Computer Science  Data Structures

There are many different data structures. It is important to understand the advantages and disadvantages of each, so that you know which data structure is most appropriate for any given situation. This tutorial will briefly describe several common data structures, when they should be used, and lists their pros and cons. Additionally, there are several tables at the end listing the computational complexity of various operations on the data structures.

A Primer on Computational Complexity

The performance of algorithms and data structures is rarely described in terms of absolute time. Saying that an operation takes 7 ms is only accurate for a specific computer running on a specific input. Instead performance is described relative to the input size. Commonly used is big-O notation, which is an upper bound on an algorithms performance. For example, a linear time algorithm takes time proportional to the input size. If the size of the input doubles, so does the amount of time to perform the operation. In big-O notation, the complexity of a linear time algorithm would be written as O(n).

Big-O notation, however, does not provide all the information that may be needed. It is a measure of asymptotic performance, only accounting for how the complexity increases for large inputs. Only the fastest growing term affects the asymptotic behavior, and so the other terms are not included. Still, big-O notation provides a reasonable estimate of the performance for most operations on inputs larger than 20 elements.

Another computational complexity idea is amortized cost. This can be thought of as an average. While it may occasionally take more time to run the operation, any sequence of n operations will average no more than the amortized cost per operation. Determining the complexity of algorithms is beyond the scope of this tutorial, but reference tables are provided at the end of the post for the data structures discussed here.

Common Computational Complexities
Notation Common name Example
O(1) Constant Adding two numbers
O(log n) Logarithmic Finding an element in a sorted list
O(n) Linear Finding an element in an unsorted list
O(n log n) Log-linear Sorting a list
O(n^2) Quadratic Testing collisions between every pair of objects in a group
O(n^3) Cubic Matrix multiplication
O(c^n), c>1 Exponential Guessing a password by brute force

Dynamic Array

Similar data structures / aliases: Vector, Resizable array, Array list

The simplest dynamic array is a wrapper around a static array. When the original array runs out of space, a new larger array is allocated, and the contents of the original array are copied to the new array. Elements are accessed like a static array, although many implementations provide additional safety checks when doing this. Normally, dynamic arrays grow only from one end. This results in better performance for operations on that end. There are, however, some implementations that can grow from either size at the cost of additional memory.

Dynamic arrays are useful just about everywhere an ordinary array is. Their ability to grow and shrink in size makes them more versatile, but they retain their very fast indexing and iterations. Inserting and deleting elements at the end of a dynamic array is also fast, but doing the same anywhere else means all the elements after the modification must be copied and moved.

Another concern is that while the amortized cost of each insert at the end is constant, the worst case performance is linear. This is because if the reserved space is full, the entire array must be copied. This can lead to the occasional unexpectedly expensive operation. For this reason, you need to be careful when using dynamic arrays in a real-time application. If a very large dynamic array needs to be grow, it can hold up the rest of the program. For smaller dynamic arrays this is not much of an issue because memory copying is normally a very fast (though linear time) operation.

Pros

  • Very fast indexing and iteration
  • Memory coherency

Cons

  • Adding element at end may take O(n) time (but usually not)
  • Adding elements in middle is slow
  • Wastes space is array isnt full

Linked List

Similar structures/aliases: Singly linked list, Doubly linked list

A linked list wraps each element in a node. For a singly linked list, each node contains a reference to the next node. A doubly linked list contains references to both the next and previous nodes. The advantages of a doubly linked list is that deletions becomes easier and the list can be traversed in either direction, but this comes at the cost of more memory overhead.

Linked lists a good data structure is you will frequently add or delete elements at both ends or if you will need to insert and delete elements while iterating through the list. They are, however, very poor at random element access. Iterating over the list is also a bit slower than dynamic arrays, because of the number of references that need to be traversed.

Pros

  • Modifications made while iterating over the list are fast
  • Insertions/deletions at both beginning and end of list are fast

Cons

  • Slower iterations than arrays
  • Extra memory overhead for each element
  • Random access is slow

Binary Tree

Similar structures/aliases: Red-black tree

A binary tree consists of nodes, each with two children. The left child is always less than the parent and the right child is always greater. This simple structure allows most operations on a binary tree to be carried out in logarithmic time. More advanced variations, such as red-black trees, keep the tree balanced so that the worst case performance remains logarithmic.

Almost all operations on binary trees have logarithmic performance. This is both a strength and weakness. While the operations are not the fastest, they are very consistent. The average case and worst case performances are the same. Elements in a binary tree must also have an ordering such as alphabetically.

Pros

  • Logarithmic performance for most operations
  • Elements always remain sorted

Cons

  • Logarithmic performance for most operations
  • Slower iteration
  • Elements must have an ordering

Hash Map

Similar structures/aliases: Hash table

A hash map consists of a set of keys, each uniquely mapping to an element. The keys are transformed by an algorithm into an integer which is then used as an index into an array. The array is where the actual elements are stored. Because of the limited size of the array, collisions may occur when two keys hash to the same value. For these cases, collision resolution routines must be used which reduce performance.

A hash map is best at storing a large number of elements that need to be quickly located. Find, insert, and remove all operate in constant time so even very large data sets can be operated on efficiently. This performance does have a cost. While, the operations may be constant time, that constant can be quite large. Consider hashing by name. Every character of the name must be run through some algorithm so that it produces a sufficiently random hash value. For a long name, this could take some time. For this reason, once an element has been found, a reference to it should be kept so that it does not need to be located repeatedly. There is also a significant memory cost for hash maps. To reduce the number of collisions, the hash map must be significantly larger than the number of elements in it.

Pros

  • Constant time find, insert, and delete

Cons

  • Hashing function may be slow
  • Performance degrades as table fills up
  • Large memory overhead

Complexity Reference

Lists

Lists are your basic interface. They allow for random access (i.e. elements can be looked at, inserted, or removed at any index). The iteration speed is O(n) for almost all data structures, so instead the speed relative to one another is listed. Binary trees and hash maps are also listed even though they dont support all of the list operations.

Peek at first / middle / last Insert first / middle / last Remove first / middle / last Iteration
Dynamic Array O(1) O(n) / O(n) / O(1)* O(n) / O(n) / O(1) Fastest
Singly Linked List O(1) / O(n) / O(1) O(1) O(1) / O(n) / O(n) Fast
Doubly Linked List O(1) / O(n) / O(1) O(1) O(1)** Fast
Binary Tree O(log n) - - Moderate
Hash Map - - - Slow

* amortized cost, ** does not include search time

Sorted Lists

A sorted list acts like a list except items are indexed in order.

Peek at first / middle / last Insert Remove first / middle / last Find
Dynamic Array (unsorted) O(1) O(1)* O(n) O(n)
Dynamic Array (sorted) O(1) O(n) O(n) O(log n)
Singly Linked List (sorted) O(1) / O(n) / O(1) O(n) O(1) / O(n) / O(n) O(n)
Doubly Linked List (sorted) O(1) / O(n) / O(1) O(n) O(1) / O(n) / O(n) O(n)
Binary Tree O(log n) O(log n) O(log n) O(log n)

* amortized cost

Stacks

In a stack, an element can be inserted at the end (pushed) or removed from the end (popped). This means that the last element added will always be the first to be removed. Most implementations also allow the last element to be looked at without modifying the stack.

Peek at first Insert Remove
Dynamic Array O(1) O(1)* O(1)
Singly Linked List O(1) O(1) O(n)
Doubly Linked List O(1) O(1) O(1)

* amortized cost

Queues

In a queue elements can be added to the end and removed from the front. The element that was in the queue the longest is therefore the first to be removed.

Peek at first Insert Remove
Dynamic Array O(1) O(1)* O(1)
Singly Linked List O(1) O(1) O(1)
Doubly Linked List O(1) O(1) O(1)

* amortized cost

Double Ended Queues

A double ended queue allows elements to be added and removed at both the front and back.

Peek at first / last Insert first / last Remove first / last
Dynamic Array O(1) O(1)* O(1)
Singly Linked List O(1) O(1) O(1) / O(n)
Doubly Linked List O(1) O(1) O(1)

* amortized cost

Maps

A map identifies each element not by an index, but by a unique key (such as a name). In the case of a sorted dynamic array or binary tree, the keys must have an ordering (i.e. alphabetically).

Insert/remove Find
Dynamic Array (sorted) O(n) O(log n)
Binary Tree O(log n) O(log n)
Hash Map O(1) O(1)

-Eric

Leave a comment ?

0 Comments.

Leave a Comment

* Copy this password:

* Type or paste password here:

976 Spam Comments Blocked so far by Spam Free Wordpress


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Performance Optimization WordPress Plugins by W3 EDGE