Advanced Data Structures and Algorithms

This book requires that you first read Data Structures.

This book requires that you first read Algorithms.

This is a book to complement the Data Structures book and the Algorithms book, and assumes these books as prerequisites.

There are two conflicting goals in online book writing:

to be an introduction to fundamental material
to be a complete reference work that includes discussions of tangential material

We've decided to do both. The Data Structures text and the Algorithms text focus on just the fundamentals. This book (Advanced Data Structures and Algorithms) is a place for reference material. The idea is that a student in the span of a year or less can cover those fundamentals and then move on the advanced topics in this book.

Possible topics

Parallel Programming (section; an elaboration of divide and conquer, or pipeline decoupling for streaming algorithms)
Memory Management and Garbage Collection (whole chapter)
NP-Completeness (whole chapter)
Proof of the master theorem (section)
Quadtrees and Octrees and R-trees (chapter)
Trie data structure
B-trees (chapter) (didn't we already cover this in Data Structures/Trees#B Trees and Algorithm Implementation/Trees/B+ tree ?)
(pseudo) random number generators (see Statistics/Numerical Methods/Random Number Generation, etc.)
Introspective Sort (section; a hybrid sorting algorithm)
Closures ( What Is Closure? )
Continuations ( continuations )
Cache-Aware and Cache-Oblivious Algorithms
Approximation
Maximum Flow
Primality testing (see Algorithm Implementation/Mathematics/Primality Testing, Algorithm Implementation/Mathematics/Prime number generation, Some Basic and Inefficient Prime Number Generating Algorithms, Cryptography/Mathematical Background#Primality Testing, Discrete Mathematics/Number theory#Testing for primality, Cryptography/RSA#Key Generation, etc.)

While there is no content here yet, please feel free to start writing sections! Because this book is more of a reference, chapters can be more independent of each other. Further, you can assume the reader already knows the basic data-structures and algorithms.

FFT Multiplication*

As covered in the Algorithms book, by breaking an integer into two parts and treating it as a polynomial, we were able to derive a better-than-grade-school multiplication algorithm. Even more gains in efficiency can be made if we break the integer into three parts instead. However, we don't need to stop there: we can generalize the technique of polynomial multiplication and interpolation by breaking an n digit binary number into a degree n polynomial, which breaks the number up into its individual digits (and is the furthest we can divide such a number).

We noticed that evaluating the polynomial representations at points 1, -1, and 0 made the expressions easy, but if we are to use degree-2n polynomials, how can we come up with enough points to use? The trick is to evaluate the polynomial using complex numbers, in particular, by using "roots of unity": that is, complex numbers that are all distance 1 from the origin.

Trie data structure

A Trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is an ordered tree data structure often used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores a complete key.

Tree-List

w:Rope (computer science)

When dealing with lists, we normally have two choices: either we get $O(1)$ indexed access but $O(N)$ insertion with vectors, or $O(N)$ indexing and $O(1)$ insertion with linked lists. It'd be nice if we could find some middle ground, say with $O(\log N)$ (amortized) running time for all operations. Well, Soren Sandmann has implemented GSequence, which uses a splay tree to achieve exactly that. [TODO: This structure doesn't have a widely accepted name yet. If someone can think of a better name, suggest it.]

I think B-tree algorithms give worst-case O(ln(N)) insertion, deletion, and access by name. Is GSequence better in some way? I've been told that some operating systems stored files and directories on disk as B-trees. --DavidCary 23:39, 21 July 2005 (UTC)

There's a number of balanced trees that give O(ln N) performance. The idea here is just wrapping a list interface around a tree. You might as well just say that you can use a tree since a list interface is only for implementation convenience.

There are various heap data structures that give O(log(N)) insertion, deletion and access by name.

I believe these are called random-access lists. Also, it is possible to get O(1) insertion and O(log N) indexing; e.g. Okasaki's skew-binomial random access lists. If there is an interest, I can upload a demonstration-quality implementation in Common Lisp.