Carleton University
Technical Report TR-95-03
February 1995

Red-Black Balanced Trie Hashing

E. J. Otoo

Abstract

Trie hashing is a scheme, proposed by Litwin, for indexing records with very long alphanumeric keys. The records are grouped into buckets of capacity b records per bucket and maintained on secondary storage. To retrieve a record, the memory res- ident trie is traversed from the root to a leaf node where the address of the target bucket is found. Using the address found, the data bucket is read into memory and searched to determine the presence or absence of the record. The scheme, for all prac- tical purposes, locates a record in one or two disk accesses. Unlike a trie, the scheme su ers from: i) potential degeneracy when the keys inserted are ordered, ii) expen- sive reconstruction cost if a system failure occurs during a session. We present a new approach to implementing Trie Hashing that resolves the problem of potential degen- eracy. Our approach combines the basic trie hashing algorithm with the balancing techniques of the Red-Black Binary Search Tree, to produce a relatively balanced trie hashing scheme. As a result we ensure that the trie is of height O(log np) where np is the number of buckets and we achieve an average data storage utilization of 67% that is reminiscent of a bucket splitting storage organization. Our method improves considerably upon the performance of the trie hashing scheme.

TR-95-03.pdf