Carleton University
Technical Report TR-01-05
July 2001

Nearly-Optimal Fano-Based Canonical Codes

Luis G. Rueda & B. John Oommen

Abstract

Statistical coding techniques have been used for a long time in lossless data compression, using meth- ods such as Human’s algorithm, arithmetic coding, Shannon’s method, Fano’s method, etc. Most of these methods can be implemented either statically or adaptively. Canonical codes, in which the code words are arranged in a lexicographical order, are advantageous because they can be decoded extremely expediently. Although Human’s algorithm is optimal, the generation of a canonical Human code is not straightforward. Conversely, while the Fano coding is sub-optimal, it can lead to canonical codes.In this paper, we resolve the dilemma by focusing on the static implementation of Fano’s method. By taking advantage of the properties of the encoding schemes generated by this method, and the concept of code word arrangement, we present an enhanced version of the static Fano’s method, namely Fano+. We formally analyze Fano+ by presenting some properties of Fano trees, and the theory of list rearrangements. Our enhanced algorithm achieves compression ratios arbitrarily close to those of Human’s algorithm. Empirical results on les of the Calgary corpus and the Canterbury corpus corroborate the almost-optimal eciency of our enhanced algorithm and its canonical nature. We believe that the compression eciency of Fano+ can be made to attain the compression ratios of the best known schemes if a structure model of the data is also incorporated.

TR-01-05.pdf