Carleton University
Technical Report TR-174
May 1990
A Presortedness Metric for Ensembles of Data Sequences
Abstract
Sorting is a well studied problem in the field of computer science. Many excellent sorting methods are known today, including the recent adaptive methods which sort all sequences of data ( or permutations) and perform particularly well on sequences that are “almost sorted” according to some “presortedness” measure. The methods existing in the literature have focussed on assigning a “presortedness” measure to a single input sequence. The particular measure in question can then be related to the computational complexity of the algorithm being utilized. Such a measure, in itself, cannot be used to predict the expected complexity of au algorithm. A comparison of any two sorting algorithms (in the expected sense) necessarily requires a knowledge of the distribution of the data sequences presented to the sorting program.
In an attempt to capture the nature of the distribution of data sequences presented to a sorting program, we propose a presortedness metric which characterizes the properties of an ensemble of data sequences by a single parameter. This parameter indicates how sorted or unsorted the ensemble of random data sequences is. Various theoretical properties of the metric are derived. The estimation procedure for determining the Maximum Likelihood Estimate of the “Ensemble Presortedness metric” for a given ensemble of data sequences is also included. The results have been experimentally validated.