Carleton University
Technical Report TR-99-06
February 1999

Trapezoidal Atribute Cardinality Map: A New Histogram-like Technique for Query Optimization

B. John Oommen & Murali Thiyagarajah

Abstract

Histogram techniques are used to efficiently estimate query result sizes in most of the modern-day database systems. This is done by approximating the frequencies of the attribute values of the underlying data distributions. Even though they have been in use for nearly two decades, there has been no significant mathematical techniques ( other than those used in statistics for traditional histogram approximations) to study them. In a recent work [12), we introduced a new histogram-like approximation strategy, called the Rectangular Attribute Cardinality Map (R-ACM), which approximates the density function within a given sector by a rectangular cell. In this paper, we introduce another histogram-like approximation strategy, called the Trapezoidal Attribute Cardi­nality Map (T-ACM), that aims to approximate the density of the underlying attribute values using the philosophies of numerical integration.
In this new histogram-like approximation method, the density function within a given sector is approximated by a trapezoidal cell, where the slope of the trapezoid is obtained so as to guarantee that the actual probability mass within the cell equals the true probability mass. Analytically, we show that for the T-ACM, the distribu­tion of an attribute value within the sector is Binomially distributed. This permits us to derive worst-case and average-case results for the estimation errors of the probabil­ity mass itself. Our theoretical results, which include a rigorous maximum likelihood and expected-case analyses, and an extensive set of experiments demonstrate that the T-ACM scheme (which is essentially histogram-like) is much more accurate than the traditional equi-width and equi-depth histograms for query result size estimation. Due to its high accuracy and low construction costs, we hope that, along with the R-ACM introduced in [12], the T-ACM could become an invaluable tool for query optimization in the future database systems.

TR-99-06.pdf