Carleton University
Technical Report TR-01-06
July 2001
The Efficiency of Modern-Day Histogram-Like Techniques for Query Optimization
Abstract
One of the most dicult tasks in modern day Database Management Systems (DBMS) is information retrieval. Basically, this task involves a user query, written in a high-level language such as the Structured Query Language (SQL), and some internal operations, which are transparent to the user. The internal operations are carried out through very complex modules that decompose, optimize and execute the dif- ferent operations. We consider the problem of Query Optimization which consists of the system choosing, among many dierent Query Evaluation Plans (QEP), the most economical one. Since the number of QEPs increases exponentially as the number of relations involving the query is larger, query optimization is a very complex problem.Many estimation techniques have been developed in order to approximate the cost of a QEP. Histogram- based techniques are the most used methods in this context. In this paper, we discuss the eciency of some of these methods: Equi-width, Equi-depth, the Rectangular Attribute Cardinality Map (R-ACM), and the Trapezoidal Attribute Cardinality Map (T-ACM). These methods are used to estimate the cost of the dierent QEPs and choose the best one. It has been shown that the errors of the estimates from the R-ACM and T-ACM are signicantly less than the corresponding errors obtained from the Equi-width and Equi-depth. This fact has been formally demonstrated using reasonable statistical distributions for the cost of a QEP, the doubly exponential distribution and the normal distribution. For the empirical analysis, we have developed a formal, rigorous prototype model used to analyze these methods on random databases. Our empirical results demonstrate that the R-ACM chooses a superior QEP more than three times as often as the Equi-width and Equi-depth. In the case of the T-ACM the ratio is even greater than four. Indeed, in the most general scenario, we analytically prove that under certain models, the better the accuracy of an estimation technique, the greater the probability of choosing the most ecient QEP.