Advanced techniques for high performance query optimization in database systems
Abstract (summary)
Current query optimizers employ many sources of information about the database to optimize queries. Statistics and integrity constraints—primary key and foreign key constraints in particular—have long played a role in the optimizer. Strategies that exploit constraints have been seen to offer good improvement for query evaluation. A problem, however, is that often the “constraints” that would be useful for optimization for a given database are not explicitly available for the optimizer.
In this dissertation we investigate how to discover useful constraint-like characterizations of large data sets via data mining. Specifically, we propose join holes as a new way for characterizing data values for multiple attributes. An efficient and scalable algorithm is presented for identifying all maximal join holes over large data sets. We explore the applications of join holes for query optimization in two areas: query rewrite and join size estimation. We propose SIEQE, a new statistic built over non base tables by means of join holes for cardinality prediction. The proposed techniques extend the functionalities of current query optimizers and provide better estimation of cardinality for join queries in an efficient way.
Database systems nowadays make use of materialized views for the efficiency of query evaluation. Determining which views to materialize, however, is a non-trivial task. We address the view selection problem in this dissertation by presenting an algorithm to automatically recommend vew to materialize for given workload queries. Our work in this area has been implemented in latest DB2 product which demonstrated significant improvement.
Skyline is a proposed query operator that could be useful in expressing preference queries. Computing skyline queries expressed in standard SQL statements would be cumbersome and expensive with current database optimization. We address this by developing sort-filter-skyline, an efficient algorithm that is well-behaved in a relational context, and is easily accommodated by the query optimizer.