The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous
distributed databases. The information stored at the data warehouse is in form of views referred to as materialized
views. The selection of the materialized views is one of the most important decisions in designing a data warehouse.
Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical
processing queries. The first issue for the user to consider is query response time. So in this paper, we develop
algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost
under the constraint of a given query response time. We call it query_cost view_ selection problem.
First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for
selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the
materialized views selection problem. But with the development of genetic process, the legal solution produced become
more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic
algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated
annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in
order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that
the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized
algorithms will become invaluable tools for data warehouse evolution.
|