## Abstract

We consider the index selection problem. Given either a fixed query workload or an unknown probability distribution on possible future queries, and a bound B on how much space is available to build indices, we seek to build a collection of indices for which the average query response time is minimized. We give strong negative and positive peformance bounds. Let m be the number of queries in the workload. We show how to obtain with high probability a collection of indices using space O(B ln m) for which the average query cost is opt_{B} the optimal performance possible for indices using at most B total space. Moreover, this space relaxation is necessary: unless N P ⊆ n^{o(log log n)}, no polynomial time algorithm can guarantee average query cost less than M^{1-∈} opt_{B} using space αB, for any constant α, where M is the size of the dataset. We quantify the error in performance introduced by running the algorithm on a sample drawn from a query distribution.

Original language | English (US) |
---|---|

Pages | 244-251 |

Number of pages | 8 |

DOIs | |

State | Published - 2003 |

Externally published | Yes |

Event | Twenty second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2003 - San Diego, CA, United States Duration: Jun 9 2003 → Jun 11 2003 |

### Other

Other | Twenty second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2003 |
---|---|

Country/Territory | United States |

City | San Diego, CA |

Period | 6/9/03 → 6/11/03 |

## ASJC Scopus subject areas

- Software
- Information Systems
- Hardware and Architecture