Abstract
Concepts are sequences of words that represent real or imaginary entities or ideas that users are interested in. As a first step towards building a web of concepts that will form the backbone of the next generation of search technology, we develop a novel technique to extract concepts from large datasets. We approach the problem of concept extraction from corpora as a market-basket problem, adapting statistical measures of support and confidence. We evaluate our concept extraction algorithm on datasets containing data from a large number of users (e.g., the AOL query log data set), and we show that a high-precision concept set can be extracted.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 566-577 |
| Number of pages | 12 |
| Journal | Proceedings of the VLDB Endowment |
| Volume | 3 |
| Issue number | 1 |
| DOIs | |
| State | Published - Sep 2010 |
| Externally published | Yes |
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- General Computer Science