Towards the web of concepts: Extracting concepts from large datasets

Aditya Parameswaran, Hector Garcia-Molina, Anand Rajaraman

Research output: Contribution to journalArticlepeer-review


Concepts are sequences of words that represent real or imaginary entities or ideas that users are interested in. As a first step towards building a web of concepts that will form the backbone of the next generation of search technology, we develop a novel technique to extract concepts from large datasets. We approach the problem of concept extraction from corpora as a market-basket problem, adapting statistical measures of support and confidence. We evaluate our concept extraction algorithm on datasets containing data from a large number of users (e.g., the AOL query log data set), and we show that a high-precision concept set can be extracted.

Original languageEnglish (US)
Pages (from-to)566-577
Number of pages12
JournalProceedings of the VLDB Endowment
Issue number1
StatePublished - Sep 2010

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)


Dive into the research topics of 'Towards the web of concepts: Extracting concepts from large datasets'. Together they form a unique fingerprint.

Cite this