Topical crawling for business intelligence

Gautam Pant, Filippo Menczer

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The Web provides us with a vast resource for business intelligence. However, the large size of the Web and its dynamic nature make the task of foraging appropriate information challenging. General-purpose search engines and business portals may be used to gather some basic intelligence. Topical crawlers, driven by richer contexts, can then leverage on the basic intelligence to facilitate in-depth and up-to-date research. In this paper we investigate the use of topical crawlers in creating a small document collection that helps locate relevant business entities. The problem of locating business entities is encountered when an organization looks for competitors, partners or acquisitions. We formalize the problem, create a test bed, introduce metrics to measure the performance of crawlers, and compare the results of four different crawlers. Our results underscore the importance of identifying good hubs and exploiting link contexts based on tag trees for accelerating the crawl and improving the overall results.

Original languageEnglish (US)
Title of host publicationResearch and Advanced Technology for Digital Libraries
Subtitle of host publication7th European Conference, ECDL 2003, Trondheim, Norway, August 17-22, 2003. Proceedings
EditorsTraugott Koch, Ingeborg Torvik Sølvberg
PublisherSpringer
Pages233-244
Number of pages12
ISBN (Print)9783540407263
DOIs
StatePublished - 2003
Externally publishedYes

Publication series

NameLecture Notes in Computer Science
Volume2769
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Topical crawling for business intelligence'. Together they form a unique fingerprint.

Cite this