TY - CHAP
T1 - Topical crawling for business intelligence
AU - Pant, Gautam
AU - Menczer, Filippo
PY - 2003
Y1 - 2003
N2 - The Web provides us with a vast resource for business intelligence. However, the large size of the Web and its dynamic nature make the task of foraging appropriate information challenging. General-purpose search engines and business portals may be used to gather some basic intelligence. Topical crawlers, driven by richer contexts, can then leverage on the basic intelligence to facilitate in-depth and up-to-date research. In this paper we investigate the use of topical crawlers in creating a small document collection that helps locate relevant business entities. The problem of locating business entities is encountered when an organization looks for competitors, partners or acquisitions. We formalize the problem, create a test bed, introduce metrics to measure the performance of crawlers, and compare the results of four different crawlers. Our results underscore the importance of identifying good hubs and exploiting link contexts based on tag trees for accelerating the crawl and improving the overall results.
AB - The Web provides us with a vast resource for business intelligence. However, the large size of the Web and its dynamic nature make the task of foraging appropriate information challenging. General-purpose search engines and business portals may be used to gather some basic intelligence. Topical crawlers, driven by richer contexts, can then leverage on the basic intelligence to facilitate in-depth and up-to-date research. In this paper we investigate the use of topical crawlers in creating a small document collection that helps locate relevant business entities. The problem of locating business entities is encountered when an organization looks for competitors, partners or acquisitions. We formalize the problem, create a test bed, introduce metrics to measure the performance of crawlers, and compare the results of four different crawlers. Our results underscore the importance of identifying good hubs and exploiting link contexts based on tag trees for accelerating the crawl and improving the overall results.
UR - http://www.scopus.com/inward/record.url?scp=35048900251&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35048900251&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-45175-4_22
DO - 10.1007/978-3-540-45175-4_22
M3 - Chapter
AN - SCOPUS:35048900251
SN - 9783540407263
T3 - Lecture Notes in Computer Science
SP - 233
EP - 244
BT - Research and Advanced Technology for Digital Libraries
A2 - Koch, Traugott
A2 - Sølvberg , Ingeborg Torvik
PB - Springer
ER -