Abstract
HTML anchors are often surrounded by text that seems to describe the destination page appropriately. The text surrounding a link or the link-context is used for a variety of tasks associated with Web information retrieval. These tasks can benefit by identifying regularities in the manner in which "good" contexts appear around links. In this paper, we describe a framework for conducting such a study. The framework serves as an evaluation platform for comparing various link-context derivation methods. We apply the framework to a sample of Web pages obtained from more than 10,000 different categories of the ODP. Our focus is on understanding the potential merits of using a Web page's tag tree structure, for deriving link-contexts. We find that good link-context can be associated with tag tree hierarchy. Our results show that climbing up the tag tree when the link-context provided by greater depths is too short can provide better performance than some of the traditional techniques.
Original language | English (US) |
---|---|
Pages | 49-55 |
Number of pages | 7 |
DOIs | |
State | Published - 2003 |
Externally published | Yes |
Event | 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD '03 - San Diego, CA, United States Duration: Jun 13 2003 → Jun 13 2003 |
Conference
Conference | 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD '03 |
---|---|
Country/Territory | United States |
City | San Diego, CA |
Period | 6/13/03 → 6/13/03 |
Keywords
- DOM
- Link-context
- Tag tree
ASJC Scopus subject areas
- Computer Graphics and Computer-Aided Design
- Computer Science Applications