Deriving link-context from HTML tag tree

Research output: Contribution to conferencePaperpeer-review

Abstract

HTML anchors are often surrounded by text that seems to describe the destination page appropriately. The text surrounding a link or the link-context is used for a variety of tasks associated with Web information retrieval. These tasks can benefit by identifying regularities in the manner in which "good" contexts appear around links. In this paper, we describe a framework for conducting such a study. The framework serves as an evaluation platform for comparing various link-context derivation methods. We apply the framework to a sample of Web pages obtained from more than 10,000 different categories of the ODP. Our focus is on understanding the potential merits of using a Web page's tag tree structure, for deriving link-contexts. We find that good link-context can be associated with tag tree hierarchy. Our results show that climbing up the tag tree when the link-context provided by greater depths is too short can provide better performance than some of the traditional techniques.

Original languageEnglish (US)
Pages49-55
Number of pages7
DOIs
StatePublished - 2003
Externally publishedYes
Event8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD '03 - San Diego, CA, United States
Duration: Jun 13 2003Jun 13 2003

Conference

Conference8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD '03
Country/TerritoryUnited States
CitySan Diego, CA
Period6/13/036/13/03

Keywords

  • DOM
  • Link-context
  • Tag tree

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Deriving link-context from HTML tag tree'. Together they form a unique fingerprint.

Cite this