Collaborative wrapping: A turbo framework for Web data extraction

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To access data sources on the Web, a crucial step is wrapping, which translates query responses, rendered in textual HTML, back into their relational form. Traditionally, this problem has been addressed with syntax-based approaches for a single source. However, as online databases mutiply, we often need to wrap multipe sources, in particular for domain-based integration. Observing that sources in the same domain usually share common fields, we propose a novel wrapping concept- collaborative wrapping- where multiple sources are extracted concurrently with contentbased synchronization to produce consentaneous extractions. Toward this concept, recognizing wrapping as a communication process, we develop the turbo wraper, upon the insight of turbo codes- a multi-code decoding scheme in information theory. Our experiment shows that the turbo wrapper consistently outperforms baseline single-source methods, is robust, and does benefit from extended scales of source collaboration.

Original languageEnglish (US)
Title of host publication23rd International Conference on Data Engineering, ICDE 2007
Pages1261-1262
Number of pages2
DOIs
StatePublished - Sep 24 2007
Event23rd International Conference on Data Engineering, ICDE 2007 - Istanbul, Turkey
Duration: Apr 15 2007Apr 20 2007

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other23rd International Conference on Data Engineering, ICDE 2007
CountryTurkey
CityIstanbul
Period4/15/074/20/07

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint Dive into the research topics of 'Collaborative wrapping: A turbo framework for Web data extraction'. Together they form a unique fingerprint.

Cite this