HyLiEn: A hybrid approach to general list extraction on the web

Fabio Fumarola, Tim Weninger, Rick Barber, Donato Malerba, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual structure of the Web page. We present HyLiEn an unsupervised, Hybrid approach for automatic List discovery and Extraction on the Web. It employs general assumptions about the visual rendering of lists, and the structural representation of items contained in them. We show that our method significantly outperforms existing methods.

Original languageEnglish (US)
Title of host publicationProceedings of the 20th International Conference Companion on World Wide Web, WWW 2011
Pages35-36
Number of pages2
DOIs
StatePublished - Apr 29 2011
Event20th International Conference Companion on World Wide Web, WWW 2011 - Hyderabad, India
Duration: Mar 28 2011Apr 1 2011

Publication series

NameProceedings of the 20th International Conference Companion on World Wide Web, WWW 2011

Other

Other20th International Conference Companion on World Wide Web, WWW 2011
CountryIndia
CityHyderabad
Period3/28/114/1/11

Keywords

  • web information integration
  • web lists
  • web mining

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Fingerprint Dive into the research topics of 'HyLiEn: A hybrid approach to general list extraction on the web'. Together they form a unique fingerprint.

  • Cite this

    Fumarola, F., Weninger, T., Barber, R., Malerba, D., & Han, J. (2011). HyLiEn: A hybrid approach to general list extraction on the web. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 (pp. 35-36). (Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011). https://doi.org/10.1145/1963192.1963211