Structured databases on the web: Observations and implications

Kevin Chen-Chuan Chang, Bin He, Chengkai Li, Mitesh Patel, Zhen Zhang

Research output: Contribution to journalReview article

Abstract

The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep" Web" of searchable databases is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and integrating structured Web sources. On one hand, our "macro" study surveys the deep Web at large, in April 2004, adopting the random IP-sampling approach, with one million samples. (How large is the deep Web? How is it covered by current directory services?) On the other hand, our "micro" study surveys source-specific characteristics over 441 sources in eight representative domains, in December 2002. (How "hidden" are deep-Web sources? How do search engines cover their data? How complex and expressive are query forms?) We report our observations and publish the resulting datasets to the research community. We conclude with several implications (of our own) which, while necessarily subjective, might help shape research directions and solutions.

Original languageEnglish (US)
Pages (from-to)61-70
Number of pages10
JournalSIGMOD Record
Volume33
Issue number3
DOIs
StatePublished - Sep 2004

Fingerprint

Search engines
World Wide Web
Macros
Sampling

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Structured databases on the web : Observations and implications. / Chang, Kevin Chen-Chuan; He, Bin; Li, Chengkai; Patel, Mitesh; Zhang, Zhen.

In: SIGMOD Record, Vol. 33, No. 3, 09.2004, p. 61-70.

Research output: Contribution to journalReview article

Chang, Kevin Chen-Chuan ; He, Bin ; Li, Chengkai ; Patel, Mitesh ; Zhang, Zhen. / Structured databases on the web : Observations and implications. In: SIGMOD Record. 2004 ; Vol. 33, No. 3. pp. 61-70.
@article{5915479682f44379be2ec6fbe396ebd0,
title = "Structured databases on the web: Observations and implications",
abstract = "The Web has been rapidly {"}deepened{"} by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this {"}deep{"} Web{"} of searchable databases is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and integrating structured Web sources. On one hand, our {"}macro{"} study surveys the deep Web at large, in April 2004, adopting the random IP-sampling approach, with one million samples. (How large is the deep Web? How is it covered by current directory services?) On the other hand, our {"}micro{"} study surveys source-specific characteristics over 441 sources in eight representative domains, in December 2002. (How {"}hidden{"} are deep-Web sources? How do search engines cover their data? How complex and expressive are query forms?) We report our observations and publish the resulting datasets to the research community. We conclude with several implications (of our own) which, while necessarily subjective, might help shape research directions and solutions.",
author = "Chang, {Kevin Chen-Chuan} and Bin He and Chengkai Li and Mitesh Patel and Zhen Zhang",
year = "2004",
month = "9",
doi = "10.1145/1031570.1031584",
language = "English (US)",
volume = "33",
pages = "61--70",
journal = "SIGMOD Record",
issn = "0163-5808",
publisher = "Association for Computing Machinery (ACM)",
number = "3",

}

TY - JOUR

T1 - Structured databases on the web

T2 - Observations and implications

AU - Chang, Kevin Chen-Chuan

AU - He, Bin

AU - Li, Chengkai

AU - Patel, Mitesh

AU - Zhang, Zhen

PY - 2004/9

Y1 - 2004/9

N2 - The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep" Web" of searchable databases is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and integrating structured Web sources. On one hand, our "macro" study surveys the deep Web at large, in April 2004, adopting the random IP-sampling approach, with one million samples. (How large is the deep Web? How is it covered by current directory services?) On the other hand, our "micro" study surveys source-specific characteristics over 441 sources in eight representative domains, in December 2002. (How "hidden" are deep-Web sources? How do search engines cover their data? How complex and expressive are query forms?) We report our observations and publish the resulting datasets to the research community. We conclude with several implications (of our own) which, while necessarily subjective, might help shape research directions and solutions.

AB - The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep" Web" of searchable databases is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and integrating structured Web sources. On one hand, our "macro" study surveys the deep Web at large, in April 2004, adopting the random IP-sampling approach, with one million samples. (How large is the deep Web? How is it covered by current directory services?) On the other hand, our "micro" study surveys source-specific characteristics over 441 sources in eight representative domains, in December 2002. (How "hidden" are deep-Web sources? How do search engines cover their data? How complex and expressive are query forms?) We report our observations and publish the resulting datasets to the research community. We conclude with several implications (of our own) which, while necessarily subjective, might help shape research directions and solutions.

UR - http://www.scopus.com/inward/record.url?scp=5444262639&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=5444262639&partnerID=8YFLogxK

U2 - 10.1145/1031570.1031584

DO - 10.1145/1031570.1031584

M3 - Review article

AN - SCOPUS:5444262639

VL - 33

SP - 61

EP - 70

JO - SIGMOD Record

JF - SIGMOD Record

SN - 0163-5808

IS - 3

ER -