TY - GEN
T1 - Collaborative wrapping
T2 - 23rd International Conference on Data Engineering, ICDE 2007
AU - Chuang, Shui Lung
AU - Chang, Kevin Chen Chuan
AU - Zhai, Cheng Xiang
PY - 2007/9/24
Y1 - 2007/9/24
N2 - To access data sources on the Web, a crucial step is wrapping, which translates query responses, rendered in textual HTML, back into their relational form. Traditionally, this problem has been addressed with syntax-based approaches for a single source. However, as online databases mutiply, we often need to wrap multipe sources, in particular for domain-based integration. Observing that sources in the same domain usually share common fields, we propose a novel wrapping concept- collaborative wrapping- where multiple sources are extracted concurrently with contentbased synchronization to produce consentaneous extractions. Toward this concept, recognizing wrapping as a communication process, we develop the turbo wraper, upon the insight of turbo codes- a multi-code decoding scheme in information theory. Our experiment shows that the turbo wrapper consistently outperforms baseline single-source methods, is robust, and does benefit from extended scales of source collaboration.
AB - To access data sources on the Web, a crucial step is wrapping, which translates query responses, rendered in textual HTML, back into their relational form. Traditionally, this problem has been addressed with syntax-based approaches for a single source. However, as online databases mutiply, we often need to wrap multipe sources, in particular for domain-based integration. Observing that sources in the same domain usually share common fields, we propose a novel wrapping concept- collaborative wrapping- where multiple sources are extracted concurrently with contentbased synchronization to produce consentaneous extractions. Toward this concept, recognizing wrapping as a communication process, we develop the turbo wraper, upon the insight of turbo codes- a multi-code decoding scheme in information theory. Our experiment shows that the turbo wrapper consistently outperforms baseline single-source methods, is robust, and does benefit from extended scales of source collaboration.
UR - http://www.scopus.com/inward/record.url?scp=34548796463&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34548796463&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2007.368988
DO - 10.1109/ICDE.2007.368988
M3 - Conference contribution
AN - SCOPUS:34548796463
SN - 1424408032
SN - 9781424408030
T3 - Proceedings - International Conference on Data Engineering
SP - 1261
EP - 1262
BT - 23rd International Conference on Data Engineering, ICDE 2007
Y2 - 15 April 2007 through 20 April 2007
ER -