Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise-attribute correspondences in isolation. In contrast, we propose a new matching paradigm, holistic schema matching, to match many schemas at the same time and find all matchings at once. By handling a set of schemas together, we can explore their context information that reflects the semantic correspondences among attributes. Such information is not available when schemas are matched only in pairs. As the realizations of holistic schema matching, we develop two alternative approaches: global evaluation and local evaluation. Global evaluation exhaustively assesses all possible "models," where a model expresses all attribute matchings. In particular, we propose the MGS framework for such global evaluation, building upon the hypothesis of the existence of a hidden schema model that probabilistically generates the schemas we observed. On the other hand, local evaluation independently assesses every single matching to incrementally construct such a model. In particular, we develop the DCM framework for local evaluation, building upon the observation that co-occurrence patterns across schemas often reveal the complex relationships of attributes. We apply our approaches to match query interfaces on the deep Web. The result shows the effectiveness of both the MGS and DCM approaches, which together demonstrate the promise of holistic schema matching.
ASJC Scopus subject areas
- Information Systems