Abstract

To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sourcess. As a new attempt, this paper studies such matching as a data mining problem. Specifically, while complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this paper takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this "deep Web," query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach, which consists of dual mining of positive and negative correlations. We evaluate our approach on deep Web sources in several object domains (e.g., Books and Airfares) and the results show that the correlation mining approach does discover semantically meaningful matchings among attributes.

Original languageEnglish (US)
Title of host publicationWorkshop Proceedings - The 9th Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD 2004, In Conjunction with ACM SIGMOD International Conference on Management of Data, SIGMOD-04
Pages3-10
Number of pages8
DOIs
StatePublished - 2004
Event9th Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD 2004, In Conjunction with ACM SIGMOD International Conference on Management of Data, SIGMOD-04 - Paris, France
Duration: Jun 13 2004Jun 13 2004

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other9th Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD 2004, In Conjunction with ACM SIGMOD International Conference on Management of Data, SIGMOD-04
Country/TerritoryFrance
CityParis
Period6/13/046/13/04

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Mining complex matchings across web query interfaces'. Together they form a unique fingerprint.

Cite this