TY - GEN
T1 - Mining complex matchings across web query interfaces
AU - He, Bin
AU - Chang, Kevin Chen-Chuan
AU - Han, Jiawei
PY - 2004
Y1 - 2004
N2 - To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sourcess. As a new attempt, this paper studies such matching as a data mining problem. Specifically, while complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this paper takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this "deep Web," query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach, which consists of dual mining of positive and negative correlations. We evaluate our approach on deep Web sources in several object domains (e.g., Books and Airfares) and the results show that the correlation mining approach does discover semantically meaningful matchings among attributes.
AB - To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sourcess. As a new attempt, this paper studies such matching as a data mining problem. Specifically, while complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this paper takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this "deep Web," query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach, which consists of dual mining of positive and negative correlations. We evaluate our approach on deep Web sources in several object domains (e.g., Books and Airfares) and the results show that the correlation mining approach does discover semantically meaningful matchings among attributes.
UR - http://www.scopus.com/inward/record.url?scp=77954009165&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954009165&partnerID=8YFLogxK
U2 - 10.1145/1008694.1008696
DO - 10.1145/1008694.1008696
M3 - Conference contribution
AN - SCOPUS:77954009165
SN - 158113908X
SN - 9781581139082
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 3
EP - 10
BT - Workshop Proceedings - The 9th Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD 2004, In Conjunction with ACM SIGMOD International Conference on Management of Data, SIGMOD-04
T2 - 9th Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD 2004, In Conjunction with ACM SIGMOD International Conference on Management of Data, SIGMOD-04
Y2 - 13 June 2004 through 13 June 2004
ER -