TY - GEN
T1 - Fast best-effort pattern matching in large attributed graphs
AU - Tong, Hanghang
AU - Faloutsos, Christos
AU - Gallagher, Brian
AU - Eliassi-Rad, Tina
PY - 2007
Y1 - 2007
N2 - We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person's job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a "star" query would be, "find a CEO who has strong interactions with a Manager, a Lawyer,and an Accountant, or another structure as close to that as possible". Similarly, a "loop" query could help spot a money laundering ring. Traditional SQL-based methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. This is the first main feature of our method. It can find exact-, as well as near-matches, and it will present them to the user in our proposed "goodness" order. For example, our method tolerates indirect paths between, say, the "CEO" and the "Accountant" of the above sample query, when direct paths don't exist. Its second feature is scalability. In general, if the query has n q nodes and the data graph has n nodes, the problem needs polynomial time complexity O(n n q), which is prohibitive. Our G-Ray ("Graph X-Ray") method finds high-quality subgraphs in time linear on the size of the data graph. Experimental results on the DLBP author-publication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average fora 4-node query on the DBLP graph.
AB - We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person's job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a "star" query would be, "find a CEO who has strong interactions with a Manager, a Lawyer,and an Accountant, or another structure as close to that as possible". Similarly, a "loop" query could help spot a money laundering ring. Traditional SQL-based methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. This is the first main feature of our method. It can find exact-, as well as near-matches, and it will present them to the user in our proposed "goodness" order. For example, our method tolerates indirect paths between, say, the "CEO" and the "Accountant" of the above sample query, when direct paths don't exist. Its second feature is scalability. In general, if the query has n q nodes and the data graph has n nodes, the problem needs polynomial time complexity O(n n q), which is prohibitive. Our G-Ray ("Graph X-Ray") method finds high-quality subgraphs in time linear on the size of the data graph. Experimental results on the DLBP author-publication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average fora 4-node query on the DBLP graph.
KW - Attributed graph
KW - Pattern match
KW - Random walk
UR - http://www.scopus.com/inward/record.url?scp=36849032888&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=36849032888&partnerID=8YFLogxK
U2 - 10.1145/1281192.1281271
DO - 10.1145/1281192.1281271
M3 - Conference contribution
AN - SCOPUS:36849032888
SN - 1595936092
SN - 9781595936097
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 737
EP - 746
BT - KDD-2007
T2 - KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Y2 - 12 August 2007 through 15 August 2007
ER -