TY - GEN
T1 - On implementing provenance-aware regular path queries with relational query engines
AU - Dey, Saumen
AU - Cuevas-Vicenttín, Víctor
AU - Köhler, Sven
AU - Gribkoff, Eric
AU - Wang, Michael
AU - Ludäscher, Bertram
N1 - Copyright:
Copyright 2013 Elsevier B.V., All rights reserved.
PY - 2013
Y1 - 2013
N2 - Use of graphs is growing rapidly in social networks, semantic web, biological databases, scientific workflow provenance, and other areas. Regular Path Queries (RPQs) can be seen as a core graph query language to answer pattern-based reachability queries. Unfortunately, the number of freely available systems for querying graphs using RPQs is rather limited, and available implementations do not provide direct support for a number of desirable variants of RPQs, e.g., to return those edges that are contained in some (or all) paths that match the given regular expression R. Thus, by returning not just a pair (x, y) of end points of paths that match R, but also "witness edges" (u, v) inbetween, our RPQ variants can be understood as returning additional provenance information about the answer (x, y), i.e., those edges (u, v) that are in some (or all) paths from x to y matching R. We propose a number of such RPQ variants and show how they can be implemented using either Datalog or a suitable RDBMS. Our initial experimental results indicate that RPQs and our provenance-aware variants (RPQProv), when implemented using conventional relational technologies, yield reasonable performance even for relatively large graphs. On the other hand, the overhead associated with some of these variants also makes efficient handling of provenance-aware graph queries an interesting challenge for future research.
AB - Use of graphs is growing rapidly in social networks, semantic web, biological databases, scientific workflow provenance, and other areas. Regular Path Queries (RPQs) can be seen as a core graph query language to answer pattern-based reachability queries. Unfortunately, the number of freely available systems for querying graphs using RPQs is rather limited, and available implementations do not provide direct support for a number of desirable variants of RPQs, e.g., to return those edges that are contained in some (or all) paths that match the given regular expression R. Thus, by returning not just a pair (x, y) of end points of paths that match R, but also "witness edges" (u, v) inbetween, our RPQ variants can be understood as returning additional provenance information about the answer (x, y), i.e., those edges (u, v) that are in some (or all) paths from x to y matching R. We propose a number of such RPQ variants and show how they can be implemented using either Datalog or a suitable RDBMS. Our initial experimental results indicate that RPQs and our provenance-aware variants (RPQProv), when implemented using conventional relational technologies, yield reasonable performance even for relatively large graphs. On the other hand, the overhead associated with some of these variants also makes efficient handling of provenance-aware graph queries an interesting challenge for future research.
KW - datalog
KW - graph querying
KW - regular path queries
UR - http://www.scopus.com/inward/record.url?scp=84876807388&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876807388&partnerID=8YFLogxK
U2 - 10.1145/2457317.2457353
DO - 10.1145/2457317.2457353
M3 - Conference contribution
AN - SCOPUS:84876807388
SN - 9781450315999
T3 - ACM International Conference Proceeding Series
SP - 214
EP - 223
BT - Joint EDBT/ICDT 2013 Workshops - Proceedings
T2 - Joint EDBT/ICDT 2013 Workshops
Y2 - 18 March 2013 through 22 March 2013
ER -