TY - GEN
T1 - Database support for exploring scientific workflow provenance graphs
AU - Anand, Manish Kumar
AU - Bowers, Shawn
AU - Ludäscher, Bertram
PY - 2012
Y1 - 2012
N2 - Provenance graphs generated from real-world scientific workflows often contain large numbers of nodes and edges denoting various types of provenance information. A standard approach used by workflow systems is to visually present provenance information by displaying an entire (static) provenance graph. This approach makes it difficult for users to find relevant information and to explore and analyze data and process dependencies. We address these issues through a set of abstractions that allow users to construct specialized views of provenance graphs. Our model provides operations that allow users to expand, collapse, filter, group, and summarize all or portions of provenance graphs to construct tailored provenance views. A unique feature of the model is that it can be implemented using standard relational database technology, which has a number of advantages in terms of supporting existing provenance frameworks and efficiency and scalability of the model. We present and formalize the operations within the model as a set of relational queries expressed against an underlying provenance schema. We also present a detailed experimental evaluation that demonstrates the feasibility and efficiency of our approach against provenance graphs generated from a number of scientific workflows.
AB - Provenance graphs generated from real-world scientific workflows often contain large numbers of nodes and edges denoting various types of provenance information. A standard approach used by workflow systems is to visually present provenance information by displaying an entire (static) provenance graph. This approach makes it difficult for users to find relevant information and to explore and analyze data and process dependencies. We address these issues through a set of abstractions that allow users to construct specialized views of provenance graphs. Our model provides operations that allow users to expand, collapse, filter, group, and summarize all or portions of provenance graphs to construct tailored provenance views. A unique feature of the model is that it can be implemented using standard relational database technology, which has a number of advantages in terms of supporting existing provenance frameworks and efficiency and scalability of the model. We present and formalize the operations within the model as a set of relational queries expressed against an underlying provenance schema. We also present a detailed experimental evaluation that demonstrates the feasibility and efficiency of our approach against provenance graphs generated from a number of scientific workflows.
UR - http://www.scopus.com/inward/record.url?scp=84863429325&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863429325&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-31235-9_23
DO - 10.1007/978-3-642-31235-9_23
M3 - Conference contribution
AN - SCOPUS:84863429325
SN - 9783642312342
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 343
EP - 360
BT - Scientific and Statistical Database Management - 24th International Conference, SSDBM 2012, Proceedings
T2 - 24th International Conference on Scientific and Statistical DatabaseManagement, SSDBM 2012
Y2 - 25 June 2012 through 27 June 2012
ER -