- About Us
- Research & Labs
- Projects and Funding
Provenance Management Framework: Provenance Algebra and Materialized View-based Storage
In Collaboration With Microsoft Research
Provenance, from the French word "provenir" meaning "to come from", describes the lineage of an entity. Provenance is critical information in eScience to accurately interpret scientific results. Though information provenance has been recognized as a hard problem in computing science (British Computing Society, 2004), many fundamental research issues in provenance have yet to be addressed. In this work, we have proposed a provenance management system composed of a novel provenance algebra and a materialized view-based provenance storage to address the above listed issues.
eScience requires a common provenance model to represent workflow provenance, database provenance, as well as domain-specific details in an integrated manner. Further, the scale of provenance metadata generated in high-throughput eScience experiments precludes manual interpretation and requires processing by software applications. Hence, a common provenance model should also allow both consistent interpretation and reasoning using entailment rules by software applications. We introduce a common provenance model called provenir ontology defined using the OWL-DL language. Provenir ontology includes provenance classes and explicitly modeled named relations between them. Modeling relations as first class entities enables the provenir ontology to capture provenance details that are closer to real world eScience experiments. The provenir ontology forms the core component of a modular approach for our eScience provenance framework. Instead of a single monolithic provenance ontology that models all possible details from different domains, our proposed modular provenance framework involves integrated use of multiple ontologies, each modeling specific provenance metadata for a particular domain (for example, ProPreO ontology represents proteomics domain-specific provenance). These multiple ontologies will use the provenir ontology as the common reference model, hence making it easier to interoperate with each other. This modular framework represents a scalable, flexible and maintainable approach that can be adapted to the specific requirement of different domains.
|Provenir Ontology Schema (Show / Hide)|
Schema of Provenir Ontology
|Provenance Query Operators|
The provenance literature features a large variety of queries, each addressing the specific requirements of an application under discussion. But without a systematic classification of provenance queries it is difficult to clearly identify the common and distinct characteristics of these queries, and more importantly, define query operators to support them.
A classification scheme for provenance queries in eScience is proposed for the first time, based on the classification a set of provenance query operators are defined. The query operators are defined in terms of the Provenir ontology.
|Materialized View-based Provenance Storage|
A practical provenance storage solution is implemented on a commercial relational database system using a materialized views-based approach. This approach demonstrates that a provenance management system using a relational database system is feasible for complex queries over large datasets through implementation of well-defined provenance query operators and using materialized views.
In the world of database systems for provenance support, the need to dynamically maintain large amounts of complex data conflicts with the demand for subsecond query response time. Our answer to this dilemma is materialized views and indices, both of which precompute aggregate information. A database can utilize materialized views to prejoin tables, presort solution sets, and integrate semantic information. The materialized view can be set up to automatically keep itself in synch with base data, updating itself at predetermined intervals.