You are here

Provenance Management Framework: Provenance Algebra and Materialized View-based Storage

Provenance Management Framework: Provenance Algebra and Materialized View-based Storage

In Collaboration With Microsoft Research


Provenance, from the French word "provenir" meaning "to come from", describes the lineage of an entity. Provenance is critical information in eScience to accurately interpret scientific results. Though information provenance has been recognized as a hard problem in computing science (British Computing Society, 2004), many fundamental research issues in provenance have yet to be addressed. In this work, we have proposed a provenance management system composed of a novel provenance algebra and a materialized view-based provenance storage to address the above listed issues.

Provenir Ontology

eScience requires a common provenance model to represent workflow provenance, database provenance, as well as domain-specific details in an integrated manner. Further, the scale of provenance metadata generated in high-throughput eScience experiments precludes manual interpretation and requires processing by software applications. Hence, a common provenance model should also allow both consistent interpretation and reasoning using entailment rules by software applications. We introduce a common provenance model called provenir ontology defined using the OWL-DL language. Provenir ontology includes provenance classes and explicitly modeled named relations between them. Modeling relations as first class entities enables the provenir ontology to capture provenance details that are closer to real world eScience experiments. The provenir ontology forms the core component of a modular approach for our eScience provenance framework. Instead of a single monolithic provenance ontology that models all possible details from different domains, our proposed modular provenance framework involves integrated use of multiple ontologies, each modeling specific provenance metadata for a particular domain (for example, ProPreO ontology represents proteomics domain-specific provenance). These multiple ontologies will use the provenir ontology as the common reference model, hence making it easier to interoperate with each other. This modular framework represents a scalable, flexible and maintainable approach that can be adapted to the specific requirement of different domains.

Provenir Ontology Schema (Show / Hide)

Further details

Provenance Query Operators

The provenance literature features a large variety of queries, each addressing the specific requirements of an application under discussion. But without a systematic classification of provenance queries it is difficult to clearly identify the common and distinct characteristics of these queries, and more importantly, define query operators to support them. A classification scheme for provenance queries in eScience is proposed for the first time, based on the classification a set of provenance query operators are defined. The query operators are defined in terms of the Provenir ontology.
Provenance Query Classification:

  • Querying for provenance metadata: Given a data entity, this category of queries returns the complete set of provenance information that influenced the current state of the data entity.
  • Querying for data values: A diametrically opposite perspective to the first category of query is, given a set of constraints defined over both provenance metadata and data, expressed using formal context structure, retrieve a set of data entities satisfying the constraints.
  • Modifying provenance metadata: This category of queries is defined over the provenance metadata itself. Example operations include merging of provenance from different stages of an experiment and comparison of provenance for two datasets from different sources.
Further details

Materialized View-based Provenance Storage

A practical provenance storage solution is implemented on a commercial relational database system using a materialized views-based approach. This approach demonstrates that a provenance management system using a relational database system is feasible for complex queries over large datasets through implementation of well-defined provenance query operators and using materialized views. In the world of database systems for provenance support, the need to dynamically maintain large amounts of complex data conflicts with the demand for subsecond query response time. Our answer to this dilemma is materialized views and indices, both of which precompute aggregate information. A database can utilize materialized views to prejoin tables, presort solution sets, and integrate semantic information. The materialized view can be set up to automatically keep itself in synch with base data, updating itself at predetermined intervals.

Further details



Satya Sahoo

© 2012 Kno.e.sis | 377 Joshi Research Center, 3640 Colonel Glenn Highway, Dayton, OH 45435 | (937 - 775 - 5217)

Most bruising and swelling will resolve within two weeks of your surgery. Tightness after breast augmentation. Call and make an appointment to see your surgeon to make sure you are healing as expected - I'm sure he or she will be happy to hear from you How to fix sagging breast? Discuss, breast lift information Surgical bra after breast augmentation It is always fair to ask your surgeon the rational for a bra or any other aspect of your care - Im sure he or she will be happy to answer