You are here

Provenance Management Framework: Provenance Algebra and Materialized View-based Storage

Do you want to lose weight fast but do not know how? Are you tired of big belly? To lose weight quickly you need to follow the rules how to lose weight fast and how to lose weight fast for women.
Eat less harmful products, get exercise, then to not ask yourself how to lose weight fast for men, try all sorts of fast diets, including detox diet. Love your body and do not overeat to be thin.

Provenance Management Framework: Provenance Algebra and Materialized View-based Storage

In Collaboration With Microsoft Research

Introduction

Provenance, from the French word "provenir" meaning "to come from", describes the lineage of an entity. Provenance is critical information in eScience to accurately interpret scientific results. Though information provenance has been recognized as a hard problem in computing science (British Computing Society, 2004), many fundamental research issues in provenance have yet to be addressed. In this work, we have proposed a provenance management system composed of a novel provenance algebra and a materialized view-based provenance storage to address the above listed issues.


Provenir Ontology

eScience requires a common provenance model to represent workflow provenance, database provenance, as well as domain-specific details in an integrated manner. Further, the scale of provenance metadata generated in high-throughput eScience experiments precludes manual interpretation and requires processing by software applications. Hence, a common provenance model should also allow both consistent interpretation and reasoning using entailment rules by software applications. We introduce a common provenance model called provenir ontology defined using the OWL-DL language. Provenir ontology includes provenance classes and explicitly modeled named relations between them. Modeling relations as first class entities enables the provenir ontology to capture provenance details that are closer to real world eScience experiments. The provenir ontology forms the core component of a modular approach for our eScience provenance framework. Instead of a single monolithic provenance ontology that models all possible details from different domains, our proposed modular provenance framework involves integrated use of multiple ontologies, each modeling specific provenance metadata for a particular domain (for example, ProPreO ontology represents proteomics domain-specific provenance). These multiple ontologies will use the provenir ontology as the common reference model, hence making it easier to interoperate with each other. This modular framework represents a scalable, flexible and maintainable approach that can be adapted to the specific requirement of different domains.

Provenir Ontology Schema (Show / Hide)

Further details


Provenance Query Operators

The provenance literature features a large variety of queries, each addressing the specific requirements of an application under discussion. But without a systematic classification of provenance queries it is difficult to clearly identify the common and distinct characteristics of these queries, and more importantly, define query operators to support them. A classification scheme for provenance queries in eScience is proposed for the first time, based on the classification a set of provenance query operators are defined. The query operators are defined in terms of the Provenir ontology.
Provenance Query Classification:

  • Querying for provenance metadata: Given a data entity, this category of queries returns the complete set of provenance information that influenced the current state of the data entity.
  • Querying for data values: A diametrically opposite perspective to the first category of query is, given a set of constraints defined over both provenance metadata and data, expressed using formal context structure, retrieve a set of data entities satisfying the constraints.
  • Modifying provenance metadata: This category of queries is defined over the provenance metadata itself. Example operations include merging of provenance from different stages of an experiment and comparison of provenance for two datasets from different sources.
Further details


Materialized View-based Provenance Storage

A practical provenance storage solution is implemented on a commercial relational database system using a materialized views-based approach. This approach demonstrates that a provenance management system using a relational database system is feasible for complex queries over large datasets through implementation of well-defined provenance query operators and using materialized views. In the world of database systems for provenance support, the need to dynamically maintain large amounts of complex data conflicts with the demand for subsecond query response time. Our answer to this dilemma is materialized views and indices, both of which precompute aggregate information. A database can utilize materialized views to prejoin tables, presort solution sets, and integrate semantic information. The materialized view can be set up to automatically keep itself in synch with base data, updating itself at predetermined intervals.

Further details


Publications


Contact

Satya Sahoo

© 2012 Kno.e.sis | 377 Joshi Research Center, 3640 Colonel Glenn Highway, Dayton, OH 45435 | (937 - 775 - 5217)