ProPreO
| Knowledge Representation and Management |
Ontology as a
formal model to capture essential aspects of a domain is being rapidly adopted
in many disciplines including biological sciences. In collaboration with
researchers at |
| ProPreO: comprehensive Proteomics data and process provenance ontology | Proteomics discipline and glycoproteomics in particular, are focused on two core objectives:
The experimental protocols for proteomics are rapidly maturing and attaining the proportions of industrial scale for data generation; also termed as high-throughput experiment. Similar to genomics, the limiting step in this scenario will be the computational and related analytical tools that are available to process this large volume of data and generate useful information.
We adhered to four major criteria during the development of the ProPreO: I. Logical rigor: We are using ProPreO for annotation of experimental proteomics data. Using this annotated experimental data; information management application will not only be able to store, retrieve, and integrate multiple datasets but also infer implicit knowledge that will provide insight to proteomics researchers for hypothesis formulation and validation. Hence, to allow computational tools to use ProPreO for reasoning purposes, we ensured the absence of incorrectly determined classes, incorrect or inappropriate naming schemes, and ill-defined relationships between concepts in the ProPreO schema. ProPreO schema includes 390 rigorously defined classes, 32 generic relations and 172 specific restrictions on the generic relations to correctly describe each concept and its relation to other concepts. II.Compatibility with existing bio-medical ontologies: It is now well understood and accepted that the life sciences domain requires multiple ontologies to manage the inherent complexities of the domain. Hence, in the scenario involving multiple but related ontologies, it is critical that these ontologies can be used in an integrated manner by semantic applications. We have followed the Basic Formal Ontology (BFO) (Smith B. et. al. 2002) approach in class and relationship creation in ProPreO. The three top-level classes of ProPreO are 'data' (datasets and parameter data), (experimental) 'instrument', and (experimental) 'task'. Additionally, we created the relations in ProPreO by defining generic and easily understandable relations at top-level classes. Using various restrictions, we defined the application of the generic relations for each class thereby effectively and efficiently modeling the characteristics of each concept and its relation with other concepts accurately. Currently, we are working on issues related to the integration, mapping and alignment of ProPreO with ontologies listed in the Open Biomedical Ontologies (OBO) repository. III.Use of OWL-DL language: The Web Ontology Language (OWL) has three flavors namely, OWL-Lite, OWL-DL and OWL-Full. As we planned ProPreO ontology to be used by computational applications while being as accurate as possible in expressing the inherent complexity of the proteomics experimental domain, we chose OWL-DL as the language for ProPreO. OWL-DL enables us to be expressive while ensuring acceptable computational properties.. IV.Populated ontology: We believe that an ontology schema is of limited use without real world knowledge. We have populated ProPreO with instances corresponding to concepts modeled as part of the ontology schema. ProPreO has 3.1 million instances and 18.6 million triples. This population of ProPreO with million of real world instances enables us build computational tools that integrate the large volumes of high-throughput experimental data within an overarching semantic framework and reason over it for knowledge discovery. These four criterions has enabled ProPreO to provide the formal semantic foundation for modeling and incorporation of comprehensive provenance information in wide ranging, high-throughput proteomics research. |
| Access ProPreO (version: 1.1) |
|
| Collaborator |
Complex Carbohydrate Research Center (CCRC), University of Georgia |
| Publication |
Satya S. Sahoo, Christopher
Thomas, Amit Sheth, William
S. York, and Samir Tartir, "Knowledge Modeling and its
application in Life Sciences: A Tale of two ontologies" the 15th World
Wide Web (WWW, 2006) conference, |
| Contact Person: Satya S. Sahoo |
Back