Knowledge representation(ontology, formal logic), Schema-driven data & information integration, Semantic Provenance in E-Science (Bioinformatics, Sensor Web), Scientific workflows (using semantic Web services), and Semantic Web


Provenance Management Framework (in collaboration with Microsoft Research)

Provenance, from the French word "provenir" meaning "to come from", describes the lineage of an entity. Provenance is critical information in eScience to accurately interpret scientific results. Though information provenance has been recognized as a hard problem in computing science (British Computing Society, 2004), many fundamental research issues in provenance have yet to be addressed. In collaboration with Microsoft Research, we have proposed a provenance management system composed of a novel provenance algebra and a materialized view-based provenance storage to address the above listed issues. Further details

Ontology-driven Information Integration (in collaboration with Lister Hill National Center, NLM/NIH)

We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the complex biological queries.

W3C RDB2RDF Incubator Group to Study Mapping Relational Data (RDB) into RDF

Member of W3C (World Wide Web Consortium) RDB2RDF Incubator Group representing Wright State University. Co-authored a extensive survey of the current state of the art techniques used for conversion of Relational Databases to RDF (report). This literature survey is part of the final report of the RDB2RDF Incubator Group to W3C.Active participation in the preparation of the RDB2RDF final report that also includes the "Ontology-driven Information Integration of Gene and Biological Pathway Data" work as a use case for information integration using RDB2RDF techniques.

ProPreO ontology - a large domain provenance ontology

ProPreO is being developed as a provenance ontology that models experimental proteomics. The current version of ProPreO contains 490 concepts and 30 relationships (170 class-level restrictions), along with more than 3 million data instances. ProPreO describes proteomics experiments using three top-level concepts: (a) data (b) material object and (c) task. The organization of concepts in this manner facilitates the annotation of high-throughput experimental data, allowing contextually relevant parameters and parameter collections (such as mass spectral data) to be efficiently identified, extracted, and analyzed by software applications.

Select Publications

S.S. Sahoo, , R.S. Barga, J. Goldstein, A. Sheth, "Provenance Algebra and Materialized View-based Provenance Management", Microsoft Research Technical Report, (MSR-TR-2008-170) November 2008,(pdf)

S.S. Sahoo, A. Sheth, C. Henson, "Semantic Provenance for eScience: 'Meaningful' Metadata to Manage the Deluge of Scientific Data", IEEE Internet Computing, Web-Scale Workflow Track, Track editors: M. Brian Blake and Michael Huhns, July/August 2008 (Vol. 12, No. 4) pp. 46-54 (pdf)

S.S. Sahoo, O. Bodenreider, J.L. Rutter, K.J. Skinner, A.P. Sheth, 'An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence.', Journal of Biomedical Informatics, Volume 41, Issue 5, October 2008,(Special Issue: Semantic Mashup of Biomedical Data) Pages 752-765 (pdf)

S.S. Sahoo, C. Thomas, A. Sheth, W.S. York and S. Tartir 'Knowledge Modeling and its application in Life Sciences: A Tale of two Ontologies', In Proceedings of the 15th international Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW '06. ACM, New York, NY, 317-326.(Acceptance Rate: 11%) (pdf)

S.S. Sahoo, K. Zeng, O. Bodenreider, A.P. Sheth, 'From "glycosyltransferase" to "congenital muscular dystrophy": Integrating knowledge from NCBI Entrez Gene and the Gene Ontology'. Proceedings of Medinfo Conference 2007, Brisbane, Australia, 20-24 August, 2007. PMID: 17911917 (pdf)

B. Aleman-Meza, C. Halaschek-Wiener, S.S. Sahoo, A. Sheth, I.B. Arpinar, 'Template Based Semantic Similarity for Security Applications', Proceedings of the IEEE Intl. Conference on Intelligence and Security Informatics (ISI-2005), May 19-20, 2005 (pdf)

S.S. Sahoo, A. Sheth, B. Hunter, W.S. York, 'SemBOWSER - adding Semantics to biological Web services registry' in Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, Edited by Christopher J. O. Baker and Kei-Hoi Cheung. New York: Springer; 2007, pp. 317-340(Abstract)(Book chapter)

Kno.e.sis Center - Research Assistant, January 2007 to present.

Lister Hill National Center for Biomedical Communications (NLM/National Institute of Health), Bethesda MD - Research Intern, 2007, 2006


Email - first name last name google mail