Research
Projects
eScience is a paradigm shift in scientific research. Provenance metadata describes the lineage of data. Provenance is being increasingly recognized as fundamental metadata component for scalable, effective and efficient integration and management of eScience data. The Semantic Provenance Framework (SPF) is focussed on incorporating domain knowledge and ontological underpinning in provenance using expressive domain-specific provenance ontologies. This form of provenance that imposes a formally-defined domain-specific conceptual view on scientific data (domain semantics), mitigates or eliminates terminological heterogeneity, and enables the use of reasoning tools for knowledge discovery is defined as 'semantic provenance'. SPF is being applied to two real world application scenarios: a) Semantic Provenance Annotation of Proteomics Data (SPADE) b) Semantic Provenance for Sensors Data Retrieval (CLOVER)
ProPreO is being developed as a provenance ontology that models experimental proteomics. The current version of ProPreO contains 490 concepts and 30 relationships (170 class-level restrictions), along with more than 3 million data instances. ProPreO describes proteomics experiments using three top-level concepts: (a) data (b) material object and (c) task. The organization of concepts in this manner facilitates the annotation of high-throughput experimental data, allowing contextually relevant parameters and parameter collections (such as mass spectral data) to be efficiently identified, extracted, and analyzed by software applications.
GLYcan Data Exchange (GLYDE) standard is an XML-based representation format to enable interoperability and exchange of glycomics data. The XML representation is compatible with the GlycO ontology and facilitates the computational processing of glycan structures.Although GLYDE-II is mainly intended to provide a standard for the representation of the chemical structures of complex glycans, it aims also to provide sufficient flexibility so that its specifications can be easily integrated into a systems biology concept.
In collaboration with the Lister Hill National Center for Biomedical Communication (U.S. National Library of Medicine, NIH), we are working on integrating biological data in structured resources namely relational databases using Semantic Web representational formats. We converted the NCBI Entrez Gene (EG) data source into RDF using named relationships to relate data entities in EG. This enabled us to capture the logical, domain relevant connections between genes, proteins encoded by these genes, the disease information associated with these genes and their location on the chromosomes. Next, we integrated the Gene Ontology structure, available in RDF format, with EG RDF and were able to effectively answer research queries linking 'glycosyltransferase' to 'congenital muscular dystrophy'. Currently, we working to integrate all gene related NCBI data sources with EG and GO using RDF as the common representational format.
Select Publications
S.S. Sahoo, A. Sheth, C. Henson, "Semantic Provenance for eScience: 'Meaningful' Metadata to Manage the Deluge of Scientific Data", IEEE Internet Computing, Web-Scale Workflow Track, Track editors: M. Brian Blake and Michael Huhns, July/August 2008 (Vol. 12, No. 4) pp. 46-54 (pdf)
S.S. Sahoo, O. Bodenreider, J.L. Rutter, K.J. Skinner, A.P. Sheth, 'An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence.', Journal of Biomedical Informatics (Special Issue: Semantic Biomedical Mashups) 2008 (in press) (pdf)
S.S. Sahoo, C. Thomas, A. Sheth, W.S. York and S. Tartir 'Knowledge Modeling and its application in Life Sciences: A Tale of two Ontologies', 15th International WWW2006 Conference, Edinburgh, Scotland, May 23 - May 26, 2006 (Acceptance Rate: 11%) (pdf)
S.S. Sahoo, K. Zeng, O. Bodenreider, A.P. Sheth, 'From "glycosyltransferase" to "congenital muscular dystrophy": Integrating knowledge from NCBI Entrez Gene and the Gene Ontology'. Proceedings of Medinfo Conference 2007, Brisbane, Australia, 20-24 August, 2007. PMID: 17911917 (pdf)
S.S. Sahoo, C. Thomas, A. Sheth, C. Henson, W.S. York, 'GLYDE-an expressive XML standard for the representation of glycan structure'.Journal of Carbohydrate Research 2005 Dec 30;340(18):2802-7. Epub 2005 Oct 20. PMID: 16242678
B. Aleman-Meza, C. Halaschek-Wiener, S.S. Sahoo, A. Sheth, I.B. Arpinar, 'Template Based Semantic Similarity for Security Applications', Proceedings of the IEEE Intl. Conference on Intelligence and Security Informatics (ISI-2005), May 19-20, 2005 (pdf)
S.S. Sahoo, A. Sheth, B. Hunter, W.S. York, 'SemBOWSER - adding Semantics to biological Web services registry' in Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, Edited by Christopher J. O. Baker and Kei-Hoi Cheung. New York: Springer; 2007, pp. 317-340(Abstract)(Book chapter)
complete list of publications in CV