TcruziKB

The Trypanosoma cruzi Knowledge Base

In this project we mapped TcruziDB.org to ontologies and provided a flexible query interface that should guide the user on exploring that genome database.

We start by helping the user to build questions through a series of conceptual relationships between the types of data stored (e.g. "gene -> codes for -> protein -> expressed in -> life cycle stage", to represent a question for genes expressed in a given life cycle stage). Then we show the results to the user in several formats (the trivial tabular format, graph-based relationship explorer and a statistical explorer). In the video I only demonstrate the graph-based explorer, where the user can keep drilling down for more details, and eventually reveal connections between the data that were not explicit initially.

In addition to that, we also try to calculate the importance of relationships based on co-occurrence of data. For example, if a group of ortholog genes all contain a given annotation (e.g. protein domain), that will show that this is probably an important feature for that class of proteins. We color the graph edges in shades of red to highlight the most important features.

The data was obtained from TcruziDB.org, a genome database based on the Genome Unified Schema (GUS). An ontology was created to define the possible types of data and relationships between data to be found in the database. The mapping process comprised an automatic step derived from the database schema and a manual step for enhancement and reuse of existing standards like the OBO (Open Biomedical Ontologies).

This application works through programmatic web access (Ajax) to an ontology-based layer, with no database-specific code, so any database mapped to an ontology is a potential data source for this system. Moreover, the system is not specific to any ontology, so it could be reused in virtually any domain of knowledge.