Dissertation Research
My research is motivated by the need to effectively utilize textual knowledge in building intelligent systems. Over the years several compelling examples have shown the need for automating the identification of emergent knowledge from text. This capability also been listed as a milestone in
Microsoft's 2020 vision. The need for such a capability pointed out by
Dr. Don Swanson when he discovered potential theapeutic uses of Magnesium in alleviating some Migraines. This was done by manually linking article titles and the discovered associations were subsequently validated by wet-lab experments. The utility of traversing named associations between objects was pointed out as far back as 1945 by Dr.Vannevar Bush in his
MEMEX vision. All of these have served as motivation for my research.
Research threads
I have developed algorithms for entity and relationship extraction from text. Multi-relational knowledge extracted by these algorithms is represented using
RDF. I have also developed an algorithm for informative subgraph discovery over the resulting RDF graph. For my dissertation I have focussed on the following research problems as they apply to the biomedical domain:
- Unsupervised Identification of compound entities in biomedical text - Named entities occuring in text can be structurally and semantically complex. Simple entities can be nested to form complex ones. Supervised approaches to identification and typing of these nested entities have been explored. These however require training data. My research has sought to circumvent the need for training data for this problem. Details of my approach to this problem can be found in the following paper: EKAW 2008.
- Unsupervised Extraction of relationships from biomedical text - Dictionaries of terms in the biomedical domain contain entities organized in hierarchies (e.g. MeSH). My research on relationship extraction has sought to extract named relationship between these entities. I have investigated two approaches to this problem. Details are available in the following papers: WI 2008 and ISWC 2006.
- Subgraph discovery over Multi-relational Networks - Discovering patterns in graphs has long been an area of interest. In most approaches to such pattern discovery either quantitative anomalies, frequency of substructure is used to measure the interestingness of a pattern. My approach to this problem sought to define "interestingness" of subgraphs based on domain semantics. In this work I adapted a fast connection subgraph algorithm to answer the following question given an RDF graph. "What are the most relevant ways in which entity X is related to entity Y?" the response to this question is a subgraph connecting X to Y containing the most relevant paths between X and Y. Details are available in the following paper: SIGKDD Explorations 2005.
Resources
For details about the projects described above please visit
Text mining and knowledge discovery @ Kno.e.sis.
My dissertation based on this work is available as
pdf.
Colleagues & Collaborators
The work listed above was done in collaboration with
Pablo N. Mendes,
Dr. Matthew Perry,
Willie Milnor ,
Dr. Shaojun Wang guided by my academic advisor
Dr. Amit P. Sheth.