Projects  |   SemDis

Semantic Discovery: Discovering Complex Relationships in Semantic Web

An NSF Medium ITR project

SemDis is a collaborative project between Kno.e.sis Center at Wright State University (WSU), LSDIS lab at the University of Georgia, Athens (SemDis at UGA), and eBiquity at the University of Maryland, Baltimore County ( SemDis at UMBC).

Project Summary

Research in search techniques was a critical component of the first generation of the Web and has gone from academe to mainstream. A second generation "Semantic Web" will be built by adding semantic annotations that software can understand and from which humans can benefit. Modeling, discovering, and reasoning about complex relationships on the Semantic Web will enable this vision and transform the hunt for documents into a more automated analysis enabled by semantic technology. The beginnings of this shift from search to analysis can be observed in research and industry as users look beyond finding relevant documents based on keywords to finding actionable information leading to decision making and insights. Large-scale semantic annotation of data (domain-independent and domain-specific) is now made possible by cumulative advances in entity identification, automatic classification, taxonomy and ontology development, and metadata extraction. The next frontier, which changes the way we acquire and use knowledge fundamentally, is the automatic identification of complex relationships between entities in semantically annotated data. Instead of a search engine that returns documents containing terms of interest, we envision a system that returns actionable information (with associated sources and supporting evidence) to a user or application. The user interacts with information universe through a hypothesis-driven approach that combines search and inferencing, enabling more complex analysis and deeper insight. The examples in our narrative show that such a capability also greatly enhances the capacity of intelligence analysts to obtain (in time) information leading to a more secure homeland and world.

Our research will focus on the design, prototyping and evaluation of a system, called SemDIS (Semantic Discovery), that supports indexing and querying of complex semantic relationships and is driven by notions of information trust and provenance and models of hypotheses and arguments under investigation.

From a scientific perspective, we face the challenges of formally defining and representing meaningful and interesting relationships (which we call semantic associations), and defining the notion of quality of results similar to the familiar metrics of precision, recall, and document ranking. Another challenge is the (semi) automatic construction of argument structures built on these relationships to validate or deny a given hypothesis. Additional scientific and engineering challenges include those related to the scale of storing and complex query processing of large metadata sets, with corresponding more complex data structures to represent entities and relationships; the need to utilize context to select relevant subsets of metadata to process; and new techniques that use information provenance and trust to improve ranking of relationships. These challenges call for a fresh look at indexing, query processing, and ranking as well as tractable and scalable graph algorithms that exploit heuristics. Our work proposes to address these challenges building on our preliminary results in semantic metadata extraction, practical domain-specific ontology creation, definition of semantic associations, main-memory query processing, use of distributed trust to enforce security policies, and knowledge representation and reasoning on the semantic web. Scientific results from SemDIS will involve detailed scenarios and an evaluation testbed and will be measured in terms of novel techniques as well as performance metrics and measures of quality, scalability, and performance for computing complex semantic relationships. Corresponding to the breadth and depth of the topics involved in the challenge undertaken, ours is a collaborative proposal involving researchers at Wright State, UGA, and UMBC and covering the areas of information modeling and knowledge representation, storage and database management, information retrieval, and artificial intelligence.

Our efforts will have broad effects beyond the education and training of graduate students and the publication of research findings. Results from our research will be integrated with courses we teach, both existing and new. We will use institutional mechanisms in place to seek participation of students from underrepresented groups. Datasets used for test bed evaluations, some targeted tools will be made public or open-source, and new measures for relevance and ranking of semantic associations will provide input to future work comparing various approaches and techniques. Our work will also gain from several university-industry collaborations of the investigators. We will have the opportunity to leverage commercial infrastructure and raw metadata provided by Semagix and IBM, and, when appropriate, technology licensing will be encouraged. The researchers will collaborate with industry, and the students will be encouraged to intern at collaborating industrial labs. Within a broader social context, emerging knowledge-centric technologies raise legitimate privacy and civil liberties concerns. Building on past policymaking experience, we will comment on potential implications of our scientific progress.

Specific Focus areas at Kno.e.sis


Personnel

Talks and Presentations
Publications

Under Review
  • Cartic Ramakrishnan and Amit Sheth, “Blazing Semantic Trails in Text: Extracting Complex Relationships from Biomedical Literature.”
2007

2006 and earlier

Work on this project before January 2007 was carried out by LSDIS Lab, SemDis at UGA, and SemDis at UMBC.

Journal Papers
  • I. Budak Arpinar, Amit Sheth, Cartic Ramakrishnan, E. Lynn Usery, Molly Azami, and Michelle Kwan, “Geospatial Ontology Development and Semantic Analytics,” Transactions in GIS (Blackwell) 10 (4), 2006, pp. 551–576.
  • Vipul Kashyap, Cartic Ramakrishnan, Christopher Thomas, and Amit Sheth, “TaxaMiner: An experimentation framework for automated taxonomy bootstrapping,” International Journal of Web and Grid Services 1 (2), 2005, pp. 240–266.
  • Cartic Ramakrishnan, William H. Milnor, Matthew Perry, and Amit P. Sheth, “Discovering Informative Connection Subgraphs in Multi-relational Graphs,” special issue: Link Mining, SIGKDD Explorations 7 (2), December 2005.
  • Boanerges Aleman-Meza, Christian Halaschek-Wiener, I. Budak Arpinar, Cartic Ramakrishnan, and Amit Sheth, “Ranking Complex Relationships on the Semantic Web,” IEEE Internet Computing 9(3), May–June 2005, pp. 37–44.
  • Amit Sheth, Boanerges Aleman-Meza, I. Budak Arpinar, Chris Halaschek, Cartic Ramakrishnan, Clemens Bertram, Yashodhan
  • Yash Warke, David Avant, F. Sena Arpinar, Kemafor Anyanwu, and Krys Kochut, “Semantic Association Identification and Knowledge Discovery for National Security Applications,” special issue: Database Technology for Enhancing National Security, L. Zhou and W. Kim (Eds.), Journal of Database Management 16 (1), January–March 2005, pp. 33–53.
  • Kemafor Anyanwu and Amit P. Sheth. “The rho Operator: Discovering and Ranking Associations on the Semantic Web,” Special issue on Amicalola Workshop, SIGMOD Record 31 (4), December 2002, pp. 42–47.
Conference Publications
  • Cartic Ramakrishnan, K. Kochut, and A.P. Sheth, “A Framework for Schema-Driven Relationship Discovery from Unstructured Text,” 5th International Semantic Web Conference (ISWC2006), Athens, GA, November 5–9, 2006, Lecture Notes in Computer Science, vol. 4273, Springer, 2006.
  • M. Perry, F. Hakimpour, and A. P. Sheth, ”Analyzing Theme, Space and Time: An Ontology-based Approach,” Proceedings of the 14th International Symposium on Advances in Geographic Information Systems (ACM-GIS 2006), Arlington, VA, November 10–11, 2006, New York: ACM Press, 2006.
  • J. Hassell, B. Aleman-Meza, and I.B. Arpinar, “Ontology-Driven Automatic Entity Disambiguation in Unstructured Text,” 5th International Semantic Web Conference (ISWC 2006), Athens, GA, November 5–9, 2006, I. Cruz et al.(Eds.), Lecture Notes in Computer Science, vol. 4273, Springer, 2006.
  • Leo Deligiannidis, Amit Sheth, and Boanerges Aleman-Meza, “Semantic Analytics Visualization,” IEEE International Conference on Intelligence and Security Informatics (ISI-2006), San Diego, CA, May 23–24, 2006.
  • B. Aleman-Meza, M. Nagarajan, C. Ramakrishnan, L. Ding, P. Kolari, A. Sheth, I. B. Arpinar, A. Joshi, and T. Finin, “Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection,” 15th International World Wide Web Conference, Edinburgh, Scotland, May 23–26, 2006 (acceptance rate 11%). Web version
  • Maciej Janik and Krys Kochut, “BRAHMS: A WorkBench RDF Store and High Performance Memory System for Semantic Association Discovery,” Proceedings of 4th International Semantic Web Conference (ISWC 2005), Galway, Ireland, November 2005, pp. 431–445.
  • M.Perry, M.Janik, C.Ramakrishnan, C. Ibanez, I.B. Arpinar, and A. Sheth,“Peer-to-Peer Discovery of Semantic Associations,&rdquo 2nd International Workshop on Peer-to-Peer Knowledge Management (P2PKM '05), San Diego, CA, July 17, 2005.
  • K. Anyanwu, A. Maduko, and A. Sheth, “SemRank: Ranking Complex Relationship Search Results on the Semantic Web.” Proceedings of 14th International World Wide Web Conference (WWW2005), Chiba, Japan, May 2005, pp. 117–12. Paper Presentation
  • Boanerges Aleman-Meza, Phillip Burns, Matthew Eavenson, Devanand Palaniswami, and Amit Sheth, “An Ontological Approach to the Document Access Problem of Insider Threat,” Proceedings of IEEE International Conference on Intelligence and Security Informatics (ISI-2005), May 19–20, 2005. Presentation (conference version)
  • Kemafor Anyanwu and Amit P. Sheth, “r-Queries: Enabling Querying for Semantic Associations on the Semantic Web,” 12th International World Wide Web Conference, Budapest, Hungary, May 2003. Paper (html), Presentation (ppt), Presentation (pdf)
Workshop Papers
  • Farshad Hakimpour, Boanerges Aleman-Meza, Matthew Perry, and Amit Sheth, “Data Processing in Space, Time and Semantics Dimensions,” paper for Workshop, Terra Cognita 2006 - Directions to the Geospatial Semantic Web, at 5th International Semantic Web Conference, Athens, GA, November 6, 2006.
  • Boanerges Aleman-Meza, Chris Halaschek, Amit Sheth, I. Budak Arpinar, and Gowtham Sannapareddy, “SWETO: Large-Scale Semantic Web Test-bed,” Proceedings of 16th International Conference on Software Engineering & Knowledge Engineering (SEKE2004): Workshop on Ontology in Action, Banff, Canada, June 21–24, 2004, pp. 490–493.
  • Boanerges Aleman-Meza, Chris Halaschek, I. Budak Arpinar, and Amit Sheth, “Context-Aware Semantic Association Ranking,” Proceedings of 1st International Workshop on Semantic Web and Databases, Berlin, Germany, September 7–8, 2003, pp. 33–50.
Book Chapters Demonstrations and Short Papers Technical Reports
  • Boanerges Aleman-Meza, Farshad Hakimpour, I. Budak Arpinar, and Amit P. Sheth, “SwetoDblp Ontology of Computer Science Publications,” Technical Report, LSDIS Lab, Computer Science Department, University of Georgia, October 2006. (Predecessor of our SwetoDblp Article in J. Web Semantics)
  • Boanerges Aleman-Meza, Christian Halaschek-Wiener, Satya Sanket Sahoo, Amit Sheth, and I. Budak Arpinar, “Template Based Semantic Similarity for Security Applications,” Technical Report, LSDIS Lab, Computer Science Department, University of Georgia, January 2005.
  • William H. Milnor, Cartic Ramakrishnan, Matthew Perry, Amit P. Sheth, John A. Miller, and Krzysztof J. Kochut, “Discovering Informative Subgraphs in RDF Graphs,” Technical Report, LSDIS Lab, Computer Science Department, University of Georgia, April 2005
Related Presentations
  • B. Aleman-Meza, A. Sheth, I.B. Arpinar, C. Halaschek, and SemDIS team, “Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking applications,” Developers Day: Semantic Web Track. Intl WWW Conference Developers Day, New York, NY, May 2004. Presentation (PDF), Abstract
  • Amit Sheth, “Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating and Exploiting Complex Semantic Relationships,” Keynote Address, 29th Annual Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2002), Milovy, Czech Republic, November 2002. Presentation
  • Amit Sheth, “Semantic Content Management for Enterprises and National Security,” Keynote Address, Content and Semantic-based Information Retrieval, held in conjunction with 6th World Multi-conference on Systemics, Cybernetics, and Informatics (SCI 2002), Orlando, Florida, July 147–18, 2002. Abstract Presentation (ppt)
Scientific and Community Resources

This project has resulted in several scientific and community resources including ontologies, tools, and systems that are made available under open source license at no cost to the research community. Some of these are widely used and referred to by others in the Semantic Web research community. The resources include:
  • Semantic Web Technology Evaluation Ontology (SWETO), a large, high-quality test ontology from which various ontology management tools can assess and test scalability and other properties.
  • SwetoDblp, a large ontology (spin-off of SWETO ontology) focused on bibliographic data from computer science publications, for which the main source is DBLP.
  • TOntoGen, a test (synthetic, parameterized) ontology generation tool.
  • BRAHMS, a fast main-memory RDF/S storage, capable of storing, accessing, and querying large ontologies.
  • Semantic Browser, a tool that demonstrates the concept of Relationship Web by creating a relationships-centric metaweb on documents. It allows users to traverse semantically connected documents through domain-specific relationships and uses research in entity and relationship extraction.
  • SemDis API, a simple yet flexible set of interfaces intended to be a basis for implementations of RDF data access suitable to the types of algorithms being developed in the SemDis project.
  • Swoogle, a semantic Web search engine and metadata service provider, whose development was funded by SemDis and Spire (another NSF funded project at UMBC).
  • The Semantic visualization subproject provides interactive search and analytics interfaces for visual modeling and display, graphical query formulation and other semantic Web capabilities. Three tools have been developed: OntoVista, for life science applications; SAV, a 3D visualization tool for semantic analytics; and SET, or Semantic EventTracker, a highly interactive visualization tool for tracking and associating activities (events).


This material is based on work supported by the National Science Foundation under Award No. 071441 to Wright State University and No. IIS-0325464 to University of Georgia titled “SemDis: Discovering Complex Relationships in the Semantic Web.” Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. PI: Amit Sheth, Co-PIs: I. Budak Arpinar, Krys Kochut, and John Miller.

Best viewed with Firefox 2