|
Research in search techniques was a critical component of the first generation of the Web and has gone from academe to mainstream.
A second generation "Semantic Web" will be built by adding semantic annotations that software can understand and
from which humans can benefit. Modeling, discovering, and reasoning about complex relationships on the Semantic Web will enable
this vision and transform the hunt for documents into a more automated analysis enabled by semantic technology. The beginnings of
this shift from search to analysis can be observed in research and industry as users look beyond finding relevant documents based
on keywords to finding actionable information leading to decision making and insights. Large-scale semantic annotation of data
(domain-independent and domain-specific) is now made possible by cumulative advances in entity identification, automatic
classification, taxonomy and ontology development, and metadata extraction. The next frontier, which changes the way
we acquire and use knowledge fundamentally, is the automatic identification of complex relationships between entities in
semantically annotated data. Instead of a search engine that returns documents containing terms of interest, we envision a system
that returns actionable information (with associated sources and supporting evidence) to a user or application. The user interacts
with information universe through a hypothesis-driven approach that combines search and inferencing, enabling more complex analysis
and deeper insight. The examples in our narrative show that such a capability also greatly enhances the capacity of intelligence
analysts to obtain (in time) information leading to a more secure homeland and world.
Our research will focus on the design, prototyping and evaluation of a system, called SemDIS (Semantic Discovery), that supports
indexing and querying of complex semantic relationships and is driven by notions of information trust and provenance and models of
hypotheses and arguments under investigation.
From a scientific perspective, we face the challenges of formally defining and representing meaningful and interesting relationships
(which we call semantic associations), and defining the notion of quality of results similar to the familiar metrics of precision,
recall, and document ranking. Another challenge is the (semi) automatic construction of argument structures built on these
relationships to validate or deny a given hypothesis. Additional scientific and engineering challenges include those related to
the scale of storing and complex query processing of large metadata sets, with corresponding more complex data structures to
represent entities and relationships; the need to utilize context to select relevant subsets of metadata to process; and new
techniques that use information provenance and trust to improve ranking of relationships. These challenges call for a fresh look
at indexing, query processing, and ranking as well as tractable and scalable graph algorithms that exploit heuristics.
Our work proposes to address these challenges building on our preliminary results in semantic metadata extraction, practical
domain-specific ontology creation, definition of semantic associations, main-memory query processing, use of distributed trust to
enforce security policies, and knowledge representation and reasoning on the semantic web. Scientific results from SemDIS will
involve detailed scenarios and an evaluation testbed and will be measured in terms of novel techniques as well as performance
metrics and measures of quality, scalability, and performance for computing complex semantic relationships. Corresponding to the
breadth and depth of the topics involved in the challenge undertaken, ours is a collaborative proposal involving researchers at
Wright State, UGA, and UMBC and covering the areas of information modeling and knowledge representation, storage and database
management, information retrieval, and artificial intelligence.
Our efforts will have broad effects beyond the education and training of graduate students and the publication of research findings.
Results from our research will be integrated with courses we teach, both existing and new. We will use institutional mechanisms in
place to seek participation of students from underrepresented groups. Datasets used for test bed evaluations, some targeted tools
will be made public or open-source, and new measures for relevance and ranking of semantic associations will
provide input to future work comparing various approaches and techniques. Our work will also gain from several university-industry
collaborations of the investigators. We will have the opportunity to leverage commercial infrastructure and raw metadata provided
by Semagix and IBM, and, when appropriate, technology licensing will be encouraged. The researchers will collaborate with industry,
and the students will be encouraged to intern at collaborating industrial labs. Within a broader social context, emerging
knowledge-centric technologies raise legitimate privacy and civil liberties concerns. Building on past policymaking experience,
we will comment on potential implications of our scientific progress.
Specific Focus areas at Kno.e.sis
Personnel
Talks and Presentations
- Amit Sheth, “Realizing the Relationship Web:
Morphing information access on the Web from today's document-
and entity-centric paradigm to a relationship-centric paradigm,” Keynote at the ACM Multimedia
International Workshop: Many Faces of Multimedia Semantics,
Augsburg, Germany, September 28, 2007.
- “Relationship Web: Spinning the Semantic
Web from Trailblazing to Complex Hypothesis Evaluation,” seminar talk at the College of Engineering, University of Illinois at
Chicago, August 31, 2007.
- Amit Sheth, “ Trailblazing, Complex Hypothesis Evaluation, Abductive Inference and Semantic
Web—Exploring possible synergy,” address at Evidence and Intelligent Systems: ARO Workshop on Abductive Reasoning,
Adelphi, MD, August 24, 2007.
- Amit Sheth, “Relationship Web: Spinning the Semantic Web from Trailblazing to Complex Hypothesis
Evaluation,” talk in the Cyber Center Lecture Series at Purdue University,
Lafayette, Indiana, August 16, 2007.
- Amit P. Sheth, “Relationship Web: Realizing the Memex vision with the help of Semantic Web,
address at SemGrail 2007, Redmond, WA, June 21–22, 2007.
- Amit P. Sheth and Susie Stephens, “Semantic Web:
Technologies and Applications for the Real-World,”
presentation at 16th World Wide Web Conference (WWW2007), Banff, Canada, May 8–12, 2007.
Publications
Under Review
- Cartic Ramakrishnan and Amit Sheth, “Blazing Semantic Trails in Text: Extracting Complex
Relationships from Biomedical Literature.”
2007
- Boanerges Aleman-Meza, Meenakshi Nagarajan, Li Ding, Amit Sheth, I. Budak Arpinar, Anupam Joshi, and Tim Finin,
“Scalable Semantic Analytics on Social Networks for Addressing the Problem of Conflict of Interest Detection,”
ACM Transactions on the Web (accepted for publication in February 2008 issue).
- B. Aleman-Meza, S. Decker, D. Cameron, and I.B. Arpinar, “Association
Analytics for Network Connectivity in a Bibliographic and Expertise Dataset,”
in Semantic Web Engineering in the Knowledge Society, J. Cardoso and M.D. Lytras
(Eds.), 2008(in press)
- B. Aleman-Meza, “Ranking
Documents based on Relevance of Semantic Relationships,” PhD Dissertation,
University of Georgia, Computer Science Department, August 2007 (under the direction of I. Budak Arpinar).
- B. Aleman-Meza, F. Hakimpour, I.B. Arpinar, A.P. Sheth,
“SwetoDblp
Ontology of Computer Science Publications,”
Web Semantics: Science, Services and Agents on the World Wide Web (Elsevier),online March 2007.
- Kemafor Anyanwu, “Supporting Link Analysis Using Advanced Querying Methods on Semantic Web Databases,”
PhD Dissertation, University of Georgia, Computer Science Department, August 2007(under the direction of Amit P. Sheth).
- K. Anyanwu, A. Maduko, and A. Sheth, “SPARQ2L: Towards Supporting
Subgraph Extraction Queries in RDF Databases,”
Proceedings of 16th World Wide Web Conference (WWW2007), Banff, Canada, May 7–12, 2007.
- D. Cameron, B. Aleman-Meza, I. B. Arpinar, “Collecting Expertise of Researchers for Finding Relevant Experts in a
Peer-Review Setting,” 1st International ExpertFinder Workshop (EFW 2007), co-located with 7th Knowledge Web General
Assembly, Berlin Germany, January 16, 2007.
- Farshad Hakimpour, Boanerges Aleman-Meza, Matthew Perry, and Amit P. Sheth, “Spatiotemporal-Thematic Data Processing
in Semantic Web, The Geospatial Web: How Geobrowsers, Social Software and the Web 2.0 are Shaping the Network Society,“ Arno Scharl and
Klaus Tochtermann (Eds.), Advanced Information and Knowledge Processing Series, London: Springer, May 2007, Ch. 8.
- K. Gomadam, A. Ranabahu, L.Ramaswamy, A.P. Sheth, and K.Verma, “Semantic Framework for Identifying Events in a
Service Oriented Architecture,” International Conference on Web Services (ICWS),
Salt Lake City, July 9–13, 2007, Proceedings, IEEE Xplore. ISBN: 0-7695-2924-0C
- K. Kochut and M. Janik, “SPARQLeR:
Extended Sparql for Semantic Association Discovery,”
4th European Semantic Web Conference, Innsbruck, Austria, June 3–7, 2007, pp. 145–159.
- Maduko, A., Anyanwu, K., Sheth, A. and P. Schliekelman, “Estimating the Cardinality of
RDF Graph Patterns,” 16th World Wide Web Conference (WWW2007), Banff, Canada, May 8–12, 2007.
- M. Nagarajan, A. P. Sheth, M. Aguilera, K. Keeton, A. Merchant, and M. Uysal, “Altering Document
Term Vectors for Classification—Ontologies as Expectations of Co-occurrence,” 16th World Wide Web Conference
(WWW2007), Banff, Canada, May 8–12, 2007
- M. Perry, A. P. Sheth, F. Hakimpour, and P. Jain, “Supporting Complex Thematic,
Spatial and Temporal Queries over Semantic Web Data,”
2nd International Conference on Geospatial Semantics (GeoS 2007), Mexico City, November 29–30, 2007.
- S. Tartir and I. B. Arpinar, “Ontology Evaluation
and Ranking using OntoQA,” 1st IEEE International Conference
on Semantic Computing, Irvine, CA, September 17–19, 2007, pp. 185–192.
- S. Tartir, I. B. Arpinar, and A. Sheth, “Ontology Evaluation and
Validation,” in TAO—Theory and
Applications of Ontologies, Vol. II: Ontology: The Information-Science Stance, Springer (forthcoming).
2006 and earlier
Work on this project before January 2007 was carried
out by LSDIS Lab, SemDis at UGA, and SemDis at UMBC.
Journal Papers
- I. Budak Arpinar, Amit Sheth, Cartic Ramakrishnan, E. Lynn Usery, Molly Azami, and Michelle Kwan,
“Geospatial Ontology Development and
Semantic Analytics,” Transactions in GIS (Blackwell) 10 (4), 2006, pp. 551–576.
- Vipul Kashyap, Cartic Ramakrishnan, Christopher Thomas, and Amit Sheth, “TaxaMiner: An experimentation
framework for automated taxonomy bootstrapping,” International Journal of Web and Grid Services 1 (2), 2005,
pp. 240–266.
- Cartic Ramakrishnan, William H. Milnor, Matthew Perry, and Amit P. Sheth, “Discovering Informative Connection Subgraphs
in Multi-relational Graphs,” special issue: Link Mining, SIGKDD Explorations 7 (2), December 2005.
- Boanerges Aleman-Meza, Christian Halaschek-Wiener, I. Budak Arpinar, Cartic Ramakrishnan, and Amit Sheth,
“Ranking
Complex Relationships on the Semantic Web,” IEEE Internet Computing 9(3), May–June 2005, pp. 37–44.
- Amit Sheth, Boanerges Aleman-Meza, I. Budak Arpinar, Chris Halaschek, Cartic Ramakrishnan, Clemens Bertram, Yashodhan
- Yash Warke, David Avant, F. Sena Arpinar, Kemafor Anyanwu, and Krys Kochut, “Semantic Association Identification and
Knowledge Discovery for National Security Applications,” special issue: Database Technology for Enhancing National
Security, L. Zhou and W. Kim (Eds.), Journal of Database Management 16 (1), January–March 2005, pp. 33–53.
- Kemafor Anyanwu and Amit P. Sheth. “The rho Operator: Discovering and Ranking Associations on the Semantic Web,”
Special issue on Amicalola Workshop, SIGMOD Record 31 (4), December 2002, pp. 42–47.
Conference Publications
- Cartic Ramakrishnan, K. Kochut, and A.P. Sheth, “A Framework for Schema-Driven Relationship
Discovery from Unstructured Text,” 5th International Semantic Web Conference (ISWC2006), Athens, GA,
November 5–9, 2006, Lecture Notes in Computer Science, vol. 4273, Springer, 2006.
- M. Perry, F. Hakimpour, and A. P. Sheth, ”Analyzing Theme, Space and Time: An Ontology-based
Approach,” Proceedings of the 14th International Symposium on Advances in Geographic Information Systems (ACM-GIS 2006),
Arlington, VA, November 10–11, 2006, New York: ACM Press, 2006.
- J. Hassell, B. Aleman-Meza, and I.B. Arpinar, “Ontology-Driven
Automatic Entity Disambiguation in Unstructured Text,” 5th International Semantic Web Conference
(ISWC 2006), Athens, GA, November 5–9, 2006, I. Cruz et al.(Eds.), Lecture Notes in Computer Science, vol. 4273, Springer, 2006.
- Leo Deligiannidis, Amit Sheth, and Boanerges Aleman-Meza, “Semantic Analytics Visualization,” IEEE International
Conference on Intelligence and Security Informatics (ISI-2006), San Diego, CA, May 23–24, 2006.
- B. Aleman-Meza, M. Nagarajan, C. Ramakrishnan, L. Ding, P. Kolari, A. Sheth, I. B. Arpinar, A. Joshi,
and T. Finin, “Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest
Detection,” 15th International World Wide Web Conference, Edinburgh, Scotland, May 23–26, 2006 (acceptance rate 11%).
Web version
- Maciej Janik and Krys Kochut, “BRAHMS: A WorkBench RDF Store and High Performance Memory System for Semantic
Association Discovery,” Proceedings of 4th International Semantic Web Conference (ISWC 2005), Galway,
Ireland, November 2005, pp. 431–445.
- M.Perry, M.Janik, C.Ramakrishnan, C. Ibanez, I.B. Arpinar, and A. Sheth,“Peer-to-Peer Discovery of
Semantic Associations,&rdquo 2nd International Workshop on Peer-to-Peer Knowledge Management (P2PKM '05),
San Diego, CA, July 17, 2005.
- K. Anyanwu, A. Maduko, and A. Sheth, “SemRank: Ranking Complex Relationship Search Results on the Semantic Web.”
Proceedings of 14th International World Wide Web Conference (WWW2005), Chiba, Japan, May 2005, pp. 117–12.
Paper Presentation
- Boanerges Aleman-Meza, Phillip Burns, Matthew Eavenson, Devanand Palaniswami, and Amit Sheth, “An Ontological
Approach to the Document Access Problem of Insider Threat,” Proceedings of IEEE International
Conference on Intelligence and Security Informatics (ISI-2005), May 19–20, 2005. Presentation (conference version)
- Kemafor Anyanwu and Amit P. Sheth, “r-Queries: Enabling Querying for Semantic Associations on the Semantic
Web,” 12th International World Wide Web Conference, Budapest, Hungary, May 2003. Paper (html), Presentation (ppt),
Presentation (pdf)
Workshop Papers
- Farshad Hakimpour, Boanerges Aleman-Meza, Matthew Perry, and Amit Sheth, “Data Processing in Space, Time and
Semantics Dimensions,” paper for Workshop, Terra Cognita 2006 - Directions
to the Geospatial Semantic Web, at 5th International Semantic Web Conference, Athens, GA, November 6, 2006.
- Boanerges Aleman-Meza, Chris Halaschek, Amit Sheth, I. Budak Arpinar, and Gowtham Sannapareddy, “SWETO: Large-Scale
Semantic Web Test-bed,” Proceedings of 16th International Conference on Software Engineering & Knowledge Engineering
(SEKE2004): Workshop on Ontology in Action, Banff, Canada, June 21–24, 2004, pp. 490–493.
- Boanerges Aleman-Meza, Chris Halaschek, I. Budak Arpinar, and Amit Sheth,
“Context-Aware
Semantic Association Ranking,” Proceedings of 1st International Workshop on Semantic Web
and Databases, Berlin, Germany, September 7–8, 2003, pp. 33–50.
Book Chapters
- Boanerges Aleman-Meza, Amit Sheth, Devanand Palaniswami, Matthew Eavenson, and I. Budak Arpinar, “Semantic Analytics in Intelligence:
Applying Semantic Association Discovery to Determine Relevance of Heterogeneous Documents,” in
Advanced Topics in Database Research, vol. 5, Keng Siau, Ed., Idea Group Publishing, 2006, pp. 401–419.
- Boanerges Aleman-Meza, Christian Halaschek-Wiener, and Ismailcem Budak Arpinar, “Collective Knowledge Composition in a
P2P Network, in Encyclopedia of Database Technologies and Applications, L.C. Rivero et al. (Eds.),
Idea Group, 2005, pp. 74–77. ISBN 1-59140-560-2
Demonstrations and Short Papers
- Chris Halaschek, Boanerges Aleman-Meza, I. Budak Arpinar, and Amit Sheth, “Discovering and Ranking Semantic
Associations over a Large RDF Metabase,” 30th International Conference on Very Large Data Bases, Toronto, Canada,
August 30–September 03, 2004. Demonstration Paper (conference version), Poster, Presentation.
- Boanerges Aleman-Meza, Christian Halaschek-Wiener, Satya Sanket Sahoo, Amit Sheth, and
I. Budak Arpinar, “Template
Based Semantic Similarity for Security Applications,” Proceedings of IEEE
International Conference on Intelligence and Security Informatics (ISI-2005), May 19–20, 2005. Extended Abstract
(conference version)
Technical Reports
- Boanerges Aleman-Meza, Farshad Hakimpour, I. Budak Arpinar, and Amit P. Sheth, “SwetoDblp Ontology of Computer
Science Publications,” Technical Report, LSDIS Lab, Computer Science Department, University of Georgia, October 2006.
(Predecessor of our SwetoDblp Article in J. Web Semantics)
- Boanerges Aleman-Meza, Christian Halaschek-Wiener, Satya Sanket Sahoo, Amit Sheth, and I. Budak Arpinar, “Template
Based Semantic Similarity for Security Applications,” Technical Report, LSDIS Lab, Computer Science Department,
University of Georgia, January 2005.
- William H. Milnor, Cartic Ramakrishnan, Matthew Perry, Amit P. Sheth, John A. Miller, and Krzysztof J. Kochut,
“Discovering Informative Subgraphs in RDF Graphs,” Technical Report, LSDIS Lab, Computer Science Department,
University of Georgia, April 2005
Related Presentations
- B. Aleman-Meza, A. Sheth, I.B. Arpinar, C. Halaschek, and SemDIS team, “Semantic Web Technology Evaluation Ontology (SWETO):
A test bed for evaluating tools and benchmarking applications,” Developers Day: Semantic Web Track. Intl WWW
Conference Developers Day, New York, NY, May 2004. Presentation (PDF), Abstract
- Amit Sheth, “Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating and Exploiting
Complex Semantic Relationships,” Keynote Address, 29th Annual Conference on Current Trends in Theory
and Practice of Informatics (SOFSEM 2002), Milovy, Czech Republic, November 2002. Presentation
- Amit Sheth, “Semantic Content Management for Enterprises and National Security,” Keynote Address,
Content and Semantic-based Information Retrieval, held in conjunction with 6th World Multi-conference on
Systemics, Cybernetics, and Informatics (SCI 2002), Orlando, Florida, July 147–18, 2002. Abstract Presentation (ppt)
Scientific and Community Resources
This project has resulted in several scientific and community resources including ontologies, tools, and
systems that are made available under open source license at no cost to the research community. Some of these are
widely used and referred to by others in the Semantic Web research community. The resources include:
- Semantic Web Technology Evaluation Ontology (SWETO), a large, high-quality test
ontology from which various ontology management tools can assess and test scalability and other properties.
- SwetoDblp, a large ontology (spin-off of SWETO ontology) focused on bibliographic data from
computer science publications, for which the main source is DBLP.
- TOntoGen, a test (synthetic, parameterized) ontology generation tool.
- BRAHMS, a fast main-memory RDF/S storage, capable of storing, accessing, and querying large ontologies.
- Semantic Browser, a tool that demonstrates the concept of Relationship Web by creating a
relationships-centric metaweb on documents. It allows users to traverse semantically connected
documents through domain-specific relationships and uses research in entity and relationship extraction.
- SemDis API, a simple yet flexible set of interfaces
intended to be a basis for implementations of RDF data access suitable to the types of algorithms being developed in the
SemDis project.
- Swoogle, a semantic Web search engine and metadata service provider,
whose development was funded by SemDis and Spire (another NSF funded project at UMBC).
- The Semantic visualization subproject provides interactive search and analytics interfaces for visual modeling and
display, graphical query formulation and other semantic Web capabilities. Three tools have been developed: OntoVista, for life
science applications; SAV, a 3D visualization tool for semantic analytics; and SET, or Semantic EventTracker, a highly
interactive visualization tool for tracking and associating activities (events).
|