Kno.e.sis seeks to make a significant majority of the tools, data and ontologies developed under federal funding under an open source license for use by fellow researchers and often for broader use, especially when quality of software and tool are ready for external use. We also endeavor to support the evolving standards and specifications in a variety of related fields through W3C and other channels. Kno.e.sis has a history of developing community resources such as Semantic Web datasets, open source tools, and public services, which they have hosted for significant periods after the end of respective projects and/or made available through public or open source distribution channels. For more information, please email opensource at knoesis.org.Tools & Services (Active Projects) * Tools & Services (Past Projects) * Ontologies (Active Projects) * Ontologies (Past Projects) * Data Sets * Standards * Challenge/Competition
Tools & Services (Active Projects)
- EmojiNet: EmojiNet is the largest machine-readable emoji sense inventory that links Unicode emoji representations to their English meanings extracted from the Web. EmojiNet is a dataset consisting of: (i) 12,904 sense labels over 2,389 emoji, which were extracted from the web and linked to machine-readable sense definitions seen in BabelNet; (ii) context words associated with each emoji sense, which are inferred through word embedding models trained over Google News corpus and a Twitter message corpus for each emoji sense definition; and (iii) recognizing discrepancies in the presentation of emoji on different platforms, specification of the most likely platform-based emoji sense for a selected set of emoji. The dataset is hosted as an open service (Web tool with downloadable JSON datasets) with a REST API and is available at http://emojinet.knoesis.org/.
- Emotion Identification Dataset: User generated content on Twitter provides a rich source for gleaning people's emotions, which is necessary for deeper understanding of people's behaviors and actions. Extant studies on emotion identification lack comprehensive coverage of "emotional situations" because they use relatively small training datasets. To overcome this bottleneck, we have automatically created a large emotion-labeled dataset by harnessing emotion-related hashtags available in the tweets. It contains manually curated hashtags to emotion mappings as well as twitter ids with their corresponding emotion labels.
- Social-media Depression Detector (SDD): This tool allows you to detect depression on social media using the ssToT method introduced in our ASONAM 2017 paper titled Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media. The tool is written in python.
- Depression Lexicon: depression lexicon that contains common depression symptoms from the established clinical assessment questionnaire PHQ-9. We rank the terms and compile a list of informative lexicon terms for each user and use them as seed terms to discover latent topics (depression symptoms) discussed by the subject in his/her tweets. We use this lexicon in our Social-media Depression Detector (SDD) to detect social media users' depression.
- Geoann: Geoann is an annotation tool for geocoding location names in texts. We use Brat (a web-based tool for NLP-assisted text annotation) for visualizing the annotated texts. Geoann allows the annotator to retrieve all the required geo-information needed without leaving the annotation panel, i.e., it facilitates the retrieval and search of location names using Google Maps API. The tool then allows users to annotate location names by drawing bounding boxes of their spatial extents. Then, it saves the annotations in the same file-based stand-off format of each tweet.
- Location Name Extractor (LNEx): A fine-grained geoparsing tool which extracts location mentions from texts and geocode them using OpenStreetMap. The tool was specifically designed for disaster-related use-cases to support spatio-temporal analysis of data for disaster response. [Pre-print: URL].
- Twitris: A Semantic Social Web platform, which facilitates understanding of social perceptions about real-world events by analyzing user-generated data on social media. Twitris addresses challenges in large scale processing of social data, and analyzes data along multiple dimensions including location, time, topic, user, network, sentiment, and emotion (latest version v4 available at http://twitris.knoesis.org)
- Crisis Computing API (NSF SOCS project): This API interface provides 'Classification as a Service' based on our research for seeking-supplying intent classifiers to assist coordination: donation related message, request to help, offer to help, etc. (Also integrated with Ushahidi's CrisisNET project)
- Projects: METEOR-S [PI: Prof. Amit Sheth]
- MobiCloud: MobiCloud is a Domain Specific Language (DSL) based platform agnostic application development paradigm for cloud-mobile hybrid applications. A cloud-mobile hybrid is simply an application that partially runs on the mobile device and in the cloud. MobiCloud makes it extremely easy to develop these applications and deploy them to clouds and mobile devices.
- Twarql: Twarql investigate the representation of tweets as RDF in order to enable flexibility in handling the information overload of those interested in collectively analyzing social media for sensemaking. Twarql source can be accessed at https://sourceforge.net/projects/twarql/ and available under the BSD licence.
- Cuebee: A flexible, extensible application for querying the semantic web. It provides a friendly interface to guide users through the process of formulating complex queries. Cuebee source can be accessed at http://cuebee.sourceforge.net/ and available under the Creative Commons Attribution-No Derivative Works 3.0 Unported License.
- Kino (Also known as KinoE ) is a Web document annotation and indexing system that helps scientists annotate and index Web documents. Kino uses a browser plugin to add annotations and a Apache SOLR based backend to index and store the Web pages. Kino source can be accessed at https://sourceforge.net/p/sarestannotator/code/HEAD/tree/branches/NCBO-bound-annotator/ and available under the Apache 2.0 license.
- The Doozer model creation framework extracts entities and relationships from text with the goal of building comprehensive formal models of emerging or continuously changing domains. Upon completion, the code will be made available here. Example domain models created with the prototype can be found on the project page: http://knoesis.org/research/semweb/projects/knowledge-extraction/
- BLOOMS : An acronym for Bootstrapping-based Linked Open Data Ontology Matching System, BLOOMS is an ontology alignment system based on the idea of bootstrapping information already present on the LOD cloud. It was developed particularly for Linked Open Data schema alignment. Further details are available at BLOOMS Wiki page.
- Scooner : Scooner is a prototype search application that integrating the Web of pages with the Linked Open Data. The following is a demo of Scooner. Or you can take a look at the wiki page.
Tools & Services (Past Projects)
- SA-REST Annotator: SA-REST is a specification about adding annotations to Web pages. SA-REST annotator is a Firefox based browser plugin that allows users to add SA-REST annotations and publish them. SA-REST Annotator source can be accessed at http://sarestannotator.sourceforge.net/ and available under the Apache 2.0 license.
- SAWSDL4J: A clean object model for handling SAWSDL. The source and the binaries for this project can be downloaded from http://sawsdl4j.sourceforge.net/. The software is available for use under the Apache 2.0 license.
- Test Ontology Generation Tool (TOntoGen): TOntoGen generate large, high-quality data sets for testing semantic web applications. It has been implemented as a Protege plugin. TOntoGen can be downloaded from the LSDIS TOntoGen project page.
- Radiant: Radiant is an Eclipse based graphical UI for annotating existing WSDL documents into WSDL-S or SAWSDL via an OWL Ontology. Radiant can be downloaded via the LSDIS Radiant project page.
- BRAHMS: A fast main-memory RDF/S storage, capable of storing, accessing, and querying large ontologies.
- Semantic Visualization Tools: Semantic Visualization [SemViz] was a subproject within SemDis project and developed some of the earliest Semantic Web/RDF visualization tools: Semantic Analytics Visualization [SAV] - a 3D visualization tool for semantic analytics, SET - Semantic EventTracker, a highly interactive visualization tool for tracking and associating activities (events), and Paged Graph Visualization [PGV].
- SemDis API: A simple yet flexible set of interfaces intended to be a basis for implementations of RDF data access suitable to the types of algorithms being developed in the SemDis project.
- Semantic Browser: A tool that demonstrates the concept of Relationship Web by creating a relationships-centric metaweb on documents. It allows users to traverse semantically connected documents through domain-specific relationships and uses research in entity and relationship extraction.
Ontologies (Active Projects)
- Asthma ontology
- Drug Absuse Ontology (DAO)
- CEVO ontology
- CEVO Wiki web page
- CEVO Documentation
- Developer: Saeedeh Shekarpour
- Publication: CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation [Shekarpour et al. 2017]
- Cardiology ontology
- Developer: Sujan Perera
- Publications: Data Driven Knowledge Acquisition Method for Domain Knowledge Enrichment in Healthcare [Perera et al. 2012]
Semantics Driven Approach for Knowledge Acquisition from EMRs [Perera et al. 2014]
- HazardSEES ontology
Ontologies (Past Projects)
- Sensor and Sensor Network (SSN) Ontology: The Sensor and Sensor Network ontology, known
as the SSN ontology, answers the need for a domain-independent and end-to-end model for sensing
applications by merging sensor-focused (e.g. SensorML), observation-focused (e.g. Observation &
Measurement) and system-focused views. It covers the sub-domains which are sensor-specific such as the
sensing principles and capabilities and can be used to define how a sensor will perform in a particular
context to help characterize the quality of sensed data or to better task sensors in unpredictable
environments. Although the ontology leaves the observed domain unspecified, domain semantics, units of
measurement, time and time series, and location and mobility ontologies can be easily attached when
instantiating the ontology for any particular sensors in a domain. The alignment between the SSN
ontology and the DOLCE Ultra Lite upper ontology has helped to normalise the structure of the ontology
to assist its use in conjunction with ontologies or linked data resources developed elsewhere. This
ontology is publicly accessible
- Developer: W3C Semantic Sensor Network Incubator Group (SSN-XG) [Cory Henson, Amit Sheth]
- Projects: Semantic Sensor Web [PI: Prof. Amit Sheth
- SoCS Ontology for Crisis Coordination (SOCC): We extend the concepts of domain
knowledge-driven models, MOAC- Management Of A Crisis ontology (Limbu 2012), and UNOCHA's HXL-
Humanitarian Exchange Language (Keßler et al. 2013) ontology, with required but missing concepts for
organizing data during crisis response coordination for seeker and supplier behavior, and indicators of
resource needs using a lexicon. For example, the 'shelter' class contains words 'emergency
center,' 'tent,' and 'shelter,' along with lexical alternatives. For the present
demonstration, we focus on three resource categories: food, shelter and medical needs. Thus, we endeavor
to exploit a minimum, but always expandable subset that provides the maximum coverage while controlling
false alarms. For creating lexicons of indicator words for concepts, we relied on various documents
collected via interactions with domain experts (Flach et al. 2013), our Community Emergency Response
Team (CERT) training, Rural Domestic Preparedness Consortium training, and publically available
references (Homeland Security 2010; FEMA 2012; OCHA,Verity 2011). Using a first aid handbook (Swienton
and Subbarao 2012), we created an extensive 'medical' subset of emergency indicators, where we
identified words which pertained specifically to first aid or injuries and included those words along
with variations in tense (i.e., breath, breathing, breathes) and common abbreviations (i.e. mouth to
mouth, mouth 2 mouth, CPR). A local expert with FEMA experience augmented the model with additional
indicators and provided anecdotal context. The current model with food, medical, and shelter resource
indicators contain 43 concepts and 45 relationships. We created this domain model in the OWL language
using the Protégé ontology editor (Protégé 2013). Each type of disaster is listed as an entity type with
indicators for that disaster listed as individuals under a corresponding indicator entity. Therefore a
relationship is declared stating that a particular disaster concept, say Flood, relates by property
'has_a_positive_indicator', with 'Flood_i' indicator entity, that includes all words
under that heading. Each disaster has a declared negative relationship with the negative indicator list
(e.g., 'erotic' under sexual words indicators) under the entity name Negative_Indicator_i.
Finally resources are declared as individuals under the appropriate entity in the same way, but
relationships are not explicitly stated with any disaster in order to provide flexibility. [Read more:
Purohit et al., JCSCW
- Available at: SOCC ontology
- Developers: Hemant Purohit, Drew Hampton, Shreyansh Bhatt, Prof. Valerie Shalin. Guidance: Prof. Amit Sheth; External collaborators: Dr. Carlos Castillo (QCRI), Oshani Seveniratne (CSAIL, MIT)
- Project: Interdisciplinary NSF SoCS project: Social Media Enhanced Organizational Sensemaking During Emergency Response [PI: Prof. Amit Sheth]
- Provenir : A reference ontology for modeling domain-specific provenance. Additional information and download with a Crieative Commons license is available at: http://wiki.knoesis.org/index.php/Provenir_Ontology.
- Proteomics data and process provenance (ProPreO): ProPreO is a large glycoproteomics provenance ontology. The ProPreO schema includes 480 classes and attendant relations where as the populated ontology includes 3.1 million instances. More information is available at the Kno.e.sis ProPreO page. This ontology is publicly accessible via NCBO bioportal.
- Parasite Experiment Ontology (PEO): The ontology comprehensively models the processes, instruments, parameters, and sample details that will be used to annotate experimental results with provenance metadata (derivation history of results).
- Parasite Life Cycle Ontology (PLO): PLO models the life cycle stage details of T.cruzi and two related kinetoplastids, Trypanosoma brucei and Leishmania major.
- SWETO and SWETODBLP: Semantic Web Technology Evaluation Ontology and its follow on SWETODBLP were early populated ontologies created by extracting real world data using tools created by Taalee/Voquette/Semagix(a company founded by Prof. Amit Sheth) that were made available at no cost for research use. Latest SWETODBLP data with a Creative Commons License is available at http://knoesis.wright.edu/library/ontologies/swetodblp/.
- EmojiNet: EmojiNet is a machine-readable emoji meaning dataset consisting of: (i) 12,904 sense labels over 2,389 emoji, which were extracted from the web and linked to machine-readable sense definitions seen in BabelNet; (ii) context words associated with each emoji sense, which are inferred through word embedding models trained over Google News corpus and a Twitter message corpus for each emoji sense definition; and (iii) recognizing discrepancies in the presentation of emoji on different platforms, specification of the most likely platform-based emoji sense for a selected set of emoji. EmojiNet dataset can be downloaded from http://emojinet.knoesis.org/dataset.php.
- EmoSim508: EmoSim508 is the largest emoji similarity dataset that provides emoji similarity scores for 508 carefully selected emoji pairs. The most frequently co-occurring emoji pairs in a tweet corpus (that contains 147 million tweets) was used for creating the dataset and each emoji pair was annotated for its similarity using 10 human annotators. EmoSim508 dataset also consists of the emoji similarity scores generated from 8 different emoji embedding models proposed in "A Semantics-Based Measure of Emoji Similarity" paper. EmoSim508 dataset can be downloaded from http://emojinet.knoesis.org/emosim508.php.
- Singleton Property Datasets: The singleton property approach can be used to represent statements about statements in RDF without the use of reification. The main idea of the approach is to create a property instance and enforce the uniqueness of the property instance in only one triple. This approach is compatible with RDF/RDFS and SPARQL.
- Citypulse Dataset: This webpage offers a number of semantically annotated datasets collected from partners of the CityPulse EU FP7 project and relevant resources for smart city data (over 120GB data in 6 large datsets as of November 2014).
- Linked Sensor Data and Linked Observation Data: Linked Sensor Data an RDF dataset containing expressive descriptions of ~20,000 weather stations in the United States. Linked Observation Data is a 1.7 billion triple RDF dataset containing expressive descriptions of hurricane and blizzard observations in the United States. All data is also included in Linked Open Data Cloud.
- City Event Extraction Dataset: Using citizen sensor observations in the form of microblogs to extract city events provides city authorities direct access to the pulse of the populace. This dataset contains textual data (tweets) collected from San Francisco Bay Area for four months and a ground truth data for traffic related events collected from 511.org. This dataset can be utilized for evaluing city event extraction techniques/algorithms as it has both textual event and the ground truth. This dataset is available with Creative Commons License on Open Science Framework.
- Harassment-Corpus: Publishing a Quality Context-aware Annotated Corpus andLexicon for Harassment Research. Identifying profane or offensive words are a standard way of starting the investigation over cyberbullying incident. For this reason, initially we created a lexicon form the profane words and we divided our dictionary into the six context;1) Sexual 2) Appearance-related 3) Intellectual 4) Political 5) Racial 6) Combined. We utilized the first five categories of our lexiconas seed terms for collecting tweets from Twitter. Using at least one offensive word,we collected 10,000 tweets for each contextual type for a total of 50,000. Using offensive words in a given tweet does not assure that thetweet is harassing because individuals might utilize the offensivewords in a friendly manner or quotes. Therefore, we rely on human judged annotations for discriminating harassing tweets from not-harassing tweets.
- Developer: Mohammadreza Rezvan
- Dataset: Lexicon to collect the tweets. Email to the developer, to share the tweets with you.
- Publication: Publishing a quality Context-aware Annotated Corpus and Lexicon for Harassment Research
- Project: Context-Aware Harassment Detection on Social Media
StandardsKno.e.sis and its researchers have had significant impact on standards and have shown strong leaderships in standards activities. Wright State University is an official member of the World Wide Web Consortium (W3C). Prof. Amit Sheth has served as a W3C advisor committee member since 2002. Prof. Sheth and his team defined WSDL-S for annotating semantic web services (see also), which was submitted to W3C in collaboration with IBM. SAWSDL, adopted as a recommendation (standard) in 2007 was directly based on WSDL-S, and Kno.e.sis members were active members of the W3C SAWSDL working group that defined SAWSDL. Prof. Sheth also co-chaired W3C Semantic Web Service Testbed Incubator Group (XG) [Kno.e.sis contributors: Karthik Gomadam, Meena Nagarajan, Ajith Ranabahu, Amit Sheth, Kunal Verma, with John Miller at UGA] Prof. Sheth proposed GLYDE which was subsequently developed by his team with Prof. William York UGA's Complex Carbohydrate Research Center. GLYDE-II, an XML standard for data exchange that has been accepted as the standard protocol by the leading carbohydrate databases in the United States, Germany, and Japan. [Kno.e.sis/LSDIS contributors: Cory Henson, Satya Sahoo, Amit Sheth, Christopher Thomas] Prof. Sheth was an active early member of W3C Semantic Web for Health Care and Life Sciences interest group (HCLSIG), and provided its earliest use case based on Active Semantic Electronic Medical Record, an operationally deployed semantic web application in clinical setting since January 2006. Dr. Satya Sahoo (Advisor: Prof. Sheth), while at Kno.e.sis was a key participant in W3C Provenance XG. He defined semantic provenance and developed Provenir ontology that influenced Provenance XG's related work, which where then adapted by Provenance Working Group. Satya (now at Case Western Reserve University) was a contributor to the W3C Provenance XG Final Report. Dr. Matthew Perry (Advisor: Prof. Sheth), who worked on SPAQL-ST and spatial, temporal, and thematic analytics over Semantic Web data at Kno.e.sis, continued his work on spatial extension of SPAQRL after joining Oracle. He was one of the two editors of Open Geospatial Consortium's OGC GeoSPARQL - A Geographic Query Language for RDF Data. Prof. Sheth co-founded and co-chaired W3C Semantic Sensor Networking (SSN) XG which developed now widely developed SSN Ontology. Cory Henson is a co-editor of the SSN XG Final Report and the primary author of semantic sensor data annotation aspects. [Participants: Cory Henson, Amit Sheth] Prof. Sheth and his team has developed SA-REST which is also a W3C SA-REST member submission. Many use cases and tools have been developed to support SA-REST based semantic web service annotation, semantic search/discovery of Web APIs and REST-ful services, etc. [Participants: Karthik Gomadam, Ajith Ranabahu, Amit Sheth] Kno.e.sis researchers have participated in or are participating in a number of W3C working/interest/community groups, including: Linked Data Platform, HCLSIG.
Challenge/CompetitionWe are increasingly involved in conducting challenges and competition using products from our research (e.g., EmojiNet at Kaggle, EmoSim508 at Kaggle, and Knowledge Extraction for the Web of Things Challenge Track at 2018 Web Conference). We anticipate continuing this practice.