Computer games are in a sense an example of virtual environments. In order to facilitate a fully immersive experience, we developed computer games that support quad-buffered stereo. Combined with, for example, 3D-capable displays and active shutter glasses, these games provide a truely 3D experience. Simiraly, existing games and game engines can be ported to support such 3D capabilities, such as Cube 2. With Cube 2 being open source, we adapted its game engine to support 3D stereo. The adapted version can be downloaded, which includes Windows and Linux binaries, as well as the source code.
Millions of sensors around the globe currently collect avalanches of data about our environment. The rapid development and deployment of sensor technology involves many different types of sensors, both remote and in situ, with such diverse capabilities as range, modality, and maneuverability. It is possible today to utilize networks with multiple sensors to detect and identify objects of interest up close or from a great distance. The lack of integration and communication between these networks, however, often leaves this avalanche of data stovepiped and intensifies the existing problem of too much data and not enough knowledge. With a view to alleviating this glut, we propose that sensor data be annotated with semantic metadata to provide contextual information essential for situational awareness. This research was supported by The Dayton Area Graduate Studies Institute (DAGSI), AFRL/DAGSI Research Topic SN08-8: Architectures for Secure Semantic Sensor Networks for Multi-Layered Sensing.
The objective of Cirrocumulus is to develop a methodology for cloud application development and management at an abstract level by incorporating semantic enrichments at each phase of the applications lifecycle. This is intended to be achieved by using domain specific languages (DSL) for developing and configuring applications and introducing a middleware layer as a facade for core cloud services.
The problem of efficient and high-quality clustering of extreme scale datasets with complex clustering structures continues to be one of the most challenging data analysis problems. An innovate use of data cloud would provide unique opportunity to address this challenge. In this project, we propose the CloudVista framework to address (1) the problems caused by using sampling/summarization in the existing approaches and (2) the problems with the latency caused by cloud-side processing. The CloudVista framework aims to explore the entire large data stored in the cloud with the help of the data structure visual frame and the previously developed VISTA visualization model. The latency of processing large data is addressed by the RandGen algorithm that generates a series of related visual frames in the cloud without user's intervention, and a hierarchical exploration model supported by cloud-side subset processing. Experimental study shows this framework is effective and efficient for visually exploring clustering structures for extreme scale datasets stored in the cloud.
Hadoop/MapReduce has been a top choice for big data analysis in the cloud. While the elasticity and economics of cloud computing are attractive, there is no effective tool for scientists to deploy MapReduce programs with their requirements on time and budget satisfied, or with energy consumption minimized. We propose an analysis framework that aims to efficiently learn the closed-form cost model for any specific MapReduce program. This framework includes a robust regression method learning closed-form cost models from small-scale settings, the component-wise cost-variance analysis and reduction, and a fast approximate model learning method based on the model library.
The general objective of this project is to develop a novel rationale for diagnosis of diffuse coronary artery disease (DCAD) using clinical non-invasive imaging of the coronary arteries. The indices of diagnosis will be validated in studies of an atherosclerotic porcine model with DCAD. Our unique algorithms for accurately extracting morphometric data from computerized tomography angiography (CTA) images of normal and disease patients along with our quantative approach uniquely position us to undertake this research.
This project is to study knowledge transfer oriented data mining (or KTDM). Given two data sets, the idea of KTDM is to discover models that are common to both data sets, as well as models that are unique in one data set. These common and unique models with respect to the two data sets will provide a tool to leverage the already-understood properties of one data set for the purpose of understanding the other, probably less understood, data set. This EAGER project is to concentrate on models in the form of a diversified set of classification trees. The KTDM approach is useful for real-world applications in part due to its ability to allow users to narrow down to particular models, guided by known knowledge from another data set. It will help towards realizing transfer of knowledge and learning in various domains. The project will support a graduate student and will seek collaboration with experts in the medical domain. These will increase the impact of the project. This
supplementary paper contains supplementary information about shared decision trees mined from various pairs of datasets, including 3 microarray gene expression datasets for cancer and 3 microarray gene expression datasets for cancer treatment outcome.
The Cleveland Clinic Foundation and its partners, Riverain Medical, Wright State University and University Hospitals Health System, have joined together to form the Early Lung Disease Detection Alliance (ELDDA), a multidisciplinary research and commercialization program that will develop, test (through clinical trials), and bring to market new image-analysis systems that permit the early detection of lung cancer and other lung diseases. This computer-aided detection (CAD) system will be applied to the most widely available and used imaging exam, the chest x-ray. The fight against lung cancer is waged on three major fronts: prevention, detection and treatment. The goal of this collaboration is to detect disease at an early stage (i.e. stage I for lung cancer), a necessary step to improve the treatment and survival of lung cancer patients and those at risk for lung cancer throughout Ohio.
ESQUILO develops exploratory techniques to richly interlink components of LOD and then addresses the challenge of querying the LOD cloud, i.e., of obtaining answers to questions which require accessing, retrieving and combining information from different parts of the LOD cloud. Techniques for overcoming semantic heterogeneity include: semantic enrichment through Wikipedia bootstrapping; semantic integration through abstraction by means of upper-level ontologies; and, massively parallel methods for tractable ontology reasoning. Specifically, this research will: (1) identify richer, broader, and more relevant relationships between LOD datasets at instance and schema level (these relationships will promote better knowledge discovery, querying, and mapping of ontologies); (2) realize LOD query federation through an upper level ontology; and, (3) enable access to implicit knowledge through ontology reasoning. The project involves significant risk as it treads new paths in a new terrain, primarily due to the lack of descriptive information (schema) about the data provided by highly autonomous data sources, the significant syntactic and semantic heterogeneity among data originating from independent data sources, and the significantly larger scale, as well as unforeseeable obstacles associated with a rapidly changing and expanding environment.
FAnToM (Field Analysis using Topological Methods) is a software system that allows a user to explore vector fields by applying different analysis and visualization algorithms. Among other algorithms, it is capable of analyzing the topology of a 2-D or 3-D vector field, including complex structures, such as closed streamlines. This greatly helps a user to comprehend the structure of complex vector fields which could not be achieved by traditional visualization methods.
The project involves extending our work in focused knowledge (entity-relationship) extraction from scientific literature, automatic taxonomy extraction from selected community authored content (eg Wikipedia), and semi-automatic ontology development with limited expert guidance. These are combined to create a framework that will allow domain experts and computer scientists to semi-automatically create knowledge bases through an iterative process. The final goal is to provide superior (both in quality and speed) search and retrieval over scientific literature for life scientists that will enable them to elicit valuable information in the area of human performance and cognition.
We aim to build large scale distributed syntactic, semantic, and lexical language models that are trained by corpora with up to web-scale data on a supercomputer to substantially improve the performance of machine translation and speech recognition systems. It is conducted under the directed Markov random field paradigm to integrate both topics and syntax to form complex distributions for natural language. It uses hierarchical Pitman-Yor processes to model long tail properties of natural language. By exploiting the particular structure, the seemingly complex statistical estimation and inference algorithms are decomposed and performed in a distributed environment. Moreover, a long standing open problem, smoothing fractional counts due to latent variables in Kneser-Ney's sense in a principled manner, might be solved. We demonstrate how to put the complex language models into one-pass decoders of machine translation systems, and lattice rescoring decoder in a speech recognition system.
Current CT scanner allow the retrieval of vessel only up to a certain point due to the limited resolution. Recent techniques developed by Benjamin Kaimovitz et al. allow the extension of such scans down to the vessels at the capillary level, resulting in a model of the entire arterial vasculature. Of course, such a model is enormous in size challenging the visualization. We implemented a visualization software that is capable of handling a model with several GBs in size, exceeding the main memory of desktop computers. The software is highly optimized for tree shaped geometrical objects to achieve the best rendering performance possible.
The objective of the MobiCloud project is to provide a singular approach to address the challenges of the heterogeneity of the multitude of existing clouds as well the multitude of mobile applications. The MobiCloud project is based on a Domain Specific Language (DSL) based platform agnostic application development paradigm for cloud-mobile hybrid applications.
The goal of PREDOSE is to develop automated data collection and analysis tools to process social media (tweets, web-forums) to understand the knowledge, attitudes, and behaviors of prescription-drug abusers, who misuse buprenorphine, OxyContin and other pharmaceutical opioids. Instead of relying on traditional epidemiological surveillance methods such as population surveys, or face-to-face interviews with drug-involved individuals, PREDOSE focuses on the web, which provides venues for individuals to freely share their experiences, post questions, and offer comments about different drugs. Such User Generated Content (UGC) can be used as a very rich source of unsolicited, unfiltered and anonymous self-disclosures of drug use behaviors. The automatic extraction of such data enables qualitative researchers to overcome scalability limitations imposed by existing methods of qualitative studies.
With the wide deployment of public cloud computing infrastructures, using clouds to host data query services has become an appealing solution for the advantages on scalability and cost-saving. However, some data might be so sensitive that the data owner does not want to move to the cloud unless the data confidentiality and query privacy are guaranteed. On the other hand, a secured query service should still provide efficient query processing and significantly reduce the in-house workload to fully realize the benefits of cloud computing. We summarize these key features for hosting a query service in the cloud as the CPEL criteria: data Confidentiality, query Privacy, Efficient query processing, and Low in-house processing cost. Bearing the CPEL criteria in mind, we propose the RASP data perturbation method to provide secured range query and kNN query services for the data in the cloud. The RASP data perturbation method combines order preserving encryption, dimensionality expansion, random noise injection, and random projection, which provides strong resilience to attacks on the perturbed data. The RASP perturbation preserves the multidimensional ranges for query, which allows existing indexing techniques such as RTree to be applied in query processing. Range query processing is conducted in two stages: query on the bounding box of the transformed range and filter out irrelevant results with secured conditions. Both stages can be done in the cloud with exact results returned to the client, which guarantees the EL criteria of CPEL. The kNN-R algorithm is designed to work with the RASP range query algorithm to efficiently process the kNN queries. We also carefully analyzed the attacks on data and queries under the precisely defined threat model. Extensive experiments are conducted to show the advantages of this approach on the CPEL criteria.
SA-REST is a poshformat to add additional meta-data to (but not limited to) REST API descriptions in HTML or XHTML. Meta-data from various models such an ontology, taxonomy or a tag cloud can be embedded into the documents. This embedded meta-data permits various enhancements, such as improve search, facilitate data mediation and easier integration of services.
Over the last few years, there has been a growing public fascination with 'social media' and its role in modern society. At the heart of this fascination is the ability for users to create and share content via a variety of platforms such as blogs, micro-blogs, collaborative wikis, multimedia sharing sites, social networking sites etc. Our research primarily focuses on the analysis of various aspects of User-Generated Content (UGC) that are central to understanding inter-personal communication on social media. More recently, our interdisciplinary collaboration is studying People-Content-Network analysis. The objective of our work on semantic content analysis is to bring structure and organization to unstructured chatter on social media for what, why and how users write content. What are the dynamics of evolution of interactions among these users, how they are affected by sentiments, opinions and how such dynamics changes in real-time. We address these various facets in multiple sub-project under this one umbrella.
Social and sensor data is increasingly being used in continuous monitoring of events like disasters (natural or man-made), political unrest, etc. Collecting data from multi-modal sources will provide a holistic view of an event since each source of information may be complementary to each other, consequently, providing better Situational Awareness. The representation of sensors (machines or humans) and their observations (quantitative or qualitative) will help us annotate the raw data for further analysis and integration of sensor observations. This project is focused on research issues involved in representation, modeling, and annotation of sensors and observations using OGC standards along with Semantic Web technologies for access, discovery, search, and integration of sensors and its observations.
Online social networks and always-connected mobile devices have created an immense opportunity that empowers citizens and organizations to communicate and coordinate effectively in the wake of critical events. Specifically, there have been many isolated examples of using Twitter to provide timely and situational information about emergencies to relief organizations, and to conduct ad-hoc coordination. However, there are few attempts that try to understand the full ramifications of using social networks in a more concerted manner for effective organizational sensemaking in such contexts. This multi-disciplinary project, spanning computational and social sciences, seeks to fill this gap.
The analysis and visualization of tensor fields is an advancing area in scientific visualization. Topology based methods that investigate the eigenvector fields of second order tensor fields have gained increasing interest in recent years. To complete the topological analysis, we developed an algorithm for detecting closed hyper-streamlines as an important topological feature.
The Semantic Web is based on describing the meaning - or semantics - of data on the Web by means of metadata - data describing other data - in the form of ontologies. The World Wide Web Consortium (W3C) has made several recommended standards for ontology languages which differ in expressivity and ease of use. Central to these languages is that they come with a formal semantics, expressed in model-theoretic terms, which enables access to implicit knowledge by automated reasoning. Progress in the adoption of reasoning for ontology languages in practice is currently being made, but several obstacles remain to be overcome for wide adoption on the Web. Two of the central technical issues are scalability of reasoning algorithms, and dealing with inconsistency of the ontological knowledge bases. These two issues are being addressed in this project. The scalability issue has its origin in the fact that the expression of complex knowledge requires sophisticated ontology languages, like the Web Ontology Language OWL, which are inherently difficult to reason with - as witnessed by high computational complexities, usually ExpTime or beyond. This project builds on recent new developments in polynomial time languages around OWL in order to remedy this. In particular, in this project efficient algorithmizations and tools are developed for the largest currently known polynomial-time ontology language, called SROELVn. Reasoning with knowledge bases with expressivity beyond SROELVn is enabled through approximating these knowledge bases within SROELVn. The inconsistency issue has its origin in the fact that large knowledge bases, in particular on the web, are usually not centrally engineered, but arise out of the merging of different knowledge bases with different underlying perspectives and rationales. In this project tools are developed for efficient, i.e., polynomial-time reasoning with inconsistent ontologies. The concrete outcome of the project is an open source reasoning system which is able to reason efficiently with (possibly) inconsistent knowledge bases around OWL, in at least an approximate manner.
Social networks are increasingly used by humans, both civilians and military personnel, to report on observations related to vast variety of events. Use of mobile devices and smart phones has further accelerated the rate at which such social data is shared through social networks. This is complemented by a regular stream of observations reported by machine sensors at an ever growing pace, already exceeding petascale. It has consequently become impossible for humans to derive insights or make decisions by just accessing and searching such observational data. Instead, what is necessary is to have integrated access to variety of multimodal sensor and social data centered on events, and their analysis such that humans are presented with highly relevant information at the level of abstractions that lends itself to decision making. Current efforts in semantic sensor web and semantic social web are showing promise in achieving this capability. However, it is critical that trustworthiness of observational data as well as reported information at higher level abstraction be integral part of any system that is of value to military decision makers.
Trust relationships occur naturally in many diverse contexts such as e-commerce, social interactions, social networks, ad hoc mobile networks, distributed systems, decision-support systems, (semantic) sensor web, emergency response scenarios, etc. As the connections and interactions between humans and/or machines (collectively called agents) evolve, and as the agents providing content and services become increasingly removed from the agents that consume them, miscreants attempt to corrupt, subvert or attack existing infrastructure. This in turn calls for support for robust trust inference (e.g., gleaning, aggregation, propagation) and update (also called trust management). Because Web, social networking and sensor information often provide complementary and overlapping information about an activity or event that are critical for overall situational awareness, there is a unique need for understanding and development of techniques for managing trust that span all these information channels. Currently, we are pursuing research on trust and trustworthiness issues in interpersonal, social, and sensor networks, to potentially unify and integrate them for exploiting their complementary strengths.
Cardiovascular diseases, such as atherosclerosis and coronary artery disease, are high risk factors for cardiac pain and death. We implemented a visualization software that enables interactive 3-D visualization of the cardiac vasculature retrieved using CT scanning technology, and an interactive flight through the vessel. Bifurcation angles and radii of the vessels can be measured while exploring the tree. Areas of high risk that could cause potential problems can be identified by this method. The project is conducted in collaboration with Dr. Ghassan Kassab's lab at the Department of Biomedical Engineering at the Indiana University Purdue University, who provided the data set.