Research

Deduce focuses on three research themes that are necessary for next-generation data integration frameworks:

Detecting and Measuring Impact of Data Change. The goal of this research area is to explore methods of evaluating the sensitivity of data analyses and data products to changes in the underlying data used to produce the product or analysis result. We will explore techniques developed in uncertainty estimation and propagation to identify methods for quantifying expected changes in the results due to underlying data changes. Methods for enabling the user to understand the impact of dynamic data changes and make a decision whether to reanalyze or rebuild a data product will be investigated. Our research in this area has included understanding data changes in a variety of different science data sets including MODIS, Fluxnet, SDSS, Advance Light Source, NGEE Tropics and Rifle.

Distributed Data Semantics. The goal of this research area is to use techniques to capture the representation, semantics and provenance of the data to use for discovery and integration later. These techniques will enable automation of the data integration pipeline. In this research area, we will focus on two specific topics: a) user research to understand the users’ data integration use cases and the needed interfaces. b) automated metadata and provenance collection and management to facilitate data integration.

Dynamic Data Lifecycle Management. The goal of this research area is to explore the data and resource lifecycle management using data semantics. Specifically, in this research area, we will focus on a) intelligent automated or semi-automated data management infrastructure to manage data, data changes, and data products b) semantic-driven data movement infrastructure and c) use of operating systems research to manage job and resource execution to meet the real-time needs of data integration pipelines. In this research area, our contributions include frameworks for data change detection and elastic resource management on HPC resources.