CAREER: Supervised Learning for Incomplete and Uncertain Data

Principal Investigator: Alina Zare

Sponsor: NSF

Start Date: October 1, 2016

End Date: April 30, 2021

Amount: $327,359


This CAREER project will advance the state of the art in supervised machine learning to allow for incomplete, uncertain and unspecific label information. Supervised machine learning algorithms produce desired outputs for given input data by learning from example training data. The methods generally rely on completely and accurately labeled training data to drive the learning algorithm. However, many applications are plagued with labels that are incomplete, uncertain, and unspecific (lack precision). Current techniques do not adequately handle such data.

For example, analysis of satellite imagery to identify the content of each pixel is often conducted by coupling unsupervised learning methods (that do not rely on labeled training data) with manual exploration. This is time-consuming, error-prone, and expensive. Imagine, instead, easy-to-use tools that could understand the content of each pixel in satellite imagery. Extremely large amounts of road map data (for example from Google Maps or OpenStreetMap) and social media information (for example geo-tagged photographs, video clips, and social networking posts) are continually collected and stored. These data could be used as sparsely-labeled training data (with varying degrees of specificity and uncertainty) to guide understanding of satellite imagery.

Although the data is available, algorithms have yet to be developed to combine these data sources and identify the content of pixels in satellite images. This work will advance this and other potential applications of machine learning where incomplete, uncertain and unspecific labels in training data challenge the development of effective machine learning algorithms.

This CAREER project will achieve these advances through the following research objectives:

(1) Investigate and develop a mathematical framework and associated algorithms for Multiple Instance Function Learning that addresses linear and non-linear classification and regression problems with varying levels and types of sparsity, uncertainty, and specificity in training labels.

(2) Study and apply the proposed framework and algorithms towards the fusion of satellite imagery, road map data and social media for global scene understanding.

More Information: