Taming The Data Monster To Make Better Decisions

In today’s increasingly connected world, the sheer volume of messages we receive on any subject, including speech, images, video and metadata, as well as text imbedded in images and videos, is mind-boggling.

When something unexpected happens, government policy analysts need to inform and advise our nation’s leaders about the causes and effects immediately. Trying to deal with huge volumes of information, they can quickly reach capacity overload.

In 2013, Frank Konkel wrote in FCW, “Less than 24 hours after two explosions killed three people and injured dozens more at the April 15 Boston Marathon, the Federal Bureau of Investigation had compiled 10 terabytes of data in hopes of finding needles in haystacks of information that might lead to the suspects.”

Daisy Zhe Wang, at the University of Florida, is collaborating with researchers at USC’s Information Systems Institute (ISI), Rensselaer Polytechnic Institute and Columbia University to shorten the time it takes intelligence analysts to collect and interpret data about national and international events.

Dr. Boyan Onyshkevych, DARPA Program Manager for the AIDA (Active Interpretation of Disparate Alternatives) research program, emphasizes that analysts face a difficult task because messages from each data source are frequently considered independently of other sources, often resulting in only one interpretation. He elaborates that when two or more single interpretations are compared late in the process, the conclusion reached by analysts may not reflect a true consensus.

This research team led by ISI, one of several in the AIDA program funded by DARPA, will use machine intelligence to collect all messages from different sources and will then construct a complete knowledge base with the extracted information.

The research program will consider all messages as having equal value, but the probability graphs created from the knowledge base should help reduce possible bias or ambiguities in the information.

With a $1.17 million grant from ISI/DARPA, Wang will develop computer algorithms that can answer a query by reasoning over an event-driven knowledge base and generating disparate hypotheses about the links between causes and effects for the event in question.

“My algorithms will look at the knowledge graph and extract multiple hypotheses to answer an analyst’s question,” said Wang. “We will take into consideration a variety of measures including relevancy, uncertainty, similarity, consistency and connectedness between the knowledge elements, to generate a ranked list of hypotheses to answer a query.”

Dr. Wang is an Associate Professor in the Department of Computer & Information Science & Engineering at the UF’s Herbert Wertheim College of Engineering. She is also Director of the UF Data Science Research Lab and a member of the National Science Foundation’s Center for Big Learning at UF. Her research centers on reasoning and query processing over a probabilistic knowledge base. A probabilistic knowledge base is composed of entities, events, attributes and relationships between them. Her research has been supported by NSF and Google.

By using the multi-hypothesis semantic engine (MHSE) that will be developed from the AIDA research, analysts can generate clear “alternative interpretations of events, situations, and trends from sometimes noisy, conflicting, and potentially deceptive information environments,” according to Onyshkevych.

As analysts examine a hypothesis, they will be able to view the stream of messages associated with that hypothesis, elevating their confidence in its reliability.

Dependable information supplied to our country’s leaders in a timely manner will help produce sound government policies, fulfilling the ultimate vision of Dr. Wang and her colleagues for making the world a safer place.