Reducing the data demands of machine learning is the focus of DARPA initiative

News

July 12, 2018

Lisa Daigle

Assistant Managing Editor

Military Embedded Systems

ARLINGTON, Va. The Defense Advanced Research Projects Agency (DARPA) has announced a new initiative it calls Learning with Less Labels (LwLL), in which the agency will research new machine learning (ML) algorithms that require greatly reduced amounts of information to train or update.

“Under LwLL, we are seeking to reduce the amount of data required to build a model from scratch by a millionfold, and reduce the amount of data needed to adapt a model from millions to hundreds of labeled examples,” said Wade Shen, a DARPA program manager in the Information Innovation Office (I2O) who is leading the LwLL program. “This is to say, what takes one million images to train a system today, would require just one image in the future, or requiring roughly 100 labeled examples to adapt a system instead of the millions needed today.”

DARPA materials state that as ML systems have progressed, deep neural networks (DNNs) have emerged as the state-of-the-art in ML models; these DNNs can drive tasks like machine translation and speech or object recognition with a much higher degree of accuracy. The drawback: Training DNNs requires massive amounts of labeled data, typically billions or tens of billions of training examples, a process that is costly and time-consuming. Additionally, most ML models are brittle and prone to breaking when there are even small changes in their operating environment. For example, if the room's acoustics change or a microphone's sensors pick up something anomalous, a speech-recognition or speaker-identification system may need to be retrained on an entirely new data set.

DARPA officials say that the LwLL researchers will explore two technical areas. The first area, looking at building learning algorithms that efficiently learn and adapt, will research and develop algorithms capable of reducing the required number of labeled examples by the established program metrics without sacrificing system performance. The second technical area charges research teams with formally characterizing ML problems, both in terms of their decision difficulty and the true complexity of the data used to make decisions. “Today, it’s difficult to understand how efficient we can be when building ML systems or what fundamental limits exist around a model’s level of accuracy. Under LwLL, we hope to find the theoretical limits for what is possible in ML and use this theory to push the boundaries of system development and capabilities,” noted Shen.

DARPA is holding a Proposers Day on July 13 for those interested in learning more about the LwLL program. For additional information, interested parties can visit https://www.fbo.gov/index.php?s=opportunity&mode=form&id=3f255bc43c88d5006ed20cee13e97062&tab=core&_cview=0. A full description of the program will be made available in a forthcoming Broad Agency Announcement.