An Information-Theoretic Reinforcement Learning Framework for Autonomous Navigation

Principal Investigator: José Príncipe

Co-PI: Andreas Keil, Matthew Emigh

Sponsor: US Navy/ONR

Start Date: April 1, 2018

End Date: March 31, 2021

Amount: $728,244

Abstract

Robotic platforms have evolved significantly in the last 20 years fueled by advances in mechanical engineering, materials, miniaturization and manufacturing. Computer power has also improved hundredfold in the same period however, efficient autonomous exploration in unknown environments is still an open problem. It is well accepted that the bottleneck is no longer on the mechanic’s side of robotics, but on the artificial intelligence side [1]. More recently, advances in deep machine learning are conquering sensory processing, such as object recognition in imagery, speech and natural language understanding [2], while the frontier moved to unsupervised object recognition in video [3]. Navigation systems have been evolving with control theory and SLAM [4], but the precision required for many underwater tasks is still insufficient. In our current Science of Autonomy grant, we linked a hierarchical vision system with a navigation controller to improve navigation of autonomous underwater vehicles (AUVs) in docking maneuvers. Our inspiration to design the object recognition system was the human visual system [5], because organisms have evolved to survive in an unpredictable world, and so unlocking the principles they use to roam the world has proven to be quite productive. Our multidisciplinary team successfully proposed new paradigms to design the vision system architecture to detect objects in video [6] and link it directly to the navigation controller as a nested feedback system [7]. In our opinion, the biggest bottleneck in our current architecture is how to create a framework that mimics the human perception-action reward cycle (PARC) [8], which links perception and motor (navigation) components of cognition in a feedback fashion, through the environment. In our current system architecture, the vision system talks directly with the navigation controller, wasting the opportunity to “learn by doing”, which constitutes perhaps the lion share of all our world experience embodied in our common- sense knowledge.