**Principal Investigator:** Sean Meyn

**Co-PI:** Dr. Prashant Mehta (University of Illinois, Urbana-Champaign)

**Sponsor:** US ARMY RESEARCH OFFICE

**Start Date:** July 28, 2018

**End Date:** July 27, 2021

**Amount:** $290,000

## Abstract

The objective of the proposed research is to develop fundamental mathematics for general state space Markov processes and controlled interacting particle systems. The mathematical goal pertains to the development of existence, uniqueness and regularity theory for a Poisson equation, clarification of the underlying assumptions, regularity estimates, and relationship to Lyapunov exponents. Several representations of the gradient of the solution of the Poisson equation are discussed—based on the theory of elliptic PDEs along with certain compact embedding arguments for Sobolev spaces, a Lyapunov based construction, a representation in terms of the generalized resolvent, and a construction where the semigroup is approximated in terms of a diffusion map. These representations of the gradient are used to obtain new algorithms for both reinforcement learning and nonlinear filtering: These include a kernel-based algorithm based on the diffusion map approximation, and a stochastic approximation algorithm rooted in ideas from approximate value iteration. A goal of the proposed research will be to explore connections between these approximations and more broadly between the underlying mathematical concepts.

The theoretical results will be applied to topics in optimal control (primarily mean-field games), optimal filtering (the feedback particle filter), and reinforcement learning. In the research involving mean-field games, the goal is to develop a methodological framework that relates the emergent collective behavior of the population to the underlying interaction/control mechanisms — in particular those resulting from competitive interactions amongst individuals. The emergent collective behavior is modeled as a phase transition. Analysis methods and computational tools for the study of phase transitions in the mean-field oscillator game model are to be developed. Learning algorithms are based on approximate dynamic programming-based adaptation schemes, whereby each agent in the population learns its sub-optimal control policy. A deeper analysis of these numerical algorithms will be needed to quantify efficiency properties of the resulting mean-field equilibria.

In the research involving nonlinear filtering, the goal is to develop theory and numerics of the feedback particle filtering algorithm. These include algorithms for efficient approximation of gain function, algorithms for distributed implementation of the feedback particle filter (FPF), and error estimates for the finite-N FPF algorithm. For learning, ADP-based approaches are proposed to develop adaptation schemes where an optimal feedback particle filter model is directly learned from data. For such schemes, optimality and convergence properties will be investigated.