Nathan Lambert, University of California, Berkeley
Abstract: Model-based reinforcement learning is developing into a useful candidate in data-efficient control synthesis for complex robotic tasks. Using simple one-step dynamics models learned from few data has proven useful in a wide variety of simulated and experimental tasks. Frequently, the one step-models are unrolled to form longer trajectory predictions for optimization in model-predictive control. In this talk, we detail how the dual optimizations of accurate one-step predictions and then a trajectory control mechanism can result in an objective mismatch. We then detail work that can begin to address this mismatch and improve the peak performance and computational efficiency of model-based reinforcement learning.