Uncertainty-aware Reinforcement Learning for Autonomous Driving with Multimodal Digital Driver Guidance

Wenhui Huang, Zitong Shan, Shanhe Lou, Chen Lv
Nanyang Technological University
Submitted to ICRA 2024

Learning from Multimodal Guidance (LfMG).

Approach Overview. (Check the full video below)

Abstract

While existing Learning from intervention (LfI) methods within the human-in-the-loop reinforcement learning (HiL-RL) paradigm mainly operate on the assumption that human policies are homogeneous and deterministic with low variance, natural human driving behaviors are multimodal with intrinsic uncertainties, and hence, accommodating diverse human capabilities is significant for its practical applications. This work proposes an enhanced LfI approach for learning the optimal RL policy by leveraging multimodal human behaviors in the setting of N-driver concurrent interventions. Specifically, We first learn the N number of human digital drivers from the multi-human demonstration dataset, wherein each driver possesses its own policy distribution. Then, the post-trained drivers will be kept in the training loop of the RL algorithms and provide multimodal driving guidance whenever the intervention is required. Additionally, to better utilize the provided guidance, we augment the RL regarding the fundamental architecture and optimization objectives to facilitate the proposed uncertainty-aware reinforcement learning (UnaRL) algorithm. The proposed approach, which won 2$^{nd}$ place in the Alibaba Future Car Innovation Challenge 2022, is solidly compared in two challenging autonomous driving scenarios against state-of-the-art (SOTA) LfI baselines, and results of both simulation and real-world experiment confirm the superiority of our method in terms of learning robustness and driving performance. Videos and source code are provided.

Preview

Simulation

          Unprotected Left Turn

Highway Ramp Merge          

Real World Experiment

          Yield

Pass          

Video Presentation (Unmute audio to enjoy the Video)

Ablation Study

Effectiveness of Adaptive Confidence Adjustment Module.

MY ALT TEXT


We employ ten digital drivers whose success rates are between 50% and 70% to concurrently intervene in vehicle control and compare our method against PHIL-RL regarding the robustness of policy optimization and post-trained driving policy in the Ramp Merge scenario.

The converge speed of UnaRL completely dominates that of PHIL-RL with a large margin when the human guidance possesses high uncertainties. By observing both the reward curve and intervention rate curve, we can see that even though the intervention rate of PHIL-RL drops as fast as the UnaRL, the driving performance of PHIL-RL does not consistently improve at the same pace as our approach. It is because the PHIL-RL method underestimates the multi-modality of human guidance, blindly imitating human policy regardless of its uncertainties. On the contrary, our approach demonstrates superior data efficiency and converging performance even under diverse and multimodal guidance behaviors. Such robustness can be attributed to the design of the adaptive confidence adjustment module, which adaptively adjusts the confidence (weights) of learning objectives based on guidance policy and mixture variances.

Previous Work: Preference-Guided Deep Q-Network (PGDQN) TNNLS