IMPROVING POLICY GRADIENT BY EXPLORING UNDER-APPRECIATED REWARDS

Ofir Nachum, Mohammad Norouzi, Dale Schuurmans

ICLR 2017 | poster openreview arxiv | code* |