IMPROVING POLICY GRADIENT BY EXPLORING UNDER-APPRECIATED REWARDS
Ofir Nachum, Mohammad Norouzi, Dale Schuurmans
ICLR 2017
|
poster
openreview
arxiv
|
code*
|