Off-policy evaluation and learning from logged bandit feedback: Error reduction via surrogate policy

Yuan Xie, Qiang Liu, Yuan Zhou, Boyi Liu, Zhaoran Wang, Jian Peng

Research output: Contribution to conferencePaperpeer-review

Fingerprint

Dive into the research topics of 'Off-policy evaluation and learning from logged bandit feedback: Error reduction via surrogate policy'. Together they form a unique fingerprint.

Keyphrases

Mathematics

Computer Science