FingerprintDive into the research topics of 'Off-policy evaluation and learning from logged bandit feedback: Error reduction via surrogate policy'. Together they form a unique fingerprint.
- Sort by
Yuan Xie, Qiang Liu, Yuan Zhou, Boyi Liu, Zhaoran Wang, Jian Peng
Research output: Contribution to conference › Paper › peer-review