Removing RLHF Protections in GPT-4 via Fine-Tuning

Qiusi Zhan, Richard Fang, Rohan Bindu, Akul Gupta, Tatsunori Hashimoto, Daniel Kang

Research output: Contribution to conferencePaperpeer-review

Fingerprint

Dive into the research topics of 'Removing RLHF Protections in GPT-4 via Fine-Tuning'. Together they form a unique fingerprint.

Keyphrases

Computer Science