Reinforcement Learning from Human Feedback#
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors.
vLLM can be used to generate the completions for RLHF. The best way to do this is with libraries like TRL, OpenRLHF and verl.
See the following basic examples to get started if you don’t want to use an existing library: