<?xml version="1.0" encoding='utf-8'?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card id="card1" title="Reinforcement learning from human feedback - Page 5 - Wikipedia">
<p>
<a accesskey="1" href="page.php?w=reinforcement_learning_from_human_feedback&amp;p=4">1.Previous</a><br />
<a accesskey="3" href="page.php?w=reinforcement_learning_from_human_feedback&amp;p=6">3.Next</a>
</p>
<p>Though RLHF does not require massive amounts of data to improve performance, sourcing high-quality preference data is still an expensive process. Furthermore, if the data is not carefully collected from a representative <a href="page.php?w=sampling_%28statistics%29">sample</a>, the resulting model may exhibit unwanted <a href="page.php?w=algorithmic_bias">biases</a>.</p>

<p><big>Background and motivation</big></p>
<p>Optimizing a model based on human feedback is desirable when a task is difficult to specify yet easy to judge. For example, one may want</p><p>
<a accesskey="1" href="page.php?w=reinforcement_learning_from_human_feedback&amp;p=4">1.Previous</a><br />
<a accesskey="3" href="page.php?w=reinforcement_learning_from_human_feedback&amp;p=6">3.Next</a>
</p>

<do type="prev" label="Search">
        <go href="search.wml"/>
</do>

</card>
</wml>
