<?xml version="1.0" encoding='utf-8'?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card id="card1" title="Stochastic gradient descent - Page 24 - Wikipedia">
<p>
<a accesskey="1" href="page.php?w=Stochastic_gradient_descent&amp;p=23">1.Previous</a><br />
<a accesskey="3" href="page.php?w=Stochastic_gradient_descent&amp;p=25">3.Next</a>
</p>
<p>first iterations cause large changes in the parameters, while the later ones do only fine-tuning. Such schedules have been known since the work of MacQueen on <a href="page.php?w=K-means_clustering">{{mvar</a>. Practical guidance on choosing the step size in several variants of SGD is given by Spall.</p>

<p><big>Implicit updates (ISGD)</big></p>
<p>As mentioned earlier, classical stochastic gradient descent is generally sensitive to <a href="page.php?w=learning_rate">learning rate</a> . Fast convergence requires large learning rates but this may induce</p><p>
<a accesskey="1" href="page.php?w=Stochastic_gradient_descent&amp;p=23">1.Previous</a><br />
<a accesskey="3" href="page.php?w=Stochastic_gradient_descent&amp;p=25">3.Next</a>
</p>

<do type="prev" label="Search">
        <go href="search.wml"/>
</do>

</card>
</wml>
