This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “SGD Variants”.
1. Which of the following is not a variant of stochastic gradient descent?
a) Adding a projection step
b) Variable step size
c) Strongly convex functions
d) Strongly non convex functions
View Answer
Explanation: There are several variants of Stochastic Gradient Descent (SGD). Strongly non convex functions are not a variant of stochastic gradient descent. Where adding a projection step, variable step size, and strongly convex functions are three variants of (SGD) which is used to improve it.
2. Projection step is used to overcome the problem while maintaining the same convergence rate.
a) True
b) False
View Answer
Explanation: Gradient descent and stochastic gradient descent are restricting w* to a B-bounded hypothesis class (w* is in the set H = {w : ∥w∥ ≤ B}). So any step in the opposite direction of the gradient might result in stepping out of this bound. And projection step is used to overcome this problem while maintaining the same convergence rate.
3. Which of the following statements is not true about two-step update rule?
a) Two-step update rule is a way to add a projection step
b) First subtract a sub-gradient from the current value of w and then project the resulting vector onto H
c) First add a sub-gradient to the current value of w and then project the resulting vector onto H
d) The projection step replaces the current value of w by the vector in H closest to it
View Answer
Explanation: In two-step update rule, we are not adding a sub-gradient to the current value of w. The two-step rule is a way to add a projection step, where we first subtract a sub-gradient from the current value of w and then project the resulting vector onto H. Then it replaces the current value of w by the vector in H closest to it.
4. Variable step size decrease the step size as a function of iteration, t.
a) True
b) False
View Answer
Explanation: Variable step size decrease the step size as a function of iteration, t. It updates the value of w with ηt rather than updating with a constant η. When it is closer to the minimum of the function, it takes the steps more carefully, so as not to overshoot the minimum.
5. More sophisticated averaging schemes can improve the convergence speed in the case of strongly convex functions.
a) False
b) True
View Answer
Explanation: Averaging techniques are one of the variants of stochastic gradient descent which is used to improve the convergence speed in the case of strongly convex functions. It can output the average of w(t) over the last αT iterations, for some α ∈ (0, 1) or it can also take a weighted average of the last few iterates.
Sanfoundry Global Education & Learning Series – Machine Learning.
To practice all areas of Machine Learning, here is complete set of 1000+ Multiple Choice Questions and Answers.