Machine Learning Questions and Answers – SGD Variants

This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “SGD Variants”.

1. Which of the following is not a variant of stochastic gradient descent?
a) Adding a projection step
b) Variable step size
c) Strongly convex functions
d) Strongly non convex functions
View Answer

Answer: d
Explanation: There are several variants of Stochastic Gradient Descent (SGD). Strongly non convex functions are not a variant of stochastic gradient descent. Where adding a projection step, variable step size, and strongly convex functions are three variants of (SGD) which is used to improve it.

2. Projection step is used to overcome the problem while maintaining the same convergence rate.
a) True
b) False
View Answer

Answer: a
Explanation: Gradient descent and stochastic gradient descent are restricting w* to a B-bounded hypothesis class (w* is in the set H = {w : ∥w∥ ≤ B}). So any step in the opposite direction of the gradient might result in stepping out of this bound. And projection step is used to overcome this problem while maintaining the same convergence rate.

3. Which of the following statements is not true about two-step update rule?
a) Two-step update rule is a way to add a projection step
b) First subtract a sub-gradient from the current value of w and then project the resulting vector onto H
c) First add a sub-gradient to the current value of w and then project the resulting vector onto H
d) The projection step replaces the current value of w by the vector in H closest to it
View Answer

Answer: c
Explanation: In two-step update rule, we are not adding a sub-gradient to the current value of w. The two-step rule is a way to add a projection step, where we first subtract a sub-gradient from the current value of w and then project the resulting vector onto H. Then it replaces the current value of w by the vector in H closest to it.
advertisement
advertisement

4. Variable step size decrease the step size as a function of iteration, t.
a) True
b) False
View Answer

Answer: a
Explanation: Variable step size decrease the step size as a function of iteration, t. It updates the value of w with ηt rather than updating with a constant η. When it is closer to the minimum of the function, it takes the steps more carefully, so as not to overshoot the minimum.

5. More sophisticated averaging schemes can improve the convergence speed in the case of strongly convex functions.
a) False
b) True
View Answer

Answer: b
Explanation: Averaging techniques are one of the variants of stochastic gradient descent which is used to improve the convergence speed in the case of strongly convex functions. It can output the average of w(t) over the last αT iterations, for some α ∈ (0, 1) or it can also take a weighted average of the last few iterates.

Sanfoundry Global Education & Learning Series – Machine Learning.

To practice all areas of Machine Learning, here is complete set of 1000+ Multiple Choice Questions and Answers.

advertisement

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.