This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “Learning with SGD”.

1. Stochastic gradient descent cannot be used for risk minimisation.
a) False
b) True

Explanation: Stochastic gradient descent (SGD) can be used for risk minimisation. In learning the problem we are facing is minimising the risk function LD(w). With SGD, all we need is to find an unbiased estimate of the gradient of LD(w) that is, a random vector.

2. Stochastic gradient descent can be used for convex-smooth learning problems.
a) False
b) True

Explanation: Stochastic gradient descent can be used for convex-smooth learning problems. Assume that for all z, the loss function l(.,z) is convex, β-smooth, and nonnegative. Then, if we can run the SGD algorithm for minimising LD(w), it will minimise the loss function also.

3. Which of the following statements is not true about stochastic gradient descent for regularised loss minimisation?
a) Stochastic gradient descent has the same worst-case sample complexity bound as regularised loss minimisation
b) On some distributions, regularised loss minimisation yields a better solution than stochastic gradient descent
c) In some cases we solve the optimisation problem as associated with regularised loss minimisation
d) Stochastic gradient descent has entirely different worst-case sample complexity bound from regularised loss minimisation

Explanation: Stochastic gradient descent has the same worst-case sample complexity bound as regularised loss minimisation. But on distributions, regularised loss minimisation yields a better solution than stochastic gradient descent. So in some cases we solve the optimisation problem associated with regularised loss minimisation.

4. In convex learning problems where the loss function is convex, the preceding problem is also a convex optimisation problem.
a) False
b) True

Explanation: In convex learning problems where the loss function is convex, the preceding problem is also a convex optimisation problem that can be solved using SGD. Consider f is a strongly convex function and we can apply the SGD variant by constructing an unbiased estimate of a sub-gradient of f at w(t).

Sanfoundry Global Education & Learning Series – Machine Learning.

To practice all areas of Machine Learning, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]