This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “Linear Regression – Gradient Descent”.

1. What is the goal of gradient descent?

a) Reduce complexity

b) Reduce overfitting

c) Maximize cost function

d) Minimize cost function

View Answer

Explanation: Gradient descent starts with some random t

_{0}and t

_{1}. It keeps on altering them to reduce the cost function J(t

_{0}, t

_{1}). It stops at a point where it assumes that the cost function is minimal.

2. Gradient descent always gives minimal cost function.

a) True

b) False

View Answer

Explanation: Often, the gradient descent reaches the local minimum of the cost function. From there, anywhere it moves, the value of cost function increases. So, the learner assumes that point gives the minimum cost function. In this way, the global minimum is never reached.

3. What happens when the learning rate is high?

a) It always reaches the minima quickly

b) It overshoots the maxima

c) Most of the times, it overshoots the minima

d) Nothing happens

View Answer

Explanation: If the learning rate is high, the gradient descent overshoots the minima most of the times. This is because, when it is close to the minima, instead of reaching it, the algorithm alters the parameters by a greater percentage. This leads to overshooting the minima.

4. What is the correct way to update t_{0} and t_{1}?

a) Calculate t_{0} and t_{1} and then update t_{0} and t_{1}

b) Update t_{0} and t_{1} and then calculate t_{0} and t_{1}

c) Calculate t_{0}, update t_{0} and then calculate t_{1}, update t_{1}

d) Calculate t_{1}, update t_{1} and then calculate t_{0}, update t_{0}

View Answer

Explanation: Both the calculations are done first, and then updating is done. If we update one parameter before calculating both, using the updated first parameter to calculate the second parameter will lead to error.

5. The cost function contains a squared term and is divided by 2*m where m is the number of training examples. What is in the denominator of gradient descent function?

a) 2*m

b) m

c) m/2

d) m^{2}

View Answer

Explanation: Gradient descent performs a partial derivative of the cost function. The squared term produces a two after differentiation. This is canceled out with the two in the denominator, leaving only the term “m” there.

6. Cost function has a squared term, but gradient descent does not. Why?

a) Integration of cost function

b) The square root of the cost function

c) Differentiation of cost function

d) They are not related

View Answer

Explanation: Gradient descent performs a partial derivative of the cost function. The term which was raised to the power of 2, is also differentiated. So, after differentiation, the power goes down by 1, and no squared terms remain.

7. What is the output of gradient descent after each iteration?

a) Updated t_{0}, t_{1}

b) J(t_{0}, t_{1})

c) J(t_{1}, t_{0})

d) A better learning rate

View Answer

Explanation: The goal of gradient descent is to alter t

_{0}and t

_{1}until a minimum J(t

_{0}, t

_{1}) is reached. It only updates t

_{0}and t

_{1}. This, in turn, changes J(t

_{0}, t

_{1}) but it is never updated by the gradient descent algorithm.

8. Who invented gradient descent?

a) Ross Quinlan

b) Leslie Valiant

c) Thomas Bayes

d) Augustin-Louis Cauchy

View Answer

Explanation: Cauchy invented gradient descent in 1847. Bayes invented Bayes’ theorem. Leslie Valiant introduced the idea of PAC learning. Quinlan is the founder of the machine learning algorithm Decision Trees.

9. h(x) = t_{0} + t_{1}x. Alpha value (learning rate) is 0.1. Initial theta values are 0, 0. X = [1, 2, 3] and Y = [1, 3, 5]. What is the value of cost function after 1^{st} iteration?

a) 0.3

b) 0.73

c) 1.2953

d) 0.425

View Answer

Explanation: t

_{0}= 0 – 0.1 / 3((0+0*1 – 1) + (0 + 0 * 2 – 3) + (0 + 0 * 3 – 5)) = -0.1/3 (-9)

t

_{0}= 0.3

t

_{1}= 0-0.1/3((0 + 0 * 1 – 1)1 + (0+0*2–3)

^{2}+ (0 + 0 * 3 – 5)

^{3})

t

_{1}= 0.73

Cost = 1/6((.3 + .73 * 1 – 1)

^{2}+ (.3 + .73 * 2 – 3)

^{2}+ (.3 + .73 * 3 – 5)

^{2}) = 1.2953.

10. h(x) = t_{0} + t_{1}x. Alpha value (learning rate) is 0.1. Initial theta values are 0.3, 0.73. X = [1, 2, 3] and Y = [1, 3, 5]. What is the value of t_{0} after 1^{st} iteration?

a) 0.73

b) 0.425

c) 1.064444

d) 0.392

View Answer

Explanation: Here,

t

_{0}= .3-0.1/3((.3+.73*1–1) + (.3+.73*2–3) + (.3+.73*3-5))

t

_{0}= 0.425.

11. What is the generalized goal of gradient descent?

a) Minimize J(t_{1})

b) Minimize J(t_{0}, t_{1}, t_{1}, …, t_{n})

c) Minimize J(t_{0}, t_{1})

d) Maximize J(t_{1})

View Answer

Explanation: The generalized goal of gradient descent is to minimize cost function J(t

_{0}, t

_{1}, t

_{1}, …, t

_{n}). The goal of gradient descent for linear regression is to minimize J(t

_{0}, t

_{1}). The goal of gradient descent for linear regression with the simplified hypothesis is to minimize J(t

_{1}). Its goal is never to maximize the cost function.

**Sanfoundry Global Education & Learning Series – Machine Learning**.

To practice all areas of Machine Learning, __ here is complete set of 1000+ Multiple Choice Questions and Answers__.

**If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]**