# Machine Learning Questions and Answers – Linear Regression – Gradient Descent

This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “Linear Regression – Gradient Descent”.

1. What is the goal of gradient descent?
a) Reduce complexity
b) Reduce overfitting
c) Maximize cost function
d) Minimize cost function

Explanation: Gradient descent starts with some random t0 and t1. It keeps on altering them to reduce the cost function J(t0, t1). It stops at a point where it assumes that the cost function is minimal.

2. Gradient descent always gives minimal cost function.
a) True
b) False

Explanation: Often, the gradient descent reaches the local minimum of the cost function. From there, anywhere it moves, the value of cost function increases. So, the learner assumes that point gives the minimum cost function. In this way, the global minimum is never reached.

3. What happens when the learning rate is high?
a) It always reaches the minima quickly
b) It overshoots the maxima
c) Most of the times, it overshoots the minima
d) Nothing happens

Explanation: If the learning rate is high, the gradient descent overshoots the minima most of the times. This is because, when it is close to the minima, instead of reaching it, the algorithm alters the parameters by a greater percentage. This leads to overshooting the minima.

4. What is the correct way to update t0 and t1?
a) Calculate t0 and t1 and then update t0 and t1
b) Update t0 and t1 and then calculate t0 and t1
c) Calculate t0, update t0 and then calculate t1, update t1
d) Calculate t1, update t1 and then calculate t0, update t0

Explanation: Both the calculations are done first, and then updating is done. If we update one parameter before calculating both, using the updated first parameter to calculate the second parameter will lead to error.

5. The cost function contains a squared term and is divided by 2*m where m is the number of training examples. What is in the denominator of gradient descent function?
a) 2*m
b) m
c) m/2
d) m2

Explanation: Gradient descent performs a partial derivative of the cost function. The squared term produces a two after differentiation. This is canceled out with the two in the denominator, leaving only the term “m” there.

6. Cost function has a squared term, but gradient descent does not. Why?
a) Integration of cost function
b) The square root of the cost function
c) Differentiation of cost function
d) They are not related

Explanation: Gradient descent performs a partial derivative of the cost function. The term which was raised to the power of 2, is also differentiated. So, after differentiation, the power goes down by 1, and no squared terms remain.

7. What is the output of gradient descent after each iteration?
a) Updated t0, t1
b) J(t0, t1)
c) J(t1, t0)
d) A better learning rate

Explanation: The goal of gradient descent is to alter t0 and t1 until a minimum J(t0, t1) is reached. It only updates t0 and t1. This, in turn, changes J(t0, t1) but it is never updated by the gradient descent algorithm.

a) Ross Quinlan
b) Leslie Valiant
c) Thomas Bayes
d) Augustin-Louis Cauchy

Explanation: Cauchy invented gradient descent in 1847. Bayes invented Bayes’ theorem. Leslie Valiant introduced the idea of PAC learning. Quinlan is the founder of the machine learning algorithm Decision Trees.

9. h(x) = t0 + t1x. Alpha value (learning rate) is 0.1. Initial theta values are 0, 0. X = [1, 2, 3] and Y = [1, 3, 5]. What is the value of cost function after 1st iteration?
a) 0.3
b) 0.73
c) 1.2953
d) 0.425

Explanation: t0 = 0 – 0.1 / 3((0+0*1 – 1) + (0 + 0 * 2 – 3) + (0 + 0 * 3 – 5)) = -0.1/3 (-9)
t0 = 0.3
t1 = 0-0.1/3((0 + 0 * 1 – 1)1 + (0+0*2–3)2 + (0 + 0 * 3 – 5)3)
t1 = 0.73
Cost = 1/6((.3 + .73 * 1 – 1)2 + (.3 + .73 * 2 – 3)2 + (.3 + .73 * 3 – 5)2) = 1.2953.

10. h(x) = t0 + t1x. Alpha value (learning rate) is 0.1. Initial theta values are 0.3, 0.73. X = [1, 2, 3] and Y = [1, 3, 5]. What is the value of t0 after 1st iteration?
a) 0.73
b) 0.425
c) 1.064444
d) 0.392

Explanation: Here,
t0 = .3-0.1/3((.3+.73*1–1) + (.3+.73*2–3) + (.3+.73*3-5))
t0 = 0.425.

11. What is the generalized goal of gradient descent?
a) Minimize J(t1)
b) Minimize J(t0, t1, t1, …, tn)
c) Minimize J(t0, t1)
d) Maximize J(t1)

Explanation: The generalized goal of gradient descent is to minimize cost function J(t0, t1, t1, …, tn). The goal of gradient descent for linear regression is to minimize J(t0, t1). The goal of gradient descent for linear regression with the simplified hypothesis is to minimize J(t1). Its goal is never to maximize the cost function.

Sanfoundry Global Education & Learning Series – Machine Learning.

To practice all areas of Machine Learning, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]