Gradient Descent in Linear Regression Questions and Answers

This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “Linear Regression – Gradient Descent”.

1. What is the goal of gradient descent?
a) Reduce complexity
b) Reduce overfitting
c) Maximize cost function
d) Minimize cost function
View Answer

Answer: d
Explanation: Gradient descent starts with some random t₀ and t₁. It keeps on altering them to reduce the cost function J(t₀, t₁). It stops at a point where it assumes that the cost function is minimal.

2. Gradient descent always gives minimal cost function.
a) True
b) False
View Answer

Answer: b
Explanation: Often, the gradient descent reaches the local minimum of the cost function. From there, anywhere it moves, the value of cost function increases. So, the learner assumes that point gives the minimum cost function. In this way, the global minimum is never reached.

3. What happens when the learning rate is high?
a) It always reaches the minima quickly
b) It overshoots the maxima
c) Most of the times, it overshoots the minima
d) Nothing happens
View Answer

Answer: c
Explanation: If the learning rate is high, the gradient descent overshoots the minima most of the times. This is because, when it is close to the minima, instead of reaching it, the algorithm alters the parameters by a greater percentage. This leads to overshooting the minima.

4. What is the correct way to update t₀ and t₁?
a) Calculate t₀ and t₁ and then update t₀ and t₁
b) Update t₀ and t₁ and then calculate t₀ and t₁
c) Calculate t₀, update t₀ and then calculate t₁, update t₁
d) Calculate t₁, update t₁ and then calculate t₀, update t₀
View Answer

Answer: a
Explanation: Both the calculations are done first, and then updating is done. If we update one parameter before calculating both, using the updated first parameter to calculate the second parameter will lead to error.

5. The cost function contains a squared term and is divided by 2*m where m is the number of training examples. What is in the denominator of gradient descent function?
a) 2*m
b) m
c) m/2
d) m²
View Answer

Answer: b
Explanation: Gradient descent performs a partial derivative of the cost function. The squared term produces a two after differentiation. This is canceled out with the two in the denominator, leaving only the term “m” there.

6. Cost function has a squared term, but gradient descent does not. Why?
a) Integration of cost function
b) The square root of the cost function
c) Differentiation of cost function
d) They are not related
View Answer

Answer: c
Explanation: Gradient descent performs a partial derivative of the cost function. The term which was raised to the power of 2, is also differentiated. So, after differentiation, the power goes down by 1, and no squared terms remain.

7. What is the output of gradient descent after each iteration?
a) Updated t₀, t₁
b) J(t₀, t₁)
c) J(t₁, t₀)
d) A better learning rate
View Answer

Answer: a
Explanation: The goal of gradient descent is to alter t₀ and t₁ until a minimum J(t₀, t₁) is reached. It only updates t₀ and t₁. This, in turn, changes J(t₀, t₁) but it is never updated by the gradient descent algorithm.

8. Who invented gradient descent?
a) Ross Quinlan
b) Leslie Valiant
c) Thomas Bayes
d) Augustin-Louis Cauchy
View Answer

Answer: d
Explanation: Cauchy invented gradient descent in 1847. Bayes invented Bayes’ theorem. Leslie Valiant introduced the idea of PAC learning. Quinlan is the founder of the machine learning algorithm Decision Trees.

9. h(x) = t₀ + t₁x. Alpha value (learning rate) is 0.1. Initial theta values are 0, 0. X = [1, 2, 3] and Y = [1, 3, 5]. What is the value of cost function after 1^st iteration?
a) 0.3
b) 0.73
c) 1.2953
d) 0.425
View Answer

Answer: c
Explanation: t₀ = 0 – 0.1 / 3((0+0*1 – 1) + (0 + 0 * 2 – 3) + (0 + 0 * 3 – 5)) = -0.1/3 (-9)
t₀ = 0.3
t₁ = 0-0.1/3((0 + 0 * 1 – 1)1 + (0+0*2–3)² + (0 + 0 * 3 – 5)³)
t₁ = 0.73
Cost = 1/6((.3 + .73 * 1 – 1)² + (.3 + .73 * 2 – 3)² + (.3 + .73 * 3 – 5)²) = 1.2953.

10. h(x) = t₀ + t₁x. Alpha value (learning rate) is 0.1. Initial theta values are 0.3, 0.73. X = [1, 2, 3] and Y = [1, 3, 5]. What is the value of t₀ after 1^st iteration?
a) 0.73
b) 0.425
c) 1.064444
d) 0.392
View Answer

Answer: b
Explanation: Here,
t₀ = .3-0.1/3((.3+.73*1–1) + (.3+.73*2–3) + (.3+.73*3-5))
t₀ = 0.425.

11. What is the generalized goal of gradient descent?
a) Minimize J(t₁)
b) Minimize J(t₀, t₁, t₁, …, t_n)
c) Minimize J(t₀, t₁)
d) Maximize J(t₁)
View Answer

Answer: b
Explanation: The generalized goal of gradient descent is to minimize cost function J(t₀, t₁, t₁, …, t_n). The goal of gradient descent for linear regression is to minimize J(t₀, t₁). The goal of gradient descent for linear regression with the simplified hypothesis is to minimize J(t₁). Its goal is never to maximize the cost function.

Sanfoundry Global Education & Learning Series – Machine Learning.

To practice all areas of Machine Learning, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

« Prev - Machine Learning Questions and Answers – Linear Regression – Cost Function

» Next - Multivariate Linear Regression Questions and Answers

Recommended Articles: