Machine Learning Questions and Answers – Linear Regression – Gradient Descent

This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “Linear Regression – Gradient Descent”.

1. What is the goal of gradient descent?
a) Reduce complexity
b) Reduce overfitting
c) Maximize cost function
d) Minimize cost function
View Answer

Answer: d
Explanation: Gradient descent starts with some random t0 and t1. It keeps on altering them to reduce the cost function J(t0, t1). It stops at a point where it assumes that the cost function is minimal.

2. Gradient descent always gives minimal cost function.
a) True
b) False
View Answer

Answer: b
Explanation: Often, the gradient descent reaches the local minimum of the cost function. From there, anywhere it moves, the value of cost function increases. So, the learner assumes that point gives the minimum cost function. In this way, the global minimum is never reached.

3. What happens when the learning rate is high?
a) It always reaches the minima quickly
b) It overshoots the maxima
c) Most of the times, it overshoots the minima
d) Nothing happens
View Answer

Answer: c
Explanation: If the learning rate is high, the gradient descent overshoots the minima most of the times. This is because, when it is close to the minima, instead of reaching it, the algorithm alters the parameters by a greater percentage. This leads to overshooting the minima.
advertisement
advertisement

4. What is the correct way to update t0 and t1?
a) Calculate t0 and t1 and then update t0 and t1
b) Update t0 and t1 and then calculate t0 and t1
c) Calculate t0, update t0 and then calculate t1, update t1
d) Calculate t1, update t1 and then calculate t0, update t0
View Answer

Answer: a
Explanation: Both the calculations are done first, and then updating is done. If we update one parameter before calculating both, using the updated first parameter to calculate the second parameter will lead to error.

5. The cost function contains a squared term and is divided by 2*m where m is the number of training examples. What is in the denominator of gradient descent function?
a) 2*m
b) m
c) m/2
d) m2
View Answer

Answer: b
Explanation: Gradient descent performs a partial derivative of the cost function. The squared term produces a two after differentiation. This is canceled out with the two in the denominator, leaving only the term “m” there.

6. Cost function has a squared term, but gradient descent does not. Why?
a) Integration of cost function
b) The square root of the cost function
c) Differentiation of cost function
d) They are not related
View Answer

Answer: c
Explanation: Gradient descent performs a partial derivative of the cost function. The term which was raised to the power of 2, is also differentiated. So, after differentiation, the power goes down by 1, and no squared terms remain.

7. What is the output of gradient descent after each iteration?
a) Updated t0, t1
b) J(t0, t1)
c) J(t1, t0)
d) A better learning rate
View Answer

Answer: a
Explanation: The goal of gradient descent is to alter t0 and t1 until a minimum J(t0, t1) is reached. It only updates t0 and t1. This, in turn, changes J(t0, t1) but it is never updated by the gradient descent algorithm.
advertisement

8. Who invented gradient descent?
a) Ross Quinlan
b) Leslie Valiant
c) Thomas Bayes
d) Augustin-Louis Cauchy
View Answer

Answer: d
Explanation: Cauchy invented gradient descent in 1847. Bayes invented Bayes’ theorem. Leslie Valiant introduced the idea of PAC learning. Quinlan is the founder of the machine learning algorithm Decision Trees.

9. h(x) = t0 + t1x. Alpha value (learning rate) is 0.1. Initial theta values are 0, 0. X = [1, 2, 3] and Y = [1, 3, 5]. What is the value of cost function after 1st iteration?
a) 0.3
b) 0.73
c) 1.2953
d) 0.425
View Answer

Answer: c
Explanation: t0 = 0 – 0.1 / 3((0+0*1 – 1) + (0 + 0 * 2 – 3) + (0 + 0 * 3 – 5)) = -0.1/3 (-9)
t0 = 0.3
t1 = 0-0.1/3((0 + 0 * 1 – 1)1 + (0+0*2–3)2 + (0 + 0 * 3 – 5)3)
t1 = 0.73
Cost = 1/6((.3 + .73 * 1 – 1)2 + (.3 + .73 * 2 – 3)2 + (.3 + .73 * 3 – 5)2) = 1.2953.
advertisement

10. h(x) = t0 + t1x. Alpha value (learning rate) is 0.1. Initial theta values are 0.3, 0.73. X = [1, 2, 3] and Y = [1, 3, 5]. What is the value of t0 after 1st iteration?
a) 0.73
b) 0.425
c) 1.064444
d) 0.392
View Answer

Answer: b
Explanation: Here,
t0 = .3-0.1/3((.3+.73*1–1) + (.3+.73*2–3) + (.3+.73*3-5))
t0 = 0.425.

11. What is the generalized goal of gradient descent?
a) Minimize J(t1)
b) Minimize J(t0, t1, t1, …, tn)
c) Minimize J(t0, t1)
d) Maximize J(t1)
View Answer

Answer: b
Explanation: The generalized goal of gradient descent is to minimize cost function J(t0, t1, t1, …, tn). The goal of gradient descent for linear regression is to minimize J(t0, t1). The goal of gradient descent for linear regression with the simplified hypothesis is to minimize J(t1). Its goal is never to maximize the cost function.

Sanfoundry Global Education & Learning Series – Machine Learning.

To practice all areas of Machine Learning, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.