# Machine Learning Questions and Answers – Gradient Descent for Multiple Variables

This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “Gradient Descent for Multiple Variables”.

1. The cost function is minimized by __________
a) Linear regression
b) Polynomial regression
c) PAC learning

Explanation: Gradient descent starts with a random value of t0, t1, …, tn. It alters them in order to change the cost function at a particular learning rate. Once it reaches a local minimum, it stops and outputs the value of t0, t1, …, tn.

2. What is the minimum number of parameters of the gradient descent algorithm?
a) 1
b) 2
c) 3
d) 4

Explanation: Since multivariate linear regression is being considered, the minimum number of features is 2. For these two variables, two parameters are t1 and t2. Another parameter that is required is t0 which gives the y-intercept.

3. What happens when the learning rate is low?
a) It always reaches the minima quickly
b) It reaches the minima very slowly
c) It overshoots the minima
d) Nothing happens

Explanation: If the learning rate is low, the gradient descent reaches the minima very slowly. The parameters are then altered by a very low percentage and thus a lot of iterations are required to reach the minima. It is time ineffective.

4. When was gradient descent invented?
a) 1847
b) 1947
c) 1857
d) 1957

Explanation: Augustin-Louis Cauchy, a French mathematician invented the concept of gradient descent in 1847. Since then, it has been modified a few times. Gradient descent algorithm has a lot of different applications.

5. Gradient descent tries to _____________
a) maximize the cost function
b) minimize the cost function
c) minimize the learning rate
d) maximize the learning rate.

Explanation: Gradient descent tries to minimize the cost function by updating the values of t0, t1, …, tn after each iteration. The change in the values of t0, t1, …, tn depends on the learning rate.

6. Feature scaling can be used to simplify gradient descent for multivariate linear regression.
a) True
b) False

Explanation: There are multiple features in multivariate linear regression and all of them have different ranges. This increases the complexity of gradient descent. So, feature scaling is used to make the ranges of each feature similar.

7. x1’s range is 0 to 300. x2’s range is 0 to 1000. What are the suitable ranges of x1 and x2 after mean normalization?
a) x1 = (x1 – 150)/300, x2 = (x2-500)/1000
b) x1 = x2 – 700
c) x1 = x1 – 300, x2 = x2 – 1000
d) x1 = x1/300, x2 = x2/1000

Explanation: Mean normalization tries to make the range of each feature similar. It subtracts the mean from the value and divides it by the upper bound of the range. After updating (x1 – 150)/300, x2 = (x2-500)/1000, we get, x1’s range is -0.5 to 0.5 and x2’s range is -0.5 to 0.5.

8. x1’s range is 0 to 300. x2’s range is 0 to 1000. What are the suitable ranges of x1 and x2 after feature scaling?
a) x1 = x1 – 300, x2 = x2 – 1000
b) x1 = x2 – 700
c) x1 = x1/1000, x2 = x2/ 300
d) x1 = x1/300, x2 = x2/1000

Explanation: Feature scaling tries to make the range of each feature similar. After updating x1 = x1/300, x2 = x2/1000, we get, x1’s range is 0 to 1 and x2’s range is 0 to 1.

9. On which factor is the updating of each parameter dependent on?
a) The number of training examples
b) Target variable
c) The learning rate and the target variable
d) The learning rate

Explanation: Updating each factor depends on both the learning rate and the target variable. If the learning rate is high, the change will be more and vice-versa. The updating depends on how much closer the value predicted by the hypothesis is to the value of the target variable.

10. What is updated by gradient descent after each iteration?
a) The learning rate
b) Independent variables
c) Target variable
d) The number of training examples

Explanation: The gradient descent algorithm updates the value of all the features. It is done in order to minimize the cost function. The change in the value of the independent variables depends on the learning rate.

11. Who introduced the topic of gradient descent?
a) Vapnik
b) Augustin-Louis Cauchy
c) Chervonenkis
d) Alan Turing

Explanation: Cauchy invented gradient descent in 1847. Vapnik and Chervonenkis introduced the concept of VC dimension. Alan Turing is known as the father of computer science, for his various works in the field of artificial intelligence, cryptanalysis, amongst others.

12. Mean normalization can be used to simplify gradient descent for multivariate linear regression.
a) True
b) False

Explanation: Mean normalization tries to reduce the complexity of gradient descent by scaling down the range of each feature. It subtracts the mean from the value of the independent variable and divides it by the upper limit of its range.

Sanfoundry Global Education & Learning Series – Machine Learning.

To practice all areas of Machine Learning, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]