This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “Gradient Descent for Multiple Variables”.

1. The cost function is minimized by __________

a) Linear regression

b) Polynomial regression

c) PAC learning

d) Gradient descent

View Answer

Explanation: Gradient descent starts with a random value of t

_{0}, t

_{1}, …, t

_{n}. It alters them in order to change the cost function at a particular learning rate. Once it reaches a local minimum, it stops and outputs the value of t

_{0}, t

_{1}, …, t

_{n}.

2. What is the minimum number of parameters of the gradient descent algorithm?

a) 1

b) 2

c) 3

d) 4

View Answer

Explanation: Since multivariate linear regression is being considered, the minimum number of features is 2. For these two variables, two parameters are t

_{1}and t

_{2}. Another parameter that is required is t

_{0}which gives the y-intercept.

3. What happens when the learning rate is low?

a) It always reaches the minima quickly

b) It reaches the minima very slowly

c) It overshoots the minima

d) Nothing happens

View Answer

Explanation: If the learning rate is low, the gradient descent reaches the minima very slowly. The parameters are then altered by a very low percentage and thus a lot of iterations are required to reach the minima. It is time ineffective.

4. When was gradient descent invented?

a) 1847

b) 1947

c) 1857

d) 1957

View Answer

Explanation: Augustin-Louis Cauchy, a French mathematician invented the concept of gradient descent in 1847. Since then, it has been modified a few times. Gradient descent algorithm has a lot of different applications.

5. Gradient descent tries to _____________

a) maximize the cost function

b) minimize the cost function

c) minimize the learning rate

d) maximize the learning rate.

View Answer

Explanation: Gradient descent tries to minimize the cost function by updating the values of t

_{0}, t

_{1}, …, t

_{n}after each iteration. The change in the values of t

_{0}, t

_{1}, …, t

_{n}depends on the learning rate.

6. Feature scaling can be used to simplify gradient descent for multivariate linear regression.

a) True

b) False

View Answer

Explanation: There are multiple features in multivariate linear regression and all of them have different ranges. This increases the complexity of gradient descent. So, feature scaling is used to make the ranges of each feature similar.

7. x_{1}’s range is 0 to 300. x_{2}’s range is 0 to 1000. What are the suitable ranges of x_{1} and x_{2} after mean normalization?

a) x_{1} = (x_{1} – 150)/300, x_{2} = (x_{2}-500)/1000

b) x_{1} = x_{2} – 700

c) x_{1} = x_{1} – 300, x_{2} = x_{2} – 1000

d) x_{1} = x_{1}/300, x_{2} = x_{2}/1000

View Answer

Explanation: Mean normalization tries to make the range of each feature similar. It subtracts the mean from the value and divides it by the upper bound of the range. After updating (x

_{1}– 150)/300, x

_{2}= (x

_{2}-500)/1000, we get, x

_{1}’s range is -0.5 to 0.5 and x

_{2}’s range is -0.5 to 0.5.

8. x_{1}’s range is 0 to 300. x_{2}’s range is 0 to 1000. What are the suitable ranges of x_{1} and x_{2} after feature scaling?

a) x_{1} = x_{1} – 300, x_{2} = x_{2} – 1000

b) x_{1} = x_{2} – 700

c) x_{1} = x_{1}/1000, x_{2} = x_{2}/ 300

d) x_{1} = x_{1}/300, x_{2} = x_{2}/1000

View Answer

Explanation: Feature scaling tries to make the range of each feature similar. After updating x

_{1}= x

_{1}/300, x

_{2}= x

_{2}/1000, we get, x

_{1}’s range is 0 to 1 and x

_{2}’s range is 0 to 1.

9. On which factor is the updating of each parameter dependent on?

a) The number of training examples

b) Target variable

c) The learning rate and the target variable

d) The learning rate

View Answer

Explanation: Updating each factor depends on both the learning rate and the target variable. If the learning rate is high, the change will be more and vice-versa. The updating depends on how much closer the value predicted by the hypothesis is to the value of the target variable.

10. What is updated by gradient descent after each iteration?

a) The learning rate

b) Independent variables

c) Target variable

d) The number of training examples

View Answer

Explanation: The gradient descent algorithm updates the value of all the features. It is done in order to minimize the cost function. The change in the value of the independent variables depends on the learning rate.

11. Who introduced the topic of gradient descent?

a) Vapnik

b) Augustin-Louis Cauchy

c) Chervonenkis

d) Alan Turing

View Answer

Explanation: Cauchy invented gradient descent in 1847. Vapnik and Chervonenkis introduced the concept of VC dimension. Alan Turing is known as the father of computer science, for his various works in the field of artificial intelligence, cryptanalysis, amongst others.

12. Mean normalization can be used to simplify gradient descent for multivariate linear regression.

a) True

b) False

View Answer

Explanation: Mean normalization tries to reduce the complexity of gradient descent by scaling down the range of each feature. It subtracts the mean from the value of the independent variable and divides it by the upper limit of its range.

**Sanfoundry Global Education & Learning Series – Machine Learning**.

To practice all areas of Machine Learning, __ here is complete set of 1000+ Multiple Choice Questions and Answers__.

**If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]**