# Regression Trees Questions and Answers

This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “Regression Trees”.

1. Continuous Variable Decision tree has a categorical target variable.
a) False
b) True

Explanation: Continuous Variable Decision tree don’t have a categorical target variable but has a continuous target variable. It is mainly used to predict the values for continuous variables (when dependent variable is continuous).

2. Which of the following statements is not true about the Regression trees?
a) The general regression tree building methodology allows input variables to be a mixture of continuous and categorical variables
b) The terminal nodes of the tree contain the predicted output variable values
c) Regression tree is a variant of decision trees, designed to approximate real-valued functions
d) The root node holds the final prediction value

Explanation: The root node doesn’t hold the final prediction value, but the terminal nodes of the tree contain the predicted output variable values. In the general regression tree building methodology it allows input variables to be a mixture of continuous and categorical variables. And it is a variant of decision trees, designed to approximate real-valued functions.

3. A Regression tree is built through a process known as binary recursive partitioning.
a) True
b) False

Explanation: During an iterative process a regression tree is built by breaking the data into partitions or branches. This process is called as binary recursive partitioning. During the iteration each branch is broken into smaller groups in each branch.

4. Which of the following statements is not true about the Regression trees?
a) The algorithm allocates the data into the first two partitions or branches, using every possible binary split on every field
b) Initially, all records in the training set (pre-classified records that are used to determine the structure of the tree) are grouped into the same partition
c) The algorithm selects the split that minimizes the sum of the squared deviations from the mean in the two separate partitions
d) Algorithm starts from the leftmost leave nodes

Explanation: The Regression tree algorithm doesn’t start from the leave nodes. Initially, all records in the training set are grouped into the same partition and it allocates the data into the first two partitions or branches, using every possible binary split on every field. Then it selects the split that minimizes the sum of the squared deviations from the mean in the two separate partitions and so on.

5. Which of the following statements is not true about the Regression trees?
a) It has the advantage of being concise
b) It is able to make few assumptions beyond normality of the response
c) It is not fast to compute
d) It works equally well with numerical or categorical predictors

Explanation: Regression trees (RT) are fast to compute. And it is one of the main advantages of RT. All other three statements are the advantages of RT making it to perform well with numerical or categorical predictors. And no linearity or smoothness is assumed in RT.

6. Which of the following statements is not true about the Regression trees?
a) It needs more data than other regression techniques
b) It is especially sensitive to the particular data used to build the tree
c) It gives crude predictions when it is sensitive to the particular data
d) It gives processed predictions when it is sensitive to the particular data

Explanation: Regression trees won’t give processed predictions when it is sensitive to the particular data. One of the main disadvantages is that it needs more data than other regression techniques being especially sensitive to the particular data used to build the tree. And it gives crude predictions.

7. Regression trees follow a top down greedy approach.
a) True
b) False

Explanation: Regression trees follow a top down greedy approach. It begins from the top of tree when all the observations are available in a single region and successively splits the predictor space into two new branches down the tree (Top down approach). It looks for the best variable in the current split (Greedy approach).

8. Which of the following is expressed by the given equation Y = β0 + β1X + Ɛ which shows a real-valued dependent variable Y is modeled as function of a real-valued independent variable X plus noise?
a) Binary classification
b) Linear Regression
c) Multiple Regression
d) Multi classification

Explanation: The given equation shows the linear regression. In simple linear regression a real-valued dependent variable Y is modeled as a linear function of a real-valued independent variable X plus noise. Here Ɛ is the noise.

9. Which of the following is expressed by the given equation Y = β0 + βTX + Ɛ which shows a real-valued dependent variable Y is modeled as function of multiple independent variables X1, X2, …, Xp ≡ X plus noise?
a) Binary classification
b) Linear Regression
c) Multiple Regression
d) Multi classification

Explanation: The given equation shows the multiple regression. Let multiple independent variables X1, X2, …, Xp ≡ X. And a real-valued dependent variable Y is modeled as a function of multiple independent variables plus noise where the noise is Ɛ.

10. Linear regression is a global model.
a) True
b) False

Explanation: Linear regression is a global model, where there is a single predictive formula holding over the entire data-space. When the data has lots of features which interact in complicated and nonlinear ways, assembling a single global model can be very difficult. And it is confusing when you do succeed.

11. Which of the following statements is not true about the Regression trees?
a) It divides the predictor space into distinct and non-overlapping regions
b) It divides the independent variables into distinct and non-overlapping regions
c) It always looks for the best variable in the future split
d) It cares about only the current split

Explanation: The regression tree always looks for the best variable in the current split and not in the future split. And it divides the predictor space (independent variables) into distinct and non-overlapping regions like the classification trees.

12. The value obtained by terminal nodes in the training data is the mean response of observation falling in that region.
a) True
b) False

Explanation: In the case of a regression tree, the value obtained by terminal nodes in the training data is the mean response of observation falling in that region. Thus, if an unseen data observation falls in that region, it will make its prediction with mean value.

13. Which of the following statements is not true about the Regression trees?
a) User can visualize each step which helps with making decisions
b) Making decision based on regression is much easier than other methods
c) It is not easy to prepare a regression tree
d) User can give the priority to a decision criterion

Explanation: It is easy to prepare a regression tree compared to the other methods. Because a user can present the regression tree in a much easier way as it can be represented on a simple chart or diagram. All other three statements are the advantages of regression trees.

14. Given the table which shows the number of players who play a particular game on various days according to the weather conditions. What is the standard deviation of players for Sunny Climate candidates?

 Day Climate Temperature Wind Players 1 Sunny Cool Strong 15 2 Sunny Hot Weak 10 3 Rainy Medium Weak 20 4 Winter Cool Weak 25 5 Rainy Cool Strong 15 6 Winter Cool Strong 20 7 Sunny Hot Strong 5

a) 10
b) 15.71
c) 4.08
d) 7.07

Explanation: From the table we have,

 Day Climate Temperature Wind Players 1 Sunny Cool Strong 15 2 Sunny Hot Weak 10 7 Sunny Hot Strong 5

Players for Sunny climate = (15, 10, 5)
Average = (15 + 10 + 5) / 3
= 10
Standard deviation for Sunny climate = √(((15 – 10)2 + (10 – 10)2 + (5 – 10)2)/3)
= √((52 + 02 + (-52))/3)
= √((25 + 0 + 25)/3)
= √(50/3)
= √16.67
= 4.08

15. Given the table which shows the number of players who play a particular game on various days according to the weather conditions. What is the weighted standard deviation of players for all the Wind candidates?

 Day Climate Temperature Wind Players 1 Sunny Cool Strong 15 2 Sunny Hot Weak 10 3 Rainy Medium Weak 20 4 Winter Cool Weak 30 5 Rainy Cool Strong 15 6 Winter Cool Strong 25 7 Sunny Hot Strong 5

a) 7.54
b) 15.71
c) 8.17
d) 7.07

Explanation: From the table for Strong Wind we have,

 Day Climate Temperature Wind Players 1 Sunny Cool Strong 15 5 Rainy Cool Strong 15 6 Winter Cool Strong 25 7 Sunny Hot Strong 5

Players for Strong Wind = (15, 15, 25, 5)
Average = (15 + 15 + 25 + 5) / 4
= 15
Standard deviation for Strong Wind = √(((15 – 15)2 + (15 – 15)2 + (25 – 15)2 + (5 – 15)2)/4)
= √((02 + 02 + 102 + (-102))/4)
= √((0+ 0 + 100 + 100)/4)
= √(200/4)
= √50
= 7.07
And we have,

 Day Climate Temperature Wind Players 2 Sunny Hot Weak 10 3 Rainy Medium Weak 20 4 Winter Cool Weak 30

Players for Weak Wind= (10, 20, 30)
Average = (10 + 20 + 30) / 3
= 20
Standard deviation for Weak Wind = √(((10 – 20)2 + (20 – 20)2 + (30 – 20)2)/3)
= √(((-102)) + 02 + 102)/3)
= √((100 + 0 + 100)/3)
= √(200/3)
= √66.67
= 8.17
Hence we get,

 Wind Standard deviation of players Instances Strong 7.07 4 Weak 8.17 3

Weighted standard deviation for Wind = (7.07 * (4/7)) + (8.17 * (3/7))
= (7.07 * 0.57) + (8.17 * 0.43)
= 4.03 + 3.51
= 7.54

16. Given the table which shows the abstract details of players who play a particular game on various days according to the weather conditions. The standard deviation of players is 6.58. What is the Standard deviation reduction for Temperature?

 Temperature Standard deviation of players Instances Hot 5.85 4 Medium 7.54 2 Cool 4.58 4

a) 5.68
b) 0.9
c) 1.89
d) 2.34

Explanation: From the table we have,
Weighted standard deviation of Temperature = (5.85 * (4/10)) + (7.54 * (2/10)) + (4.58 * (4/10))
= (5.85 * 0.4) + (7.54 * 0.2) + (4.58 * 0.4)
= 2.34 + 1.51 + 1.83
= 5.68
Standard deviation reduction for Temperature = Standard deviation of players – Weighted standard deviation of Temperature
= 6.58 – 5.68
= 0.9

Sanfoundry Global Education & Learning Series – Machine Learning.

To practice all areas of Machine Learning, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]