# Machine Learning Questions and Answers – Statistical Learning Framework

This set of Machine Learning Multiple Choice Questions & Answers (MCQs) focuses on “Statistical Learning Framework”.

1. How are the points in the domain set given as input to the algorithm?
a) Vector of features
b) Scalar points
c) Polynomials
d) Clusters

Explanation: The variables are converted into a vector of features, and then given as an input to the algorithm. The vector is of the size (number of features x number of training data sets). The output of the learner is usually given as a polynomial.

2. To which input does the learner has access to?
a) Testing Data
b) Label Data
c) Training Data
d) Cross-Validation Data

Explanation: The learner gets access to a particular set of data on which it trains. This data is called as training data. Testing Data is used for testing of the learner’s outputs. The best outputs are then used on the cross-validation data. The label data is a representation of different types of the dependent variables.

3. The set which represents the different instances of the target variable is known as ______
a) domain set
b) training set
c) label set
d) test set

Explanation: Label Set denotes all the possible forms the target variable can take (for e.g. {0,1} or {yes, no} in a logistic regression problem). Domain Set represents the vector of features, given as input to the learner. Training Set and Test Set are parts of the Domain Set which are used for training and testing respectively.

4. What is the learner’s output also called?
a) Predictor, or Hypothesis, or Classifier
b) Predictor, or Hypothesis, or Trainer
c) Predictor, or Trainer, or Classifier
d) Trainer, or Hypothesis, or Classifier

Explanation: The output is called a predictor when it is used to predict the type or the numerical value of the target variable. It is called a hypothesis when it is a general statement about the data set. It is called a classifier when it is used to classify the training set in two or more types.

5. It is assumed that the learner has prior knowledge about the probability distribution which generates the instances in a training set.
a) True
b) False

Explanation: The learner has no prior knowledge about the distribution. It is assumed that the distribution is completely arbitrary. It is also assumed that there is a function which “correctly” labels the training examples. The learner’s job is to find out this function.

6. The labeling function is known to the learner in the beginning.
a) True
b) False

Explanation: The function is unknown to the learner as this is what the learner is trying to find out. In the beginning, the learner just knows about the training set and the corresponding label set.

7. The papaya learning algorithm is based on a dataset that consists of three variables – color, softness, tastiness of the papaya. Which is more likely to be the target variable?
a) Tastiness
b) Softness
c) Papaya
d) Color

Explanation: The tastiness is dependent on how ripe the papaya is. The ripeness is determined by the color and softness. Hence color and softness are the independent variables and the tastiness is the dependent variable or target variable.

8. The error of classifier is measured with respect to _________
a) variance of data instances
b) labeling function
c) probability distribution
d) probability distribution and labeling function

Explanation: The error is the probability of choosing a random instance from the data set and then misclassifying it using the labeling function.

9. What is not accessible to the learner?
a) Training Set
b) Label Set
c) Labeling Function
d) Domain Set

Explanation: The learner has access to the domain set, from which it extracts the training set. The label set is also given. Then the algorithm is applied to the training set to teach the learner, a function to determine the correct label of a given instance. This is the labeling function.

10. What are the possible values of A, B, and C in the following diagram?

a) A – Training Set, B – Domain Set, C – Cross-Validation Set
b) A – Training Set, B – Test Set, C – Cross-Validation Set
c) A – Training Set, B – Test Set, C – Domain Set
d) A – Test Set, B – Domain Set, C – Training Set

Explanation: Domain Set comprises of the total input data set. It is usually divided into a training set, a test set and a cross-validation set in the ratio 3:1:1. Since the learner learns about the data set from the training set, the later is usually larger than the test and cross-validation set.

Sanfoundry Global Education & Learning Series – Machine Learning.

To practice all areas of Machine Learning, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]