# Data Mining Questions and Answers – Data Reduction

This set of Data Mining Multiple Choice Questions & Answers (MCQs) focuses on “Data Reduction”.

1. What of the following is not a data reduction strategy?
a) Dimensionality reduction
b) Numerosity reduction
c) Data compression
d) Data migration

Explanation: Data reduction techniques give a reduced representation of data to increase the efficiency of mining process. Some of the data reduction techniques are dimensionality reduction, numerosity reduction and data compression.

2. Wavelet transforms project the original data onto a smaller space.
a) True
b) False

Explanation: Dimensionality reduction is one of the commonly used data reduction technique. It includes wavelet transforms and principal components analysis which project the original data onto a smaller space.

3. Which of the following is not a non-parametric numerosity reduction technique?
a) Histogram
b) Clustering
c) Sampling
d) Regression

Explanation: Numerosity reduction techniques are divided into parametric and non-parametric techniques. Histogram, clustering and sampling are some of the commonly used non-parametric numerosity reduction techniques.

4. Data cube aggregation can be used for non-parametric numerosity reduction.
a) True
b) False

Explanation: Numerosity reduction techniques are used to obtain smaller forms of data representations. These are divided into parametric numerosity reduction techniques and non-parametric numerosity reduction techniques. Data cube aggregation is also one of the non-parametric numerosity reduction techniques.

5. If the original data can be reconstructed from the compressed data without any information loss, it is also known as _____
a) Lossless data reduction
b) Lossy data reduction
c) Loss-full data transformation
d) Lost data transformation

Explanation: Data compression is a data reduction technique in which a compressed representation of the original data is obtained. It can be lossless and lossy. When the original data is reconstructed from the compressed data without any loss of information, it is known as lossless data reduction.

6. Wavelet transformed data has _____
a) Same length as the original data
b) Double the length as the original data
c) Half the length as the original data
d) One-fourth the length as the original data

Explanation: Wavelet transform is a dimensionality reduction technique. It projects the data onto a smaller space. Initially the wavelet transformed is of the same length as the original data. Then this data is truncated and only the strongest wavelet coefficients are stored.

7. Given a set of wavelet coefficients, approximation of the original data can be created by _____
a) Again copying the same discrete wavelet transform
b) Applying inverse of the discrete wavelet transform used
c) Applying some other discrete wavelet transform
d) Aggregating the data over a certain range

Explanation: Wavelet transformed data consists of a vector of wavelet coefficients. Given a wavelet coefficient vector, applying inverse of the discrete wavelet transform used to the vector gives an approximation of the original data.

8. Which of the following is true?
a) Discrete wavelet transform is better for lossy compression than discrete Fourier transform
b) Discrete Fourier transform is better for lossy compression than discrete wavelet transform
c) Discrete Fourier transform approximation requires less space than discrete wavelet transform
d) Discrete Fourier transform provides more accurate approximation of the data

Explanation: Discrete wavelet transform and discrete Fourier transform are the techniques that can be used for data compression. However, the discrete wavelet transform performs better in lossy compression by giving a smaller and more accurate data representation.

9. Which of the following is the most appropriate reason for data normalization before application of principal component analysis?
a) The attributes with larger domains will not influence the results as compared to the attributes with smaller domains
b) The attributes with smaller domains will not influence the results as compared to the attributes with larger domains
c) Larger amount of data can be searched
d) Better clarity of data

Explanation: Usually the input data contains many attributes which have different domains. In such cases, normalization is performed, so that, the larger domain attributes do not outweigh the smaller domain attributes.

10. Which of the following is not true about principal component analysis?
a) It can handle sparse data
b) It can handle skewed data
c) It cannot handle skewed data
d) It can handle multidimensional data

Explanation: Principal component analysis is a dimensionality reduction technique. It can handle sparse data and can also work with skewed data. Multi-dimensional data is handled by reducing it to two dimensions according to the given problem.

Sanfoundry Global Education & Learning Series – Data Mining.

To practice all areas of Data Mining, here is complete set of Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]