NPTEL Introduction to Machine Learning Week 6 Assignment Answers 2024
1. Entropy for a 90-10 split between two classes is:
- 0.469
- 0.195
- 0.204
- None of the above
Answer :- For Answers Click Here
2. Consider a dataset with only one attribute(categorical). Suppose, there are 8 unordered values in this attribute, how many possible combinations are needed to find the best split-point for building the decision tree classifier?
- 511
- 1023
- 512
- 127
Answer :- For Answers Click Here
3. Having built a decision tree, we are using reduced error pruning to reduce the size of the tree. We select a node to collapse. For this particular node, on the left branch, there are three training data points with the following outputs: 5, 7, 9.6, and for the right branch, there are four training data points with the following outputs: 8.7, 9.8, 10.5, 11. The average value of the outputs of data points denotes the response of a branch. The original responses for data points along the two branches (left & right respectively) were response−left and, response−right and the new response after collapsing the node is response−new. What are the values for response−left, response−right and response−new (numbers in the option are given in the same order)?
- 9.6, 11, 10.4
- 7.2; 10; 8.8
- 5, 10.5, 15
- depends on the tree height.
Answer :- For Answers Click Here
4. Which of the following is a good strategy for reducing the variance in a decision tree?
- If improvement of taking any split is very small, don’t make a split. (Early Stopping)
- Stop splitting a leaf when the number of points is less than a set threshold K.
- Stop splitting all leaves in the decision tree when any one leaf has less than a set threshold K points.
- None of the Above.
Answer :-
5. Which of the following statements about multiway splits in decision trees with categorical features is correct?
- They always result in deeper trees compared to binary splits
- They always provide better interpretability than binary splits
- They can lead to overfitting when dealing with high-cardinality categorical features
- They are computationally less expensive than binary splits for all categorical features
Answer :-
6. Which of the following statements about imputation in data preprocessing is most accurate?
- Mean imputation is always the best method for handling missing numerical data
- Imputation should always be performed after splitting the data into training and test sets
- Missing data is best handled by simply removing all rows with any missing values
- Multiple imputation typically produces less biased estimates than single imputation methods
Answer :- For Answers Click Here
7. Consider the following dataset:
Which among the following split-points for feature2 would give the best split according to the misclassification error?
- 186.5
- 188.6
- 189.2
- 198.1
Answer :- For Answers Click Here