NPTEL Introduction to Machine Learning Week 6 Assignment Answers 2024

By Sanket

NPTEL Introduction to Machine Learning Week 6 Assignment Answers 2024

1. Entropy for a 90-10 split between two classes is:

• 0.469
• 0.195
• 0.204
• None of the above
`Answer :- For Answers Click Here`

2. Consider a dataset with only one attribute(categorical). Suppose, there are 8 unordered values in this attribute, how many possible combinations are needed to find the best split-point for building the decision tree classifier?

• 511
• 1023
• 512
• 127
`Answer :- For Answers Click Here`

3. Having built a decision tree, we are using reduced error pruning to reduce the size of the tree. We select a node to collapse. For this particular node, on the left branch, there are three training data points with the following outputs: 5, 7, 9.6, and for the right branch, there are four training data points with the following outputs: 8.7, 9.8, 10.5, 11. The average value of the outputs of data points denotes the response of a branch. The original responses for data points along the two branches (left & right respectively) were response−left and, response−right and the new response after collapsing the node is response−new. What are the values for response−left, response−right and response−new (numbers in the option are given in the same order)?

• 9.6, 11, 10.4
• 7.2; 10; 8.8
• 5, 10.5, 15
• depends on the tree height.
`Answer :- For Answers Click Here`

4. Which of the following is a good strategy for reducing the variance in a decision tree?

• If improvement of taking any split is very small, don’t make a split. (Early Stopping)
• Stop splitting a leaf when the number of points is less than a set threshold K.
• Stop splitting all leaves in the decision tree when any one leaf has less than a set threshold K points.
• None of the Above.
`Answer :- `

5. Which of the following statements about multiway splits in decision trees with categorical features is correct?

• They always result in deeper trees compared to binary splits
• They always provide better interpretability than binary splits
• They can lead to overfitting when dealing with high-cardinality categorical features
• They are computationally less expensive than binary splits for all categorical features
`Answer :- `

6. Which of the following statements about imputation in data preprocessing is most accurate?

• Mean imputation is always the best method for handling missing numerical data
• Imputation should always be performed after splitting the data into training and test sets
• Missing data is best handled by simply removing all rows with any missing values
• Multiple imputation typically produces less biased estimates than single imputation methods
`Answer :- For Answers Click Here`

7. Consider the following dataset:

Which among the following split-points for feature2 would give the best split according to the misclassification error?

• 186.5
• 188.6
• 189.2
• 198.1
`Answer :- For Answers Click Here`