**NPTEL Introduction To Machine Learning Assignment Answer**

## NPTEL Introduction To Machine Learning Week 4 Assignment Answer 2023

**Q1. Consider the data set given below.**

Claim: PLA (perceptron learning algorithm) can learn a classifier that achieves zero misclassification error on the training data. This claim is:

True

False

Depends on the initial weights

True, only if we normalize the feature vectors before applying PLA.

**Answer:- **For Answer Click Here

Q2. Which of the following loss functions are convex? (Multiple options may be correct)

- 0-1 loss (sometimes referred as mis-classification loss)
- Hinge loss
- Logistic loss
- Squared error loss

**Answer:- **

Q3. Which of the following are valid kernel functions?

- (1+ < x, x’ >)
^{d} - tanℎ(K
_{1}<x,x’>+K2) - exp(−γ||x−x’||2)

**Answer:- **For Answer Click Here

Q4. Consider the 1 dimensional dataset:

(Note: x is the feature, and y is the output)

State true or false: The dataset becomes linearly separable after using basis expansion with the following basis function ϕ(x)=[1x^{3}]

- True
- False

**Answer:- **

Q5. State True or False:

SVM cannot classify data that is not linearly separable even if we transform it to a higherdimensional space.

- True
- False

**Answer:- **For Answer Click Here

Q6. State True or False:

The decision boundary obtained using the perceptron algorithm does not depend on the initial values of the weights.

- True
- False

**Answer:- **

Q7. Consider a linear SVM trained with n labeled points in R^{2} without slack penalties and resulting in k=2 support vectors, where n>100. By removing one labeled training point and retraining the SVM classifier, what is the maximum possible number of support vectors in the resulting solution?

- 1
- 2
- 3
- n − 1
- n

**Answer:- **For Answer Click Here

Q8. Consider an SVM with a second order polynomial kernel. Kernel 1 maps each input data point x to K_{1}(x)=[x x^{2}]. Kernel 2 maps each input data point x to K_{2}(x)=[3x 3×2]. Assume the hyper-parameters are fixed. Which of the following option is true?

- The margin obtained using K
_{2}(x) will be larger than the margin obtained using K_{1}(x). - The margin obtained using K
_{2}(x) will be smaller than the margin obtained using K_{1}(x). - The margin obtained using K
_{2}(x) will be the same as the margin obtained using K_{1}(x).

**Answer:- **

## NPTEL Introduction To Machine Learning Week 3 Assignment Answer 2023

**1. Which of the following are differences between LDA and Logistic Regression?**

- Logistic Regression is typically suited for binary classification, whereas LDA is directly applicable to multi-class problems
- Logistic Regression is robust to outliers whereas LDA is sensitive to outliers
- both (a) and (b)
- None of these

**Answer :-** c

**2. We have two classes in our dataset. The two classes have the same mean but different variance.**

LDA can classify them perfectly.

LDA can NOT classify them perfectly.

LDA is not applicable in data with these properties

Insufficient information

**Answer :-** b

**3. We have two classes in our dataset. The two classes have the same variance but different mean.**

LDA can classify them perfectly.

LDA can NOT classify them perfectly.

LDA is not applicable in data with these properties

Insufficient information

**Answer :- **d

**4. Given the following distribution of data points:**

What method would you choose to perform Dimensionality Reduction?

Linear Discriminant Analysis

Principal Component Analysis

Both LDA and/or PCA.

None of the above.

**Answer :-** a

**5. If log(1−p(x)/1+p(x))=β0+βx What is p(x) ?**

p(x)=1+eβ0+βx / eβ0+βx

p(x)=1+eβ0+βx / 1−eβ0+βx

p(x)=eβ0+βx / 1+eβ0+βx

p(x)=1−eβ0+βx / 1+eβ0+βx

**Answer :- **d

**6. For the two classes ’+’ and ’-’ shown below.**

While performing LDA on it, which line is the most appropriate for projecting data points?

Red

Orange

Blue

Green

**Answer :-** c

**7. Which of these techniques do we use to optimise Logistic Regression:**

Least Square Error

Maximum Likelihood

(a) or (b) are equally good

(a) and (b) perform very poorly, so we generally avoid using Logistic Regression

None of these

**Answer :- **b

**8. LDA assumes that the class data is distributed as:**

Poisson

Uniform

Gaussian

LDA makes no such assumption.

**Answer :-** c

9. Suppose we have two variables, X and Y (the dependent variable), and we wish to find their relation. An expert tells us that relation between the two has the form Y=meX+c. Suppose the samples of the variables X and Y are available to us. Is it possible to apply linear regression to this data to estimate the values of m and c ?

No.

Yes.

Insufficient information.

None of the above.

**Answer :- **b

**10. What might happen to our logistic regression model if the number of features is more than the number of samples in our dataset?**

It will remain unaffected

It will not find a hyperplane as the decision boundary

It will over fit

None of the above

**Answer :-** c

## NPTEL Introduction To Machine Learning Week 2 Assignment Answer 2023

**1. The parameters obtained in linear regression**

- can take any value in the real space
- are strictly integers
- always lie in the range [0,1]
- can take only non-zero values

**Answer :-** a. can take any value in the real space

**2. Suppose that we have N independent variables (X1,X2,…Xn) and the dependent variable is Y . Now imagine that you are applying linear regression by fitting the best fit line using the least square error on this data. You found that the correlation coefficient for one of its variables (Say X1) with Y is -0.005.**

- Regressing Yon X1 mostly does not explain away Y .
- Regressing Y on X1 explains away Y .
- The given data is insufficient to determine if regressing Yon X1 explains away Y or not.

**Answer :- b. Regressing Yon X1 mostly does not explain away Y .**

**3. Which of the following is a limitation of subset selection methods in regression?**

- They tend to produce biased estimates of the regression coefficients.
- They cannot handle datasets with missing values.
- They are computationally expensive for large datasets.
- They assume a linear relationship between the independent and dependent variables.
- They are not suitable for datasets with categorical predictors.

**Answer :- **c. They are computationally expensive for large datasets.

**4. The relation between studying time (in hours) and grade on the final examination (0-100) in a random sample of students in the Introduction to Machine Learning Class was found to be:Grade = 30.5 + 15.2 (h)**

**How will a student’s grade be affected if she studies for four hours?**

- It will go down by 30.4 points.
- It will go down by 30.4 points.
- It will go up by 60.8 points.
- The grade will remain unchanged.
- It cannot be determined from the information given

**Answer :- **c. It will go up by 60.8 points.

**5. Which of the statements is/are True?**

- Ridge has sparsity constraint, and it will drive coefficients with low values to 0.
- Lasso has a closed form solution for the optimization problem, but this is not the case for Ridge.
- Ridge regression does not reduce the number of variables since it never leads a coefficient to zero but only minimizes it.
- If there are two or more highly collinear variables, Lasso will select one of them randomly

**Answer :- **c. Ridge regression does not reduce the number of variables since it never leads a coefficient to zero but only minimizes it.
d. If there are two or more highly collinear variables, Lasso will select one of them randomly

**6. Find the mean of squared error for the given predictions:**

**Hint: Find the squared error for each prediction and take the mean of that.**

**1****2****1.5****0**

**Answer :- **a. 1

**7. Consider the following statements:Statement A: In Forward stepwise selection, in each step, that variable is chosen which has the maximum correlation with the residual, then the residual is regressed on that variable, and it is added to the predictor.Statement B: In Forward stagewise selection, the variables are added one by one to the previously selected variables to produce the best fit till then**

- Both the statements are True.
- Statement A is True, and Statement B is False
- Statement A is False and Statement B is True
- Both the statements are False.

**Answer :- **a. Both the statements are True.

**8. The linear regression model y=a _{0}+a_{1}x_{1}+a_{2}x_{2}+…….+a_{p}x_{p} is to be fitted to a set of N training data points having p attributes each. Let X be N×(p+1) vectors of input values (augmented by 1‘s), Y be N×1 vector of target values, and θθ be (p+1)×1 vector of parameter values (a_{0},a_{1},a_{2},…,a_{p}. If the sum squared error is minimized for obtaining the optimal regression model, which of the following equation holds?**

- X
^{T}X=XY - Xθ=X
^{T}Yθ - X
^{T}Xθ =Y - X
^{T}Xθ=X^{T}Y

**Answer :- **d. X^{T}Xθ=X^{T}Y

**9. Which of the following statements is true regarding Partial Least Squares (PLS) regression?**

- PLS is a dimensionality reduction technique that maximizes the covariance between the predictors and the dependent variable.
- PLS is only applicable when there is no multicollinearity among the independent variables.
- PLS can handle situations where the number of predictors is larger than the number of observations.
- PLS estimates the regression coefficients by minimizing the residual sum of squares.
- PLS is based on the assumption of normally distributed residuals.
- All of the above.
- None of the above.

**Answer :- **a

**10. Which of the following statements about principal components in Principal Component Regression (PCR) is true?**

- Principal components are calculated based on the correlation matrix of the original predictors.
- The first principal component explains the largest proportion of the variation in the dependent variable.
- Principal components are linear combinations of the original predictors that are uncorrelated with each other.
- PCR selects the principal components with the highest p-values for inclusion in the regression model.
- PCR always results in a lower model complexity compared to ordinary least squares regression.

**Answer :- **c. Principal components are linear combinations of the original predictors that are uncorrelated with each other.

Course Name | Introduction To Machine Learning |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## NPTEL Introduction To Machine Learning Week 1 Assignment Answer 2023

**1. Which of the following is a supervised learning problem?**

- Grouping related documents from an unannotated corpus.
- Predicting credit approval based on historical data.
- Predicting if a new image has cat or dog based on the historical data of other images of cats and dogs, where you are supplied the information about which image is cat or dog.
- Fingerprint recognition of a particular person used in biometric attendance from the fingerprint data of various other people and that particular person.

**Answer :- b, c, d**

**2. Which of the following are classification problems?**

- Predict the runs a cricketer will score in a particular match.
- Predict which team will win a tournament.
- Predict whether it will rain today.
- Predict your mood tomorrow.

**Answer :- b, c, d**

**3. Which of the following is a regression task?**

- Predicting the monthly sales of a cloth store in rupees.
- Predicting if a user would like to listen to a newly released song or not based on historical data.
- Predicting the confirmation probability (in fraction) of your train ticket whose current status is waiting list based on historical data.
- Predicting if a patient has diabetes or not based on historical medical records.
- Predicting if a customer is satisfied or unsatisfied from the product purchased from ecommerce website using the the reviews he/she wrote for the purchased product.

**Answer :- a, c**

**4. Which of the following is an unsupervised learning task?**

- Group audio files based on language of the speakers.
- Group applicants to a university based on their nationality.
- Predict a student’s performance in the final exams.
- Predict the trajectory of a meteorite.

**Answer :- a, b**

**5. Which of the following is a categorical feature?**

- Number of rooms in a hostel.
- Gender of a person
- Your weekly expenditure in rupees.
- Ethnicity of a person
- Area (in sq. centimeter) of your laptop screen.
- The color of the curtains in your room.
- Number of legs an animal.
- Minimum RAM requirement (in GB) of a system to play a game like FIFA, DOTA.

**Answer :- b, d, f**

**6. Which of the following is a reinforcement learning task?**

- Learning to drive a cycle
- Learning to predict stock prices
- Learning to play chess
- Leaning to predict spam labels for e-mails

**Answer :- a, c**

**7. Let X and Y be a uniformly distributed random variable over the interval [0,4][0,4] and [0,6][0,6] respectively. If X and Y are independent events, then compute the probability, P(max(X,Y)>3)**

- 1/6
- 5/6
- 2/3
- 1/2
- 2/6
- 5/8
- None of the above

**Answer :- f (5/8)**

**8. Find the mean of 0-1 loss for the given predictions:**

- 1
- 0
- 1.5
- 0.5

**Answer :- d (0.5)**

**9. Which of the following statements are true? Check all that apply.**

- A model with more parameters is more prone to overfitting and typically has higher variance.
- If a learning algorithm is suffering from high bias, only adding more training examples may not improve the test error significantly.
- When debugging learning algorithms, it is useful to plot a learning curve to understand if there is a high bias or high variance problem.
- If a neural network has much lower training error than test error, then adding more layers will help bring the test error down because we can fit the test set better.

**Answer :- b, d**

**10. Bias and variance are given by:**

- E[f^(x)]−f(x),E[(E[f^(x)]−f^(x))
^{2}] - E[f^(x)]−f(x),E[(E[f^(x)]−f^(x))]
^{2} - (E[f^(x)]−f(x))2,E[(E[f^(x)]−f^(x))
^{2}] - (E[f^(x)]−f(x))2,E[(E[f^(x)]−f^(x))]
^{2}

**Answer :- a**

Course Name | Introduction To Machine Learning |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |