## NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answer 2023

**1. Which step does Nesterov accelerated gradient descent perform before finding the update size?**

- Increase the momentum
- Estimate the next position of the parameters
- Adjust the learning rate
- Decrease the step size

Answer :- For AnswerClick Here

**2. Select the parameter of vanilla gradient descent controls the step size in the direction of the gradient.**

- Learning rate
- Momentum
- Gamma
- None of the above

Answer :-

**3. What does the distance between two contour lines on a contour map represent?**

- The change in the output of function
- The direction of the function
- The rate of change of the function
- None of the above

Answer :-For AnswerClick Here

**4. Which of the following represents the contour plot of the function f(x,y) = x2−y?**

Answer :-

**5. What is the main advantage of using Adagrad over other optimization algorithms?**

- It converges faster than other optimization algorithms.
- It is less sensitive to the choice of hyperparameters (learning rate).
- It is more memory-efficient than other optimization algorithms.
- It is less likely to get stuck in local optima than other optimization algorithms.

Answer :-

**6. We are training a neural network using the vanilla gradient descent algorithm. We observe that the change in weights is small in successive iterations. What are the possible causes for the following phenomenon?**

- η is large
- ∇w is small
- ∇w is large
- η is small

Answer :-

**7. You are given labeled data which we call X where rows are data points and columns feature. One column has most of its values as 0. What algorithm should we use here for faster convergence and achieve the optimal value of the loss function?**

- NAG
- Adam
- Stochastic gradient descent
- Momentum-based gradient descent

Answer :-

**8. What is the update rule for the ADAM optimizer?**

- wt=wt−1−lr∗(mt/(vt−−√+ϵ))
- wt=wt−1−lr∗m
- wt=wt−1−lr∗(mt/(vt+ϵ))
- wt=wt−1−lr∗(vt/(mt+ϵ))

Answer :-

**9. What is the advantage of using mini-batch gradient descent over batch gradient descent?**

- Mini-batch gradient descent is more computationally efficient than batch gradient descent.
- Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch gradient descent.
- Mini batch gradient descent gives us a better solution.
- Mini-batch gradient descent can converge faster than batch gradient descent.

Answer :-

**10. Which of the following is a variant of gradient descent that uses an estimate of the next gradient to update the current position of the parameters?**

- Momentum optimization
- Stochastic gradient descent
- Nesterov accelerated gradient descent
- Adagrad

Answer :-For AnswerClick Here

## NPTEL Deep Learning – IIT Ropar Week 3 Assignment Answer 2023

1. Which of the following statements about backpropagation is true?

- It is used to optimize the weights in a neural network.
- It is used to compute the output of a neural network.
- It is used to initialize the weights in a neural network.
- It is used to regularize the weights in a neural network.

Answer:- a

2. Let y be the true class label and p be the predicted probability of the true class label in a binary classification problem. Which of the following is the correct formula for binary cross entropy?

Answer:- a

3. Let yi�� be the true class label of the i�-th instance and pi�� be the predicted probability of the true class label in a multi-class classification problem. Write down the formula for multi-class cross entropy loss.

Answer:- For AnswerClick Here

4. Can cross-entropy loss be negative between two probability distributions?

- Yes
- No

Answer:-

5. Let p� and q� be two probability distributions. Under what conditions will the cross entropy between p� and q� be minimized?

- p�=q�
- All the values in p� are lower than corresponding values in q�
- All the values in p� are lower than corresponding values in q�
- p� = 0 [0 is a vector]

Answer:-

6. Which of the following is false about cross-entropy loss between two probability distributions?

It is always in range (0,1)

It can be negative.

It is always positive.

It can be 1.

Answer:-

7. The probability of all the events x1,x2,x2….xn

in a system is equal(n>1

). What can you say about the entropy H(X)

of that system?(base of log is 2)

- H(X)≤1
- H(X)=1
- H(X)≥1
- We can’t say anything conclusive with the provided information.

Answer:-

8. Suppose we have a problem where data x

and label y

are related by y=x4+1

. Which of the following is not a good choice for the activation function in the hidden layer if the activation function at the output layer is linear?

- Linear
- Relu
- Sigmoid
- Tan
^{−1}(x)

Answer:-

9. We are given that the probability of Event A happening is 0.95 and the probability of Event B happening is 0.05. Which of the following statements is True?

- Event A has a high information content
- Event B has a low information content
- Event A has a low information content
- Event B has a high information content

Answer:-

10. Which of the following activation functions can only give positive outputs greater than 0?

- Sigmoid
- ReLU
- Tanh
- Linear

Answer:- For AnswerClick Here

## NPTEL Deep Learning – IIT Ropar Week 2 Assignment Answer 2023

**1. What is the range of the sigmoid function σ(x)=1/1+e ^{−x}? **

- (−1,1)
- (0,1)
- −∞,∞)
- (0,∞)

Answer :-

**2. What happens to the output of the sigmoid function as |x| very small?**

- The output approaches 0.5
- The output approaches 1.
- The output oscillates between 0 and 1.
- The output becomes undefined.

Answer :-

**3. Which of the following theorem states that a neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function?**

- Bayes’ theorem
- Central limit theorem
- Fourier’s theorem
- Universal approximation theorem

Answer :-

**4. We have a function that we want to approximate using 150 rectangles (towers). How many neurons are required to construct the required network?**

- 301
- 451
- 150
- 500

Answer :-

**5. A neural network has two hidden layers with 5 neurons in each layer, and an output layer with 3 neurons, and an input layer with 2 neurons. How many weights are there in total? (Dont assume any bias terms in the network)**

Answer :-

**6. What is the derivative of the ReLU activation function with respect to its input at 0?**

- 0
- 1
- −1
- Not differentiable

Answer :-

**7. Consider a function f(x)=x ^{3}−3x^{2}+2. What is the updated value of xafter 3rd iteration of the gradient descent update, if the learning rate is 0.10.1 and the initial value of x is 4?**

Answer :-

**8. Which of the following statements is true about the representation power of a multilayer network of sigmoid neurons?**

- A multilayer network of sigmoid neurons can represent any Boolean function.
- A multilayer network of sigmoid neurons can represent any continuous function.
- A multilayer network of sigmoid neurons can represent any function.
- A multilayer network of sigmoid neurons can represent any linear function.

Answer :-

**9. How many boolean functions can be designed for 3 inputs?**

- 65,536
- 82
- 56
- 64

Answer :-

**10. How many neurons do you need in the hidden layer of a perceptron to learn any boolean function with 6 inputs? (Only one hidden layer is allowed)**

- 16
- 64
- 16
- 32

Answer :- Click Here

## NPTEL Deep Learning – IIT Ropar Week 1 Assignment Answer 2023

**1. The table below shows the temperature and humidity data for two cities. Is the data linearly separable?**

- Yes
- No
- Cannot be determined from the given information

Answer :-Yes

**2. What is the perceptron algorithm used for?**

- Clustering data points
- Finding the shortest path in a graph
- Classifying data
- Solving optimization problems

Answer :-Classifying data

**3. What is the most common activation function used in perceptrons?**

- Sigmoid
- ReLU
- Tanh
- Step

Answer :- Click Here

**4. Which of the following Boolean functions cannot be implemented by a perceptron?**

- AND
- OR
- XOR
- NOT

Answer :-

**5. We are given 4 points in R2 say, x1=(0,1),x2=(−1,−1),x3=(2,3),x4=(4,−5).Labels of x1,x2,x3,x4 are given to be −1,1,−1,1 We initiate the perceptron algorithm with an initial weight w0=(0,0) on this data. What will be the value of w0 after the algorithm converges? (Take points in sequential order from x1 to x)( update happens when the value of weight changes)**

- (0,0)
- (−2,−2)
- (−2,−3)
- (1,1)

Answer :-

**6. We are given the following data:**

Can you classify every label correctly by training a perceptron algorithm? (assume bias to be 0 while training)

- Yes
- No

Answer :-

**7. Suppose we have a boolean function that takes 5 inputs x1,x2,x3,x4,x5? We have an MP neuron with parameter θ=1. For how many inputs will this MP neuron give output y=1?**

- 21
- 31
- 30
- 32

Answer :-

**8. Which of the following best represents the meaning of term “Artificial Intelligence”?**

- The ability of a machine to perform tasks that normally require human intelligence
- The ability of a machine to perform simple, repetitive tasks
- The ability of a machine to follow a set of pre-defined rules
- The ability of a machine to communicate with other machines

Answer :-

**9. Which of the following statements is true about error surfaces in deep learning?**

- They are always convex functions.
- They can have multiple local minima.
- They are never continuous.
- They are always linear functions.

Answer :-

**10. What is the output of the following MP neuron for the AND Boolean function?**

**y={1,0,if x1+x2+x3≥1 0, therwise **

- y=1 for (x1,x2,x3)=(0,1,1)
- y=0 for (x1,x2,x3)=(0,0,1)
- y=1 for (x1,x2,x3)=(1,1,1)
- y=0 for (x1,x2,x3)=(1,0,0)

Answer :- Click Here