**NPTEL Deep Learning – IIT Ropar Assignment Answer**

## NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answer 2023

**1. Which step does Nesterov accelerated gradient descent perform before finding the update size?**

- Increase the momentum
- Estimate the next position of the parameters
- Adjust the learning rate
- Decrease the step size

**Answer :- For Answer ****Click Here**

**2. Select the parameter of vanilla gradient descent controls the step size in the direction of the gradient.**

- Learning rate
- Momentum
- Gamma
- None of the above

**Answer :- **For Answer **Click Here**

**3. What does the distance between two contour lines on a contour map represent?**

- The change in the output of function
- The direction of the function
- The rate of change of the function
- None of the above

**Answer :- **For Answer **Click Here**

**4. Which of the following represents the contour plot of the function f(x,y) = x2−y?**

**Answer :- **For Answer **Click Here**

**5. What is the main advantage of using Adagrad over other optimization algorithms?**

- It converges faster than other optimization algorithms.
- It is less sensitive to the choice of hyperparameters (learning rate).
- It is more memory-efficient than other optimization algorithms.
- It is less likely to get stuck in local optima than other optimization algorithms.

**Answer :- **For Answer **Click Here**

**6. We are training a neural network using the vanilla gradient descent algorithm. We observe that the change in weights is small in successive iterations. What are the possible causes for the following phenomenon?**

- η is large
- ∇w is small
- ∇w is large
- η is small

**Answer :- **For Answer **Click Here**

**7. You are given labeled data which we call X where rows are data points and columns feature. One column has most of its values as 0. What algorithm should we use here for faster convergence and achieve the optimal value of the loss function?**

- NAG
- Adam
- Stochastic gradient descent
- Momentum-based gradient descent

**Answer :- **For Answer **Click Here**

**8. What is the update rule for the ADAM optimizer?**

- wt=wt−1−lr∗(mt/(vt−−√+ϵ))
- wt=wt−1−lr∗m
- wt=wt−1−lr∗(mt/(vt+ϵ))
- wt=wt−1−lr∗(vt/(mt+ϵ))

**Answer :- **For Answer **Click Here**

**9. What is the advantage of using mini-batch gradient descent over batch gradient descent?**

- Mini-batch gradient descent is more computationally efficient than batch gradient descent.
- Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch gradient descent.
- Mini batch gradient descent gives us a better solution.
- Mini-batch gradient descent can converge faster than batch gradient descent.

**Answer :- **For Answer **Click Here**

**10. Which of the following is a variant of gradient descent that uses an estimate of the next gradient to update the current position of the parameters?**

- Momentum optimization
- Stochastic gradient descent
- Nesterov accelerated gradient descent
- Adagrad

**Answer :- **For Answer **Click Here**

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## NPTEL Deep Learning – IIT Ropar Week 3 Assignment Answer 2023

1. Which of the following statements about backpropagation is true?

- It is used to optimize the weights in a neural network.
- It is used to compute the output of a neural network.
- It is used to initialize the weights in a neural network.
- It is used to regularize the weights in a neural network.

`Answer:- a`

2. Let y be the true class label and p be the predicted probability of the true class label in a binary classification problem. Which of the following is the correct formula for binary cross entropy?

`Answer:- a`

3. Let yi�� be the true class label of the i�-th instance and pi�� be the predicted probability of the true class label in a multi-class classification problem. Write down the formula for multi-class cross entropy loss.

`Answer:- For Answer `**Click Here**

4. Can cross-entropy loss be negative between two probability distributions?

- Yes
- No

`Answer:- For Answer `**Click Here**

5. Let p� and q� be two probability distributions. Under what conditions will the cross entropy between p� and q� be minimized?

- p�=q�
- All the values in p� are lower than corresponding values in q�
- All the values in p� are lower than corresponding values in q�
- p� = 0 [0 is a vector]

`Answer:- For Answer `**Click Here**

6. Which of the following is false about cross-entropy loss between two probability distributions?

It is always in range (0,1)

It can be negative.

It is always positive.

It can be 1.

`Answer:- For Answer `**Click Here**

7. The probability of all the events x1,x2,x2….xn

in a system is equal(n>1

). What can you say about the entropy H(X)

of that system?(base of log is 2)

- H(X)≤1
- H(X)=1
- H(X)≥1
- We can’t say anything conclusive with the provided information.

`Answer:- For Answer `**Click Here**

8. Suppose we have a problem where data x

and label y

are related by y=x4+1

. Which of the following is not a good choice for the activation function in the hidden layer if the activation function at the output layer is linear?

- Linear
- Relu
- Sigmoid
- Tan
^{−1}(x)

`Answer:- For Answer `**Click Here**

9. We are given that the probability of Event A happening is 0.95 and the probability of Event B happening is 0.05. Which of the following statements is True?

- Event A has a high information content
- Event B has a low information content
- Event A has a low information content
- Event B has a high information content

`Answer:-For Answer `**Click Here**

10. Which of the following activation functions can only give positive outputs greater than 0?

- Sigmoid
- ReLU
- Tanh
- Linear

`Answer:- For Answer `**Click Here**

## NPTEL Deep Learning – IIT Ropar Week 2 Assignment Answer 2023

**1. What is the range of the sigmoid function σ(x)=1/1+e ^{−x}? **

- (−1,1)
- (0,1)
- −∞,∞)
- (0,∞)

**Answer :- Click Here**

**2. What happens to the output of the sigmoid function as |x| very small?**

- The output approaches 0.5
- The output approaches 1.
- The output oscillates between 0 and 1.
- The output becomes undefined.

**Answer :- Click Here**

**3. Which of the following theorem states that a neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function?**

- Bayes’ theorem
- Central limit theorem
- Fourier’s theorem
- Universal approximation theorem

**Answer :- Click Here**

**4. We have a function that we want to approximate using 150 rectangles (towers). How many neurons are required to construct the required network?**

- 301
- 451
- 150
- 500

**Answer :- Click Here**

**5. A neural network has two hidden layers with 5 neurons in each layer, and an output layer with 3 neurons, and an input layer with 2 neurons. How many weights are there in total? (Dont assume any bias terms in the network)**

**Answer :- Click Here**

**6. What is the derivative of the ReLU activation function with respect to its input at 0?**

- 0
- 1
- −1
- Not differentiable

**Answer :- **

**7. Consider a function f(x)=x ^{3}−3x^{2}+2. What is the updated value of xafter 3rd iteration of the gradient descent update, if the learning rate is 0.10.1 and the initial value of x is 4?**

**Answer :- Click Here**

**8. Which of the following statements is true about the representation power of a multilayer network of sigmoid neurons?**

- A multilayer network of sigmoid neurons can represent any Boolean function.
- A multilayer network of sigmoid neurons can represent any continuous function.
- A multilayer network of sigmoid neurons can represent any function.
- A multilayer network of sigmoid neurons can represent any linear function.

**Answer :- Click Here**

**9. How many boolean functions can be designed for 3 inputs?**

- 65,536
- 82
- 56
- 64

**Answer :- Click Here**

**10. How many neurons do you need in the hidden layer of a perceptron to learn any boolean function with 6 inputs? (Only one hidden layer is allowed)**

- 16
- 64
- 16
- 32

**Answer :- Click Here**

## NPTEL Deep Learning – IIT Ropar Week 1 Assignment Answer 2023

**1. The table below shows the temperature and humidity data for two cities. Is the data linearly separable?**

- Yes
- No
- Cannot be determined from the given information

**Answer :- **Yes

**2. What is the perceptron algorithm used for?**

- Clustering data points
- Finding the shortest path in a graph
- Classifying data
- Solving optimization problems

**Answer :- **Classifying data

**3. What is the most common activation function used in perceptrons?**

- Sigmoid
- ReLU
- Tanh
- Step

**Answer :- Click Here**

**4. Which of the following Boolean functions cannot be implemented by a perceptron?**

- AND
- OR
- XOR
- NOT

**Answer :- Click Here**

**5. We are given 4 points in R2 say, x1=(0,1),x2=(−1,−1),x3=(2,3),x4=(4,−5).Labels of x1,x2,x3,x4 are given to be −1,1,−1,1 We initiate the perceptron algorithm with an initial weight w0=(0,0) on this data. What will be the value of w0 after the algorithm converges? (Take points in sequential order from x1 to x)( update happens when the value of weight changes)**

- (0,0)
- (−2,−2)
- (−2,−3)
- (1,1)

**Answer :- Click Here**

**6. We are given the following data:**

Can you classify every label correctly by training a perceptron algorithm? (assume bias to be 0 while training)

- Yes
- No

**Answer :- Click Here**

**7. Suppose we have a boolean function that takes 5 inputs x1,x2,x3,x4,x5? We have an MP neuron with parameter θ=1. For how many inputs will this MP neuron give output y=1?**

- 21
- 31
- 30
- 32

**Answer :- Click Here**

**8. Which of the following best represents the meaning of term “Artificial Intelligence”?**

- The ability of a machine to perform tasks that normally require human intelligence
- The ability of a machine to perform simple, repetitive tasks
- The ability of a machine to follow a set of pre-defined rules
- The ability of a machine to communicate with other machines

**Answer :- Click Here**

**9. Which of the following statements is true about error surfaces in deep learning?**

- They are always convex functions.
- They can have multiple local minima.
- They are never continuous.
- They are always linear functions.

**Answer :- **

**10. What is the output of the following MP neuron for the AND Boolean function?**

**y={1,0,if x1+x2+x3≥1 0, therwise **

- y=1 for (x1,x2,x3)=(0,1,1)
- y=0 for (x1,x2,x3)=(0,0,1)
- y=1 for (x1,x2,x3)=(1,1,1)
- y=0 for (x1,x2,x3)=(1,0,0)

**Answer :- Click Here**

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |