NPTEL Deep Learning - IIT Ropar Week 4 Assignment Answers 2024

NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answers 2024

1. What is the primary benefit of using Adagrad compared to other optimization algorithms?

Contents

NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answers 2024 NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answer 2023 NPTEL Deep Learning – IIT Ropar Week 3 Assignment Answer 2023 NPTEL Deep Learning – IIT Ropar Week 2 Assignment Answer 2023 NPTEL Deep Learning – IIT Ropar Week 1 Assignment Answer 2023

It converges faster than other optimization algorithms.
It is more memory-efficient than other optimization algorithms.
It is less sensitive to the choice of hyperparameters(learning rate).
It is less likely to get stuck in local optima than other optimization algorithms.

Answer :- For Answer Click Here

2. A team has a data set that contains 100 samples for training a feed-forward neural network. Suppose they decided to use the gradient descent algorithm to update the weights. Suppose further that they use line search algorithm for the learning rate as follows, η=[0.01,0.1,1,2,10]. How many times do the weights get updated after training the network for 10 epochs? (Note, for each weight update the loss has to decrease)

Answer :- For Answer Click Here

3. The figure below shows the change in loss value over iterations

The oscillation in the loss value might be due to

Mini-batch gradient descent algorithm used for parameter updates
Batch gradient descent with constant learning rate algorithm used for parameter updates
Stochastic gradient descent algorithm used for parameter updates
Batch gradient descent with line search algorithm used for parameter updates

Answer :-

4. Given data where one column predominantly contains zero values, which algorithm should be used to achieve faster convergence and optimize the loss function?

Adam
NAG
Momentum-based gradient descent
Stochastic gradient descent

Answer :- For Answer Click Here

5. In Nesterov accelerated gradient descent, what step is performed before determining the update size?

Increase the momentum
Adjust the learning rate
Decrease the step size
Estimate the next position of the parameters

Answer :-

6. We have following functions x³,ln(x),e^x,x and 4. Which of the following functions has the steepest slope at x=1?

x³
ln(x)
e^x
4

Answer :-

7. Which of the following represents the contour plot of the function f(x,y) = x²−y² ?

Answer :- For Answer Click Here

8. Which parameter in vanilla gradient descent determines the step size taken in the direction of the gradient?

Learning rate
Momentum
Gamma
None of the above

Answer :-

9. Which of the following are among the disadvantages of Adagrad?

It doesn’t work well for the Sparse matrix.
It usually goes past the minima.
It gets stuck before reaching the minima.
Weight updates are very small at the initial stages of the algorithm.

Answer :-

10. Which of the following can help avoid getting stuck in a poor local minimum while training a deep neural network?

Using a smaller learning rate.
Using a smaller batch size.
Using a shallow neural network instead.
None of the above.

Answer :- For Answer Click Here

NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answer 2023

1. Which step does Nesterov accelerated gradient descent perform before finding the update size?

Increase the momentum
Estimate the next position of the parameters
Adjust the learning rate
Decrease the step size

Answer :- For Answer Click Here

2. Select the parameter of vanilla gradient descent controls the step size in the direction of the gradient.

Learning rate
Momentum
Gamma
None of the above

Answer :-

3. What does the distance between two contour lines on a contour map represent?

The change in the output of function
The direction of the function
The rate of change of the function
None of the above

Answer :- For Answer Click Here

4. Which of the following represents the contour plot of the function f(x,y) = x2−y?

Answer :-

5. What is the main advantage of using Adagrad over other optimization algorithms?

It converges faster than other optimization algorithms.
It is less sensitive to the choice of hyperparameters (learning rate).
It is more memory-efficient than other optimization algorithms.
It is less likely to get stuck in local optima than other optimization algorithms.

Answer :-

6. We are training a neural network using the vanilla gradient descent algorithm. We observe that the change in weights is small in successive iterations. What are the possible causes for the following phenomenon?

η is large
∇w is small
∇w is large
η is small

Answer :-

7. You are given labeled data which we call X where rows are data points and columns feature. One column has most of its values as 0. What algorithm should we use here for faster convergence and achieve the optimal value of the loss function?

NAG
Adam
Stochastic gradient descent
Momentum-based gradient descent

Answer :-

8. What is the update rule for the ADAM optimizer?

wt=wt−1−lr∗(mt/(vt−−√+ϵ))
wt=wt−1−lr∗m
wt=wt−1−lr∗(mt/(vt+ϵ))
wt=wt−1−lr∗(vt/(mt+ϵ))

Answer :-

9. What is the advantage of using mini-batch gradient descent over batch gradient descent?

Mini-batch gradient descent is more computationally efficient than batch gradient descent.
Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch gradient descent.
Mini batch gradient descent gives us a better solution.
Mini-batch gradient descent can converge faster than batch gradient descent.

Answer :-

10. Which of the following is a variant of gradient descent that uses an estimate of the next gradient to update the current position of the parameters?

Momentum optimization
Stochastic gradient descent
Nesterov accelerated gradient descent
Adagrad

Answer :- For Answer Click Here

NPTEL Deep Learning – IIT Ropar Week 3 Assignment Answer 2023

1. Which of the following statements about backpropagation is true?

It is used to optimize the weights in a neural network.
It is used to compute the output of a neural network.
It is used to initialize the weights in a neural network.
It is used to regularize the weights in a neural network.

Answer:- a

2. Let y be the true class label and p be the predicted probability of the true class label in a binary classification problem. Which of the following is the correct formula for binary cross entropy?

Answer:- a

3. Let yi�� be the true class label of the i�-th instance and pi�� be the predicted probability of the true class label in a multi-class classification problem. Write down the formula for multi-class cross entropy loss.

Answer:- For Answer Click Here

4. Can cross-entropy loss be negative between two probability distributions?

Answer:-

5. Let p� and q� be two probability distributions. Under what conditions will the cross entropy between p� and q� be minimized?

p�=q�
All the values in p� are lower than corresponding values in q�
All the values in p� are lower than corresponding values in q�
p� = 0 [0 is a vector]

Answer:-

6. Which of the following is false about cross-entropy loss between two probability distributions?
It is always in range (0,1)
It can be negative.
It is always positive.
It can be 1.

Answer:-

7. The probability of all the events x1,x2,x2….xn
in a system is equal(n>1
). What can you say about the entropy H(X)
of that system?(base of log is 2)

H(X)≤1
H(X)=1
H(X)≥1
We can’t say anything conclusive with the provided information.

Answer:-

8. Suppose we have a problem where data x
and label y
are related by y=x4+1
. Which of the following is not a good choice for the activation function in the hidden layer if the activation function at the output layer is linear?

Linear
Relu
Sigmoid
Tan⁻¹(x)

Answer:-

9. We are given that the probability of Event A happening is 0.95 and the probability of Event B happening is 0.05. Which of the following statements is True?

Event A has a high information content
Event B has a low information content
Event A has a low information content
Event B has a high information content

Answer:-

10. Which of the following activation functions can only give positive outputs greater than 0?

Sigmoid
ReLU
Tanh
Linear

Answer:- For Answer Click Here

NPTEL Deep Learning – IIT Ropar Week 2 Assignment Answer 2023

1. What is the range of the sigmoid function σ(x)=1/1+e^−x?

(−1,1)
(0,1)
−∞,∞)
(0,∞)

Answer :-

2. What happens to the output of the sigmoid function as |x| very small?

The output approaches 0.5
The output approaches 1.
The output oscillates between 0 and 1.
The output becomes undefined.

Answer :-

3. Which of the following theorem states that a neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function?

Bayes’ theorem
Central limit theorem
Fourier’s theorem
Universal approximation theorem

Answer :-

4. We have a function that we want to approximate using 150 rectangles (towers). How many neurons are required to construct the required network?

Answer :-

5. A neural network has two hidden layers with 5 neurons in each layer, and an output layer with 3 neurons, and an input layer with 2 neurons. How many weights are there in total? (Dont assume any bias terms in the network)

Answer :-

6. What is the derivative of the ReLU activation function with respect to its input at 0?

0
1
−1
Not differentiable

Answer :-

7. Consider a function f(x)=x³−3x²+2. What is the updated value of xafter 3rd iteration of the gradient descent update, if the learning rate is 0.10.1 and the initial value of x is 4?

Answer :-

8. Which of the following statements is true about the representation power of a multilayer network of sigmoid neurons?

A multilayer network of sigmoid neurons can represent any Boolean function.
A multilayer network of sigmoid neurons can represent any continuous function.
A multilayer network of sigmoid neurons can represent any function.
A multilayer network of sigmoid neurons can represent any linear function.

Answer :-

9. How many boolean functions can be designed for 3 inputs?

65,536
82
56
64

Answer :-

10. How many neurons do you need in the hidden layer of a perceptron to learn any boolean function with 6 inputs? (Only one hidden layer is allowed)

Answer :- Click Here

NPTEL Deep Learning – IIT Ropar Week 1 Assignment Answer 2023

1. The table below shows the temperature and humidity data for two cities. Is the data linearly separable?
a1q1

Yes
No
Cannot be determined from the given information

Answer :- Yes

2. What is the perceptron algorithm used for?

Clustering data points
Finding the shortest path in a graph
Classifying data
Solving optimization problems

Answer :- Classifying data

3. What is the most common activation function used in perceptrons?

Sigmoid
ReLU
Tanh
Step

Answer :- Click Here

4. Which of the following Boolean functions cannot be implemented by a perceptron?

Answer :-

5. We are given 4 points in R2 say, x1=(0,1),x2=(−1,−1),x3=(2,3),x4=(4,−5).Labels of x1,x2,x3,x4 are given to be −1,1,−1,1 We initiate the perceptron algorithm with an initial weight w0=(0,0) on this data. What will be the value of w0 after the algorithm converges? (Take points in sequential order from x1 to x)( update happens when the value of weight changes)

(0,0)
(−2,−2)
(−2,−3)
(1,1)

Answer :-

6. We are given the following data:
a1q6
Can you classify every label correctly by training a perceptron algorithm? (assume bias to be 0 while training)

Answer :-

7. Suppose we have a boolean function that takes 5 inputs x1,x2,x3,x4,x5? We have an MP neuron with parameter θ=1. For how many inputs will this MP neuron give output y=1?

Answer :-

8. Which of the following best represents the meaning of term “Artificial Intelligence”?

The ability of a machine to perform tasks that normally require human intelligence
The ability of a machine to perform simple, repetitive tasks
The ability of a machine to follow a set of pre-defined rules
The ability of a machine to communicate with other machines