NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answers 2024
1. What is the primary benefit of using Adagrad compared to other optimization algorithms?
- It converges faster than other optimization algorithms.
- It is more memory-efficient than other optimization algorithms.
- It is less sensitive to the choice of hyperparameters(learning rate).
- It is less likely to get stuck in local optima than other optimization algorithms.
Answer :- For Answer Click Here
2. A team has a data set that contains 100 samples for training a feed-forward neural network. Suppose they decided to use the gradient descent algorithm to update the weights. Suppose further that they use line search algorithm for the learning rate as follows, η=[0.01,0.1,1,2,10]. How many times do the weights get updated after training the network for 10 epochs? (Note, for each weight update the loss has to decrease)
- 100
- 5
- 500
- 10
- 50
Answer :- For Answer Click Here
3. The figure below shows the change in loss value over iterations
The oscillation in the loss value might be due to
- Mini-batch gradient descent algorithm used for parameter updates
- Batch gradient descent with constant learning rate algorithm used for parameter updates
- Stochastic gradient descent algorithm used for parameter updates
- Batch gradient descent with line search algorithm used for parameter updates
Answer :-
4. Given data where one column predominantly contains zero values, which algorithm should be used to achieve faster convergence and optimize the loss function?
- Adam
- NAG
- Momentum-based gradient descent
- Stochastic gradient descent
Answer :- For Answer Click Here
5. In Nesterov accelerated gradient descent, what step is performed before determining the update size?
- Increase the momentum
- Adjust the learning rate
- Decrease the step size
- Estimate the next position of the parameters
Answer :-
6. We have following functions x3,ln(x),ex,x and 4. Which of the following functions has the steepest slope at x=1?
x3
ln(x)
ex
4
Answer :-
7. Which of the following represents the contour plot of the function f(x,y) = x2−y2 ?
Answer :- For Answer Click Here
8. Which parameter in vanilla gradient descent determines the step size taken in the direction of the gradient?
- Learning rate
- Momentum
- Gamma
- None of the above
Answer :-
9. Which of the following are among the disadvantages of Adagrad?
- It doesn’t work well for the Sparse matrix.
- It usually goes past the minima.
- It gets stuck before reaching the minima.
- Weight updates are very small at the initial stages of the algorithm.
Answer :-
10. Which of the following can help avoid getting stuck in a poor local minimum while training a deep neural network?
- Using a smaller learning rate.
- Using a smaller batch size.
- Using a shallow neural network instead.
- None of the above.
Answer :- For Answer Click Here
NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answer 2023
1. Which step does Nesterov accelerated gradient descent perform before finding the update size?
- Increase the momentum
- Estimate the next position of the parameters
- Adjust the learning rate
- Decrease the step size
Answer :- For Answer Click Here
2. Select the parameter of vanilla gradient descent controls the step size in the direction of the gradient.
- Learning rate
- Momentum
- Gamma
- None of the above
Answer :-
3. What does the distance between two contour lines on a contour map represent?
- The change in the output of function
- The direction of the function
- The rate of change of the function
- None of the above
Answer :- For Answer Click Here
4. Which of the following represents the contour plot of the function f(x,y) = x2−y?
Answer :-
5. What is the main advantage of using Adagrad over other optimization algorithms?
- It converges faster than other optimization algorithms.
- It is less sensitive to the choice of hyperparameters (learning rate).
- It is more memory-efficient than other optimization algorithms.
- It is less likely to get stuck in local optima than other optimization algorithms.
Answer :-
6. We are training a neural network using the vanilla gradient descent algorithm. We observe that the change in weights is small in successive iterations. What are the possible causes for the following phenomenon?
- η is large
- ∇w is small
- ∇w is large
- η is small
Answer :-
7. You are given labeled data which we call X where rows are data points and columns feature. One column has most of its values as 0. What algorithm should we use here for faster convergence and achieve the optimal value of the loss function?
- NAG
- Adam
- Stochastic gradient descent
- Momentum-based gradient descent
Answer :-
8. What is the update rule for the ADAM optimizer?
- wt=wt−1−lr∗(mt/(vt−−√+ϵ))
- wt=wt−1−lr∗m
- wt=wt−1−lr∗(mt/(vt+ϵ))
- wt=wt−1−lr∗(vt/(mt+ϵ))
Answer :-
9. What is the advantage of using mini-batch gradient descent over batch gradient descent?
- Mini-batch gradient descent is more computationally efficient than batch gradient descent.
- Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch gradient descent.
- Mini batch gradient descent gives us a better solution.
- Mini-batch gradient descent can converge faster than batch gradient descent.
Answer :-
10. Which of the following is a variant of gradient descent that uses an estimate of the next gradient to update the current position of the parameters?
- Momentum optimization
- Stochastic gradient descent
- Nesterov accelerated gradient descent
- Adagrad
Answer :- For Answer Click Here
NPTEL Deep Learning – IIT Ropar Week 3 Assignment Answer 2023
1. Which of the following statements about backpropagation is true?
- It is used to optimize the weights in a neural network.
- It is used to compute the output of a neural network.
- It is used to initialize the weights in a neural network.
- It is used to regularize the weights in a neural network.
Answer:- a
2. Let y be the true class label and p be the predicted probability of the true class label in a binary classification problem. Which of the following is the correct formula for binary cross entropy?
Answer:- a
3. Let yi�� be the true class label of the i�-th instance and pi�� be the predicted probability of the true class label in a multi-class classification problem. Write down the formula for multi-class cross entropy loss.
Answer:- For Answer Click Here
4. Can cross-entropy loss be negative between two probability distributions?
- Yes
- No
Answer:-
5. Let p� and q� be two probability distributions. Under what conditions will the cross entropy between p� and q� be minimized?
- p�=q�
- All the values in p� are lower than corresponding values in q�
- All the values in p� are lower than corresponding values in q�
- p� = 0 [0 is a vector]
Answer:-
6. Which of the following is false about cross-entropy loss between two probability distributions?
It is always in range (0,1)
It can be negative.
It is always positive.
It can be 1.
Answer:-
7. The probability of all the events x1,x2,x2….xn
in a system is equal(n>1
). What can you say about the entropy H(X)
of that system?(base of log is 2)
- H(X)≤1
- H(X)=1
- H(X)≥1
- We can’t say anything conclusive with the provided information.
Answer:-
8. Suppose we have a problem where data x
and label y
are related by y=x4+1
. Which of the following is not a good choice for the activation function in the hidden layer if the activation function at the output layer is linear?
- Linear
- Relu
- Sigmoid
- Tan−1(x)
Answer:-
9. We are given that the probability of Event A happening is 0.95 and the probability of Event B happening is 0.05. Which of the following statements is True?
- Event A has a high information content
- Event B has a low information content
- Event A has a low information content
- Event B has a high information content
Answer:-
10. Which of the following activation functions can only give positive outputs greater than 0?
- Sigmoid
- ReLU
- Tanh
- Linear
Answer:- For Answer Click Here
NPTEL Deep Learning – IIT Ropar Week 2 Assignment Answer 2023
1. What is the range of the sigmoid function σ(x)=1/1+e−x?
- (−1,1)
- (0,1)
- −∞,∞)
- (0,∞)
Answer :-
2. What happens to the output of the sigmoid function as |x| very small?
- The output approaches 0.5
- The output approaches 1.
- The output oscillates between 0 and 1.
- The output becomes undefined.
Answer :-
3. Which of the following theorem states that a neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function?
- Bayes’ theorem
- Central limit theorem
- Fourier’s theorem
- Universal approximation theorem
Answer :-
4. We have a function that we want to approximate using 150 rectangles (towers). How many neurons are required to construct the required network?
- 301
- 451
- 150
- 500
Answer :-
5. A neural network has two hidden layers with 5 neurons in each layer, and an output layer with 3 neurons, and an input layer with 2 neurons. How many weights are there in total? (Dont assume any bias terms in the network)
Answer :-
6. What is the derivative of the ReLU activation function with respect to its input at 0?
- 0
- 1
- −1
- Not differentiable
Answer :-
7. Consider a function f(x)=x3−3x2+2. What is the updated value of xafter 3rd iteration of the gradient descent update, if the learning rate is 0.10.1 and the initial value of x is 4?
Answer :-
8. Which of the following statements is true about the representation power of a multilayer network of sigmoid neurons?
- A multilayer network of sigmoid neurons can represent any Boolean function.
- A multilayer network of sigmoid neurons can represent any continuous function.
- A multilayer network of sigmoid neurons can represent any function.
- A multilayer network of sigmoid neurons can represent any linear function.
Answer :-
9. How many boolean functions can be designed for 3 inputs?
- 65,536
- 82
- 56
- 64
Answer :-
10. How many neurons do you need in the hidden layer of a perceptron to learn any boolean function with 6 inputs? (Only one hidden layer is allowed)
- 16
- 64
- 16
- 32
Answer :- Click Here
NPTEL Deep Learning – IIT Ropar Week 1 Assignment Answer 2023
1. The table below shows the temperature and humidity data for two cities. Is the data linearly separable?
- Yes
- No
- Cannot be determined from the given information
Answer :- Yes
2. What is the perceptron algorithm used for?
- Clustering data points
- Finding the shortest path in a graph
- Classifying data
- Solving optimization problems
Answer :- Classifying data
3. What is the most common activation function used in perceptrons?
- Sigmoid
- ReLU
- Tanh
- Step
Answer :- Click Here
4. Which of the following Boolean functions cannot be implemented by a perceptron?
- AND
- OR
- XOR
- NOT
Answer :-
5. We are given 4 points in R2 say, x1=(0,1),x2=(−1,−1),x3=(2,3),x4=(4,−5).Labels of x1,x2,x3,x4 are given to be −1,1,−1,1 We initiate the perceptron algorithm with an initial weight w0=(0,0) on this data. What will be the value of w0 after the algorithm converges? (Take points in sequential order from x1 to x)( update happens when the value of weight changes)
- (0,0)
- (−2,−2)
- (−2,−3)
- (1,1)
Answer :-
6. We are given the following data:
Can you classify every label correctly by training a perceptron algorithm? (assume bias to be 0 while training)
- Yes
- No
Answer :-
7. Suppose we have a boolean function that takes 5 inputs x1,x2,x3,x4,x5? We have an MP neuron with parameter θ=1. For how many inputs will this MP neuron give output y=1?
- 21
- 31
- 30
- 32
Answer :-
8. Which of the following best represents the meaning of term “Artificial Intelligence”?
- The ability of a machine to perform tasks that normally require human intelligence
- The ability of a machine to perform simple, repetitive tasks
- The ability of a machine to follow a set of pre-defined rules
- The ability of a machine to communicate with other machines
Answer :-
9. Which of the following statements is true about error surfaces in deep learning?
- They are always convex functions.
- They can have multiple local minima.
- They are never continuous.
- They are always linear functions.
Answer :-
10. What is the output of the following MP neuron for the AND Boolean function?
y={1,0,if x1+x2+x3≥1 0, therwise
- y=1 for (x1,x2,x3)=(0,1,1)
- y=0 for (x1,x2,x3)=(0,0,1)
- y=1 for (x1,x2,x3)=(1,1,1)
- y=0 for (x1,x2,x3)=(1,0,0)
Answer :- Click Here