NPTEL Deep Learning for Computer Vision Week 10 Assignment Answers 2024
1. Why might Segment Anything (SAM) be particularly useful in data annotation tasks compared to traditional segmentation models?
- It produces perfect segmentation masks.
- It can adapt to segment any object, even those not seen during training
- It requires less computational resources
- It automatically labels all objects in an image without user input
Answer :- For Answers Click Here
2. What is the primary advantage of using DETR (Detection Transformer) over traditional object detection methods?
- DETR uses an anchor-based approach, which simplifies the object localization process.
- DETR eliminates the need for region proposals and anchor boxes, simplifying the object detection pipeline.
- DETR requires significantly fewer training epochs compared to traditional methods.
- DETR can only detect objects in high-resolution images due to its reliance on self-attention mechanisms.
Answer :- For Answers Click Here
3. How does the patch size in a Vision Transformer impact performance?
- Smaller patch sizes lead to better local feature extraction but increase computational cost.
- Larger patch sizes always improve model performance.
- Smaller patch sizes are computationally cheaper but may miss global context.
- Patch size has no significant impact on model performance.
Answer :- For Answers Click Here
4. What is a key characteristic of the Swin Transformer that differentiates it from the standard Vision Transformer (ViT)?
- Swin Transformer uses global attention throughout the entire image for every layer.
- Swin Transformer employs a hierarchical structure with shifted windows for local attention, allowing it to scale to larger images.
- Swin Transformer is designed exclusively for small image resolutions.
- Swin Transformer eliminates the use of multi-head self-attention in favor of convolutional operations
Answer :-
5. What is the purpose of the class token in a Vision Transformer?
- It encodes the position of each image patch.
- It serves as the representation of the entire image, which is used for classification.
- It performs the same function as a softmax layer in traditional neural networks.
- It stores the output of each transformer layer.
Answer :-
6. Why do Vision Transformers often require large datasets for effective training?
- They are inherently more data-efficient than CNNs.
- They lack the inductive biases of convolutions, making them more reliant on data to learn structure.
- Their self-attention mechanism directly reduces the need for large datasets.
- They can overfit more easily without large datasets.
Answer :- For Answers Click Here
7. What is the primary challenge when training GANs?
- Maximizing the discriminator loss.
- Ensuring the generator and discriminator learn in balance.
- Training the generator faster than the discriminator.
- Reducing the number of parameters in the generator.
Answer :-
8. Which of the following best describes “mode collapse” in GANs?
- The discriminator becoming too powerful.
- The generator producing a limited variety of outputs.
- The loss function of the discriminator diverging.
- The generator generating random noise instead of real-like data.
Answer :-
9. What is the role of the latent space in a VAE?
- It stores the compressed data.
- t stores real-valued outputs of the decoder.
- It represents the error between the input and output.
- It captures a distribution of latent variables for data generation.
Answer :-
10. Which of the following statements are false? (Select all that apply)
- Generative adversarial networks (GANs) generate sharper images compared to Variational AutoEncoders (VAE)
- GAN is an example of an implicit density estimation model
- Fully connected layers in mapping network of Style-GAN do not change the dimension of its input
- The generator and discriminator are always trained together in a GAN
Answer :- For Answers Click Here
11. What are the capabilities of these models?
1) Discriminative model i) Assigns labels to data;
Performs supervised feature learning
2) Generative model ii) Assigns labels while rejecting outliers;
Generates new data conditioned on input labels
3) Conditional generative model iii) Detects outliers;
Performs unsupervised feature learning;
Samples to generate new data
- 1→iii, 2→i, 3→ii
- 1→i, 2→iii, 3→ii
- 1→ii, 2→iii, 3→i
- 1→i, 2→ii, 3→iii
Answer :-
In a VAE, if the encoder outputs μ=[0.3,0.1,0.2,0.4] and σ=[0.1,0.4,0.2,0.3], and ϵ sampled from N(0,I) is [0.6,0.2,0.4,0.1], then the latent value z given to the decoder is?
12. Element 1:______________
Answer :- For Answers Click Here
13. Element 2:______________
Answer :-
14. Element 3:_______________
Answer :-
15. Element 4:______________
Answer :- For Answers Click Here