NPTEL Reinforcement Learning Week 11 Assignment Answers 2024

Sanket
By Sanket

NPTEL Reinforcement Learning Week 11 Assignment Answers 2024

1. Which of the following option is correct for the sub-task terminations in the MAXQ Framework?

  • The termination is stochastic
  • The termination is deterministic
Answer :- For Answers Click Here 

2. In MAXQ learning, we have a collection of SMDPs. In conventional value function, the only argument was state. In MAXQ value function decomposition, we have value function of the form Vπ(i,s), where π
is the policy, s is the current state. What is ′i′ supposed to be in the above notation?

  • The number of times we have visited state s
  • It means it is ith iteration of updates
  • i is the identity of the sub-task/SMDP.
  • None of the above.
Answer :- For Answers Click Here 

Comprehensive model for question 3 to question 6
Consider the following taxi-world problem. The grey colored cell are inaccessible cells or can be thought of obstacles. The corner cells marked as R, G, B, Y are allowed pickup-drop points for passengers.

Screenshot%202024 09 29%20154644

Say following is the Call-Graph for the above Taxi-World problem.

Screenshot%202024 09 29%20154657

3. From the below list of actions:
i Left
ii Drop off
iii Navigate
iv put-down
Which among them are the primitive actions?

  • i, ii, iii, iv
  • ii, iii
  • i, iv
  • None of the above
Answer :- 

4. From the discussion in the class, it is said that Navigate is not a single sub-task. What is the parameter ′t′in ′Navigate(t)′ from the class discussions?

  • the number of times ’Pick up’ or ’Drop off’ have called sub-task Navigate
  • the maximum number of primitive actions permitted to finish sub-task
  • the destination (in this case, one of R, G, B, Y)
  • None of the above
Answer :- 

5. State True/False. The ordering of the above call-graph is important and sub-tasks should be performed via these orderings.

  • True
  • False
Answer :- 

6. Suppose the passenger is always either inside the taxi or at one of the four pickup/dropoff locations. That means there are 5 states for passenger’s location. Then for the given taxiworld, what is the number of states that suffices to define all information?

  • 18
  • 18*5
  • 18*5*4
  • None of the above
Answer :- For Answers Click Here 

7. State True/False. Bottlenecks are useful surrogative measures for option discovery.

  • True
  • False
Answer :- 

8. Which of the following can be considered as a good option in Hierarchical RL?

  • An option that can be reused often
  • An option that can cut down exploration
  • An option that helps in transfer learning
  • None of the above
Answer :- 

9. We define the action value for MAXQ as qπ(i,s,a)=vπ(a,s)+Cπ(i,s,a) where qπ(i,s,a) can be interpreted as expected return when you are in sub-task i, and state s, and you decide to perform sub-task a. Assume that in taking a, you get reward r1, and after completion of a, you get reward r2 in completing sub-task i . Choose the correct value of Cπ(i,s,a) from following.

  • Cπ(i,s,a)=r2
  • Cπ(i,s,a)=r1+r2
  • Cπ(i,s,a)=r1
  • None of the above
Answer :- 

10. In the MAXQ approach to solving a problem, suppose that sub-task Mi invokes sub-task Mj .Do the pseudo rewards of Mj have any effect on sub-task Mi ?

  • Yes
  • No
Answer :- For Answers Click Here 
Share This Article
Leave a comment