NPTEL Reinforcement Learning Week 11 Assignment Answers 2024

1. Which of the following option is correct for the sub-task terminations in the MAXQ Framework?

The termination is stochastic
The termination is deterministic

Answer :- For Answers Click Here

2. In MAXQ learning, we have a collection of SMDPs. In conventional value function, the only argument was state. In MAXQ value function decomposition, we have value function of the form Vπ(i,s), where π
is the policy, s is the current state. What is ′i′ supposed to be in the above notation?

The number of times we have visited state s
It means it is i^th iteration of updates
i is the identity of the sub-task/SMDP.
None of the above.

Answer :- For Answers Click Here

Comprehensive model for question 3 to question 6
Consider the following taxi-world problem. The grey colored cell are inaccessible cells or can be thought of obstacles. The corner cells marked as R, G, B, Y are allowed pickup-drop points for passengers.

Say following is the Call-Graph for the above Taxi-World problem.

3. From the below list of actions:
i Left
ii Drop off
iii Navigate
iv put-down
Which among them are the primitive actions?

i, ii, iii, iv
ii, iii
i, iv
None of the above

Answer :-

4. From the discussion in the class, it is said that Navigate is not a single sub-task. What is the parameter ′t′in ′Navigate(t)′ from the class discussions?

the number of times ’Pick up’ or ’Drop off’ have called sub-task Navigate
the maximum number of primitive actions permitted to finish sub-task
the destination (in this case, one of R, G, B, Y)
None of the above

Answer :-

5. State True/False. The ordering of the above call-graph is important and sub-tasks should be performed via these orderings.

True
False

Answer :-

6. Suppose the passenger is always either inside the taxi or at one of the four pickup/dropoff locations. That means there are 5 states for passenger’s location. Then for the given taxiworld, what is the number of states that suffices to define all information?

18
18*5
18*5*4
None of the above

Answer :- For Answers Click Here

7. State True/False. Bottlenecks are useful surrogative measures for option discovery.

True
False

Answer :-

8. Which of the following can be considered as a good option in Hierarchical RL?

An option that can be reused often
An option that can cut down exploration
An option that helps in transfer learning
None of the above

Answer :-

9. We define the action value for MAXQ as q^π(i,s,a)=v^π(a,s)+C^π(i,s,a) where q^π(i,s,a) can be interpreted as expected return when you are in sub-task i, and state s, and you decide to perform sub-task a. Assume that in taking a, you get reward r₁, and after completion of a, you get reward r₂ in completing sub-task i . Choose the correct value of Cπ(i,s,a) from following.

C^π(i,s,a)=r₂
C^π(i,s,a)=r1+r₂
C^π(i,s,a)=r1
None of the above

Answer :-

10. In the MAXQ approach to solving a problem, suppose that sub-task M_i invokes sub-task M_j .Do the pseudo rewards of M_j have any effect on sub-task M_i ?