NPTEL Reinforcement Learning Week 11 Assignment Answers 2024
1. Which of the following option is correct for the sub-task terminations in the MAXQ Framework?
- The termination is stochastic
- The termination is deterministic
Answer :- For Answers Click Here
2. In MAXQ learning, we have a collection of SMDPs. In conventional value function, the only argument was state. In MAXQ value function decomposition, we have value function of the form Vπ(i,s), where π
is the policy, s is the current state. What is ′i′ supposed to be in the above notation?
- The number of times we have visited state s
- It means it is ith iteration of updates
- i is the identity of the sub-task/SMDP.
- None of the above.
Answer :- For Answers Click Here
Comprehensive model for question 3 to question 6
Consider the following taxi-world problem. The grey colored cell are inaccessible cells or can be thought of obstacles. The corner cells marked as R, G, B, Y are allowed pickup-drop points for passengers.
Say following is the Call-Graph for the above Taxi-World problem.
3. From the below list of actions:
i Left
ii Drop off
iii Navigate
iv put-down
Which among them are the primitive actions?
- i, ii, iii, iv
- ii, iii
- i, iv
- None of the above
Answer :-
4. From the discussion in the class, it is said that Navigate is not a single sub-task. What is the parameter ′t′in ′Navigate(t)′ from the class discussions?
- the number of times ’Pick up’ or ’Drop off’ have called sub-task Navigate
- the maximum number of primitive actions permitted to finish sub-task
- the destination (in this case, one of R, G, B, Y)
- None of the above
Answer :-
5. State True/False. The ordering of the above call-graph is important and sub-tasks should be performed via these orderings.
- True
- False
Answer :-
6. Suppose the passenger is always either inside the taxi or at one of the four pickup/dropoff locations. That means there are 5 states for passenger’s location. Then for the given taxiworld, what is the number of states that suffices to define all information?
- 18
- 18*5
- 18*5*4
- None of the above
Answer :- For Answers Click Here
7. State True/False. Bottlenecks are useful surrogative measures for option discovery.
- True
- False
Answer :-
8. Which of the following can be considered as a good option in Hierarchical RL?
- An option that can be reused often
- An option that can cut down exploration
- An option that helps in transfer learning
- None of the above
Answer :-
9. We define the action value for MAXQ as qπ(i,s,a)=vπ(a,s)+Cπ(i,s,a) where qπ(i,s,a) can be interpreted as expected return when you are in sub-task i, and state s, and you decide to perform sub-task a. Assume that in taking a, you get reward r1, and after completion of a, you get reward r2 in completing sub-task i . Choose the correct value of Cπ(i,s,a) from following.
- Cπ(i,s,a)=r2
- Cπ(i,s,a)=r1+r2
- Cπ(i,s,a)=r1
- None of the above
Answer :-
10. In the MAXQ approach to solving a problem, suppose that sub-task Mi invokes sub-task Mj .Do the pseudo rewards of Mj have any effect on sub-task Mi ?
- Yes
- No
Answer :- For Answers Click Here