NPTEL Reinforcement Learning Week 7 Assignment Answers 2024

1. Which of the following is the corrected n-step truncated return?

R_t+n+γⁿV_t(s_t+n)
R_t+1+γR_t+2+γ²(R_t+3)+…+γⁿ⁻¹R_t+n+γⁿV_t(s_t+n)
γR_t+1+γ²R_t+2+γ³(R_t+3)+…+γⁿR_t+n+γⁿ⁺¹Vt(s_t+n)
None of the above.

Answer :- For Answers Click Here

2. Suppose that in a particular problem, the agent keeps going back to the same state in a loop. What is the maximum value that can be taken by the eligibility trace of such a state if we consider accumulating traces with λ=0.5 and γ=0.5?

Answer :- For Answers Click Here

3. Consider the TD(λ) algorithm. Which of these is true when λ=1 and γ=1?

The method behaves like a Monte Carlo method for an undiscounted, episodic task.
The value of all states are updated by the TD error in each episode
Eligibility traces do not decay with time
None of the above

Answer :- For Answers Click Here

4. In solving the control problem, suppose that the first action that is taken is not an optimal action according to the current policy at the start of an episode. Would an update be made corresponding to this action and the subsequent reward received in Watkin’s Q(λ) algorithm?

Answer :-

Answer :-

6. For the above question, what is the eligibility value if replacing traces are used?

γ⁷λ⁷
γ⁶λ⁶
γλ+γ⁴λ⁴+γ⁶λ⁶
γλ

Answer :- For Answers Click Here

7. State True or False:
The idea in Sarsa(λ) is to apply the TD(λ) prediction method to just the states rather than to state-action pairs.

True
False

Answer :-

8. Assertion: Eligibility traces provide a way to implement Monte Carlo algorithm in an incremental fashion.
Reason: The λ-return can be set to Monte Carlo return, which can be implemented with eligibility traces.

Assertion and Reason are both true and Reason is a correct explanation of Assertion
Assertion and Reason are both true and Reason is not a correct explanation of Assertion
Assertion is true and Reason is false
Both Assertion and Reason are false