## NPTEL Reinforcement Learning Week 7 Assignment Answers 2024

1. Which of the following is the corrected n-step truncated return?

- R
_{t+n}+γ^{n}V_{t}(s_{t+n}) - R
_{t+1}+γR_{t+2}+γ^{2}(R_{t+3})+…+γ^{n−1}R_{t+n}+γ^{n}V_{t}(s_{t+n}) - γR
_{t+1}+γ^{2}R_{t+2}+γ^{3}(R_{t+3})+…+γ^{n}R_{t+n}+γ^{n+1}Vt(s_{t+n}) - None of the above.

Answer :-For Answers Click Here

2. Suppose that in a particular problem, the agent keeps going back to the same state in a loop. What is the maximum value that can be taken by the eligibility trace of such a state if we consider accumulating traces with λ=0.5 and γ=0.5?

- 0.5
- 5
- 4
- 3

Answer :-For Answers Click Here

3. Consider the TD(λ) algorithm. Which of these is true when λ=1 and γ=1?

- The method behaves like a Monte Carlo method for an undiscounted, episodic task.
- The value of all states are updated by the TD error in each episode
- Eligibility traces do not decay with time
- None of the above

Answer :-For Answers Click Here

4. In solving the control problem, suppose that the first action that is taken is not an optimal action according to the current policy at the start of an episode. Would an update be made corresponding to this action and the subsequent reward received in Watkin’s Q(λ) algorithm?

- Yes
- No

Answer :-

5.

Answer :-

6. For the above question, what is the eligibility value if replacing traces are used?

- γ
^{7}λ^{7} - γ
^{6}λ^{6} - γλ+γ
^{4}λ^{4}+γ^{6}λ^{6} - γλ

Answer :-For Answers Click Here

7. State True or False:

The idea in Sarsa(λ) is to apply the TD(λ) prediction method to just the states rather than to state-action pairs.

- True
- False

Answer :-

8. **Assertion:** Eligibility traces provide a way to implement Monte Carlo algorithm in an incremental fashion.**Reason:** The λ-return can be set to Monte Carlo return, which can be implemented with eligibility traces.

- Assertion and Reason are both true and Reason is a correct explanation of Assertion
- Assertion and Reason are both true and Reason is not a correct explanation of Assertion
- Assertion is true and Reason is false
- Both Assertion and Reason are false

Answer :-

9.

Answer :-

10. Considering episodic tasks and for λ ∈ (0, 1), is it true that the one-step return always gets assigned the maximum weight in the λ-return?

- Yes
- No

Answer :-For Answers Click Here