## NPTEL Reinforcement Learning Week 4 Assignment Answers 2024

1. State True/False

The state transition graph for any MDP is a directed acyclic graph.

- True
- False

Answer :-For Answers Click Hera

2. Consider the following statements:

(i) The optimal policy of an MDP is unique.

(ii) We can determine an optimal policy for a MDP using only the optimal value function(v∗), without accessing the MDP parameters.

(iii) We can determine an optimal policy for a given MDP using only the optimal q-value function(q∗), without accessing the MDP parameters.

Which of these statements are false?

- Only (ii)
- Only (iii)
- Only (i), (ii)
- Only (i), (iii)
- Only (ii), (iii)

Answer :-For Answers Click Hera

3. Which of the following statements are true for a finite MDP? (Select all that apply).

- The Bellman equation of a value function of a finite MDP defines a contraction in Banach space (using the max norm).
- If 0≤γ<1, then the eigenvalues of γP
_{π}are less than 1. - We call a normed vector space ’complete’ if Cauchy sequences exist in that vector space.
- The sequence defined by v
_{n}=r_{π}+γP_{π}v_{n}−1 is a Cauchy sequence in Banach space (using the max norm).(P_{π}is a stochastic matrix)

Answer :-

4. Which of the following is a benefit of using RL algorithms for solving MDPs?

- They do not require the state of the agent for solving a MDP.
- They do not require the action taken by the agent for solving a MDP.
- They do not require the state transition probability matrix for solving a MDP.
- They do not require the reward signal for solving a MDP.

Answer :-

- Only (i)
- Only (i), (ii)
- Only (ii), (iii)
- Only (i), (iii)
- (i), (ii), (iii)

Answer :-

6. What is true about the γ (discount factor) in reinforcement learning?

- Discount factor can be any real number
- The value of γ cannot affect the optimal policy
- The lower the value of gamma, the more myopic the agent gets, i.e the agent maximises rewards that it receives over a shorter horizon

Answer :-For Answers Click Hera

7. Consider the following statements for a finite MDP (I is an identity matrix with dimensions |S|×|S|(S

is the set of all states) and P_{π} is a stochastic matrix):

(i) MDP with stochastic rewards may not have a deterministic optimal policy.

(ii) There can be multiple optimal stochastic policies.

(iii) If 0≤γ<1, then rank of the matrix I−γP_{π} is equal to |S|.

(iv) If 0≤γ<1, then rank of the matrix I−γP_{π} is less than |S|.

Which of the above statements are true?

- Only (ii), (iii)
- Only (ii), (iv)
- Only (i), (iii)
- Only (i), (ii), (iii)

Answer :-

8. Consider an MDP with 3 states A,B,C. At each state we can go to either of the two states. i.e if we are in state A then we can perform 2 actions, going to state B or C. The rewards for each transactions are r(A,B)=−3 (reward if we go from A to B), r(B,A)=−1, r(B,C)=8, r(C,B)=4, r(A,C)=0, r(C,A)=5, discount factor is 0.9. Find the fixed point of the value function for the policy π(A)=B (if we are in state A we choose the action to go to B)

π(B)=C,π(C)=A.v^{π}([ABC])=? (round to 1 decimal place)

- [20.6, 21.8, 17.6]
- [30.4, 44.2, 32.4]
- [30.4, 37.2, 32.4]
- [21.6, 21.8, 17.6]

Answer :-

Answer :-

10. For an operator L, which of the following properties must be satisfied by x for it to be a fixed point for L?(Multi-Correct)

- Lx=x
- L2x=x
- ∀λ>0Lx=λx
- None of the above

Answer :-For Answers Click Hera