# NPTEL Reinforcement Learning Week 4 Assignment Answers 2024

By Sanket

## NPTEL Reinforcement Learning Week 4 Assignment Answers 2024

1. State True/False
The state transition graph for any MDP is a directed acyclic graph.

• True
• False
`Answer :- For Answers Click Hera `

2. Consider the following statements:
(i) The optimal policy of an MDP is unique.
(ii) We can determine an optimal policy for a MDP using only the optimal value function(v∗), without accessing the MDP parameters.
(iii) We can determine an optimal policy for a given MDP using only the optimal q-value function(q∗), without accessing the MDP parameters.

Which of these statements are false?

• Only (ii)
• Only (iii)
• Only (i), (ii)
• Only (i), (iii)
• Only (ii), (iii)
`Answer :- For Answers Click Hera `

3. Which of the following statements are true for a finite MDP? (Select all that apply).

• The Bellman equation of a value function of a finite MDP defines a contraction in Banach space (using the max norm).
• If 0≤γ<1, then the eigenvalues of γPπ are less than 1.
• We call a normed vector space ’complete’ if Cauchy sequences exist in that vector space.
• The sequence defined by vn=rπ+γPπvn−1 is a Cauchy sequence in Banach space (using the max norm).(Pπ is a stochastic matrix)
`Answer :- `

4. Which of the following is a benefit of using RL algorithms for solving MDPs?

• They do not require the state of the agent for solving a MDP.
• They do not require the action taken by the agent for solving a MDP.
• They do not require the state transition probability matrix for solving a MDP.
• They do not require the reward signal for solving a MDP.
`Answer :- `
• Only (i)
• Only (i), (ii)
• Only (ii), (iii)
• Only (i), (iii)
• (i), (ii), (iii)
`Answer :- `

6. What is true about the γ (discount factor) in reinforcement learning?

• Discount factor can be any real number
• The value of γ cannot affect the optimal policy
• The lower the value of gamma, the more myopic the agent gets, i.e the agent maximises rewards that it receives over a shorter horizon
`Answer :- For Answers Click Hera `

7. Consider the following statements for a finite MDP (I is an identity matrix with dimensions |S|×|S|(S
is the set of all states) and Pπ is a stochastic matrix):
(i) MDP with stochastic rewards may not have a deterministic optimal policy.
(ii) There can be multiple optimal stochastic policies.
(iii) If 0≤γ<1, then rank of the matrix I−γPπ is equal to |S|.
(iv) If 0≤γ<1, then rank of the matrix I−γPπ is less than |S|.
Which of the above statements are true?

• Only (ii), (iii)
• Only (ii), (iv)
• Only (i), (iii)
• Only (i), (ii), (iii)
`Answer :- `

8. Consider an MDP with 3 states A,B,C. At each state we can go to either of the two states. i.e if we are in state A then we can perform 2 actions, going to state B or C. The rewards for each transactions are r(A,B)=−3 (reward if we go from A to B), r(B,A)=−1, r(B,C)=8, r(C,B)=4, r(A,C)=0, r(C,A)=5, discount factor is 0.9. Find the fixed point of the value function for the policy π(A)=B (if we are in state A we choose the action to go to B)
π(B)=C,π(C)=A.vπ([ABC])=? (round to 1 decimal place)

• [20.6, 21.8, 17.6]
• [30.4, 44.2, 32.4]
• [30.4, 37.2, 32.4]
• [21.6, 21.8, 17.6]
`Answer :- `
`Answer :- `

10. For an operator L, which of the following properties must be satisfied by x for it to be a fixed point for L?(Multi-Correct)

• Lx=x
• L2x=x
• ∀λ>0Lx=λx
• None of the above
`Answer :- For Answers Click Hera `