NPTEL Reinforcement Learning Week 4 Assignment Answers 2024
1. State True/False
The state transition graph for any MDP is a directed acyclic graph.
- True
- False
Answer :- For Answers Click Hera
2. Consider the following statements:
(i) The optimal policy of an MDP is unique.
(ii) We can determine an optimal policy for a MDP using only the optimal value function(v∗), without accessing the MDP parameters.
(iii) We can determine an optimal policy for a given MDP using only the optimal q-value function(q∗), without accessing the MDP parameters.
Which of these statements are false?
- Only (ii)
- Only (iii)
- Only (i), (ii)
- Only (i), (iii)
- Only (ii), (iii)
Answer :- For Answers Click Hera
3. Which of the following statements are true for a finite MDP? (Select all that apply).
- The Bellman equation of a value function of a finite MDP defines a contraction in Banach space (using the max norm).
- If 0≤γ<1, then the eigenvalues of γPπ are less than 1.
- We call a normed vector space ’complete’ if Cauchy sequences exist in that vector space.
- The sequence defined by vn=rπ+γPπvn−1 is a Cauchy sequence in Banach space (using the max norm).(Pπ is a stochastic matrix)
Answer :-
4. Which of the following is a benefit of using RL algorithms for solving MDPs?
- They do not require the state of the agent for solving a MDP.
- They do not require the action taken by the agent for solving a MDP.
- They do not require the state transition probability matrix for solving a MDP.
- They do not require the reward signal for solving a MDP.
Answer :-
- Only (i)
- Only (i), (ii)
- Only (ii), (iii)
- Only (i), (iii)
- (i), (ii), (iii)
Answer :-
6. What is true about the γ (discount factor) in reinforcement learning?
- Discount factor can be any real number
- The value of γ cannot affect the optimal policy
- The lower the value of gamma, the more myopic the agent gets, i.e the agent maximises rewards that it receives over a shorter horizon
Answer :- For Answers Click Hera
7. Consider the following statements for a finite MDP (I is an identity matrix with dimensions |S|×|S|(S
is the set of all states) and Pπ is a stochastic matrix):
(i) MDP with stochastic rewards may not have a deterministic optimal policy.
(ii) There can be multiple optimal stochastic policies.
(iii) If 0≤γ<1, then rank of the matrix I−γPπ is equal to |S|.
(iv) If 0≤γ<1, then rank of the matrix I−γPπ is less than |S|.
Which of the above statements are true?
- Only (ii), (iii)
- Only (ii), (iv)
- Only (i), (iii)
- Only (i), (ii), (iii)
Answer :-
8. Consider an MDP with 3 states A,B,C. At each state we can go to either of the two states. i.e if we are in state A then we can perform 2 actions, going to state B or C. The rewards for each transactions are r(A,B)=−3 (reward if we go from A to B), r(B,A)=−1, r(B,C)=8, r(C,B)=4, r(A,C)=0, r(C,A)=5, discount factor is 0.9. Find the fixed point of the value function for the policy π(A)=B (if we are in state A we choose the action to go to B)
π(B)=C,π(C)=A.vπ([ABC])=? (round to 1 decimal place)
- [20.6, 21.8, 17.6]
- [30.4, 44.2, 32.4]
- [30.4, 37.2, 32.4]
- [21.6, 21.8, 17.6]
Answer :-
Answer :-
10. For an operator L, which of the following properties must be satisfied by x for it to be a fixed point for L?(Multi-Correct)
- Lx=x
- L2x=x
- ∀λ>0Lx=λx
- None of the above
Answer :- For Answers Click Hera