NPTEL Reinforcement Learning Week 12 Assignment Answers 2024

1. Consider an environment in which an agent is randomly dropped into either state s₁ or s₂ with equal probability. The agent can only view obstacles present immediately to the North, South, East or West. However the observation made in each direction by the agent may be wrong with a probability of 0.1. If in state s₁ obstacles are present to the North and South, and in s₁ obstacles are present to the East and West, what is the probability of the agent being in state s₁ if the observation made is that there are obstacles present to the North and West.

81/82
41/82
73/82
None of the above.

Answer :- For Answers Click Here

2. In the same environment as Question 1, suppose state s₁ has obstacles present only to the North and South, and s₂ has obstacles present only to the East and West. What is the probability of the agent being in state s₁ if the observation made is that there are obstacles present only to the North, East and West.

81/82
41/82
73/82
None of the above.

Answer :-

3. Assertion: One of the reasons that history-based methods are not feasible is because of the significant increase in the state space size when the trajectory is long.
Reason: The number of states increases polynomially w.r.t. trajectory length.

Both Assertion and Reason are true, and Reason is correct explanation for Assertion.
Both Assertion and Reason are true, but Reason is not correct explanation for assertion.
Assertion is true, Reason is false
Both Assertion and Reason are false

Answer :- For Answers Click Here

4. In the case of POMDPs, which of the following is a good estimate of the return of a trajectory of a policy π, given the current belief state and the solution to the underlying MDP (value function for all states)?

Average of all V^π(s) where b(s)>0.
Weighted average of all V^π(s) where b(s) are the weights
Average of all V^π(s) where b(s)≥α, where α is a small positive value
None of the above

Answer :-

5. Consider the grid-world shown below:

Walls and obstacles are colored gray. The agent is equipped with a sensor that can detect the presence of walls or obstacles immediately to its North, South, East or West.
Which of the following are true if we represent states by their sensor observations?