NPTEL Reinforcement Learning Week 12 Assignment Answers 2024

Sanket
By Sanket

NPTEL Reinforcement Learning Week 12 Assignment Answers 2024

1. Consider an environment in which an agent is randomly dropped into either state s1 or s2 with equal probability. The agent can only view obstacles present immediately to the North, South, East or West. However the observation made in each direction by the agent may be wrong with a probability of 0.1. If in state s1 obstacles are present to the North and South, and in s1 obstacles are present to the East and West, what is the probability of the agent being in state s1 if the observation made is that there are obstacles present to the North and West.

  • 81/82
  • 41/82
  • 73/82
  • None of the above.
Answer :- For Answers Click Here 

2. In the same environment as Question 1, suppose state s1 has obstacles present only to the North and South, and s2 has obstacles present only to the East and West. What is the probability of the agent being in state s1 if the observation made is that there are obstacles present only to the North, East and West.

  • 81/82
  • 41/82
  • 73/82
  • None of the above.
Answer :- 

3. Assertion: One of the reasons that history-based methods are not feasible is because of the significant increase in the state space size when the trajectory is long.
Reason: The number of states increases polynomially w.r.t. trajectory length.

  • Both Assertion and Reason are true, and Reason is correct explanation for Assertion.
  • Both Assertion and Reason are true, but Reason is not correct explanation for assertion.
  • Assertion is true, Reason is false
  • Both Assertion and Reason are false
Answer :- For Answers Click Here 

4. In the case of POMDPs, which of the following is a good estimate of the return of a trajectory of a policy π, given the current belief state and the solution to the underlying MDP (value function for all states)?

  • Average of all Vπ(s) where b(s)>0.
  • Weighted average of all Vπ(s) where b(s) are the weights
  • Average of all Vπ(s) where b(s)≥α, where α is a small positive value
  • None of the above
Answer :- 

5. Consider the grid-world shown below:

Screenshot%202024 10 07%20185909

Walls and obstacles are colored gray. The agent is equipped with a sensor that can detect the presence of walls or obstacles immediately to its North, South, East or West.
Which of the following are true if we represent states by their sensor observations?

  • The grid-world is a 1st-order Markov system.
  • The grid-world is a 2nd-order Markov system.
  • The grid-world is a 3rd-order Markov system.
  • The grid-world is a 4th-order Markov system.
Answer :- For Answers Click Here 
Share This Article
Leave a comment