Another reason to include $\gamma$ :

It helps us to avoid a common error when designing MDPs — when all the policies will give the system the infinity rewards.

In 687 grid world, $\gamma$ is 0.9

Property: If $|\mathcal{S}|<\infty, |\mathcal{A}|<\infty, \gamma <1$ and $R_t$ is bounded, then an optimal (deterministic/ means not stochastic) policy exists.

Can you have an optimal policy that only achieves the final goal state. There’s no such thing as a final goal state in the MVP specification. We can have MDP with no notion of goal in which never terminate agent moves around forever.

Terminal State

A terminal state is a state that always transitions to a special state, $S_\infty \in \mathcal{S}$, called the " terminal absorbing state"

  • Once in $S_\infty$, the agent can never leave
  • If we are in $S_\infty$, then $R_t=$0 always.

We can put $S_\infty$ in the states set or not, or just put it in but sometimes transition it sometimes not.

Episode

When the agent reaches $S_\infty$, the current trial, called an "episode" ends, and we start a new one.

Planning

p and R are known. RL: at least P is unknown. Most methods will not estimate p (transition function).