Another reason to include $\gamma$ :
It helps us to avoid a common error when designing MDPs — when all the policies will give the system the infinity rewards.
In 687 grid world, $\gamma$ is 0.9
Property: If $|\mathcal{S}|<\infty, |\mathcal{A}|<\infty, \gamma <1$ and $R_t$ is bounded, then an optimal (deterministic/ means not stochastic) policy exists.
Can you have an optimal policy that only achieves the final goal state. There’s no such thing as a final goal state in the MVP specification. We can have MDP with no notion of goal in which never terminate agent moves around forever.
Terminal State
A terminal state is a state that always transitions to a special state, $S_\infty \in \mathcal{S}$, called the " terminal absorbing state"
- Once in $S_\infty$, the agent can never leave
- If we are in $S_\infty$, then $R_t=$0 always.
We can put $S_\infty$ in the states set or not, or just put it in but sometimes transition it sometimes not.
Episode
When the agent reaches $S_\infty$, the current trial, called an "episode" ends, and we start a new one.
Planning
p and R are known. RL: at least P is unknown. Most methods will not estimate p (transition function).