Two most-related fields: (1) Operating research; (2) Control (classical / adaptive)
-The main difference: these fields typically assume the environment (plant) can and should be directly approximated.
— In vast majority of the methods in RL, we don’t want to model the environment or the plant.
Question: When should you use RL?
- As a last resort. — If your problem is not well suited to standard control methods and if your problem is not one that is supervised learning. So it’s really evaluative feedback and it’s sequential and you don’t think it’s going to be easy to model that environment that is the perfect setting for reinforcement learning.
The key properties of RL: 1. Evaluation Feedback. 2. Sequential
The agent environment interaction is that we may need to deliberately take poor actions repeatedly in order to get to a state where we can get a very large reward. ( undergraduate and graduate ) That’s why sequential matters.
The difference between State Set (In RL) and State space: A space is a set that has some additional structure like a notion of distance. So space is vectors, where we use Euclidean distance to measure distances between them. Set are more general — every space has a set underneath.