Markov property is a core property in Markov Process, understanding it will give you a broader horizon on Reinforcement Learning. It’s simple that Markov Process doesn’t care about the past, however it is the past that definite the present, which means present is the outcome of the past. Nevertheless, the only thing we should do is focus on the present, because the present will be the past.
So, what we should take into consideration? Remembering all the past is not a ideal method, we should summarize them. From Sutton’s book “What we would like, ideally, is a state signal that summarize past sensation compactly, yet in such a way that all relevant information is retained. … A state signal that succeeds in retaining all relevant information is said to be Markov, or to have Markov property.”
I have been reading Sutton’s new book Reinforcement Learning: An Introduction(2nd edition) for many days, and what made me confused was the deterministic model. What is the deterministic model?
Mathematical model in which outcomes are precisely determined through known relationships among states and events, without any room for random variation. In such models, a given input will always produce the same output, such as in a known chemical reaction. In comparison, stochastic models use ranges of values for variables in the form of probability distributions.
In deterministic models, the output of the model is fully determined by the parameter values and the initial conditions. For example, the multi-arm bandits problem in Sutton’s book, if the value of an arbitrary action is selected according to a normal distribution, say with mean 0 and variance 1, the model is stochastic, if for each action, the value is fixed for example, that is deterministic.