Monte Carlo methods look at the problem in a completely novel way compared to dynamic programming. Monte Carlo simulations are named after the gambling hot spot in Monaco, since chance and random outcomes are central to the modeling technique, much as they are to games like roulette, dice, and slot machines. Once again, we will be following the RL Sutton’s book 1, with extra explanation and examples that the book does not offer. Well, I think you’ll be glad to know that Monte Carlo methods, a classic way to approximate difficult probability distributions, can handle all of your worries associated with dynamic programming solutions! For example, are there a lot of real world problems where you know the state transition probabilities? Can you arbitrarily start at any state at the beginning? Is your MDP finite? It’s nice and all to have dynamic programming solutions to reinforcement learning, but it comes with many restrictions. ![]() We used policy iteration and value iteration to solve for the optimal policy. Previously, we discussed markov decision processes, and algorithms to find the optimal action-value function $q_*(s, a)$ and $v_*(s)$. The approach that further attempts to model real world uncertainty by analyzing projects the way one might analyze gambling strategies is called: gambler's approach. Extra: Discount-aware Importance Sampling.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |