Part 1 ended with a table with 192 entries, one per state-action pair, updated by a single rule. That post covered four foundational algorithms from 1957 to 1989: By then much of what is fundamental in reinforcement learning was in place—value functions, policies, temporal difference errors, the tension between exploration and exploitation, actors and critics. The three lessons in this post cross that boundary. A neural network replaces the table. A probability distribution rep...
No comments yet. Log in to discuss on the Fediverse