Q-learning is a model-free reinforcement learning technique. It can be used to find an optimal action-selection policy for any given Markov decision process (MDP)
Algorithm:
- Agent senses its environment, using this information to determine its current state
- Agent takes an action and obtain a penalty or reward
- Agent senses its environment again – to see what effect its chosen action had
- Agent learns from its experience (and so makes ‘better’ decisions next time)
Source: How does Q-learning work?
Implementation:
Python: http://mnemstudio.org/path-finding-q-learning-tutorial.htm [Raw]
Links:
- Awesome Reinforcement Learning
- Reinforcement Learning
- Monte Carlo Methods
- Q-learning with Neural Networks