A gentle introduction to the fundamentals of reinforcement learning
The k armed bandit problem or the multi-armed bandit is as follows:
We have k slot machines and an octopus with 10 arms operating those k-slot machines (K > 10)
everytime the lever is pulled, the slotmachine gives a random reward out of a normal distribution set for that machine
The goal here it to maximize the total reward obtained by pulling the levers of the machines
Points to be noted:
If a value function were to be evaluated for the K-Armed bandit test bed given by \(q^*(a)\), then \(a\) is :