What is a K-armed Bandit ?

A gentle introduction to the fundamentals of reinforcement learning

What is a K-Armed Bandit?

The k armed bandit problem or the multi-armed bandit is as follows:

We have k slot machines and an octopus with 10 arms operating those k-slot machines (K > 10)

everytime the lever is pulled, the slotmachine gives a random reward out of a normal distribution set for that machine

the multi-armed octopus pulling on m levers of a k slot machines

The goal here it to maximize the total reward obtained by pulling the levers of the machines

Points to be noted:

We will not get the same reward or a reward with certainity or predictability out of the slot machines
We do not get to pull all the K arms of the slot machines as we have only 10 arms

If a value function were to be evaluated for the K-Armed bandit test bed given by \(q^*(a)\), then \(a\) is :