What is a K-armed Bandit ?

A gentle introduction to the fundamentals of reinforcement learning

What is a K-Armed Bandit?

The k armed bandit problem or the multi-armed bandit is as follows:

We have k slot machines and an octopus with 10 arms operating those k-slot machines (K > 10)

everytime the lever is pulled, the slotmachine gives a random reward out of a normal distribution set for that machine

the multi-armed octopus pulling on m levers of a k slot machines

The goal here it to maximize the total reward obtained by pulling the levers of the machines

Points to be noted:

If a value function were to be evaluated for the K-Armed bandit test bed given by \(q^*(a)\), then \(a\) is :

  1. is a scalar lever number
  2. a vector that represents the levers pulled at time instant \(t\)