Carleton University
Technical Report TR-89
May 1986
Ergodic Learning Automata Capable of Incorporating Apriori Information
Abstract
We consider learning automata which update their action probabilities on the basis of the responses they get from a random environment. The automata update the probabilities whether the environment responds with a reward or a penalty. Learning automata are said to be ergodic if the distribution of the limiting action probability vector is independent of the initial distribution.
In this paper, we present an ergodic scheme which can take into consideration apriori information _about the action probabilities. This is the only reported scheme in the literature capable of achieving this. The mean and the variance of the limiting distribution of the automaton is derived, and it is shown that the mean is not independent of the apriori information. Further, it is shown that the expressions for the above quantities are general cases of the corresponding quantities derived for the familiar LRP scheme. Finally, it is shown that by constantly updating the parameter quantifying the apriori information, a resultant linear scheme can be obtained. This scheme is indeed counter-intuitive, for it shown to be of a Reward-Reward flavour and is yet absolutely expedient. This demonstrates that Absolutely Expedient schemes have far more general properties that those known in the past [8].