Carleton University
Technical Report TR-32
August 1983
Multi-Action Learning Automata Possessing Ergodicity of the Mean
B.J. Oommen & M.A.L. Thathachar
Abstract
Multi-action learning automata which update their action probabilities on the basis of the responses they get from an environment are considered in this paper. The automata update the probabilities whether the environment responds with a reward or a penalty. Learning automata are said to possess Ergodicity of the Mean (EM) if the mean action probability is the state probability (or unconditional probability) of an ergodic Markov chain. The only known algorithm which is Ergodic in the Mean (EM) is the Symmetric Linear Reward-Penalty (LRP) scheme. Earlier [ 11] necessary and sufficient conditions have been derived for two-action nonlinear updating schemes to be Ergodic in the mean (FM).
