Carleton University
Technical Report TR-74
May 1985

Absorbing and Ergodic Discretized Two Action Learning Automata

John Oommen

Abstract

A learning automata is a machine that interacts with a random environment and which simultaneously learns the optimal action which the environment offers to it. Ih this paper we consider learning automata which have a variable structure. Such automata are completely defined by a set
of probability updating rules [4,9,20]. All the Variable Structure Stochastic Automata (VSSA) discussed in the literature, update the probabilities in such a way that an action probability can take any real value in the interval [0,1]. As opposed to these, in this paper we shall discretize the probability space so as to permit the action probability
to assume one of a finite number of distinct values in [O,l]. The discretized automaton is termed linear or nonlinear depending on whether or not the
sub-intervals of [O,l] are of equal length. We shall prove that:
(1) Discretized Two-Action Linear Reward-Inaction Automata are

absorbing and £-optimal in all environments.

(2) Discretized Two-Action Linear Inaction-Penalty Automata are

ergodic and expedient in all environments.

TR-74.pdf