Carleton University
Technical Report TR-74
May 1985
Absorbing and Ergodic Discretized Two Action Learning Automata
Abstract
A learning automata is a machine that interacts with a random environment and which simultaneously learns the optimal action which the environment offers to it. Ih this paper we consider learning automata which have a variable structure. Such automata are completely defined by a set
of probability updating rules [4,9,20]. All the Variable Structure Stochastic Automata (VSSA) discussed in the literature, update the probabilities in such a way that an action probability can take any real value in the interval [0,1]. As opposed to these, in this paper we shall discretize the probability space so as to permit the action probability
to assume one of a finite number of distinct values in [O,l]. The discretized automaton is termed linear or nonlinear depending on whether or not the
sub-intervals of [O,l] are of equal length. We shall prove that:
(1) Discretized Two-Action Linear Reward-Inaction Automata are
absorbing and £-optimal in all environments.
(2) Discretized Two-Action Linear Inaction-Penalty Automata are
ergodic and expedient in all environments.