Carleton University
Technical Report TR-226
July 1993
Switching Models for Non-Stationary Random Environments
Abstract
Learning automata are stochastic finite state machines that attempt to learn the characteristic of an unknown random environment with which they interact. The fundamental problem is that of learning, through feedback, the action which has the highest probability of being rewarded by the environment. The problem of designing automata for stationary environments has been extensively studied. When the environment is non-stationary, the question of modelling the nonstationarity is, in itself, a very interesting problem. In this paper, we generalize the model used in [14,15] to present three models of non-stationarity. In the first two cases, the non-stationarity is modelled by a homogeneous Markov chain governing the way in which the characteristics change. The final model considers the more general case when the transition matrix of this chain itself changes with time in a geometric manner. In each case we have analyzed the stochastic properties of the resultant switching environment. The question of analyzing the various learning machines when interacting with these environments introduces an entire new avenue of open research problems. We are currently investigating how the three models introduced here (and in particular, the time varying model) are applicable to modelling telephone traffic.