{"id":12620,"date":"2021-11-14T20:10:49","date_gmt":"2021-11-15T01:10:49","guid":{"rendered":"https:\/\/carleton.ca\/scs\/?page_id=12620"},"modified":"2021-11-14T20:10:49","modified_gmt":"2021-11-15T01:10:49","slug":"tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata","status":"publish","type":"page","link":"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/","title":{"rendered":"TR-112: E-Optimal Discretized Linear Reward-Penalty Learning Automata"},"content":{"rendered":"<p>Carleton University<br \/>\n<a href=\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/\">Technical Report<\/a> <strong>TR-112<\/strong><br \/>\nMay 1987<\/p>\n<h2 class=\"tr_t1\">E-Optimal Discretized Linear Reward-Penalty Learning Automata<\/h2>\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">B.J. Oommen &amp; J.P.R. Christensen<\/div>\n<\/div>\n<div>\n<h3>Abstract<\/h3>\n<p>In this paper we consider Variable Structure Stochastic Automata (VSSA) which interact with an environment and which dynamically learns the optimal action which the automaton offers. Like all VSSA the automata are fully defined by a set of action probability updating rules [4,9,22]. However, to minimize the requirements on the random number generator used to implement the VSSA, and to increase the speed of convergence of the automaton, we consider the case in which the probability updating functions can assume only a finite number of values. These values discretize the probability space [0,1] and hence they are called Discretized Learning Automata. The discretized automata are linear because the sub-intervals of [O, 1] are of equal length. We shall prove the following results: (i) Two-Action Discretized Linear Reward-Penalty Automata are ergodic and \u00a3-optimal in all enviroments whose minimum penalty probability is less than 0.5. (ii) There exist Discretized Two-Action Linear Reward-Penalty Automata which are ergodic and \u00a3-optimal in all random environments. (iii) Discretized Two-Action Linear Reward-Penalty Automata with artificially created absorbing barriers are \u00a3-optimal in all random environments.<br \/>\nApart from the above theoretical results simulation results will be presented which indicate the properties of automata discussed. The rate of convergence of all these automata and some open problems are also presented.<\/p>\n<\/div>\n<p><a href=\"https:\/\/carleton.ca\/scs\/wp-content\/uploads\/tr-112.pdf\">TR-112.pdf<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Carleton University Technical Report TR-112 May 1987 E-Optimal Discretized Linear Reward-Penalty Learning Automata B.J. Oommen &amp; J.P.R. Christensen Abstract In this paper we consider Variable Structure Stochastic Automata (VSSA) which interact with an environment and which dynamically learns the optimal action which the automaton offers. Like all VSSA the automata are fully defined by a [&hellip;]<\/p>\n","protected":false},"author":49,"featured_media":0,"parent":11827,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","_mi_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":"","_links_to":"","_links_to_target":""},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>TR-112: E-Optimal Discretized Linear Reward-Penalty Learning Automata - School of Computer Science<\/title>\n<meta name=\"description\" content=\"Carleton University Technical Report TR-112 May 1987 E-Optimal Discretized Linear Reward-Penalty Learning Automata B.J. Oommen &amp; J.P.R. Christensen\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/\",\"url\":\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/\",\"name\":\"TR-112: E-Optimal Discretized Linear Reward-Penalty Learning Automata - School of Computer Science\",\"isPartOf\":{\"@id\":\"https:\/\/carleton.ca\/scs\/#website\"},\"datePublished\":\"2021-11-15T01:10:49+00:00\",\"dateModified\":\"2021-11-15T01:10:49+00:00\",\"description\":\"Carleton University Technical Report TR-112 May 1987 E-Optimal Discretized Linear Reward-Penalty Learning Automata B.J. Oommen &amp; J.P.R. Christensen\",\"breadcrumb\":{\"@id\":\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/carleton.ca\/scs\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Research\",\"item\":\"https:\/\/carleton.ca\/scs\/research\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"SCS Technical Reports\",\"item\":\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Technical Reports 1987\",\"item\":\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/\"},{\"@type\":\"ListItem\",\"position\":5,\"name\":\"TR-112: E-Optimal Discretized Linear Reward-Penalty Learning Automata\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/carleton.ca\/scs\/#website\",\"url\":\"https:\/\/carleton.ca\/scs\/\",\"name\":\"School of Computer Science\",\"description\":\"Carleton University\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/carleton.ca\/scs\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"TR-112: E-Optimal Discretized Linear Reward-Penalty Learning Automata - School of Computer Science","description":"Carleton University Technical Report TR-112 May 1987 E-Optimal Discretized Linear Reward-Penalty Learning Automata B.J. Oommen &amp; J.P.R. Christensen","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/","url":"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/","name":"TR-112: E-Optimal Discretized Linear Reward-Penalty Learning Automata - School of Computer Science","isPartOf":{"@id":"https:\/\/carleton.ca\/scs\/#website"},"datePublished":"2021-11-15T01:10:49+00:00","dateModified":"2021-11-15T01:10:49+00:00","description":"Carleton University Technical Report TR-112 May 1987 E-Optimal Discretized Linear Reward-Penalty Learning Automata B.J. Oommen &amp; J.P.R. Christensen","breadcrumb":{"@id":"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/tr-112-e-optimal-discretized-linear-reward-penalty-learning-automata\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/carleton.ca\/scs\/"},{"@type":"ListItem","position":2,"name":"Research","item":"https:\/\/carleton.ca\/scs\/research\/"},{"@type":"ListItem","position":3,"name":"SCS Technical Reports","item":"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/"},{"@type":"ListItem","position":4,"name":"Technical Reports 1987","item":"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1987\/"},{"@type":"ListItem","position":5,"name":"TR-112: E-Optimal Discretized Linear Reward-Penalty Learning Automata"}]},{"@type":"WebSite","@id":"https:\/\/carleton.ca\/scs\/#website","url":"https:\/\/carleton.ca\/scs\/","name":"School of Computer Science","description":"Carleton University","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/carleton.ca\/scs\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"}]}},"acf":{"banner_image_type":"none","banner_button":"no"},"_links":{"self":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/12620"}],"collection":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/users\/49"}],"replies":[{"embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/comments?post=12620"}],"version-history":[{"count":1,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/12620\/revisions"}],"predecessor-version":[{"id":12621,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/12620\/revisions\/12621"}],"up":[{"embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/11827"}],"wp:attachment":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/media?parent=12620"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}