<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article"><front><journal-meta><journal-id journal-id-type="publisher-id">INFORMATICA</journal-id><journal-title-group><journal-title>Informatica</journal-title></journal-title-group><issn pub-type="epub">0868-4952</issn><issn pub-type="ppub">0868-4952</issn><publisher><publisher-name>VU</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">inf14209</article-id><article-id pub-id-type="doi">10.15388/Informatica.2003.018</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research article</subject></subj-group></article-categories><title-group><article-title>Efficient Exploration in Reinforcement Learning Based on Utile Suffix Memory</article-title></title-group><contrib-group><contrib contrib-type="Author"><name><surname>Pchelkin</surname><given-names>Arthur</given-names></name><email xlink:href="mailto:arturp@balticom.lv">arturp@balticom.lv</email><xref ref-type="aff" rid="j_INFORMATICA_aff_000"/></contrib><aff id="j_INFORMATICA_aff_000">Faculty of Computer Science and Information Technology, Riga Technical University, 1 Kalku Str., LV‐1658 Riga, Latvia</aff></contrib-group><pub-date pub-type="epub"><day>01</day><month>01</month><year>2003</year></pub-date><volume>14</volume><issue>2</issue><fpage>237</fpage><lpage>250</lpage><history><date date-type="received"><day>01</day><month>03</month><year>2003</year></date></history><abstract><p>Reinforcement learning addresses the question of how an autonomous agent can learn to choose optimal actions to achieve its goals. Efficient exploration is of fundamental importance for autonomous agents that learn to act. Previous approaches to exploration in reinforcement learning usually address exploration in the case when the environment is fully observable. In contrast, we study the case when the environment is only partially observable. We consider different exploration techniques applied to the learning algorithm “Utile Suffix Memory”, and, in addition, discuss an adaptive fringe depth. Experimental results in a partially observable maze show that exploration techniques have serious impact on performance of learning algorithm.</p></abstract><kwd-group><label>Keywords</label><kwd>reinforcement learning</kwd><kwd>exploration</kwd><kwd>hidden state</kwd><kwd>short‐term memory</kwd></kwd-group></article-meta></front></article>