Understanding the Order of 500 and 1000 Rupees Notes Ban using Reinforcement Learning
Main Article Content
Abstract
In the field of machine learning called reinforcement learning, complicated sequential decision-making problems have been addressed. The issue that arises when an agent learns behavior by trial-and-error runs to determine the ideal policy, or the sequence of behaviors so that rewards are maximized,is known as reinforcement learning. Because many reinforcement learning methods use dynamic programming approaches, the environment is characterized as a Markov Decision Process (MDP). The research presents reinforcement learning using Bigram, trigram, and 4-gram models for tweets collected for "500 and 1000 notes banned." A multistage graph problem is used to draw the graph and the Bayes method is used to compute the probabilities. For the given word sequence, it determines the shortest route between source and destination. After that, the path is defined by the agent's randomly selected states and actions, which are subsequently followed to receive rewards. Epsilon greedy selection mode randomly chooses an action to explore the environment.