Develop reinforcement learning model in R code utilizing an implementation of the Actor-Critic Model of the Basel Ganglia
Andrew Barto have suggested, a mapping between the components of the actor-critic model and the components of an important structure in the brain known as the basal ganglia. http://www.cse.iitm.ac.in/~cs670/book/the-book.html
Neurothesis is a not-for-profit research organisation looking to build a basic simulation of the Basel Ganglia for Parkinsons research.
<See image attached>
The dashed box on the left represents the components of the basal ganglia and the dashed box on the right represents the components of the actor-critic model.
Note Neurothesis would like to create a basic model using a hidden layer in this case to implement a multi-layered network for mapping. State estimates to values, and state estimates to actions.
We can see that there's a rough similarity between the components on the left side and the right side.
For example, the DA here on the left side stands for the dopamine signal.
We would like to treat the dopamine signals as the temporal difference prediction area signal that we see in a temporal difference learning model. Now, this is ultimately a very abstract and high level model of basal ganglia function, but perhaps, it could serve as simulator or hopefully as a research starting point for more detailed models.
The successful developer will provide five (5) sample scenarios each with 100 trails per scenario in the final code. The solution should clearly show both graphically and tabular results, the Actor (“Policy Improvement”) and Critic Learning(“Policy Evaluation”) state over N number of trails.
The source must clearly show values act as surrogate immediate rewards i.e. Locally optimal choice leads to globally optimal policy for “Markov” environments
The domains can be randomly generated using seed() in R to maintain consistent randomness.
Scenario 1. 10 x Variables (simulating the reward)
Scenario 2. 100 x Variables
Scenario 3. 1000 x Variables
Scenario 4. 10000 x Variables
Scenario 5. 100000 x Variables
The example scenario is to simulate a rate in a barn looking to turn left or right. The rat goal is to consume as much as possible in the shortest amount of trials while showing how path is optimised due to temporal difference.
The successful candidate must have qualifications in computation neuroscience, must provide well commented code and present a vignette with their code and a summation of their findings.