The distinction between MB and MF RL can provide a plausible comp

The distinction between MB and MF RL can provide a plausible computational account for the distinction between goals and habits, because action-selection in a MB agent would be immediately sensitive to current goal-values following a change in outcome value, while an MF agent would update predicted value signals only incrementally after a change in outcome value [14].

There is evidence for the existence of value representations in the brain that are MB as opposed to MF: activity in the vmPFC is better correlated with a MB value signal that takes into account knowledge of task structure (such as the presence of contingency reversals), as opposed to a MF system that incorporates no such knowledge [9], see also 17 and 18•]. Further, while the MF system uses reward-prediction Bcl-2 lymphoma errors in order to update predictions about future reward, the MB system may use a different type of prediction error to facilitate learning of the model itself. Neural correlates of such signals dubbed ‘state prediction errors’ (SPEs), have been reported in frontal and parietal areas [19] (Figure 1B). A key part of the MB system is a representation of the model itself. Little is known about how the model itself is implemented, although inferior parietal

cortex is implicated in encoding information about actions and associated outcomes independently of the value of those outcomes, which would be at least one component of a model representation [20•]. Another important feature of MB computations is that policy selection selleck products in the MB system requires active planning (i.e. forward or backward searching through the decision tree, in order to select a trajectory).

Correlates of such planning signals have been found in dorsolateral prefrontal cortex and hippocampus [21•]. MF value signals have been found in the posterior putamen 18• and 22•], consistent with previous evidence implicating this region in habitual learning 23 and 24]. On the other hand, other studies have reported signals reflecting a mix of both strategies in the striatum 21•, 25 and 26]. The extent to which MB and MF representations Liothyronine Sodium can be separated or not, might depend on details of the tasks used to assay them, as well as the implementation of how the two systems are hypothesized to interact to control behavior, as discussed below. The finding of both MB and MF RL signals in the brain begs the question of how these two systems interact in order to control behavior. Building on the proposal that the interaction between the two systems may be governed by the relative degree of uncertainty in the estimates from the two systems [14], one approach used an approximation of uncertainty based on the amount of prediction errors accumulated within the two systems [18•]. For example, if at a particular moment, the MF system has generated a lot of reward PEs recently but the MB system has generated few SPEs, this implies that predictions of the MF system may be less reliable than the MB system at that particular point in time.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>