Metadata
eLife Assessment
The authors studied cognitive control signals in the anterior cingulate cortex (ACC) while rats selected between small immediate and larger delayed rewards. The description of behavioral strategies related to value-tracking signals in ACC is potentially useful. The evidence in support of this finding is incomplete due to issues with the task design, analyses, and modeling.
Reviewer #1 (Public review):
Summary:
Adult (4mo) rats were tasked to either press one lever for an immediate reward or another for a delayed reward. The task had an adjusting amount structure in which (1) the number of pellets provided on the immediate reward lever changed as a function of the decisions made, (2) rats were prevented from pressing the same lever three times in a row.
While the authors have been very responsive to the reviews, and I appreciate that, unfortunately, the new analyses reported in this revision actually lead me to deeper concerns about the adequacy of the data to support the conclusions. In this revision, it has become clear that the conclusions are forced and not supported by the data. Alternative theories are not considered or presented. This revision has revealed deep problems with the task, the analyses, and the modeling.
Data Weaknesses
Most importantly, the inclusion of the task behavior data has revealed a deep problem with the entire structure of the data. As is obvious in Figure 1D, there is a slow learning effect that is changing over the sessions as the animals learn to stop taking the delayed outcome. Unfortunately, the 8s delays came *after* the 4s. The first 20 sessions contain 19 4s delays and 1 8s delay, while the last 20 sessions contain 14 8s delays and 6 4s delays. Given the changes across sessions, it is likely that a large part of the difference is due to across-session learning (which is never addressed or considered).
These data are not shown by subject and I suspect that individual subjects did all 4s then all 8s and some subjects switched tasks at different times. If my suspicion is true, then any comparisons between the 4s and 8s conditions (which are a major part of the author's claims) may have nothing to do with the delays, but rather with increased experience on the task.
Furthermore, the four "groups", which are still poorly defined, seem to have been assessed at a session-by-session level. So when did each animal fall into a given group? Why is Figure 1D not showing which session fell into which group and why are we not seeing each animal's progression? They also admit that animals used a mixture of strategies, which implies that the "group" assignment is an invalid analysis, as the groups do not accommodate strategy mixing.
Figure 2 shows that none of the differences of the group behavior against random choice with a basic p(delay) are significant. The use a KS test to measure these differences. KS tests are notoriously sensitive as KS tests simply measure whether there are any statistical differences between two distributions. They do not report the full statistics for Figure 2, but only say that the 4HI group was not significant (KS p-value = 0.72) and the 8LO showed a p-value of 0.1 (which they interpret as significant). p=0.1 is not significant. They don't report the value of the 4LO or 8HI groups (why not?), but say they are in-between these two extremes. That means *none* of the differences are significant.
They then test a model with additional parameters, and say that the model includes more than the minimal p_D parameter, but never report BIC or AIC model comparisons. In order to claim that the model is better than the bare p_D assumption, they should be reporting model-comparison statistics. But given that the p_D parameters are enough (q.v. Figure 2), this entire model seems unnecessary
It took me a while to determine what was being shown in Figure 3, but I was eventually able to determine that 0 was the time after the animal made the choice to wait out the delay side, so the 4s in Figure 3A1 with high power in the low-frequency (<5 Hz) range is the waiting time. They don't show the full 8s time. Nor do they show the spectrograms separated by group (assuming that group is the analytical tool they are using). In B they show only show theta power, but it is unclear how to interpret these changes over time.
In Figure 4, panel A is mostly useless because it is just five sample sessions showing firing rate plotted on the same panels as the immediate reward amount. If they want to claim correlation, they should show and test it. But moreover, this is not how neural data should be presented - we need to know what the cells are doing, population-wise. We need to have an understanding of the neural ensemble. These data are clearly being picked and chosen, which is not OK.
Figure 4, panels B and C show that the activity trivially reflects the reward that has been delivered to the animal, if I am understanding the graphs correctly. (The authors do not interpret it this way, but the data is, to my eyes, clear.) The "immediate" signal shows up immediately at choice and reflects the size of the immediate reward (which is varying). The "delay" signal shows up after the delay and does not, which makes sense as the animals get 6 pellets on the delayed side no matter what. In fact, the max delayed side activity = the max immediate side activity, which is 6 pellets. This is just reward-related firing.
Figure 5 is poorly laid out, switching the order in 5C to be 2 1 3 in E and F. (Why?!) The statistics for Figure 5 on page 17 should be asking whether there are differences between neuron types, not whether there is a choice x time interaction in a given neuron type. When I look at Figure 5F1-3, all three types look effectively similar with different levels of noise. It is unclear why they are doing this complicated PC analysis or what we should be drawing from it.
Figure 6 mis-states pie charts as "total number" rather than proportions.
Interpretation Weaknesses
The separation of cognitive effort into "resource-based" and "resistance-based" seems artificial to me. I still do not understand why the ability to resist a choice does not also depend on resource or why using resources are not a form of resistance. Doesn't every action in the end depend on the resources one has available? And doesn't every use of a resource resist one option by taking another? Even if one buys these two separate cognitive control processes (which at this point in reading the revision, I do not), the paper starts from the assumption that a baseline probability of waiting out the delays is a "resistance-based cognitive control" (why?) and a probability of choice that takes into account the size of the immediate value (confusingly abbreviated as ival) is a "resource-based cognitive control" (again, why?)
Reviewer #2 (Public review):
Summary:
I appreciate the considerable work the authors have done on the revision. The manuscript is markedly improved.
Strengths still include the strong theoretical basis, well-done experiments, and clear links to LFP / spectral analyses that have links to human data. The task is now more clearly explained, and the neural correlates better articulated.
Weaknesses:
I had remaining questions, many related to my previous questions.
(1) The results have some complexity, but I still had questions about which is resource and which is resistance based. The authors say in the last sentence of the discussion: "Prominent pre-choice theta power was associated with a behavioral strategy characterized by a strong bias towards a resistance-based strategy, whereas the neural signature of ival-tracking was associated with a strong bias towards a resource-based strategy.".
I might suggest making this simpler and clear in the abstract and the first paragraph of the discussion. A simple statement like 'pre-choice theta was biased towards resistance whereas single neurons were biased towards resources" might make this idea come across?
(2) I think most readers would like to see raw single trial LFP traces in Figure 3, single unit rasters in Figure 4, and spike-field records in Figure 5.
(3) What limitations are there to this work? I wonder if readers might benefit from some contextualization - the sample size, heterogenous behavior - lack of cell-type specificity - using PC3 to define spectral relationships - I might suggest pointing these out.
(4) I still wasn't sure what 4 Hz vs. theta 6-12 Hz meant - is it all based on PC3's pos/neg correlation? I wonder if showing a scatter plot with the y-axis being PC3 and the x-axis being theta 4 Hz power would help distinguish these? Is this the first time this sort of analysis has been done? If so, it requires clearer definitions.
Reviewer #3 (Public review):
Summary:
The study investigated decision making in rats choosing between small immediate rewards and larger delayed rewards, in a task design where the size of the immediate rewards decreased when this option was chosen and increased when it was not chosen. The authors conceptualise this task as involving two different types of cognitive effort; 'resistance-based' effort putatively needed to resist the smaller immediate reward, and 'resource-based' effort needed to track the changing value of the immediate reward option. They argue based on analyses of the behaviour, and computational modelling, that rats use different strategies in different sessions, with one strategy in which they preferentially choose the delayed reward option irrespective of the current immediate reward size, and another strategy in which they preferentially choose the immediate reward option when the immediate reward size is large, and the delayed reward option when the immediate reward size is small. The authors recorded neural activity in anterior cingulate cortex. They propose that oscillatory activity in the 6-12Hz theta band occurs when subjects use a 'resistance-based' strategy of choosing the delayed option irrespective of the current value of the immediate reward option. They also examine neural representation of the current value of the immediate reward option, and suggest that this value is more strongly represented when subjects are using this value information to guide choice. They further argue that neurons whose activity is modulated by theta oscillations are less involved in tracking the value of the immediate reward option than neurons whose activity is not theta modulated. If solid, these findings will be of interest to researchers working on cognitive control and ACCs involvement in decision making. However, there are some issues with the modelling and analysis which preclude high confidence in the validity of the conclusions.
Strengths:
The behavioural task used is interesting and the recording methods used (64 channel silicon probes) should enable the collection of good quality single unit and LFP electrophysiology data. The authors recorded from a sizable sample of subjects for this type of study. The approach of splitting the data into sessions where subjects used different strategies and then examining the neural correlates of each is in principle interesting, though I have some reservations about the strength of evidence for the existence of multiple strategies.
Limitations:
The dataset is unbalanced in terms of both the number of sessions contributed by each subject, and their distribution across the different putative behavioural strategies (see Table 1), with some subjects contributing 7 sessions to a given strategy and others 0. Further, only 2 of 10 subjects contribute any sessions to one of the behavioural strategies (8LO), and a single subject contributes >50% of the sessions (7 of 13) sessions to another strategy (8HI). Apparent differences in brain activity between the strategies could therefore in fact reflect differences between subjects, which could arise due to e.g. differences in electrode placement. To make firm conclusions that neural activity is different in sessions where different strategies are thought to be employed, it would be necessary to account for potential cross-subject variation in the data. The current statistical methods don't appear to do this as they use within subject measures (e.g. trials or neurons) as the experimental unit and ignore which subject the neuron/trial came from.
The starting point for the analysis was the splitting of sessions into 4 groups based on the duration of the delay (4 vs 8 seconds) and then clustering within each delay category into two sub-groups. It was not clear why 2 clusters per delay category were used, nor whether the data did in fact have a clear split into two distinct clusters or continuous variation across the population of sessions. The simplified RL model used in the revised manuscript (which is an improvement from that used in the previous version) could in principle help to quantify variation across the populations of sessions, by using model fitting and comparison methods to evaluate variation in strategy across subjects. However, as far as I could tell no model-fitting or comparison was performed, and the only attempt to link the model to data was by simulating data using a fixed probability of choosing the delayed lever (i.e. with no learning across trials) and comparing the distribution of total rewards obtained per session with that of the subjects in each group (Figure 2). Total reward per session is a very coarse behavioural metric and using likelihood-based methods to fit model parameters to subjects trial-by-trial choice data would provide a more sensitive way of using the modelling to assess behavioural strategy across sessions.
Conceptually, it is not obvious that choices towards the delayed vs immediate lever reflect use of different strategies employing different types of cognitive effort. Rather these could reflect a single strategy which compares the estimated value of the two levers, with differences in behaviour between sessions accounted for either by differences in the task itself (between the 8s and 4s delay condition) or differences in the parameters of the strategy, such as the strength of temporal discounting.
Even if one accepts the claim that the task recruits two distinct types of cognitive control, the argument that theta oscillations, which occur on delay choice trials in the 4s delay condition, are a correlate of a 'resistance-based' strategy (resisting the immediate reward), is hard to reconcile with the fact that theta oscillations do not occur on delay choice trials in the 8s delay condition (Figure 3). The authors note this discrepancy, but state that 'The reason was because these groups largely avoided the delayed lever (Figure 1) and thereby abandoned the need to implement resistance-based control altogether.' However, the data in Figure 1D show that even in the 8s condition the subjects choose the delayed lever on around 50% of trials. It is not obvious why choosing the delayed lever on 50% of trials in the 8s condition does not require 'resistance-based' cognitive effort, while choosing it in the 4s delay condition does.
The other main claims regarding the neural data are that the neuronal representation of the value of the immediate reward lever (ival) is stronger in sessions where subjects are choosing that lever more often, particularly the 8LO group, and that neurons whose activity tracks ival are a different population from neurons whose activity is theta modulated. However, the analysis methods used to make these claims are rather convoluted and make it hard to assess the strength of the evidence for them.
To evaluate the strength of ival representation in neural activity, the authors first fit a regression model predicting each neuron's activity at different timepoints as a function of behavioural variables including ival, which is a sensible first step. However, they then perform clustering on the regression coefficients and then plot neural activity only for the cluster which they state 'provided the clearest example of value tracking'. It is not clear how the clustering was done, whether there were in fact well defined clusters in the neural activity, how the clusters whose activity is plotted were chosen, nor the proportion of neurons in this cluster for each group of sessions. The analysis therefore provides only limited information about the strength of ival representation in different session groups. It would be useful to quantify the variance explained by ival in neural activity for each group of sessions using a simpler quantification of the regression analysis, such as cross-validated coefficient of partial determination.
The analysis of how theta modulation related to representation of ival across neurons was also complicated and non-standard. To determine whether individual neurons were theta modulated, the authors did PCA on a matrix comprised of spike train autocorrelations for individual neurons, and then grouped neurons according to the projection of their autocorrelation function onto the 3rd Principal Component, on the basis that neurons with negative projection onto this component showed a peak roughly at theta frequency in the power spectrum of their autocorrelation. Even ignoring the fact that the peak in the power spectrum is broad and centred above the standard theta frequency (see figure 5B3), this is an arbitrary and unnecessarily complex way to determine if neurons are theta modulated. It would be much simpler and greatly preferable to either directly assess the modulation depth of individual neurons spike train autocorrelation in the theta band, or to use a metric of spike-LFP coupling in the theta band instead. The authors do include some analysis of spike field coherence in Figure 6 and this is a much more sensible approach. However, it is worth noting that the only session group which shows a difference in coherence at theta frequency relative to the other groups is 8LO, to which only 2 of 8 animals contribute any data and 70% of sessions come from one animal. It is therefore unclear whether differences in this group are due to differences in behavioural strategy, or reflect other sources of cross-animal variation.
Author response:
The following is the authors’ response to the current reviews.
We would like to thank the reviewers for their efforts and feedback on our preprint. We have elected to rework the manuscript for publication in a different journal. In this process we will alter many of the approaches and re-evaluate the conclusions. With this, many of the points raised by the reviewers will be no longer relevant and therefore do not require a response. Again, we thank the reviewers for their time and helpful feedback.
<hr>
The following is the authors’ response to the original reviews.
eLife Assessment:
The authors present a potentially useful approach of broad interest arguing that anterior cingulate cortex (ACC) tracks option values in decisions involving delayed rewards. The authors introduce the idea of a resource-based cognitive effort signal in ACC ensembles and link ACC theta oscillations to a resistance-based strategy. The evidence supporting these new ideas is incomplete and would benefit from additional detail and more rigorous analyses and computational methods.
We are extremely grateful for the several excellent and comments of the reviewers. To address these concerns, we have completely reworked the manuscript adding more rigorous approaches in each phase of the analysis and computational model. We realize that this has taken some time to prepare the revision. However, given the comments of the reviewers, we felt it necessary to thoroughly rework the paper based on their input. Here is a (nonexhaustive) overview of the major changes we made:
We have developed a way to more adequately capture the heterogeneity in the behavior
We have completely reworked the RL model
We have added additional approaches and rigor to the analysis of the value-tracking signal.
Reviewer #1 (Public Review):
Summary:
Young (2.5 mo [adolescent]) rats were tasked to either press one lever for immediate reward or another for delayed reward.
Please note that at the time of testing and training that the rats were > 4 months old.
The task had a complex structure in which (1) the number of pellets provided on the immediate reward lever changed as a function of the decisions made, (2) rats were prevented from pressing the same lever three times in a row. Importantly, this task is very different from most intertemporal choice tasks which adjust delay (to the delayed lever), whereas this task held the delay constant and adjusted the number of 20 mg sucrose pellets provided on the immediate value lever.
Several studies parametrically vary the immediate lever (PMID: 39119916, 31654652, 28000083, 26779747, 12270518, 19389183). While most versions of the task will yield qualitatively similar estimates of discounting, the adjusting amount is preferred as it provides the most consistent estimates (PMID: 22445576). More specifically this version of the task avoids contrast effects of that result from changing the delay during the session (PMID: 23963529, 24780379, 19730365, 35661751) which complicates value estimates.
Analyses are based on separating sessions into groups, but group membership includes arbitrary requirements and many sessions have been dropped from the analyses.
We have updated this approach and now provide a more comprehensive assessment of the behavior. The updated approach applies a hierarchical clustering model to the behavior in each session. This was applied at each delay to separate animals that prefer the immediate option more/less. This results in 4 statistically dissociable groups (4LO, 4HI, 8LO, 8HI) and includes all sessions. Please see Figure 1.
Computational modeling is based on an overly simple reinforcement learning model, as evidenced by fit parameters pegging to the extremes.
We have completely reworked the simulations in the revision. In the updated RL model we carefully add parameters to determine which are necessary to explain the experimental data. We feel that it is simplified yet more descriptive. Please see Figure 2 and associated text.
The neural analysis is overly complex and does not contain the necessary statistics to assess the validity of their claims.
We have dramatically streamlined the spike train analysis approach and added several statistical tests to ensure the rigor of our results. Please see Figures 4,5,6 and associated text.
Strengths:
The task is interesting.
Thank you for the positive comment
Weaknesses:
Behavior:
The basic behavioral results from this task are not presented. For example, "each recording session consisted of 40 choice trials or 45 minutes". What was the distribution of choices over sessions? Did that change between rats? Did that change between delays? Were there any sequence effects? (I recommend looking at reaction times.) Were there any effects of pressing a lever twice vs after a forced trial?
Please see the updated statistics and panels in Figures 1 and 2. We believe these address this valid concern.
This task has a very complicated sequential structure that I think I would be hard pressed to follow if I were performing this task.
Human tasks implement a similar task structure (PMID: 26779747). Please note the response above that outlines the benefits of using of this task.
Before diving into the complex analyses assuming reinforcement learning paradigms or cognitive control, I would have liked to have understood the basic behaviors the rats were taking. For example, what was the typical rate of lever pressing? If the rats are pressing 40 times in 45 minutes, does waiting 8s make a large difference?
Thank you for this suggestion. Our additions to Figure 1 are intended to better explain and quantify the behavior of the animals. Note that this task is designed to hold the rate of reinforcement constant no matter the choices of the animals. Our analysis supports the long-held view in the literature that rats do not like waiting for rewards, even at small delays. Going from the 4 à 8 sec delay results in significantly more immediate choices, indicating that the rats will forgo waiting 8 sec for a larger reinforcer and take a smaller reinforcer at 4 sec.
For that matter, the reaction time from lever appearance to lever pressing would be very interesting (and important). Are they making a choice as soon as the levers appear? Are they leaning towards the delay side, but then give in and choose the immediate lever? What are the reaction time hazard distributions?
This is an excellent suggestion, we have added a brief analysis of reaction times (Please see the section entitled “4 behavioral groups are observed across all sessions” in the Results). Please note that an analysis of the reaction times has been presented in a prior analysis of this data set (White et al., 2024). In addition, an analysis of reaction times in this task was performed in Linsenbardt et al. (2017). In short, animals tend to choose within 1 second of the lever appearing. In addition, our prior work shows that responses on the immediate lever tend to be slower, which we viewed as evidence of increased deliberation requirements (possibly required to integrate value signals).
It is not clear that the animals on this task were actually using cognitive control strategies on this task. One cannot assume from the task that cognitive control is key. The authors only consider a very limited number of potential behaviors (an overly simple RL model). On this task, there are a lot of potential behavioral strategies: "win-stay/lose-shift", "perseveration", "alternation", even "random choices" should be considered.
The strategies the Reviewer mentioned are descriptors of the actual choices the rats made. For example, perseveration means the rat is choosing one of the levers at an excessively high rate whereas alternation means it is choosing the two levers more or less equally, independent of payouts. But the question we are interested in is why? We are arguing that the type of cognitive control determines the choice behavior, but cognitive control is an internal variable that guides behavior, rather than simply a descriptor of the behavior. For example, the animal opts to perseverate on the delayed lever because the cognitive control required to track ival is too high. We then searched the neural data for signatures of the two types of cognitive control.
The delay lever was assigned to the "non-preferred side". How did side bias affect the decisions made?
The side bias clearly does not impact performance as the animals prefer the delay lever at shorter delays, which works against this bias.
The analyses based on "group" are unjustified. The authors compare the proportion of delayed to immediate lever press choices on the non-forced trials and then did k-means clustering on this distribution. But the distribution itself was not shown, so it is unclear whether the "groups" were actually different. They used k=3, but do not describe how this arbitrary number was chosen. (Is 3 the optimal number of clusters to describe this distribution?) Moreover, they removed three group 1 sessions with an 8s delay and two group 2 sessions with a 4s delay, making all the group 1 sessions 4s delay sessions and all group 2 sessions 8s delay sessions. They then ignore group 3 completely. These analyses seem arbitrary and unnecessarily complex. I think they need to analyze the data by delay. (How do rats handle 4s delay sessions? How do rats handle 6s delay sessions? How do rats handle 8s delay sessions?). If they decide to analyze the data by strategy, then they should identify specific strategies, model those strategies, and do model comparison to identify the best explanatory strategy. Importantly, the groups were session-based, not rat based, suggesting that rats used different strategies based on the delay to the delayed lever.
We have completely reworked our approach for capturing the heterogeneity in behavior. We have taken care to show more of the behavioral statistics that have gone into identifying each of the groups. All sessions are included in this analysis. As the reviewer suggests, we used the statistics from each of the behavioral groups to inform the RL model that explores neural signals that underly decisions in this task. We strongly disagree that groups should be rat and not session based as the behavior of the animal can, and does, change from day to day. This is important to consider when analyzing the neural data as rat-based groupings would ignore this potential source of variance.
The reinforcement learning model used was overly simple. In particular, the RL model assumes that the subjects understand the task structure, but we know that even humans have trouble following complex task structures. Moreover, we know that rodent decision-making depends on much more complex strategies (model-based decisions, multi-state decisions, rate-based decisions, etc). There are lots of other ways to encode these decision variables, such as softmax with an inverse temperature rather than epsilon-greedy. The RL model was stated as a given and not justified. As one critical example, the RL model fit to the data assumed a constant exponential discounting function, but it is well-established that all animals, including rodents, use hyperbolic discounting in intertemporal choice tasks. Presumably this changes dramatically the effect of 4s and 8s. As evidence that the RL model is incomplete, the parameters found for the two groups were extreme. (Alpha=1 implies no history and only reacting to the most recent event. Epsilon=0.4 in an epsilongreedy algorithm is a 40% chance of responding randomly.)
While we agree that the approach was not fully justified, we do not agree that it was invalid. Simply stated, a softmax approach gives the best fit to the choice behavior, whereas our epsilon-greedy approach attempted to reproduce the choice behavior using a naïve agent that progressively learns the values of the two levers on a choice-by-choice basis. Nevertheless, we certainly appreciate that important insights can be gained by fitting a model to the data as suggested. We feel that the new modeling approach we have now implemented is optimal for the present purposes and it replaces the one used in the original manuscript.
The authors do add a "dbias" (which is a preference for the delayed lever) term to the RL model, but note that it has to be maximal in the 4s condition to reproduce group 2 behavior, which means they are not doing reinforcement learning anymore, just choosing the delayed lever.
The dbias term was dropped in the new model implementation
Neurophysiology:
The neurophysiology figures are unclear and mostly uninterpretable; they do not show variability, statistics or conclusive results.
While the reviewer is justified in criticizing the clarity of the figures, the statement that “they do not show variability, statistics or conclusive results” is not correct. Each of the figures presented in the first draft of the manuscript, except Figure 3, are accompanied by statistics and measures of variability. Nonetheless we have updated each of the neurophysiology analyses. We hope that the reviewer will find our updates more rigorous and thorough.
As with the behavior, I would have liked to have seen more traditional neurophysiological analyses first. What do the cells respond to? How do the manifolds change aligned to the lever presses? Are those different between lever presses?
We have added several figures that plot the mean +/- SEM of the neural activity (see Figures 4 and 5). Hopefully this provides a more intuitive picture of the changes in neural activity throughout the task.
Are there changes in cellular information (both at the individual and ensemble level) over time in the session?
We provide several analyses of how firing rate changes over trials in relation to ival over time and trials in the session. In addition, we describe how these signals change in each of the behavioral groups.
How do cellular responses differ during that delay while both levers are out, but the rats are not choosing the immediate lever?
We were somewhat unclear about this suggestion as the delay follows the lever press. In addition, there is no delay after immediate presses
Figure 3, for example, claims that some of the principal components tracked the number of pellets on the immediate lever ("ival"), but they are just two curves. No statistics, controls, or justification for this is shown. BTW, on Figure 3, what is the event at 200s?
This comment is no longer relevant based on the changes we’ve made to the manuscript.
I'm confused. On Figure 4, the number of trials seems to go up to 50, but in the methods, they say that rats received 40 trials or 45 minutes of experience.
This comment is no longer relevant based on the changes we’ve made to the manuscript.
At the end of page 14, the authors state that the strength of the correlation did not differ by group and that this was "predicted" by the RL modeling, but this statement is nonsensical, given that the RL modeling did not fit the data well, depended on extreme values. Moreover, this claim is dependent on "not statistically detectable", which is, of course, not interpretable as "not different".
This comment is no longer relevant based on the changes we’ve made to the manuscript.
There is an interesting result on page 16 that the increases in theta power were observed before a delayed lever press but not an immediate lever press, and then that the theta power declined after an immediate lever press.
Thank you for the positive comment.
These data are separated by session group (again group 1 is a subset of the 4s sessions, group 2 is a subset of the 8s sessions, and group 3 is ignored). I would much rather see these data analyzed by delay itself or by some sort of strategy fit across delays.
Thank you for the excellent suggestion. Our new group assignments take delay into account.
That being said, I don't see how this description shows up in Figure 6. What does Figure 6 look like if you just separate the sessions by delay?
We are unclear what the reviewer means by “this description”.
Discussion:
Finally, it is unclear to what extent this task actually gets at the questions originally laid out in the goals and returned to in the discussion. The idea of cognitive effort is interesting, but there is no data presented that this task is cognitive at all. The idea of a resourced cognitive effort and a resistance cognitive effort is interesting, but presumably the way one overcomes resistance is through resourcelimited components, so it is unclear that these two cognitive effort strategies are different.
The basis for the reviewers assertation that “the way one overcomes resistance is through resourcelimited components” is not clear. In the revised version, we have taken greater care to outline how each type of effort signal facilitates performance of the task and articulate these possibilities in our stochastic and RL models. We view the strong evidence for ival tracking presented herein as a critical component of resource based cognitive effort.
The authors state that "ival-tracking" (neurons and ensembles that presumably track the number of pellets being delivered on the immediate lever - a fancy name for "expectations") "taps into a resourced-based form of cognitive effort", but no evidence is actually provided that keeping track of the expectation of reward on the immediate lever depends on attention or mnemonic resources. They also state that a "dLP-biased strategy" (waiting out the delay) is a "resistance-based form of cognitive effort" but no evidence is made that going to the delayed side takes effort.
We challenge the reviewers that assertation ival tracking is a “fancy name for expectations”. We make no claim about the prospective or retrospective nature of the signal. Clearly, expectations should be prospective and therefore different from ival tracking. Regarding the resistance signal: First, animals avoid the delay lever more often at the 8 sec delay (Figure 1). We have shown that increasing the delay systematically biases responses AWAY from the delay (Linsenbardt et al., 2017). This is consistent with a well-developed literature that rats and mice do not like waiting for delayed reinforcers. We contend that enduring something you don’t like takes effort.
The authors talk about theta synchrony, but never actually measure theta synchrony, particularly across structures such as amygdala or ventral hippocampus. The authors try to connect this to "the unpleasantness of the delay", but provide no measures of pleasantness or unpleasantness. They have no evidence that waiting out an 8s delay is unpleasant.
We have added spike-field coherence to better contact the literature on synchrony. Note that we never refer to our results as “synchrony”. However, we would be remiss to not address the growing literature on theta synchrony in effort allocation. There is a well-developed literature that rats and mice do not like waiting for delayed reinforcers. If waiting out the delay was not pleasant then why do the animals forgo larger rewards to avoid it?
The authors hypothesize that the "ival-tracking signal" (the expectation of number of pellets on the immediate lever) "could simply reflect the emotional or autonomic response". Aside from the fact that no evidence for this is provided, if this were to be true, then, in what sense would any of these signals be related to cognitive control?
This is proposed as an alternative explanation to the ival signal in the discussion. It was added as our due diligence. Emotional state could provide feedback to the currently implemented control mechanism. If waiting for reinforcement is too unpleasant this could drive them to ival tracking and choosing the immediate option more frequently. We provide this option only as a possibility, not a conclusion. We have clarified this in the revised text. Nevertheless, based on our review of the literature, autonomic tracking in some form, seems to be the most likely function of ACC (Seamans & Floresco 2022). While the reviewer may disagree with this, we feel it is at least as valid as all the complex, cognitively-based interpretations that commonly appear in the literature.
Reviewer #2 (Public Review):
Summary:
This manuscript explores the neuronal signals that underlie resistance vs resource-based models of cognitive effort. The authors use a delayed discounting task and computational models to explore these ideas. The authors find that the ACC strongly tracks value and time, which is consistent with prior work. Novel contributions include quantification of a resource-based control signal among ACC ensembles, and linking ACC theta oscillations to a resistance-based strategy.
Strengths:
The experiments and analyses are well done and have the potential to generate an elegant explanatory framework for ACC neuronal activity. The inclusion of local-field potential / spike-field analyses is particularly important because these can be measured in humans.
Thank you for the endorsement of our work.
Weaknesses:
I had questions that might help me understand the task and details of neuronal analyses.
(1) The abstract, discussion, and introduction set up an opposition between resource and resistancebased forms of cognitive effort. It's clear that the authors find evidence for each (ACC ensembles = resource, theta=resistance?) but I'm not sure where the data fall on this dichotomy.
(a) An overall very simple schematic early in the paper (prior to the MCML model? or even the behavior) may help illustrate the main point.
(b) In the intro, results, and discussion, it may help to relate each point to this dichotomy.
(c) What would resource-based signals look like? What would resistance based signals look like? Is the main point that resistance-based strategies dominate when delays are short, but resource-based strategies dominate when delays are long?
(d) I wonder if these strategies can be illustrated? Could these two measures (dLP vs ival tracking) be plotted on separate axes or extremes, and behavior, neuronal data, LFP, and spectral relationships be shown on these axes? I think Figure 2 is working towards this. Could these be shown for each delay length? This way, as the evidence from behavior, model, single neurons, ensembles, and theta is presented, it can be related to this framework, and the reader can organize the findings.
These are excellent suggestions, and we have implemented them, where possible.
(2) The task is not clear to me.
(a) I wonder if a task schematic and a flow chart of training would help readers.
Yes, excellent idea, we have now included this in Figure 1.
(b) This task appears to be relatively new. Has it been used before in rats (Oberlin and Grahame is a mouse study)? Some history / context might help orient readers.
Indeed, this task has been used in rats in several prior studies in rats. Please see the following references (PMID: 39119916, 31654652, 28000083, 26779747, 12270518, 19389183).
(c) How many total sessions were completed with ascending delays? Was there criteria for surgeries? How many total recording sessions per animal (of the 54?)
Please note that the delay does not change within a session. There were no criteria for surgery.
(d) How many trials completed per session (40 trials OR 45 minutes)? Where are there errors? These details are important for interpreting Figure 1.
Every animal in this data set completed 40 trials and we have updated the task description to clarify this issue. There are no errors in this task, but rather the task is designed to the tendency to make an impulsive choice (smaller reward now).
(3) Figure 1 is unclear to me.
(a) Delayed vs immediate lever presses are being plotted - but I am not sure what is red, and what is blue. I might suggest plotting each animal.
We have updated Figure 1 considerably for clarity.
(b) How many animals and sessions go into each data point?
We hope this is clarified now with our new group assignments as all sessions were included in the analysis.
(c) Table 1 (which might be better referenced in the paper) refers to rats by session. Is it true that some rats (2 and 8) were not analyzed for the bulk of the paper? Some rats appear to switch strategies, and some stay in one strategy. How many neurons come from each rat?
We have updated Table 1 based on our new groupings. The rats that contribute the most sessions also tend to be represented across the behavioral groups therefore it is unlikely that effort allocation strategies across groupings are an esoteric feature of an animal.
(d) Task basics - RT, choice, accuracy, video stills - might help readers understand what is going into these plots
(e) Does the animal move differently (i.e., RTs) in G1 vs. G2?
Excellent suggestion. We have added more analysis of the task variables in the revision (e.g. RT, choice comparisons across delays, etc…)
(4) I wasn't sure how clustered G1 vs. G2 vs G3 are. To make this argument, the raw data (or some axis of it) might help.
(a) This is particularly important because G3 appears to be a mix of G1 and G2, although upon inspection, I'm not sure how different they really are
(b) Was there some objective clustering criteria that defined the clusters?
(c) Why discuss G3 at all? Can these sessions be removed from analysis?
Based on our updates to the behavioral analysis these comments are no longer relevant.
(5) The same applies to neuronal analyses in Fig 3 and 4
(a) What does a single neuron peri-event raster look like? I would include several of these.
(b) What does PC1, 2 and 3 look like for G1, G2, and G3?
(c) Certain PCs are selected, but I'm not sure how they were selected - was there a criteria used? How was the correlation between PCA and ival selected? What about PCs that don't correlate with ival?
(d) If the authors are using PCA, then scree plots and PETHs might be useful, as well as comparisons to PCs from time-shuffled / randomized data.
We hope that our reworking of the neural data analysis has clarified these issues. We now include several firing rate examples and aggregate data.
(6) I had questions about the spectral analysis
(a) Theta has many definitions - why did the authors use 6-12 Hz? Does it come from the hippocampal literature, and is this the best definition of theta? What about other bands (delta - 1-4 Hz), theta (4-7 Hz); and beta - 13- 30 Hz? These bands are of particular importance because they have been associated with errors, dopamine, and are abnormal in schizophrenia and Parkinson's disease.
This designation comes mainly from the hippocampal and ACC literature in rodents. In addition, this range best captured the peak in the power spectrum in our data. Note that we focus our analysis on theta give the literature regarding theta in the ACC as a correlate of cognitive controls (references in manuscript). We did interrogate other bands as a sanity check and the results were mostly limited to theta. Given the scope of our manuscript and the concerns raised regarding complexity we are concerned that adding frequency analyses beyond theta obfuscates the take home message.
However, the spectrograms in Figure 3 show a range of frequencies and highlight the ones in the theta band as the most dynamic prior to the choice.
(b) Power spectra and time-frequency analyses may justify the authors focus. I would show these (yaxis - frequency, x-axis - time, z-axis, power).
Thank you for the suggestion. We have added this to Figure 3.
(7) PC3 as an autocorrelation doesn't seem the to be right way to infer theta entrainment or spikefield relationships, as PCA can be vulnerable to phantom oscillations, and coherence can be transient. It is also difficult to compare to traditional measures of phase-locking. Why not simply use spike-field coherence? This is particularly important with reference to the human literature, which the authors invoke.
Excellent suggestion. Note that PCA provided a way to classify neurons that exhibited peaks in the autocorrelation at theta frequencies. We have added spike-field coherence, and this analysis confirms the differences in theta entrainment of the spike trains across the behavioral groups. Please see Figure 6D.
Reviewer #3 (Public Review):
Summary:
The study investigated decision making in rats choosing between small immediate rewards and larger delayed rewards, in a task design where the size of the immediate rewards decreased when this option was chosen and increased when it was not chosen. The authors conceptualise this task as involving two different types of cognitive effort; 'resistance-based' effort putatively needed to resist the smaller immediate reward, and 'resource-based' effort needed to track the changing value of the immediate reward option. They argue based on analyses of the behaviour, and computational modelling, that rats use different strategies in different sessions, with one strategy in which they consistently choose the delayed reward option irrespective of the current immediate reward size, and another strategy in which they preferentially choose the immediate reward option when the immediate reward size is large, and the delayed reward option when the immediate reward size is small. The authors recorded neural activity in anterior cingulate cortex (ACC) and argue that ACC neurons track the value of the immediate reward option irrespective of the strategy the rats are using. They further argue that the strategy the rats are using modulates their estimated value of the immediate reward option, and that oscillatory activity in the 6-12Hz theta band occurs when subjects use the 'resistancebased' strategy of choosing the delayed option irrespective of the current value of the immediate reward option. If solid, these findings will be of interest to researchers working on cognitive control and ACCs involvement in decision making. However, there are some issues with the experiment design, reporting, modelling and analysis which currently preclude high confidence in the validity of the conclusions.
Strengths:
The behavioural task used is interesting and the recording methods should enable the collection of good quality single unit and LFP electrophysiology data. The authors recorded from a sizable sample of subjects for this type of study. The approach of splitting the data into sessions where subjects used different strategies and then examining the neural correlates of each is in principle interesting, though I have some reservations about the strength of evidence for the existence of multiple strategies.
Thank you for the positive comments.
Weaknesses:
The dataset is very unbalanced in terms of both the number of sessions contributed by each subject, and their distribution across the different putative behavioural strategies (see table 1), with some subjects contributing 9 or 10 sessions and others only one session, and it is not clear from the text why this is the case. Further, only 3 subjects contribute any sessions to one of the behavioural strategies, while 7 contribute data to the other such that apparent differences in brain activity between the two strategies could in fact reflect differences between subjects, which could arise due to e.g. differences in electrode placement. To firm up the conclusion that neural activity is different in sessions where different strategies are thought to be employed, it would be important to account for potential cross-subject variation in the data. The current statistical methods don't do this as they all assume fixed effects (e.g. using trials or neurons as the experimental unit and ignoring which subject the neuron/trial came from).
In the revised manuscript we have updated the group assignments. We have improved our description of the logic and methods for employing these groupings as well. With this new approach, all sessions are now included in the analysis. The group assignments are made purely on the behavioral statistics of an animal in each session. We feel this approach is preferable to eliminating neurons or session with the goal of balancing them, which may introduce bias. Further, the rats that contribute the most sessions also tend to be represented across the behavioral groups therefore it is unlikely that effort allocation strategies across groupings are an esoteric feature of an animal. As neurons are randomly sampled from each animal on a given session, we feel that we’re justified in treating these as fixed effects.
It is not obvious that the differences in behaviour between the sessions characterised as using the 'G1' and 'G2' strategies actually imply the use of different strategies, because the behavioural task was different in these sessions, with a shorter wait (4 seconds vs 8 seconds) for the delayed reward in the G1 strategy sessions where the subjects consistently preferred the delayed reward irrespective of the current immediate reward size. Therefore the differences in behaviour could be driven by difference in the task (i.e. external world) rather than a difference in strategy (internal to the subject). It seems plausible that the higher value of the delayed reward option when the delay is shorter could account for the high probability of choosing this option irrespective of the current value of the immediate reward option, without appealing to the subjects using a different strategy.
Further, even if the differences in behaviour do reflect different behavioural strategies, it is not obvious that these correspond to allocation of different types of cognitive effort. For example, subjects' failure to modify their choice probabilities to track the changing value of the immediate reward option might be due simply to valuing the delayed reward option higher, rather than not allocating cognitive effort to tracking immediate option value (indeed this is suggested by the neural data). Conversely, if the rats assign higher value to the delayed reward option in the G1 sessions, it is not obvious that choosing it requires overcoming 'resistance' through cognitive effort.
The RL modelling used to characterise the subject's behavioural strategies made some unusual and arguably implausible assumptions:
Thank you for the feedback, based on these comments (and those above) we have completely reworked the RL model. In addition, we’ve taken care to separate out the variables that correspond to a resistance- versus a resource-based signal.
There were also some issues with the analyses of neural data which preclude strong confidence in their conclusions:
Figure 4I makes the striking claim that ACC neurons track the value of the immediately rewarding option equally accurately in sessions where two putative behavioural strategies were used, despite the behaviour being insensitive to this variable in the G1 strategy sessions. The analysis quantifies the strength of correlation between a component of the activity extracted using a decoding analysis and the value of the immediate reward option. However, as far as I could see this analysis was not done in a cross-validated manner (i.e. evaluating the correlation strength on test data that was not used for either training the MCML model or selecting which component to use for the correlation). As such, the chance level correlation will certainly be greater than 0, and it is not clear whether the observed correlations are greater than expected by chance.
We have added more rigorous methods to assess the ival tracking signal (Figure 4 and 5). In addition, we’ve dropped the claim that ival tracking is the same across the behavioral groups. We suspect that this was an artifact of a suboptimal group assignment approach in the previous version.
An additional caveat with the claim that ACC is tracking the value of the immediate reward option is that this value likely correlates with other behavioural variables, notably the current choice and recent choice history, that may be encoded in ACC. Encoding analyses (e.g. using linear regression to predict neural activity from behavioural variables) could allow quantification of the variance in ACC activity uniquely explained by option values after controlling for possible influence of other variables such as choice history (e.g. using a coefficient of partial determination).
We agree that the ival tracking signal may be influenced by other variables – especially ones that are not cognitive but rather more generated by the autonomic system. We have included a discussion of this possibility in the Discussion section. Our previous work has explored the role of choice history on neural activity, please see White et al., (2024).
Figure 5 argues that there are systematic differences in how ACC neurons represent the value of the immediate option (ival) in the G1 and G2 strategy sessions. This is interesting if true, but it appears possible that the effect is an artefact of the different distribution of option values between the two session types. Specifically, due to the way that ival is updated based on the subjects' choices, in G1 sessions where the subjects are mostly choosing the delayed option, ival will on average be higher than in G2 sessions where they are choosing the immediate option more often. The relative number of high, medium and low ival trials in the G1 and G2 sessions will therefore be different, which could drive systematic differences in the regression fit in the absence of real differences in the activity-value relationship. I have created an ipython notebook illustrating this, available at: https://notebooksharing.space/view/a3c4504aebe7ad3f075aafaabaf93102f2a28f8c189ab9176d48 07cf1565f4e3. To verify that this is not driving the effect it would be important to balance the number of trials at each ival level across sessions (e.g. by subsampling trials) before running the regression.
This is an excellent point and lead us to abandon the linear regression-based approach to quantify differences in ival coding across behavioral groups.
Recommendations for the authors:
Reviewer #1 (Recommendations For The Authors):
This paper was extremely hard to read. In addition to the issues raised in the public review (overly complex and incomplete analyses), one of the hardest things to deal with was the writing.
Thank you for the feedback. Hopefully we have addressed this with our thorough rewrite.
The presentation was extremely hard to follow. I had to read through it several times to figure out what the task was. It wasn't until I got to the RL model Figure 2A that I realized what was really going on with the task. I strongly recommend having an initial figure that lays out the actual task (without any RL or modeling assumptions) and identifies the multiple different kinds of sessions. What is the actual data you have to start with? That was very unclear.
Excellent idea. We have implemented this in Figure 1.
Labeling session by "group" is very confusing. I think most readers take "group" as the group of subjects, but that's not what you mean at all. You mean some sessions were one way and some were another. (And, as I noted in the public review, you ignore many of the sessions, which I think is not OK.) I think a major rewrite would help a lot. Also, I don't think the group analysis is necessary at all. In the public review, I recommend doing the analyses very differently and more classically.
We have updated the group assignments in a manner that is more intuitive, reflects the delays, and includes all sessions.
The paper is full of arbitrary abbreviations that are completely unnecessary. Every time I came to "ival", I had to translate that into "number of pellets delivered on the immediate lever" and every time I came to dLP, I had to translate that into "delayed lever press". Making the text shorter does not make the text easier to read. In general, I was taught that unless the abbreviation is the common term (such as "DNA" not "deoxyribonucleic acid"), you should never use an abbreviation. While there are some edge cases (ACC probably over "anterior cingulate cortex"), dLP, iLP, dLPs, iLPs, ival, are definitely way over the "don't do that" line.
We completely agree here and apologize for the excessive use of abbreviations. We have removed nearly all of them
The figures were incomplete, poorly labeled, and hard to read. A lot of figures were missing, for example
Basic task structure
Basic behavior on the task
Scatter plot of the measures that you are clustering (lever press choice X number of pellets on the immediate lever, you can use color or multiple panels to indicate the delay to the delayed lever) Figure 3 is just a couple of examples. That isn't convincing at all.
Figure 4 is missing labels. In Figure 4, I don't understand what you are trying to say.
I don't see how the results on page 16 arise from Figure 6. I strongly recommend starting from the actual data and working your way to what it means rather than forcing this into this unreasonable "session group" analysis.
We have completely reworked the Figures for clarity and content.
The statement that "no prior study has explored the cellular correlates of cognitive effort" is ludicrous and insulting. There are dozens of experiments looking at ACC in cognitive effort tasks, in humans, other primates, and rodents. There are many dozens of experiments looking at cellular correlates in intertemporal choice tasks, some with neural manipulations, some with ensemble recordings. There are many dozens of experiments looking at cellular relationships to waiting out a delay.
We agree that our statement was extremely imprecise. We have updated this to say: “Further, a role for theta oscillations in allocating physical effort has been identified. However, the cellular
mechanisms within the ACC that control and deploy types of cognitive effort have not been identified.”
Reviewer #2 (Recommendations For The Authors):
In Figure 2, the panels below E and F are referred to as 'right' - but they are below? I would give them letters.
I would make sure that animal #s, neuron #s, and LFP#s are clearly presented in the results and in each figure legend. This is important to follow the results throughout the manuscript.
Some additional proofreading ('Fronotmedial') might help with clarity.
Based on our updates, this is no longer relevant.
Reviewer #3 (Recommendations For The Authors):
In addition to the suggestions above to address specific issues, it would be useful to report some additional information about aspects of the experiments and analyses:
Specify how spike sorting was performed and what metrics were used to select well isolated single units.
Done.
Provide histology showing the recording locations for each subject.
Histological assessments of electrodes placements are provided in White et al. 2024, but we provide an example placement. This has been added to the text.
Indicate the sequence of recording sessions that occurred for each subject, including for each session what delay duration was used and which dataset the session contributed to, and indicate when the neural probes were advanced between sessions.
We feel that this adds complexity unnecessarily as we make no claims about holding units across sessions for differences in coding in the dorsoventral gradient of ACC.
Indicate the experimental unit when reporting uncertainty measures in figure legends (e.g. mean +/- SEM across sessions).
Done.