Main

Obsessive–compulsive behaviors are highly prevalent in the general population and cause significant distress to the individual1,2. Due to their presence in multiple psychiatric conditions, such as obsessive–compulsive disorder (OCD), these behaviors are increasingly identified as a core dimension for psychiatric classification7. Existing treatments for the heterogenous obsessive–compulsive symptoms produce disappointing outcomes and are accompanied by side effects that weaken compliance8,9. With non-invasive neuromodulation (for example, transcranial alternating current stimulation, tACS), interventions can be personalized according to individual neurophysiological dynamics10. Such personalized interventions have been proposed to improve treatment outcomes for obsessive–compulsive behaviors11.

Obsessive–compulsive behaviors are considered outcomes of maladaptive habit learning due to abnormalities in a frontostriatal network engaged in repetitive execution of learned behaviors3,4. Previous studies have associated these behaviors with excessive habit learning as a function of reward during reinforcement learning12, along with abnormal connectivity between the medial orbitofrontal cortex (OFC) and the ventral striatum3,4,12,13,14. Interestingly, a parallel line of work has proposed functionally relevant oscillatory activity in the same network during reward learning. Novel, unexpected or improbable reward, but not punishment, feedback elicits enhanced activity in the high beta–low gamma frequency range over frontal sites in human magneto- and electroencephalography (MEG/EEG, 20–35 Hz; Fig. 1)5,6,15,16. This effect is thought to be a motivational value signal originating from the OFC5,6 that is transmitted to the ventral striatum to link preceding stimulus–response mapping to consequent rewards5,17. Modulation of these rhythms in the OFC during reward processing might influence the learning mechanisms facilitating transition to habitual behavior, thereby normalizing obsessive–compulsive behaviors.

Fig. 1: Model integrating beta–gamma activity with reward learning circuitry.
figure 1

Arrows show anatomical relationships between components of the reward and learning circuitry. Dashed black arrows show the loop of activity between the hippocampus (Hipp), nucleus accumbens (NAcc) and ventral tegmental area (VTA) put forward by Axmacher et al.38, with solid black arrows showing other major connections involved in learning and motivation39. The positive reward feedback beta–gamma effect shown to the right is theorized to derive from generators in the orbitofrontal cortex (OFC) or ventromedial prefrontal cortex (vmPFC)5,6, to be modulated by the substantia nigra (SN) and VTA (short red arrow), and to synchronize with beta–gamma rhythms in the ventral striatum (VS), providing a key mechanism by which the communication between frontostriatal regions can result in the transmission of a motivational value signal to the reward circuit in learning or decision-making contexts (long red arrow)5. Amy, amygdala; dACC, dorsal anterior cingulate cortex; Th, thalamus; VP, ventral pallidum.

Across two experiments, we used personalized, non-invasive, high-definition tACS (HD-tACS) to establish the functional contribution of OFC beta–gamma rhythms in reward learning and examined its potential in normalizing obsessive–compulsive behaviors. First (experiment 1), we sought to causally demonstrate that beta–gamma neuromodulation selectively influences reward- but not punishment-guided learning. We determined the intrinsic reward-related beta–gamma frequency of participants performing a reinforcement learning task and used HD-tACS to modulate these rhythms in the OFC during the task. Additionally, given that beta and gamma rhythms are proposed to map rewards with preceding events5,17, they may facilitate transition to habitual behavior, which can be reflected in exploitative decision-making18. However, reward-related beta–gamma activity has also been associated with low tonic prefrontal dopamine19, which is further associated with exploratory decision-making20. We used reinforcement learning models to study these candidate computational mechanisms during beta–gamma neuromodulation. Second (experiment 2), we examined the impact of chronic beta–gamma OFC modulation over 5 days on obsessive–compulsive characteristics in a non-clinical population. Similar to experiment 1, personalized neuromodulation frequencies were determined during a previous session while participants performed the reinforcement learning task. Obsessive–compulsive behaviors were examined over a period of 3 months. Finally, to identify common underlying mechanisms, we examined how neuromodulation-induced changes in reward learning and in obsessive–compulsive symptoms across both experiments are associated with intrinsic beta–gamma dynamics.

To examine the functional role of beta–gamma rhythms in reward learning, we used a between-participants, double-blind, double-sham-controlled design in experiment 1 involving a monetary reinforcement learning task21 (Fig. 2a). Sixty participants were randomly assigned to either the passive sham, active control (alpha, 10 Hz) or personalized beta–gamma (mean, 26.6 Hz ± 3.3 s.d.) modulation group. Personalized beta–gamma frequencies were determined through a pre-experiment EEG recording while participants learned to associate stimuli with monetary rewards and losses. Electrical field modeling was used to target the medial OFC (Fig. 2b). All participants performed a 30 min block of the task before, during and after modulation, and their choice behavior during each period was analyzed separately for reward and punishment trials.

Fig. 2: Monetary reinforcement learning task and orbitofrontal neuromodulation protocol.
figure 2

a, Participants chose one of two abstract visual stimuli and later observed the outcome. Following a jittered interstimulus interval (ISI), the outcome depending on the trial type was shown. In reward trials, one stimulus was associated with an 80% probability of winning US$10 and a 20% probability of winning nothing, and the other stimulus had the opposite probability structure. In punishment trials, one stimulus was associated with an 80% probability of losing US$10 and a 20% probability of losing nothing, and the other stimulus had the opposite probability structure. b, Personalized orbitofrontal neuromodulation protocol and current-flow models on three-dimensional reconstructions of the cortical surface. The location and current intensity value of each electrode are shown.

Optimal choice behavior showed a significant interaction effect of the modulation group (sham, alpha, beta–gamma), valence (reward, punishment), and time (pre-modulation, modulation, post-modulation) (Fig. 3). Differences in choice behavior were observed when comparing beta–gamma with sham, and beta–gamma with alpha, but not when comparing alpha with sham, which suggests that beta–gamma modulation alone changed behavior. Comparing optimal choice behavior across groups during the modulation period, we found significantly impaired behavior during reward trials in the beta–gamma group. No differences across groups were observed for punishment trial behavior. In the beta–gamma group, choice behavior during reward trials was significantly impaired during the modulation period relative to the pre-modulation period but recovered in the post-modulation period, which suggests that the effects were acutely sensitive to concurrent beta–gamma HD-tACS (Fig. 3c). No differences were observed in punishment trial behavior over time. Control analyses in the alpha and sham groups did not show any differences over time in reward or punishment trials (all P > 0.468, all n = 20). The three groups did not differ in the pre-modulation performance for reward (all P > 0.369, all n = 40) or punishment trials (all P > 0.604, all n = 40); this rules out potential between-participant confounding effects related to performance fluctuations at baseline. Together, these results demonstrate that frequency-personalized, beta–gamma OFC neuromodulation can exert a strong, valence-specific influence on choice behavior during reward learning in a rapid and reversible manner.

Fig. 3: Results of experiment 1, the monetary reinforcement learning task.
figure 3

ac, Observed optimal behavioral choices for reward (blue) and punishment (gold) during the pre-modulation (Pre, dotted), modulation (Mod, solid), and post-modulation (Post, dashed) blocks for the passive sham (a), active control/alpha (b) and personalized beta–gamma (c) groups. A significant modulation group × valence × time interaction was observed (F4,114 = 4.337, P = 0.003, \(\eta _p^2\) = 0.132, n = 60). The interaction effect was present when comparing beta–gamma with sham (F2,76 = 3.643, P = 0.036, \(\eta _p^2\) = 0.087, n = 40) and beta–gamma with alpha (F2,76 = 7.162, P = 0.001, \(\eta _p^2\) = 0.159, n = 40), but not when comparing alpha with sham (F2,76 = 1.286, P = 0.282, n = 40). During the modulation period, a significant group effect was observed for reward trials (F2,57 = 17.735, P = 1 × 10−6, \(\eta _p^2\) = 0.384, n = 60), with beta–gamma modulation driving the effect (sham versus beta–gamma, F1,38 = 16.170, P = 2.65 × 10−4, \(\eta _p^2\) = 0.299, n = 40; alpha versus beta–gamma, F1,38 = 35.997, P = 5.69 × 10−7, \(\eta _p^2\) = 0.486, n = 40; sham versus alpha, F1,38 = 0.805, P = 0.375, n = 40), but no differences were observed for punishment trials across groups (all F < 0.291, all P > 0.593, all n = 40). In the beta–gamma group (c, right panel), there was a significant effect of time on reward trials (F2,38 = 20.159, P = 2 × 10−6, \(\eta _p^2\) = 0.515, n = 20), with the modulation period showing significant differences compared with both pre-modulation (F1,19 = 29.533, P = 3 × 10−5, \(\eta _p^2\) = 0.609, n = 20) and with post-modulation (F1,19 = 23.012, P = 1.25 × 10−4, \(\eta _p^2\) = 0.548, n = 20) periods. No differences were observed between pre-modulation and post-modulation periods in reward trials (F1,19 = 0.822, P = 0.376, n = 20). Beta–gamma modulation did not affect choice behavior in punishment trials over time (all F < 0.370, all P > 0.550, ns = 20). Mixed ANOVAs used the within-participants factors of valence (reward, punishment) and time (pre-modulation, modulation, post-modulation), and the between-participants factor of group (sham, alpha, beta–gamma). Error bars and shaded error bands, ±1 s.e.m. ***P < 0.001. NS, not significant.

Source data

Next, we used reinforcement learning models to identify the computational mechanisms affected by beta–gamma neuromodulation during reward learning. We fitted these models to each participant’s choice behavior during reward trials. We used four models: a standard Q-learning algorithm (StandardQ)22; a standard Q-learning algorithm with a free parameter allowing differential weighting of positive and negative outcomes (WeightedQ)23; an actor-critic algorithm (Actor-Critic)22,23; and a hybrid model incorporating both Q-learning and actor-critic learning (Hybrid)23,24. The Actor-Critic model had the best fit or least amount of information loss, as indicated by the lowest Akaike information criterion pooled across participants and conditions (StandardQ, 34,435.68; WeightedQ, 32,703.25; Actor-Critic, 28,817.58; Hybrid, 31,126.38). Repeated measures analysis of variance (ANOVA) of the Actor-Critic model parameters for the within-participants factor of time (pre-modulation, modulation, post-modulation) and the between-participants factor of group (sham, alpha, beta–gamma) showed a significant interaction effect on mean temperature (β) (Extended Data Fig. 1). Follow-up ANOVAs showed a significant time × group interaction for beta–gamma compared with alpha, and for beta–gamma compared with sham, but not for alpha compared with sham. These interactions were driven by an increase in temperature during the modulation period, which rapidly recovered once neuromodulation was switched off, thus mirroring the effects of beta–gamma HD-tACS on reward choice behavior (Fig. 3). No other model parameters showed significant effects. To summarize, beta–gamma OFC neuromodulation acutely impaired reward-guided choice behavior by increasing an exploratory tendency, perhaps through the addition of decision noise or indeterminacy to the neural computations involved in choice.

After establishing that beta–gamma OFC HD-tACS modulates reward learning, we examined its effects on obsessive–compulsive behaviors. In experiment 2, we used a between-participants, double-blind, active-sham-controlled design. A non-clinical group of 64 participants with obsessive–compulsive symptoms in the subclinical and clinical range was studied. For each participant, we determined the endogenous reward-sensitive beta–gamma peak frequency using an abbreviated version of the learning task with concurrent EEG. Participants were assigned to either the personalized beta–gamma group (mean 27.1 Hz ± 3.6 s.d.) or the active control group undergoing alpha (10 Hz) modulation (that is, active control/alpha). HD-tACS, targeted at the medial OFC, was applied for 30 min each day for 5 consecutive days. We quantified obsessive–compulsive behaviors using the Obsessive–Compulsive Inventory-Revised (OCI-R)25, the most commonly used self-report scale in OCD research26,27. The baseline measurement was taken prior to HD-tACS on the first day. Follow-up measurements were taken on the fifth day (the last day of HD-tACS), and at 1 month, 2 months and 3 months after the last day of HD-tACS.

Beta–gamma modulation induced a robust and long-lasting improvement in obsessive–compulsive behaviors. A significant group × time interaction was observed in the OCI-R total score. In the beta–gamma group (n = 32), Bonferroni-corrected pairwise comparisons showed significant OCI-R reductions between pre-modulation baseline and all post-modulation periods (Fig. 4a, left panel): on the last intervention day (P = 0.042), and at the post-intervention timepoints of 1 month (P = 0.013), 2 months (P = 0.002) and 3 months (P = 0.003). On average, the OCI-R total score reduced by nearly 28% over 3 months. No significant changes in the OCI-R score were observed in the alpha group. Thus, beta–gamma neuromodulation induced frequency-specific improvements in obsessive–compulsive symptoms that could not be attributed to stimulation artifacts.

Fig. 4: Results of experiment 2, change in obsessive–compulsive symptoms after HD-tACS.
figure 4

a, Mean OCI-R total score (left panel) and mean OCI-R subscale scores (right panels) with 95% confidence intervals (with mean at center of error bars) shown for each modulation group and timepoint. The shaded region shows the 5 day intervention period. For total OCI-R (left panel), there was a significant group × time interaction (F4,248 = 6.748, P = 2.43 × 10−4, \(\eta _p^2\) = 0.098, n = 64). There was a main effect of time on OCI-R in the beta–gamma group (F4,124 = 10.278, P = 5.5 × 10−5, \(\eta _p^2\) = 0.249, n = 32), and no effect for the control/alpha group (F4,124 = 0.307, P = 0.80, n = 32). See main text for statistics of pairwise comparisons. For subscales, significant group × time effects were evident for hoarding (F4,248 = 2.886, P = 0.030, \(\eta _p^2\) = 0.044, n = 64), ordering (F4,248 = 4.234, P = 0.004, \(\eta _p^2\) = 0.064, n = 64) and obsessing (F4,248 = 2.817, P = 0.032, \(\eta _p^2\) = 0.043, n = 64). The beta–gamma modulation group showed significant main effects of time for hoarding (F4,124 = 8.169, P = 1.16 × 10−4, \(\eta _p^2\) = 0.209, n = 32), checking (F4,124 = 4.827, P = 0.008, \(\eta _p^2\) = 0.135, n = 32), ordering (F4,124 = 9.8, P = 3.9 × 10−5, \(\eta _p^2\) = 0.24, n = 32) and washing (F4,124 = 3.029, P = 0.043, \(\eta _p^2\) = 0.089, n = 32). Mixed ANOVAs used the within-participants factor of time and the between-participants factor of group (control/alpha, beta–gamma). Follow-up Bonferroni-corrected pairwise comparisons (right panels) in the beta–gamma group (n = 32) revealed a rapid reduction in hoarding (P = 0.02) and ordering (P = 0.012) by the last day of intervention, relative to baseline. These reduced scores remained reduced or continued to drop at 1 month (hoarding, P = 0.009; ordering, P = 0.01), 2 months (hoarding, P = 0.039; ordering, P = 0.002) and 3 months (hoarding, P = 1.9 × 10−5; ordering, P = 0.003). Of note, relative to baseline, there were significant reductions in checking on day 5 (P = 0.035), at 1 month (P = 0.008) and at 2 months (P = 0.015) after intervention. The alpha group (n = 32) showed no main effects of time (all F < 1.359, all P > 0.257) and no pairwise differences at any timepoint relative to baseline (all P > 0.228). b, Scatter plots with 95% confidence intervals of individual participant total OCI-R scores obtained before the intervention (pre-modulation) shown against the modulation-induced change in OCI-R (that is, baseline minus post-modulation measurement) for each post-intervention timepoint and for each modulation group. The beta–gamma group (n = 32) showed significant Pearson correlations (two-tailed, corrected for multiple comparisons) on day 5 (r32 = 0.624, P = 1.35 × 10−4), 1 month (r32 = 0.602, P = 2.71 × 10−4), 2 months (r32 = 0.712, P = 5 × 10−6) and 3 months (r32 = 0.651, P = 5.6 × 10−5). *P < 0.05; **P < 0.01; ***P < 0.001.

Source data

Beta–gamma modulation selectively improved several OCI-R subscales related to compulsivity (Fig. 4a, right panels). There were significant group × time interactions for hoarding, ordering and obsessing behaviors. Within the beta–gamma group, main effects of time were observed for hoarding, checking, ordering and washing behaviors. Follow-up Bonferroni-corrected pairwise comparisons in the beta–gamma group revealed rapid reductions in the OCI-R hoarding and ordering subscales by the last day of intervention. These reductions were sustained, to the post-intervention timepoint of 3 months. Of note, significant reductions were also observed in checking for up to 2 months. The alpha group showed no main effects of time and no pairwise differences at any timepoint relative to baseline. Pearson correlations between all time intervals in the alpha group showed good to excellent reliability for total and subscale scores (0.526 to 0.947, respectively; all P < 0.002, all n = 32), consistent with previous test–retest reliability studies of OCI-R in clinical and subclinical OCD populations25,28.

Participants with higher OCI-R scores at baseline showed greater reductions in obsessive–compulsive behaviors after beta–gamma modulation. We performed participant-wise correlations in the beta–gamma group between the total OCI-R score at baseline and the change in this score at each post-modulation timepoint. After correcting for multiple comparisons, significant correlations were found at each timepoint; this indicates that individuals with higher baseline ratings had the largest HD-tACS improvements (Fig. 4b). Conversely, individuals with relatively low OCI-R scores at baseline had no change or had slightly increased symptoms after repeated beta–gamma modulation. However, the number of low-scoring participants was a clear minority (6 of 32). The overwhelming majority of participants (26 of 32) had modulation-induced improvements, especially those who reported more severe obsessive–compulsive behaviors.

In addition to the total OCI-R score, subscale scores at baseline also predicted modulation-induced improvements at various timepoints. Significant correlations that survived P value correction for multiple comparisons (P < 0.0008) and were present for at least three out of four post-modulation timepoints were observed for hoarding (all r32 > 0.638, all P < 8.7 × 10−5, all n = 32), checking (all r32 > 0.628, all P < 1.2 × 10−4, all n = 32), ordering (all r32 > 0.590, all P < 3.8 × 10−4, all n = 32) and washing (all r32 > 0.590, all P < 3.8 × 10−4, all n = 32). By contrast, analyses of the same relationships for total OCI-R (all r32 < 0.237, all P > 0.192, all n = 32) and subscales (all r32 < 0.554, all P > 0.001, all n = 32) in the active control/alpha group did not reach significance at the corrected alpha level and the large majority of correlations from the control group were not significant even at an alpha level of P < 0.05 (23 out of 28, all r32 < 0.336, all P > 0.06, all n = 32), thus confirming the improvements to be frequency specific in the beta–gamma range. The intervention was seen to be most effective for non-clinical individuals with more severe compulsivity characteristics.

Across the two experiments, we observed impairments in reward learning and reduction in obsessive–compulsive symptoms due to beta–gamma OFC modulation. Next, we devised a series of complementary analyses across both experiments to confirm that changes in reward learning and obsessive–compulsive behaviors were related to each other and similarly associated with OFC beta–gamma neurophysiology. If obsessive–compulsive behaviors arise from maladaptive habit learning, then individuals with more severe obsessive–compulsive characteristics should have higher optimal choice behavior during reward trials. To test this prediction, we divided participants in experiment 2 into two groups based on their baseline OCI-R score (low versus high), collapsed across neuromodulation groups, and analyzed their reward choice behavior during the abbreviated, pre-experiment reinforcement learning task. The group with high OCI-R scores had significantly higher behavioral accuracy (mean ± s.e.m., 0.91 ± 0.02, n = 34) than the low OCI-R group (0.83 ± 0.02, n = 30; F1,62 = 6.759, P = 0.012, \(\eta _p^2\) = 0.098, n = 64). Participant-wise regression analysis revealed that individuals with high reward choice behavior were also those with high OCI-R scores (r64 = 0.494, P = 3.4 × 10−5, n = 64). Thus, high symptom severity was associated with better learning from rewards, consistent with previous work linking compulsivity and excessive reward-based habit learning12.

Next, we examined the relationship between reward choice behavior and the change in the intrinsic beta–gamma power in response to rewarding feedback relative to punishing feedback. We computed the beta–gamma effect during the pre-experiment EEG in experiment 1 and correlated it with the choice behavior during the pre-modulation periods. A significant participant-wise negative correlation (r60 = −0.467, P = 1.7 × 10−4, n = 60) was observed, which suggests that individuals who exhibited a smaller beta–gamma effect in response to rewarding feedback consistently made better choices during reward trials.

Optimal reward choice behavior was associated with a smaller beta–gamma effect (in experiment 1) and higher obsessive–compulsive symptom severity (in experiment 2). This implies that individuals with higher OCI-R scores should show a smaller beta–gamma effect in their pre-experiment EEG. We compared the beta–gamma effect between individuals with high and low OCI-R total scores at baseline. An ANOVA using the within-participants factor of valence (reward versus punishment) and the between-participants factor of OCI-R score (high versus low) indicated a significant valence × OCI-R interaction (F1,62 = 9.462, P = 0.003, \(\eta _p^2\) = 0.132, n = 64). By parsing the interaction, we observed that the high OCI-R group exhibited no beta–gamma effect, in agreement with our prediction, given that power values relative to baseline on reward trials (mean ± s.e.m., 0.013 ± 0.010 dB) and on punishment trials (0.0003 ± 0.008 dB) did not significantly differ (F1,33 = 0.954, P = 0.336, \(\eta _p^2\) = 0.028, n = 34). By contrast, individuals who scored relatively low on the OCI-R scale showed a significant beta–gamma effect (F1,29 = 20.712, P = 8.8 × 10−5, \(\eta _p^2\) = 0.417, n = 30) due to a large increase in beta–gamma power on reward trials (0.074 ± 0.011 dB) relative to punishment trials (0.008 ± 0.013 dB). Critically, these observations suggest that both obsessive–compulsive behaviors and reward-guided choice behavior during learning are similarly associated with rhythms in the beta–gamma range.

As previously discussed, modulation gains were stronger for individuals with more severe obsessive–compulsive characteristics (Fig. 4b). Given that obsessive–compulsive characteristics were associated with better choice performance in reward trials, individuals with more optimal reward choice behavior at baseline should experience stronger impairments during beta–gamma neuromodulation. Indeed, a significant negative correlation was observed between reward choice behavior during the pre-modulation and modulation periods across participants in the beta–gamma group in experiment 1 (r20 = −0.578, P = 0.008, n = 20). Together, these results establish parallel effects of neuromodulation in both experiments and suggest that convergent mechanisms underlie the regulation of reward-guided choice behavior and obsessive–compulsive behaviors through beta–gamma neuromodulation.

Activity in the beta–gamma range is considered a key mechanism for coupling frontostriatal areas during reward learning5. These rhythms are thought to transmit motivational value signals within the reward network to assist learning (Fig. 1)5,17. In light of these theories, it might seem surprising that modulation of these rhythms in the OFC in the current study caused impairments, and not enhancements, in optimal choice behavior. A facilitatory role of beta–gamma rhythms in learning may still emerge if both OFC and ventral striatum are simultaneously modulated. For synchronized circuits, neuromodulation of a single node in the circuit may not be sufficient to provide additional benefits10. Development of temporal interference-based subcortical neuromodulation29 for human studies, paired with OFC modulation using HD-tACS may be key to empirically test this possibility. Another possibility is that neuromodulation may have saturated the intrinsic beta–gamma activity in the OFC, which would leave little room for reward-induced modulations for behavioral adaptation. Although more work is needed to test these hypotheses, our observations of impaired choice behavior align with another perspective that takes into account the role of dopamine circuitry in the prefrontal cortex. Specifically, reward-related beta–gamma activity is stronger in individuals with reduced tonic prefrontal dopamine19. Furthermore, reduction in OFC dopamine levels is associated with increased exploratory behavior20. Examination of choice behavior through reinforcement learning models supports this perspective. Using the actor-critic framework, an increase was detected in exploration during decision-making while participants underwent beta–gamma neuromodulation. It is possible that beta–gamma neuromodulation increased local beta–gamma activity and reduced dopamine levels in the OFC to increase exploratory behavior. These modulations may be accompanied by changes in the frontostriatal connectivity and striatal dopamine levels20. The increase in exploration could also be due to added noise in maintaining or transmitting the identity information of the rewarding stimuli30, or due to downstream effects in a broader network that controls exploratory behavior such as the locus coeruleus31,32 or the motor cortex33. Future neuromodulation studies carried out concurrently with electrophysiology and neuroimaging will be needed to examine these neural processes that are potentially influenced by beta–gamma modulation during reward-guided choice behavior.

Compulsive behaviors are characterized as outcomes of dysfunctional habit learning4,12. In support of these theories, we observed greater severity of compulsive behaviors associated with more exploitative reward choice behavior. Chronic administration of HD-tACS over the medial OFC personalized to the intrinsic beta–gamma frequencies of the reward network decreased compulsive behaviors for at least 3 months. Together with prior evidence of an association of obsessive–compulsive behavior with habit learning4,12, these observations suggest that OFC neuromodulation may have improved obsessive–compulsive symptoms by promoting exploratory behavior. Indeed, reduced exploration during reward choice behavior has been observed in some pathologies of compulsivity such as alcohol use disorder34, and in animal models of compulsive behavior35. It is plausible that chronic OFC neuromodulation recovered the dysfunctional frontostriatal connectivity associated with compulsivity characteristics4, and may have affected the striatum, which also exhibits beta frequency activity during habitual behavior36. These predictions can be better examined through concurrent neuromodulation and neuroimaging. At present, our results provide causal evidence for the involvement of the OFC in obsessive–compulsive behaviors in humans, and the frequency-specific effects of neuromodulation provide a mechanistic insight into the neurophysiology that can be targeted for symptom improvement. The personalized neuromodulation design leverages the neuroplastic properties of the brain to produce long-lasting effects via chronic HD-tACS administration37. Although we examined a subclinical population in the present study, the benefits observed provide motivation for further investigation of personalized neuromodulation as an experimental medicine for clinical obsessive–compulsive spectrum disorders and other conditions of frontostriatal dysfunction.

Methods

Participants

A total of 128 participants (60 in experiment 1, 68 in experiment 2) signed informed consent forms, approved by the Boston University Institutional Review Board, and were paid. For experiment 1, data on all 60 participants recruited were analyzed (sham, n = 20, mean age, 26.6 years ± 5.8 s.d., 10 female, all right-handed; alpha, n = 20, mean age, 25.8 years ± 5.6 s.d., 10 female, all right-handed; beta–gamma, n = 20, mean age, 25.1 years ± 6.2 s.d., 11 female, all right-handed; two-tailed independent samples t-tests showed no between-group age differences, all t38 < 0.790, all P > 0.435, all n = 40). For experiment 2, we sought a sample that reflected a wide range of scores on the OCI-R for correlational analysis and to determine the impact of the beta–gamma intervention on people with varying levels of obsessive–compulsive behavior severity. This approach was motivated by conceptualizations in dimensional psychiatry that emerged as Research Domain Criteria (RDoC)40 and align with transdiagnostic perspectives of obsessive–compulsive spectrum disorders7. To this end, we sought a relatively equal distribution of low and high OCI-R scores across the spectrum, similar to previous research methods41. Thirty-four participants with relatively low OCI-R scores (0–17) and another 34 participants with relatively high OCI-R scores (≥18) were recruited. Four participants were lost to attrition due to multiple testing sessions. The final group of 64 participants consisted of 30 people with low scores (active control/alpha, n = 15, mean age 23.9 years ± 4.0 s.d., 8 female, all right-handed; beta–gamma, n = 15, 25.3 years ± 3.8 s.d., 8 female, all right-handed; two-tailed independent samples t-test for age, t28 = 1.026, P = 0.314, n = 30), and 34 people with high scores (active control/alpha, n = 17, 23.4 years ± 4.0 s.d., 9 female, all right-handed; beta–gamma, n = 17, 23.0 years ± 3.7 s.d., 9 female, all right-handed; two-tailed independent samples t-test on age, t32 = 0.327, P = 0.745, n = 34). Independent samples two-tailed t-tests showed no significant age differences between active control and beta–gamma modulation groups (t62 = 0.491, P = 0.625, n = 64). Of the 34 participants with high scores, 28 (active control/alpha, n = 14; beta–gamma, n = 14) scored higher than 20, which indicates the likely presence of OCD25.

All participants reported no metal implants in head, no implanted electronic devices, no history of neurological problems or head injury, no skin sensitivity, no claustrophobia, not being pregnant, no history of an Axis I psychiatric disorder, no current use of psychoactive medication, no substance abuse or dependence in the past 6 months, normal color vision, normal or corrected-to-normal visual acuity, and not being enrolled in any other research or treatment study for the duration of their study participation.

Stimuli and procedures

Overview

Experiment 1 was a randomized, between-participants, double-blind study using passive and active control procedures and pre–post within-participants measurements. Sixty participants were randomly assigned to either the passive sham, the active control (alpha, 10 Hz) or the personalized beta–gamma (mean, 26.6 Hz ± 3.3 s.d.) experimental group (20 participants each), with the use of a covariate adaptive randomization method to produce equal-sized groups balanced by sex and age42. All procedures for each experimental group were identical, with the exception of the HD-tACS protocol (passive sham, active control/alpha or personalized beta–gamma; Fig. 2). In addition to a passive sham control (the gold standard in the field of non-invasive neuromodulation) we instituted an active control procedure using alpha-rate HD-tACS to minimize potential confounding effects related to peripheral co-stimulation and to test for frequency specificity of HD-tACS effects10,43. For each experimental group, the duration of task performance was 90 min, and this consisted of three consecutive periods (pre-modulation, modulation, post-modulation; 30 min each) during which participants performed a classic monetary reinforcement learning task (Fig. 2a). The additional pre–post, within-participants measurements increased the rigor and reproducibility of the study. The pre-modulation period offered an additional baseline period from which to compare the effects of each intervention within groups, and provided the opportunity to evaluate the stability of the baseline data collected across groups. The post-modulation period enabled pre–post comparisons, evaluation of practice effects, and the duration of modulation effects to be examined. Behavioral data were analyzed before, during and after the 30 min application of HD-tACS in each experimental group (sham, alpha, beta–gamma).

Experiment 2 was a randomized, between-participants, double-blind study using an active control procedure (alpha-rate HD-tACS) and pre–post, within-participants measurements. Sixty-four participants were randomly assigned to either the active control (alpha, 10 Hz) or personalized beta–gamma (mean, 27.1 Hz ± 3.6 s.d.) experimental group (32 participants each) using a covariate adaptive randomization method to produce equal-sized groups balanced by sex, age and OCI-R score42. The procedures across experimental groups were identical, with the exception of modulation frequency (10 Hz versus personalized ~27 Hz; Fig. 2). In addition to the use of electrical field modeling and HD-tACS to help to focalize the stimulated area and minimize unwanted peripheral co-stimulation44,45,46, we also instituted an active control procedure that matched the personalized beta–gamma intervention in terms of estimated effects on extra-cranial tissue, shunting and peripheral co-stimulation10,43. All participants completed a baseline pre-modulation OCI-R assessment, followed by 5 consecutive days of alpha or beta–gamma HD-tACS (30 min each day). On the fifth day, after neuromodulation, participants completed a post-modulation OCI-R assessment, followed by three longer-term post-modulation OCI-R assessments at 1 month, 2 months and 3 months after the last day (that is, day 5) of the intervention. OCI-R scores were analyzed for all five data collection timepoints (baseline, day 5, 1 month, 2 months and 3 months).

Experimental task

In experiment 1, participants performed three consecutive blocks (pre-modulation, modulation, post-modulation) of a well-validated monetary reinforcement learning task (Fig. 2a), coded in Psychtoolbox47, based on methods from ref. 21. Each block was approximately 30 min and consisted of three runs, with new pairs of stimuli in each run. During a run, participants performed 80 trials (40 reward, 40 punishment). Each trial began with fixation (500 ms), followed by the presentation of two novel visual stimuli (letters from the Agathodaimon font; 2,500 ms) to the left and right of fixation (relative stimulus position counterbalanced across trials). Participants were instructed to choose between stimuli to maximize payoffs. Upon making a selection using a button press from a handheld gamepad, a red circle appeared around the chosen stimulus (500 ms). Following a jittered interstimulus interval (500–1,500 ms), feedback was presented (2,500 ms). Each pair of stimuli was related to a pair of outcomes (in the reward trials, the two outcomes were +US$10 or US$0, and in the punishment trials they were −US$10 or US$0), with the two stimuli in each trial each having reciprocal probabilities of the two outcomes (80%/20% and 20%/80%, respectively). Reward and punishment trials were randomly interleaved within a run. To maximize payoffs, participants needed to learn stimulus–outcome contingencies via trial and error. Participants were told that their wins from a randomly selected run of trials would be their compensation for participation, but in actuality all participants left with the same fixed amount.

Clinical instrument

In experiment 2, participants' obsessive–compulsive characteristics were evaluated using the OCI-R25. The OCI-R is one of the most commonly used self-report assessments in obsessive–compulsive behavior research, worldwide26,27. It has been used to quantify obsessive–compulsive symptoms in various disorders, including OCD25. It is an 18-item unipolar self-report questionnaire of DSM-IV obsessive–compulsive symptoms that includes six subscales: washing, checking, neutralizing, obsessing, ordering and hoarding, and these subscales have been validated in clinical48, non-clinical49, and combined samples25. Each response is scored on a five-point scale from ‘not at all’ (coded as 0) to ‘extremely’ (coded as 4). The total score is the sum of the scores on all 18 items. People with OCD typically have a score of 21 or higher25. The OCI-R has good to excellent convergent and discriminant validity, test–retest reliability, and internal consistency in clinical and non-clinical samples25,48,49.

HD-tACS

The alternating current stimulation was delivered non-invasively using a nine-channel high-definition transcranial electrical-current stimulator (Soterix Medical). The HD-tACS procedure used eight sintered 12-mm-diameter Ag-AgCl electrodes, attached to custom high-definition plastic holders, filled with conductive gel, and embedded in a BrainCap (Brain Vision). Electrical field modeling using HD-Explore and HD-Targets (Soterix Medical) guided decision-making about where to place electrodes, with the goal of delivering focalized current to medial regions of the bilateral OFC. Figure 2b shows the modulation parameters, including the number of electrodes, electrode location and current intensity values (Fp1, −0.3 mA; Fp2, −0.3mA; Cz, 0.6 mA; C5, 0.3 mA; C6, 0.3 mA; Ex17, −0.6 mA; Ex18, −0.6 mA; lz, 0.6 mA). A bipolar sinusoidal alternating current was applied at either a participant-defined beta–gamma frequency (that is, personalized beta–gamma) or at 10 Hz (that is, active control/alpha). Modulation intensity (peak to peak) was set to 1.8 mA. Modulation was done for 30 min while participants performed the reinforcement learning task (experiment 1) or were seated at rest with eyes open, not engaged in any task (experiment 2). All participants confirmed that the modulation procedure was acceptable and did not induce painful skin sensations or phosphenes.

Control procedures

We implemented numerous procedures, in accordance with previously established methods10,50,51,52, to ensure that information about the experiments would not bias results. First, both experiments were between-participant in design to minimize learning and the transfer of effects across conditions. Second, for both experiments, within-participant, pre- and post-modulation assessment periods were implemented for each modulation group to reduce random noise such as confounding effects related to participant differences. In addition to comparisons afforded by the between-participant design, the addition of pre–post assessment periods allowed us to: (1) examine the effects of each type of modulation through time in the same group of participants; (2) evaluate the stability of baseline task performance across the different groups of participants; and (3) determine how long the modulation effects lasted.

Third, the experiments were sham-controlled. Experiment 1 used the standard passive sham control. The passive sham procedure followed the same procedure as the active beta–gamma modulation, but lasted only 30 s, and ramped up and down at the beginning and end of the 30 min period, to simulate the tingling sensation that participants commonly experience and then quickly adjust to during active modulation51. Additionally, experiments 1 and 2 utilized an active control10. The parameters of the active control procedure were identical to that of beta–gamma modulation, except that alternating current was applied at a neighboring frequency band (alpha, 10 Hz). This was an especially rigorous control procedure because unlike the passive sham, HD-tACS during active control was applied to the same cortical target (OFC), at the full 1.8 mA intensity, for the full 30 min duration. The active control effectively minimizes unwanted effects associated with changes in extra-cranial tissue, shunting and peripheral co-stimulation, such as transretinal or transcutaneous stimulation43. The improved neuromodulation technology (HD-tACS) and electrical field modeling that we used have also been shown to minimize the effects of peripheral co-stimulation because these methods enhance anatomical targeting precision and produce more focal current distributions44,53.

Fourth, we used a double-blind procedure, in which a second experimenter set the mode (for example, active or sham) on the stimulator, but otherwise did not interact with the participant or with the experimenter who performed data collection. Fifth, we conducted recordings in a sound-attenuated, electrically shielded chamber to avoid line noise interference and reduce the possibility of participant–experimenter interaction during testing, which could have at least implicitly biased the participants.

Last, we confirmed that participants remained blind to the type of modulation that they received. After the experiment had ended (experiment 1) or each modulation test day had ended (experiment 2), we administered a safety questionnaire54 and visual analog scale55, which included questions regarding attention, concentration, mood, vision, headache, fatigue and skin sensations under electrodes. Scores on these ratings were not significantly different across modulation groups (experiment 1, all F1,38 < 0.307, all P > 0.583, all n = 40; experiment 2, all F1,62 < 1.229, all P > 0.272, all n = 64). In addition, all participants were asked at the end of the experiments whether they could guess which modulation group they had been randomly assigned to. Participants' guesses were near chance (experiment 1, 31.67% hit rate; experiment 2, 53.13% hit rate).

Data analysis

Pre-experiment EEG frequency tuning

Dynamic systems theory predicts that systems with intrinsic periodic dynamics have preferred modulation frequencies and that neuronal modulation should be most effective when the external modulation frequency of weak periodic perturbations is closest to the brain network’s preferred frequency56. To design a personalized HD-tACS intervention, we determined the peak frequency of each participant’s endogenous positive feedback beta–gamma effect. Each participant completed a brief pre-study EEG recording while they performed 30 min of the reinforcement learning task. EEG was recorded using our standard procedures10. The EEG signal was collected from 64 electrodes arrayed according to the international 10–20 system using an ActiChamp active channel amplifier sampling at 1,000 Hz (Brain Vision). The right mastoid electrode served as the online reference. Horizontal eye position was monitored by recording the electro-oculogram from bipolar electrodes placed at the outer canthus of each eye, and vertical eye position and blinks were monitored with bipolar electrodes placed above and below the left orbit.

Offline, data were re-referenced to the average of the left and right mastoids. Broad segments were cut from −1,000 ms before to 2,000 ms after feedback onset to eliminate edge artifacts induced by wavelet convolution from the critical middle of this window. Visual inspection was used to remove large muscle artifacts, followed by independent component analysis to remove blink and noise artifacts57. The data were converted to current source density (CSD)58 to improve topographical localization59. Total power was calculated by convolving the CSD-EEG time series in each segment with a set of complex Morlet wavelets, defined with a Gaussian envelope using a constant ratio (σf = f/7) and a wavelet duration (6σt), in which f is the center frequency and σt = 1/(2πσf). Frequencies ranged from 1 to 60 in 0.5 Hz steps. Following single-trial EEG spectral decomposition, the magnitude (that is, length) of the complex number vectors was extracted, squared and averaged, yielding a measure of intertrial total power for a given frequency, timepoint and electrode. Power was decibel normalized, in which for each electrode and frequency, the average power during an interval of −100 to 0 ms before feedback onset served as the baseline activity. Analysis focused on the beta–gamma band frequencies (20–35 Hz) at electrode Fz during a conservatively broad temporal window (200–500 ms after feedback onset), consistent with the spectral, spatial and temporal locus of the positive feedback beta–gamma effect5,15,16. Within the beta–gamma band, the frequency with maximum difference in mean total power between reward and punishment trials was extracted on an individual participant basis and used as the target modulation frequency. Individually frequency-tuned modulation was applied with a resolution of 0.5 Hz, consistent with prior work10. All EEG processing and analysis were conducted using MATLAB calls on the FieldTrip toolbox60.

Primary dependent measures

In experiment 1, participants' choices for each trial were averaged across the three runs in each block, resulting in a learning curve composed of 40 choice scores for each participant, for each reward and punishment trial type, and for each pre-modulation, modulation and post-modulation time period. In experiment 2, OCI-R total score and subscales were generated by adding the item scores based on a five-point Likert scale. For both experiments, we examined participants' beta–gamma effect, that is, the difference in the beta–gamma power (20–35 Hz) following reward feedback relative to punishment feedback (200–500 ms after feedback onset) at electrode Fz, during the pre-experiment period, and its relationship with reward optimal choice behavior and OCI-R total score.

Computational modeling

Four models are used; that is, two variants of a standard Q-learning model (StandardQ and WeightedQ)22,23, an Actor-Critic model22,23, and a Hybrid model incorporating both Q-learning and actor-critic elements23,24 to best match the pattern of choice behavior in the reinforcement learning task.

Q-learning models

For every participant, the Q-learning algorithm22 calculates the expected reward value (Q) for an action (a) based on their choice and feedback history. If the expected reward value for an action a in trial t is Q(a,t), then the expected reward value for the same action in trial t + 1 is computed as per the following rule:

$$Q\left({a,t + 1}\right) = {\it{Q}}\left({a,t} \right) + a_o \times \delta\left( t \right),$$

where ɑo represents the learning rate and quantifies the degree to which feedback in trial t leads to adjustments in the expected value during trial t + 1. The value of ɑo ranges between 0 and 1, with 0 indicating no influence and 1 indicating maximum influence. δ(t) represents the prediction error computed as the difference between the actual outcome following the action a in trial t and the expected reward value as follows:

$$\delta \left( t \right) = {\mathrm{outcome}}\left( t \right) - Q\left( {a,t} \right).$$

We used two variants of the Q-learning algorithm. In the StandardQ variant, the outcome(t) variable was set to 1 for positive feedback, 0 for neutral feedback, and −1 for negative feedback. In the WeightedQ variant, we allowed positive and negative feedback to be weighted differently using an additional free parameter, d23. The outcome(t) value was set to 1 − d for positive feedback, 0 for neutral feedback, and −d for negative feedback, allowing exclusive weighting for positive and negative outcomes when d = 0 and d = 1, respectively, and equal weighting of positive and negative outcomes when d = 0.5. In both variants, the expected values (Q) were set to 0 at the beginning of every run.

When choosing between two actions, a1 and a2, in a given trial t, both models computed the expected reward value for each action, Q(a1,t) and Q(a2,t), using the choice and feedback history as described above. Using these expected values, the probability of choosing the action a1 in that trial was computed using the following softmax rule:

$$P\left( {a_1,t} \right) = {\it{exp}}\left( {Q\left( {a_1,t} \right)/\beta } \right)/\left[ {exp\left( {Q\left( {a_1,t} \right)/\beta } \right) + exp\left( {Q\left( {a_2,t} \right)/\beta } \right)} \right].$$

Here, the parameter β represents the noise or the temperature during the choice process. Higher β values suggest greater randomness in the choice process and lesser sensitivity to expected reward values. Lower β values suggest higher propensity to choose the stimulus with larger expected rewards.

Actor-Critic model

In the Actor-Critic model, every pair of stimuli is represented as a state s, and the expected value associated with that state in a given trial t is represented as V(s,t). The value functions are updated using a prediction error, ε(t), as follows:

$$V\left( {s,t + 1} \right) = V\left( {s,t} \right) + a_C \times \varepsilon \left( t \right),$$
$$\varepsilon \left( t \right) = {\mathrm {outcome}}\left( t \right) - V\left( {s,t} \right),$$

where ɑC represents the learning rate of the critic module, which determines the degree to which state values are updated according to the prediction error, calculated as the difference between the expected value for the current state, V(s,t), and the actual outcome, outcome(t). Similar to the WeightedQ model, outcome(t) was set to 1 − d for positive feedback, 0 for neutral feedback, and −d for negative feedback.

The actor module selects from among multiple actions, a, within a state, s, on a trial, t, according to their weights, w(s,a,t). The weight of the selected action is adjusted using the prediction error from the critic module, ε(t), using the following rule:

$$w\left( {s,a,t + 1} \right) = w\left( {s,a,t} \right) + a_A \times \varepsilon \left( t \right),$$

where ɑA is the learning rate for the actor module, which is used to determine the degree of influence of the prediction error on the action weight. Following ref. 23, we normalized the actor weights using the sum of their absolute values to avoid unbounded growth, and the weights were initialized at 0.01. For example, given two possible actions, a1 and a2, which reflect selection of stimulus 1 or stimulus 2, respectively, actor weight for action 1 was normalized as follows:

$$w\left( {s,a_1,t} \right) \leftarrow w\left( {s,a_1,t} \right)/\left( {\left| {w\left( {s,a_1,t} \right)} \right| + \left| {w\left( {s,a_2,t} \right)} \right|} \right).$$

Using these weights, the probability of choosing the action a1 in a trial t, P(a1,t), was determined using the following softmax rule:

$$P\left( {a_1,t} \right) = exp\left( {w\left( {s,a_1,t} \right)/\beta } \right)/\left[ {exp\left( {w\left( {s,a_1,t} \right)/\beta } \right) + exp\left( {w\left( {s,a_2,t} \right)/\beta } \right)} \right],$$

where the β parameter reflects the temperature or the degree of exploration during the choice.

Hybrid actor-critic Q-learning model

We used a Hybrid model23,24, which incorporates the expected value of each action from the WeightedQ model with the action weights determined from the Actor-Critic model, therefore this reflects an interaction between model-based and model-free learning mechanisms. The expected values of each action in the WeightedQ model were combined with the weights of each action in the Actor-Critic model using a mixing parameter, c, the value of which reflects the independent contributions of each model as per the following rule:

$$H\left( {s,a,t} \right) = \left[ {\left( {1 - c} \right) \times w\left( {s,a,t} \right) + c \times Q\left( {a,t} \right)} \right].$$

As c ranges from 0 to 1, the model transitions from a purely Actor-Critic model to the WeightedQ model. Action a1 is then selected in a given trial t with a probability P(a1,t) using a softmax function:

$$P\left( {a_1,t} \right) = {\mathrm{exp}}\left( {H\left( {s,a_1,t} \right)/\beta } \right)/\left[ {{\mathrm{exp}}\left( {H\left( {s,a_1,t} \right)/\beta } \right) + {\mathrm{exp}}\left( {H\left( {s,a_2,t} \right)/\beta } \right)} \right],$$

where the β parameter reflects the temperature or the degree of exploration during the choice.

Parameter estimation

Using maximum likelihood estimation with the MATLAB function fmincon, we optimized two parameters in the StandardQ model (ɑ, β), three parameters in the WeightedQ model (ɑo, β, d), four parameters in the Actor-Critic model (ɑC, ɑA, β, d), and six parameters in the Hybrid model (ɑo, ɑC, ɑA, β, d, c). The parameters were estimated such that the probability of actual choices was maximized for reward trials. These estimations were performed for each run in every block (pre-modulation, modulation, post-modulation) for participants in every group (sham, alpha, beta–gamma). We also performed model fits after combining reward and punishment trials and the pattern of results remained similar.

Statistics

In experiment 1, we carried out repeated measures ANOVA using the within-participants factors of time (pre-modulation versus modulation versus post-modulation) and valence (reward versus punishment), and the between-participants factor of modulation group (passive sham versus active control/alpha versus personalized beta–gamma) to assess optimal choice performance. The analysis of temperature used the within-participants factor of time (pre-modulation versus modulation versus post-modulation) and the between-participants factor of modulation group (passive sham versus active control/alpha versus personalized beta–gamma). When appropriate, follow-up ANOVAs were conducted to test specific preplanned hypotheses. To examine the relationship between reward-guided optimal choice behavior and the reward feedback beta–gamma effect, we conducted a participant-wise two-tailed Pearson correlation between the mean EEG beta–gamma power in the pre-experiment reward feedback (20–35 Hz, 200–500 ms after feedback onset) and optimal choice accuracy on reward trials during the pre-modulation period, collapsed across modulation groups (passive sham, active control/alpha, personalized beta–gamma). Participant-wise two-tailed Pearson correlation was used to examine the association between reward optimal choice behavior during the pre-modulation and the modulation periods for the beta–gamma group.

In experiment 2, we carried out repeated measures ANOVA for the OCI-R total and subscale scores using the within-participants factor of time (pre-modulation versus day 5 versus 1 month versus 2 months versus 3 months) and the between-participants factor of modulation group (active control/alpha versus personalized beta–gamma). Interactions were parsed with Bonferroni-corrected pairwise comparisons. Participant-wise two-tailed Pearson correlations, using an adjusted P value (P = 0.0008) to control for multiple comparisons, were conducted to examine relationships between participants' pre-modulation baseline OCI-R scores and each post-modulation OCI-R score. To examine the relationship between baseline OCI-R total scores and pre-experiment reward choice behavior, the data were collapsed across modulation groups (active control/alpha, personalized beta–gamma) and an ANOVA was run on pre-experiment optimal choice behavior from reward trials using the between-participants factor of OCI-R score (low versus high). Participant-wise Pearson correlations (two-tailed) were conducted to examine the association between individual participant pre-experiment reward choice behavior and baseline OCI-R total score. To compare the EEG beta–gamma effect (20–35 Hz, 200–500 ms after feedback onset) between individuals with high and low baseline OCI-R total scores, we conducted an ANOVA using the within-participants factor of valence (reward versus punishment) and the between-participants factor of OCI-R score (low versus high). For both experiments, ANOVA P values were adjusted using the Greenhouse–Geisser epsilon correction for non-sphericity when this assumption was violated. The experiments of this study were not repeated independently. However, partial eta squared effect sizes were estimated to facilitate comparison between studies and promote replication. All tests were two-tailed with the exception of F-tests, which by definition are one-tailed. Statistical parameters, including the definitions and exact value of n (that is, total number of participants), P values, and the type of statistical tests, are reported primarily in the figure legends.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.