Introduction

Mental illness, such as schizophrenia and mood disorders can emerge early in life and can potentially become a lifetime burden for many patients [1]. Recent studies have shown that mental illness can be progressive in their courses after onset [2,3,4,5,6,7]. Repeated psychiatric episodes may be associated with irreversible alterations of the brain [5, 6, 8]. Early detection of these disorders is therefore paramount, so that effective treatments can mitigate or preempt further episodes and the progression of illness.

Schizophrenia is a severe psychiatric disorder marked by delusions, hallucinations, poor motivation, cognitive impairments, and is highly disabling [1]. There is a substantial public health cost with the economic burden estimated at $155.7 billion in 2013 in the United States [9], highlighting the importance of early detection and intervention in schizophrenia. Advancement in the magnetic resonance imaging (MRI) techniques in recent decades provides the unique opportunity to search for biomarkers for schizophrenia non-invasively. Equipped also with advanced machine learning techniques, researchers have made unprecedented progress in discovering biomarkers that can increasingly enable identification of individual patients with schizophrenia [10, 11]. However, it is still much more challenging to identify patients in the early phases of the disorder compared to those with chronic illness [11]. Furthermore, treatments with antipsychotic medication have been found to contribute to the brain changes observed over the progression of schizophrenia [8, 12]. Thus, finding biomarkers of schizophrenia at the first episode without the confounding effects of treatment has been challenging and identifying reliable non-invasive brain imaging markers in the early phase of illness remains an important goal [11, 13]. Using these biomarkers to make individual predictions of future treatment responses to antipsychotic medicine in the early phase of schizophrenia would be clinically invaluable.

In this study, we aimed to discover dysconnectivity between cortical regions in patients with the first-episode drug-naive (FEDN) schizophrenia using the mutual information and the correlations of brain activities measured by functional MRI. We aimed to employ these measures for diagnostic classification at an individual level. We also aimed to predict the individual responses to the antipsychotic treatment. The results of this study represent an important step toward the development of translational tools in early diagnostic identification, as well as personalized treatment approaches for initial treatment in first-episode schizophrenia in the future.

Materials and methods

Participants

Subjects were recruited from the community and Beijing Hui-Long-Guan hospital. The patients with FEDN schizophrenia were inpatients at the hospital (N = 43; age 28.3 ± 9.9 years; 24 females). Healthy controls (HC) were recruited from the community (N = 29; age 27.7 ± 7.8 years; 16 females). Diagnoses and clinical characteristics were assessed by fully trained staff using the Structured Clinical Interview for DSM-IV (SCID). Patients subjects were only included if they were diagnosed with schizophrenia during their first psychotic episode. Subjects were excluded if they had a history of taking psychiatric medication, head trauma with residual effects, neurological disorders, and uncontrolled major medical conditions. HC were excluded if they had a history of any Axis I disorder according to SCID, had a first-degree relative with any Axis I disorder, or used psychoactive medication within two weeks before the study. This study was approved by the local Institutional Review Board of Beijing Hui-Long-Guan hospital according to the Declaration of Helsinki guidelines. All patients and HC signed the written informed consent before participating the study.

Treatment and measurement

All patients were treated with a stable dose of risperidone for 10 weeks. The dose of risperidone was increased to 3–6 mg a day during the first week of administration, and doses were maintained at those levels until the end of the clinical study. Concomitant medications included chloral hydrate or lorazepam for insomnia, and benzhexol hydrochloride as antiparkinsonian agents for extrapyramidal symptoms, as needed. No other concomitant psychotropic medications were used during the study.

Patients with first-episode schizophrenia appear to be more sensitive to the therapeutic and extrapyramidal effects of antipsychotic medications [14, 15]. Numerous randomized, controlled studies have shown that risperidone is effective and well-tolerated in patients with a first psychotic episode at low doses (≤6 mg/day) and with slow titration compared to other atypical antipsychotics, although risperidone seems to produce more prolactin increase than most other atypical antipsychotics [16, 17]. Moreover, several studies have also reported significant positive and negative symptom improvement with risperidone in first-episode psychotic patients within a shorter period of treatment [18, 19]. It is known that prompt and effective amelioration of psychotic symptoms is important because a patient’s first experiences with a drug are crucial in determining long-term compliance and optimal long-term outcome [14, 15]. Hence, the good toleration, and prompt and effective control of symptoms with risperidone may improve the long-term outcome in patients with first-episode schizophrenia. Together, these are the reasons for risperidone being chosen as an antipsychotic to treat these drug-naive patients with first-episode schizophrenia. These benefits are well-recognized by clinicians, leading to risperidone as a commonly employed choice in the treatment of schizophrenia, facilitating recruitment of participants for investigations such as the current study.

The positive symptoms and hallucination subscale of the positive and negative syndrome scale (PANSS) was assessed by two clinical psychiatrists who were blind to the purpose of the study. All patients were rated at baseline and after the treatment session. To ensure consistency and reliability of ratings across the study, two psychiatrists who had worked at least 5 years in clinical practice simultaneously attended a training session in the use of the PANSS before the start of the study. After training, a correlation coefficient >0.8 was maintained for the PANSS total score by repeated assessments during the course of the study. Five patients failed to participate the follow-up assessment, which resulted in 38 patients that had treatment response evaluated.

MRI data acquisition and preprocessing

The structural T1-weighted scan of each subject was acquired on a GE 3 Tesla MRI scanner (GE Healthcare, Buckinghamshire, United Kingdom) using the spoiled gradient echo (SPGR) sequence with the following parameters: repetition time (TR) = 6.2 ms, echo time (TE) = 2.8 ms, flip angle = 8°, field of view (FOV) = 240 mm, slice thickness = 1.2 mm, matrix size = 256 × 256 and 142 slices.

Resting-state functional MRI scans were acquired using the echo planar imaging (EPI) sequence with the following parameters: TR = 2000 ms, TE = 30 ms, flip angle = 90°, field of view (FOV) = 240 mm, slice thickness = 4 mm, matrix size = 64 × 64 and 33 slices. The total scan time for rs-fMRI was about 6 min and 20 s.

Cortical reconstruction was performed with the FreeSurfer software (version 5.3.0; http://surfer.nmr.mgh.harvard.edu). The procedure included intensity normalization, automated topology corrections and automatic segmentations of cortical and subcortical regions [20,21,22]. The cortex was segmented with Desikan-Killiany Atlas [23].

Functional images were preprocessed with the “preproc-sess” pipeline in the Freesurfer, which included motion correction, smoothing with the Gaussian kernel using the full width at half maximum of 5 mm, and brain-mask creation. The preprocessed functional images were registered and resampled according to the structural images of the corresponding subjects, so that the average time series of the blood-oxygen-level dependent (BOLD) signals in each brain region in the Desikan-Killiany Atlas could be extracted. We had 68 regions across the whole brain. The mutual information (MI) as well as the zero-lag correlation between each pair of the regional time series of BOLD signal was calculated using the MATLAB toolbox as functional connections (FC) [24, 25], resulting 2278 (68 × 67/2) functional connection values excluding the self-correlation.

Statistical analyses

Statistical analyses were conducted using MATLAB (version R2015a, The Mathworks Inc., Natick, Massachusetts). For the MI and correlation between each pair of regions, a t-test was performed to evaluate the difference between the FEDN schizophrenia and HC group. A strict false discovery rate (FDR) of 0.05 was applied to the t-test results as the multiple comparison correction [26, 27]. The MI and correlation of cortical regions were correlated with the PANSS scales using the Pearson’s correlation. We considered p-values <0.05 or a p-value threshold defined by FDR as significant.

Classification with the support vector machine

We used the MI and correlation FC between bilateral superior temporal cortex (STC) and all the other cortical regions as input features to the linear support vector classifier (SVC). We used the default settings in the “sklearn” package [28] of python except “class_weight = ‘balanced’” due to the imbalanced sample sizes between the HC and patients. Leave-one-out cross-validation was used, where one subject was left out iteratively as the testing target and the rest of the sample was used to train the SVC. Within the training during cross-validation, a linear SVC was used to estimate the weights of the features for the patient identification model. Only the top ten features were used to make the classification on the testing subject. SVC was used to identify patients with FEDN schizophrenia among all the subjects.

We also apply the linear support vector regression (SVR) to predict the percentage drop of PANSS total scores after the treatment using features that were significantly correlated with PANSS total scores (Pearson’s correlation, p < 0.05). We considered a patient a responder, if the patient showed 30% decrease or more in the PANSS total scores. The percentage drop of PANSS total score was calculated as

$$d = \frac{{{\mathrm{PANSS}}_{{\mathrm{baseline}}} - {\mathrm{PANSS}}_{{\mathrm{followup}}}}}{{{\mathrm{PANSS}}_{{\mathrm{baseline}}} - 30}}.$$

The 30 in the denominator corresponds to the “non-symptomatic” score of PANSS total score. A patient was predicted as a responder, if the patient’s predicted percentage drop of PANSS total score was 30% or more based on the SVR. A similar approach has been applied in a previous study [29] Both SVC and SVR belong to a family of machine learning algorithms, support vector machines (SVM) [30].

Results

We evaluated functional connections (FC) between cortical regions in a pairwise manner using the mutual information (MI) of the BOLD signals measured by the resting-state functional MRI (rs-fMRI). We found wide spread lower MI in FEDN schizophrenia patients compared to age- and gender-matched healthy controls (HC) across all evaluated cortical regions. However, only eight connections survived the multiple comparison correction at 5% false discovery rate (FDR; Fig. 1). All these eight connections with low MI in FEDN schizophrenia involved the STC: seven out of these eight connections directly involved the STC while the remaining connection involved the right STC and the left cuneus through the left lingual cortex. As a complementary measure, we also evaluated the functional connectivity of cortical regions using the zero-lag correlation of BOLD signals between cortical regions, but no comparisons of FEDN and HC connectivity employing this measure survived 5% FDR correction.

Fig. 1
figure 1

Lower mutual information (MI) of superior temporal cortex (STC) in patients with first-episode drug-naive schizophrenia (FEDN) than healthy controls

The MI of STC and regions spanning dorsal-lateral prefrontal, cingulate, temporal and parietal cortices were correlated with PANSS subtotal score of positive symptoms and hallucination subscore (p < 0.05; Fig. S1) [31]. But these findings did not survive multiple comparison correction at the FDR of 5%. No correlation between STC connections and other PANSS subscales were observed.

As we found that all the eight connections with low MI in FEDN schizophrenia involved the STC, we further investigated whether we could identify the patients with FEDN schizophrenia at an individual level exploiting this information. We used the MI between bilateral STC and all the other cortical regions as input features to a machine learning algorithm . We used leave-one-out cross-validation to “predict” whether the unseen subject was an FEDN schizophrenia patient. Our model accurately identified individuals with FEDN schizophrenia (balanced accuracy: 78.6%; accuracy: 77.8%; sensitivity: 74.4; specificity: 82.8%; Table 1A). Functional connections between left STC and right caudal middle frontal cortex, bilateral insula, left fusiform, right precentral, left inferior temporal cortex, and between right STC and right pars triangularis and left fusiform were the top features selected across cross-validations (Fig. S2 and S3). The accuracy of the same model using correlation FC is much lower than using MI (Table S1A).

Table 1A Individual identification of FEDN schizophrenia

We also followed up the patients after they received antipsychotic treatment for 10 weeks. Post-treatment PANSS scores were evaluated and compared to the baseline. Patients with FEDN schizophrenia showed significant improvement of symptoms as reflected in the decreases of PANSS total scores (p < 0.0001) and subtotal scores of positive (p < 0.0001), negative (p = 0.0109) and general symptoms (p < 0.0001). Patients were considered responders, if they showed 30% decrease or more in the PANSS total scores. We found that our cross-validated regression model using correlations as functional connections between STC and other cortical regions could predict the percentage drop of the PANSS total score (r = 0.69; p < 0.0001). Based on the predicted percentage drop of the PANSS, we could predict 88.0% responders and 76.9% non-responders to antipsychotic treatment (balanced accuracy 82.5%) (Table 1B). The prediction accuracy of the same model using MI was much lower than using correlation FC (Table S1B).

Table 1B Individual prediction of responders to antipsychotic treatment

Discussion

In this study, we showed abnormal functional connectivity involving the superior temporal cortex (STC) in patients with first-episode drug-naive (FEDN) schizophrenia compared to healthy subjects. By using machine learning algorithms and functional connections between STC and other cortical regions as input, we successfully identified the FEDN schizophrenia patients and predict their responses to antipsychotic treatment at the individual level. To our best knowledge, this is the first study to simultaneously identify and predict treatment response of first-episode schizophrenia patients using the functional connectivity derived from resting-state fMRI without the potential confounds of medication.

Previous studies have showed important progress in the individual identification of chronic schizophrenia using structural and functional MRI data with machine learning [10, 32,33,34,35]. Individual identification of first-episode schizophrenia has been more challenging than for chronic schizophrenia, but decent performance has been achieved using structural and diffusion MRI data [36,37,38,39,40,41]. Notably, two of these studies included a group of subjects at risk for schizophrenia and were able to distinguish the groups of the first-episode schizophrenia, at-risk and healthy subjects [36, 41]. However, none of these studies could exclude possible medication effects from antipsychotic treatment, as the samples were not drug-naive, in contrast with our sample of FEDN schizophrenia.

One prospective study showed that it was possible to identify high-risk individuals who would transition to psychosis using machine learning and electroencephalogram data [42]. The study found that the STC contributed to the prediction of psychosis transition, consistent with our current findings. The STC is an important region of auditory perception and sensory integration. Molecular, structural, and functional abnormalities in STC have been consistently observed in the chronic and FEDN schizophrenia [8, 43,44,45,46,47,48,49,50,51]. Lower volume of superior temporal gyrus, part of STC, was found in chronic schizophrenia and associated with auditory hallucination [52]. In our study, we found that the mutual information between STC and other cortical regions in patients with the FEDN schizophrenia was lower than in healthy subjects. These findings are consistent with the disconnection theory of schizophrenia [53]. The mutual information between STC and several other cortical regions was also correlated with the hallucination subscale of PANSS in our sample, although the correlations did not survive multiple comparison correction. This may suggest that functional dysconnectivity in STC is already observable at the earliest stage of illness onset and progression of psychosis in schizophrenia, and could be a potential predictable marker for psychosis risk [42].

Simultaneous identification and treatment response prediction of early schizophrenia using biomarkers acquired non-invasively will be a crucial component of “precision medicine approaches” for this severe mental disorder. However, there is still a dearth of studies of this kind. Some prior research has shown predictive power of an index derived from the functional connectivity seeding from the striatum in estimating response to antipsychotic treatment [54, 55]. In our study, we were able to simultaneously identify schizophrenia diagnosis and predict the response to antipsychotic treatment at an individual level in first-episode drug-naive patients using the functional connectivity of STC.

Selecting an antipsychotic medication in treatment of schizophrenia largely remains a trial-and-error process in treating schizophrenia, with no specific biomarkers or methods to lend decision support [56]. Poor initial antipsychotic choice delays the benefits of more effective treatment and often comes with the additional disadvantage of aversive side effects. A method that could optimize the treatment for each patient with schizophrenia at the early stage may greatly improve the treatment efficacy and reduce the aversive effect of antipsychotics. Our study provides such a computational model to inform clinicians as to whether a patient with FEDN schizophrenia will respond to risperidone as the first treatment. In the case that the model predicts that the patient is unlikely to respond to risperidone, alternative antipsychotic medications can be considered by the clinician. Along with other studies [54, 55], the results of current study could be considered as a first step toward a model that is capable of providing clinicians decision support in selecting the ideal antipsychotic treatment for patients with schizophrenia in a personalized manner.

It is interesting to note that our study also demonstrated for the first time the differential effectiveness of mutual information versus correlation coefficients of the cortical activities as the functional connectivity in diagnostic identification and treatment outcome prediction. According to our observation, mutual information could be more sensitive than the correlation coefficients in identifying FEDN schizophrenia. On the other hand, the correlation coefficients might be superior in the prediction of antipsychotic treatment response compared to mutual information. These findings may indicate that in employing computational psychiatry approaches to identifying neuroimaging biomarkers, what particular analytic measure serves as the more robust predictor may vary depending on whether the prediction goal pertains to diagnosis or treatment response.

Several considerations need to be taken into account with respect to the interpretation of our findings. Although our sample of patients with the FEDN schizophrenia was reasonably large given the inherent challenges in FEDN patient recruitment, the sample size was limited. Larger sample replication will strengthen the generalizability of our methods. A machine learning model trained on a large sample of patients with FEDN schizophrenia may also help to fully explore the parameter and feature space to increase the accuracy of identification and treatment response prediction of FEDN schizophrenia. Furthermore, the individual identification of FEDN schizophrenia was based on a sample with schizophrenia patients and healthy controls only. As such, our findings will not necessarily apply to clinical settings where a new patient may experiences first episode psychosis in association with conditions other than schizophrenia, such as bipolar or major depressive disorders. Thus, our ability to identify FEDN schizophrenia should still be considered preliminary with respect to clinical applicability as a diagnostic tool. And relatedly, a model of treatment response prediction in FEDN schizophrenia might provide helpful clinical suggestions, but further validation focused on antipsychotic medications other than risperidone will still be necessary.

In summary, we were able to identify the FEDN schizophrenia patients and predict their responses to antipsychotic treatment at the individual level using machine learning and functional connectivity between cortical regions derived from a non-invasive brain imaging technique, rs-fMRI. The integration of this easily accessible measure of brain activity together with powerful machine learning approaches provide a step toward individualized identification and treatment response prediction of schizophrenia, laying the basis for precision medicine approaches at the early phases of this debilitating illness.