Introduction

Drought is a complex, multifaceted phenomenon involving climate, water resources, and socioeconomic drivers and impacts1,2,3,4. It causes billions of dollars in economic losses and severe stress on ecosystem productivity5,6,7. While significant efforts are underway to develop drought monitoring and forecasting systems, the ability to forecast drought is limited due to inherent uncertainties in precipitation forecasts at long lead times (months to years)8,9,10,11,12,13,14. For example, the NOAA’s Seasonal Climate Outlook did not predict the 2012 agricultural drought (also commonly referred to as soil moisture drought) in the central US which resulted in $30 billion economic losses15,16. In this study, we assess the initialized16 and uninitialized5 large ensemble experiments of the Community Earth System Model (CESM) for North America and show that recent advances in earth system modeling17 combined with an improved understanding of long-memory land surface processes18,19 can enable skillful predictions of soil moisture drought several months in advance (Fig. 1) despite limited skills in the precipitation forecast.

Fig. 1: Skillful prediction of the 2012 soil moisture drought in the California–Nevada region (30°–37° north latitudes and 110°–120° west longitudes).
figure 1

The CESM-DPLE drought forecast, initialized in November 2011, and the uninitialized CESM-LE forecast are compared with two observation equivalent data (CLM and GLEAM). The thin lines show anomalies from individual ensemble members, and the thick lines show ensemble mean.

The soil moisture variability represents an integrated effect of climate, vegetation, and soil processes. The Community Land Model (CLM)20,21, which is the land component of the CESM, simulates a range of biophysical and biogeochemical processes, including land surface heterogeneity, radiation scheme, momentum, energy balance, hydrology, photosynthesis, stomatal conductance, and prognostic vegetation phenology22,23. The CLM simulates soil moisture variability with high fidelity at par with the observationally constrained remote-sensing estimates (Supplementary Fig. 1). Previous studies have compared soil moisture variability with various drought metrics and found the soil moisture metric’s superiority for agricultural drought applications24,25. The soil moisture variability is correlated significantly with the Palmer Drought Severity Index in the CESM experiment (Supplementary Fig. 2).

The slowly varying ocean anomalies, e.g., El Niño-Southern Oscillation (ENSO)12,13, Atlantic Multi-decadal Variability26,27, and Pacific Decadal Variability (PDV)28,29,30,31 affect land hydroclimate at seasonal to decadal (S2D) timescales through atmospheric teleconnection processes32. Multi-season to multi-year memory from deep soil moisture and groundwater may also contribute to S2D predictability of hydroclimate features such as drought33,34,35,36,37, pluvials31, and wildfire38. External climate forcing from greenhouse gas emissions and anthropogenic aerosol emissions can provide predictability on multi-year timescales32,39.

We quantify and disentangle the effects of ocean and land initializations on S2D hydroclimate predictability, and demonstrate that land surface processes (i.e., soil moisture memory, reemergence, and land–atmosphere interactions) can contribute to the improvement of forecast skills on S2D timescales by transforming a weak precipitation signal into a predictable soil moisture signal. This finding is an advancement compared with the previous studies that have documented soil moisture as a source of predictability on sub-seasonal to seasonal timescales, only32,40. We also develop an observational constraint on the predictability estimates that can motivate future development of S2D prediction systems. Finally, we aim to bring fundamental advances by (a) developing the predictability estimates on the S2D timescale continuum instead of multi-year only41,42 or seasonal only timescale43 and (b) assessing hydroclimate predictability in the mid-latitude regions where the signal to noise ratio is known to be smaller than in equatorial regions12,44 and therefore where predicting drought is more challenging13.

The CESM Decadal Prediction Large Ensemble (CESM-DPLE) uses observation equivalent ocean and sea ice states derived from the forced ocean–sea ice (FOSI) simulations45,46, and land condition from a selected realization of the Large Ensemble simulation9 for the given time to initialize S2D earth system prediction47 (Table 1). Each initialization consists of a 40-member ensemble forecast generated by randomly perturbing the initial atmospheric condition, and each forecast is integrated for 10 years. In contrast, the CESM Large Ensemble (CESM-LE)9 is a fully coupled 40-member ensemble climate simulations where neither the ocean initial states nor the land initial states are constrained at the start of each forecast period. Additionally, the CESM-LE allows disentangling the effect of land initialization by contrasting the effect of the initialized land anomalies with the remaining 39 ensembles not used for initializations (Table 1). Hence, CESM-DPLE and its uninitialized parallel CESM-LE provide a rich dataset to estimate the potential of drought predictability on the S2D timescales.

Table 1 List of climate experiments employed to investigate S2D hydroclimate predictability.

Predictability of soil moisture on the S2D timescales

Our analysis demonstrates that soil moisture predictability is much higher and spatially extensive than for precipitation on S2D timescales. Figure 2 compares the root zone (0–0.5 m) soil moisture predictability with the precipitation over North America for 36 initialization dates (1980, 1981, …., 2015). We use a recent period (1980–2015) in our analysis because of the significant trend in the longer period (1955–2015) (not shown). We employ a signal to noise ratio metric that compares the variance of the ensemble average forecast (signal) to that of the ensemble spread (noise) to determine the effects of the initializations48 (see the “Methods” section). This is a measure of the potential predictability (referred to as predictability hereafter) in the perfect model world. Actual realized forecast skill is likely to be lower than the predictability due to model biases and uncertain initial conditions43,49.

Fig. 2: The S2D predictability of (a) root zone (0–0.5 m) soil moisture and its comparison with the (b) precipitation predictability in the CESM-DPLE, and using signal to (signal + noise) ratio metric.
figure 2

Stippling shows statistical significance at a 95% confidence level. The bottom panel (c) shows areas of the significant predictability for soil moisture and precipitation in North America.

The spatial extent of the statistically significant signal is three times greater for soil moisture than for precipitation. Averaged over year 1–10 forecasts, 51% of the North American land area shows a statistically significant soil moisture predictability than 18.5% area for the precipitation. The DPLE soil moisture predictability is about four times higher than the soil moisture persistence model forecast (Supplementary Fig. 3). Furthermore, there is predictability for soil moisture beyond the timescale of the precipitation predictability. For example, there is almost minimal area of predictability beyond year 5 for precipitation, whereas there is predictability seen beyond year 5 in soil moisture in about 36% of land area that includes many areas in Western North America, which has a history of the mega-drought50. A cyclic behavior in the forecast skill (Fig. 2c) can be due to differences in skill across the annual cycle43 because of the Nov1 initialization date (Supplementary Fig. 4).

Spring (MAM) has the highest soil moisture predictability (55%) averaged over the 10 years of the forecast, while fall (SON) has the lowest predictability (39%). Soil moisture predictability for winter (DJF) and summer (JJA) are 50% and 53%, respectively (Fig. 3). A previous study18 has found soil moisture anomalies reemergence in the root zone from deeper soil layer anomalies stored in the preceding seasons. A stronger land–atmosphere coupling in spring and summer can enhance the predictability due to memory stored in the land surface51. As expected, soil moisture predictability decreases with increasing lead times52, but it remains considerably higher than precipitation throughout the forecast period (Fig. 2). The majority of the land area (68% of North America) shows significant forecast skill for soil moisture by the end of year-2, and decreasing to 37% land area by year 10.

Fig. 3: Comparison of the total soil moisture predictability area in North America due to ocean and land initialization effects (Ocean + Land) with that due to land-only initialization effect (Land).
figure 3

Thin gray lines show the expected statistical distribution if the land-only were not initialized. We determined the land-only effects by computing the anomaly correlations between initialized total soil moisture anomalies and the ensemble average forecast anomalies for the root zone soil moisture. Thin gray lines are obtained by correlating the uninitialized soil moisture initial condition (remaining 39 LE initial conditions) with the ensemble average DPLE forecast.

External climate forcing, e.g., greenhouse gas emissions, contribute to a signal in the decadal prediction32, i.e., the global warming trend is a part of the signal. Hence, we did not remove the trend from our predictability analysis. However, a global warming trend can also counteract the effects of the initialization. For example, a forecast initialized during the positive phase of the PDV would often exhibit a wetting signal in the US southwest53 that can be muted at long lead times by the gradual soil moisture drying due to the global warming trend4. A removal of trend slightly improves the predictability at longer lead-time, particularly for soil moisture with an accentuated annual cycle in years 7–10 (Supplementary Fig. 5).

Land initialization can bring considerable improvements in S2D soil moisture forecasts. To disentangle the effects of land initialization, we computed the anomaly correlations between initial condition total soil moisture anomaly (past 12 months average before November 1) and the ensemble average forecast anomalies for the 36 initialization date (1980–2015) (Fig. 3). We compared it with the remaining 39 LE ensembles’ anomaly correlations that were not used to initialize the DPLE land condition (thin gray line in Fig. 3). If the spatial extent of significant anomaly correlation with the initialized land anomalies is greater than that for the remaining 39 ensembles, it demonstrates a significant impact of land initialization on soil moisture predictability. This methodology gives an improved result than a previous method that assessed the effect of land initialization by correlating the DPLE forecast anomalies with the evolution of soil moisture in uninitialized LE#3454 (Supplementary Fig. 6), where soil moisture anomalies evolved from a different SST initial condition than the DPLE forecast.

Land initialization brings statistically significant improvement in predictability for lead times exceeding one year in three out of four seasons, i.e., effects of initialized soil moisture anomalies are significantly higher than the remaining 39 ensembles not used in initializations (Fig. 3). For DJF, the impact of land initialization extends out to 7 years. This is a noteworthy result in at least two respects: (1) previous studies have reported improved predictability due to soil moisture and/or land initialization is generally limited to sub-seasonal to seasonal timescales32,52. Our results show that land initialization can bring improvement in predictability for multiple years. (2) CESM-DPLE uses a synthetic land initialization taken from a selected ensemble of the uninitialized experiment (Table 1). We are not aware of any observationally constrained land initialization technique conceptually similar to FOSI simulations45,46 for the ocean. We hope that our results provide the necessary motivation for developing the observationally constrained land initialization technique.

Why is soil moisture more predictable? We compared the predictability of soil moisture and precipitation averaged over the annual timescale. The areal extent of predictability is similar between soil moisture and precipitation when averaged annually (Fig. 4). Significant predictability is found for most of the North American land areas (>50%) from year 1 to year 10, both for precipitation and soil moisture, but not always co-located particularly at the longer lead time, e.g., lead 10-year suggesting different climate processes contributing to the predictability. Since the low-frequency (inter-annual) precipitation variability component is smaller (7%) compared with the sub-seasonal (43%) and seasonal (42%) variability components in North America (Supplementary Fig. 7); annually averaged precipitation predictability does not contribute to improving sub-seasonal to seasonal precipitation predictability (e.g., Fig. 2). Also, the magnitude of the signal strength is weaker for precipitation than soil moisture in years 1 and 2 (Fig. 4, and Supplementary Fig. 8).

Fig. 4: The annual average predictability of (a) the root zone (0–0.5 m) soil moisture and its comparison with (b) the precipitation predictability in the CESM-DPLE, and using signal to (signal + noise) ratio metric.
figure 4

Twelve-month average anomalies (non-overlapping) are calculated for each year of the forecast lead time, and before calculating the signal to total ratio. Stippling shows statistical significance at a 95% confidence level. The bottom panel (c) shows areas of the significant predictability area for soil moisture and precipitation in North America.

The sub-seasonal to seasonal variability in precipitation is an order of magnitude higher than the soil moisture variability (Fig. 5), i.e., a highly variable sub-seasonal to seasonal precipitation is less predictable in the DPLE experiment. However, the soil moisture memory, reemergence, and land–atmosphere interactions processes can boost the signal strengths18,55,56 and contribute to a much higher predictability for soil moisture on those timescales. For example, soil moisture variability is smaller (22% in CLM and 7% in GLEAM) at sub-seasonal timescale despite a higher precipitation variability at sub-seasonal scale (43%). In contrast, inter-annual soil moisture variability is higher (16% in CLM, and 29% in GLEAM) than inter-annual precipitation variability (7%) (Supplementary Fig. 7), suggesting contribution from the memory effect55,56.

Fig. 5: Apportionment entropy (AE) for the observed precipitation and soil moisture data.
figure 5

Please note that the color scale for the AE of the precipitation is one order magnitude higher than that of the soil moisture. AE71 is a non-parametric measure of sub-seasonal to seasonal variability in the data—higher AE represents a higher variability in the data (see the “Methods” section).

The soil moisture forecast skill seen in Figs 14 should be considered conservative estimates of the predictability compared to the observational estimates shown in Fig. 6. The CESM exhibits weak soil moisture to the precipitation feedback loop. The CESM-DPLE precipitation forecasts do not show significant improvements due to soil moisture initialization, i.e., the anomaly correlation between initial soil moisture anomalies and ensemble average precipitation forecast anomalies in DPLE are not statistically distinguishable from the anomaly correlations between the remaining 39 ensemble initial soil moisture anomalies and the precipitation forecasts (Supplementary Fig. 9). Since the CESM-DPLE has retrospective forecasts, it provides an opportunity to test those forecasts against observations.

Fig. 6: The scope of improving the forecast skill by developing models that show comparable effect of the antecedent soil moisture on precipitation forecast as found in the observational estimates.
figure 6

Figure compares the effect of land initialization on precipitation and soil moisture forecasts in the CESM-DPLE with that of the observational estimates from CLM4 and CLM4.5. The figure shows anomaly correlations between initial condition total soil moisture anomalies and the ensemble average forecast (colored lines) or observations (black lines) in the respective experiments. The GLEAM root zone soil moisture data results is also shown.

We compare the anomaly correlations for CLM’s soil moisture and precipitation, which are considered here as a proxy for observations, with that of the CESM-DPLE forecasts (Fig. 6). We find that the anomaly correlations between CLM initial condition soil moisture and the succeeding observed precipitation are considerably higher than those for the DPLE forecast during the first year (Fig. 6a). For example, the observations show statistically significant correlations for 30% land area during the first 6 months, whereas the DPLE does not show a significant anomaly correlation. This analysis suggests that weaker feedbacks between antecedent soil moisture and precipitation57 contribute to smaller improvements to soil moisture predictability due to land initialization effects in the DPLE (Fig. 6b).

Perspective on seasonal to multi-year drought forecasting

Predictability of drought based primarily on precipitation forecast has low skill, particularly in the mid-latitudes10,12,13,58 to effectively support drought early warning and response. Land surface integrates the stochastic tendency of weather at longer timescales within the memory of the deep soil layers18,33, and land–atmosphere coupling effects by decreasing the humidity gradient between land and atmosphere, and therefore increasing the soil moisture memory55,59. The new soil moisture predictability analysis presented here suggests that opportunities may exist to develop skillful drought prediction systems focused on soil state, which would be highly relevant for agricultural drought predictions24,25.

Land initialization contributes to approximately a third of total soil moisture predictability, with the remainder arising from ocean conditions (Fig. 3). Previous studies have made considerable efforts to develop observationally constrained ocean initializations, e.g., FOSI simulations45,46. This study’s results imply that a similar effort to develop observationally constrained land-initial conditions could be beneficial. The recent availability of multi-scale multi-source soil moisture data, including in situ60,61 and remote-sensing measurements62,63, may provide an opportunity to develop the land initializations.

Why did the ocean initializations provide skill in the soil moisture forecast but limited skill in the precipitation forecast on the S2D timescale (Fig. 2)? Soil moisture variability has a redder spectrum than precipitation64. A recent study hypothesized a new climate process, soil moisture reemergence, in which moisture anomalies stored deep in the soil column can provide year-to-year soil moisture memory18. In parallel research, ocean reemergence studies65,66,67 have found that much of the PDV results from the reemergence, which acts to redden the ENSO signal in the North Pacific19. It will be worth investigating further whether the reddened soil moisture signal is driven by atmospheric teleconnections only32 or if the land processes contribute to the redness by integrating the ENSO signal in the same way as the North Pacific does.

The soil moisture-precipitation feedback is weak in the climate model used for this study (CESM1) (Fig. 6). When compared to observations and reanalysis data, Mei and Gulling57 found a smaller effect of antecedent soil moisture conditions on precipitation in an earlier version of the CESM1 model (CAM4-CLM4). Infanti and Kirtman68 have also found a weak land initialization effect on precipitation in CAM4-CLM4. A weak soil moisture to precipitation feedback contributes to a muted effect of land initialization on soil moisture predictability compared with the observation (Fig. 6). Clearly, there exists an opportunity to develop and improve the seasonal to multi-year soil moisture drought forecasting system based on recent advances in earth system modeling and our improved understanding of long-memory land surface processes.

Methods

Forecast anomalies calculation

The DPLE forecast anomalies are computed with respect to its lead time-dependent forecast climatology as described below69:

$${\mathrm {fA}}_{i,l,y} = f_{i,l,y} - \frac{1}{p}\mathop {\sum}\limits_{y = 1}^p {\left( {\frac{1}{n}\mathop {\sum }\limits_{i = 1}^n f_{i,l,y}} \right)}$$
(1)

where fi, l, y is the DPLE forecast initialized in year y (1980, 1981, …… 2015), at lead months l (1, 2, …. 122), and ith ensemble (1, 2, …40) for a given variables soil moisture or precipitation; \({\mathrm {fA}}_{i,l,y}\) is the corresponding forecast anomalies.

The LE anomalies are computed with respect to its ensemble average climatology for the given month as below:

$${\mathrm {LEA}}_{i,m,y} = {\mathrm {LE}}_{i,m,y} - \frac{1}{p}\mathop {\sum}\limits_{y = 1}^p {\left( {\frac{1}{n}\mathop {\sum }\limits_{i = 1}^n {\mathrm {LE}}_{i,m,y}} \right)}$$
(2)

where LEi, m, y is the LE simulations for year y (1980, 1981, …… 2015), and the given month m (1, 2, …. 12), and ith ensemble (1, 2, …, 40) for a given variables soil moisture or precipitation; LEAi, m, y is the corresponding LE anomalies.

The observation (CLM) anomalies are computed with respect to observation climatology:

$${\mathrm {OA}}_{m,y} = O_{m,y} - \frac{1}{p}\mathop {\sum}\limits_{y = 1}^p {\left( {O_{m,y}} \right)}$$
(3)

where Om, y is the CLM land-only simulations for year y (1980, 1981, ……, 2015), and the given month m (1, 2, …., 12) for a given variables soil moisture or precipitation; OAm, y is the corresponding observation anomalies. Figure 1 demonstrates that the methodology outlined here successfully removed the initial drift in the DPLE forecast47,69, and gives comparable results among three experiments: DPLE, LE, and Observations (CLM).

Signal to total ratio

To diagnose potential predictability, we determined the squared signal-to-noise ratio (SNR)48 at a given lead time as

$${\mathrm {SNR}} = \frac{{V_s}}{{V_n}} = \frac{{\frac{1}{p}\mathop {\sum }\nolimits_p \left( {{\mathrm {fA}}_{ \cdot yl} - {\mathrm {fA}}_{ \cdot \cdot l}} \right)^2}}{{\frac{1}{{np}}\mathop {\sum }\nolimits_p \mathop {\sum }\nolimits_i \left( {{\mathrm {fA}}_{iyl} - {\mathrm {fA}}_{ \cdot yl}} \right)^2}}$$
(4)

where \({\mathrm {fA}}_{ \cdot \cdot l} = \frac{1}{{np}}{\sum} {_{y = 1}^p} \mathop {\sum}\nolimits_{i = 1}^n {{\mathrm {fA}}_{iyl}}\) and \(f_{ \cdot yl} = \frac{1}{n}\mathop {\sum}\nolimits_{i = 1}^n {{\mathrm {fA}}_{iyl}}\). VS represents the variability of the ensemble mean, which is a potentially predictable signal due to initializations; Vn is variability about the ensemble mean, or the noise term. The null hypothesis of no predictability can be rejected at the 95% level if \({\mathrm {SNR}} \ge {F}_{{p} - 1,\,{p}({n} - 1)}^{0.05} \ast \frac{{p - 1}}{{p(n - 1)}}\), where \({F}_{{p} - 1,\,{p}({n} - 1)}^{0.05}\) is the upper 5% threshold for the F-distribution with p-1 and p (n−1) degrees of freedom. The “signal-to-total ratio” is then defined as \({\mathrm{STR}} = \frac{{{\mathrm{SNR}}}}{{{\mathrm{SNR}} + 1}}\), which varies from 0 to 1. Note that for an infinite-member perfect-model ensemble, we may measure forecast skill of the predicted signal by the anomaly correlation, which can be shown is equivalent to the square of STR70; that is, STR is a measure of predictability. We computed STR for each forecast month (1–122 months lead forecasts), and their 3-month average values, representing different seasons, are shown in Fig. 2.

Disentangling the effects of land initialization

We determined initial condition soil moisture anomalies from the LE ens# 34, averaged over the previous year. For example, for the forecast start date on November 1, 1980, we computed the initial condition soil moisture anomalies averaged from November 1, 1979 to October 31, 1980, and over the entire soil depth (0–3.8 m). We tested other permutation of the initial condition anomalies, e.g., the last 3 months, and the root zone anomalies. We found that the past 12 months and total depths gives the best result for soil moisture predictability at greater than one-year lead-time. We found that antecedent condition from the past 3 months give a higher anomaly correlation during the first year, but its influence decreases after one year (Supplementary Fig. 6). Then we computed anomaly correlations between initial condition soil moisture anomalies and the ensemble average forecast anomalies at the given lead and for 36 start date (1980– 2015). For observational analysis (black lines in Fig. 6), we shifted the analysis from 1970 to 2005, so that for November 1, 2005, soil moisture anomalies, the observations are available at 10 years to lead time, i.e., in October 2015.

Apportionment entropy (AE)

AE is a non-parametric measure to assess sub-seasonal to seasonal variability in the data. We applied a modified form of the AE formula taken from Konapala et al.71.

$${\mathrm{AE}} = {\mathrm{log}}\,12 + \mathop {\sum}\limits_{i = 1}^{12} {\left( {x_i/X} \right){\mathrm{log}}\left( {x_i/X} \right)}$$
(5)
$$X = \mathop {\sum}\limits_{i = 1}^{12} {x_i}$$
(6)

where xi is the monthly precipitation or soil moisture climatology. If the total annual precipitation or soil moisture is equally distributed across the 12 months i.e. xi/X = 1/12 then AE = 0. Whereas, if all the precipitation fall in one month only, then AE = log 12, i.e., AE ranges between 0 and log 12, with the smaller values representing less sub-seasonal to seasonal variability and larger values representing a higher sub-seasonal to seasonal variability. We have used natural logarithm (base e) and multiplied the AE by an arbitrary number 100 for better number representation in Fig. 5.