Background and Summary

Soil and Water Assessment Tool (SWAT)1 is a comprehensive hydrological model for watershed simulation. SWAT is a continuous-time, semi-distributed, and process-based model, which includes coupled upland and river processes. The land phase of SWAT includes hydrology, soil erosion, crop growth, nutrient cycling, algae transport, pesticide fate and transport, crop management, water transfer, snowfall and snowmelt, and soil temperature. The routing phase in the channels and rivers include processes such as flood routing, sediment routing, nutrient routing, pesticide routing, and routing in the reservoirs. The model is now being upgraded and restructured to SWAT+, and also coupled to glacier melt, heavy metal fate and transport, and other watershed-related processes.

SWAT is the most widely used hydrological and water quality model in the world2. SWAT models are used in a variety of applications, including quantification of water resources availability3,4,5,6,7,8,9,10, the impact of climate and landuse changes11,12,13,14,15,16, soil erosion17,18,19, water quality4,20,21,22,23,24,25, and ecosystem services26,27,28. SWAT also contains a modified version of the Environmental Policy Integrated Climate (EPIC) model29,30 for crop yield simulation8,12,31. In total, more than 4,500 ISI publications can be found using SWAT on various watershed-related issues and ecosystem services around the world, which is by far the most extensive collection of such literature in the world with an average of 550 peer-reviewed publications per year in the last 4 years (data gathered from the Web of Science, October 2019).

Lack of data in many parts of the world is a severe impediment to hydrologic modeling. At the same time, much data generated on the global and local scales is also posing a modeling problem creating an additional source of uncertainty. Previous works have shown that the use of different databases for the same region leads to different model outputs and, consequently, different water resources estimates and different estimates of ecosystem variables4,32. Next to model uncertainty, we have previously used the term conditionality4,33,34 to describe another constraint to a so-called calibrated model. All calibrated model parameters are uniquely conditioned on model assumptions, model structure, input data, as well as calibration data, calibration routines, and objective function definition. A calibration program, SWAT-CUP (SWAT Calibration and Uncertainty Procedures)25,35, was developed for the calibration of SWAT models. SWAT-CUP provides five different calibration routines and the option of choosing between 11 different objective functions. We have previously shown that the choice of different routines and objective functions lead to different parameters while producing equally acceptable calibration results36,37. It would be desirable to always obtain unconditional model parameters independent of calibration procedures and objective functions. For this reason, in the new version of the program, we have provided an option for multi-objective calibration, which provides an option of choosing any combination of the objective functions.

Furthermore, data processing and formatting of data for different applications are highly time-consuming and prone to errors, resulting in much of the research time to be spent on data preparation instead of modeling application and analyses. For this reason, we have put together global soil, landuse, and historical and future weather databases for use in SWAT and other similar watershed models (Table 1) as described in the next section. The collection of these data provides a valuable resource for modeling, especially in regions of data scarcity.

Table 1 Sources and resolutions of databases available at the Pangaea and www.2w2e.com website.

Methods

Soil maps of the world

FAO/UNESCO soil map of the world

There is a general lack of reliable soil information for many parts of the world, which has significantly disadvantaged evaluation of soil erosion, land degradation, environmental impact studies, and sustainable land management programs. Two highly-used global soil maps are the FAO/UNESCO Soil Map of the World and Harmonized World Soil Database (HWSD_v121). Both maps provide a limited description of parameters, which are not directly useful for hydrologic models. We have, therefore, used pedotransfer functions developed from soils around the world to create the needed parameters such as hydraulic conductivity, available water capacity, and bulk density. Pedotransfer functions “translate data we have into data we need”38. These functions estimate parameters that are difficult to measure using easily measured soil properties such as texture, color, and structure, that are routinely recorded by soil surveyors39.

The FAO/UNESCO soil map of the world was prepared using the topographic map series of the American Geographical Society of New York at a nominal scale of 1:5,000,000 consisting of a 30 cm topsoil layer, and a 70 cm subsoil layer (Fig. 1). Associated files, which we produced, include “Lookup_Soil_FAO-UNESCO.txt,” which contains the correspondence between soil map and soil database, and the SWAT’s usersoil table in the main SWAT database “SWAT2012.mdb”.

Fig. 1
figure 1

Unique soil units in FAO/UNESCO Soil Map of the World.

Initially, in 2004, the first author created the soil database for the FAO/UNESCO 1995 soil map for quantification of water availability and quality in Africa9,10. The soil names were created as a concatenation of the FAO mapping unit (e.g., Af14-3C) and FAO Soil-ID (e.g., 1) to give Af14-3C-1. Soil hydrologic groups were determined according to SWAT Manual40 based on the criteria in Supplementary Table S1. The fraction of anions exclusion (ANION_EXCL) was set to 0.5 according to the SWAT Manual40. The potential or maximum crack volume of the soil profile (SOL_CRK) expressed as a fraction of the total soil volume was set to zero as there was no information available to evaluate this parameter. Other soil properties have initially been calculated9,10 using the program ROSETTA41. In the current study, we have updated this database using a large number of pedotransfer functions, as described below.

Harmonized world soil database (HWSD)

The Food and Agriculture Organization of the United Nations (FAO) and the International Institute for Applied Systems Analysis (IIASA) combined the available regional and national soil information with the data already contained within the 1:5,000,000 scale FAO-UNESCO map, into a new comprehensive Harmonized World Soil Database (HWSD_v121). This map has a resolution of about 1 km (30 arc seconds) and consists of a 30-cm topsoil layer, and a 70-cm subsoil layer (Supplementary Fig. S1).

The soil variables provided in the Harmonized World Soil Database42 and FAO/UNESCO Soil Map of the World included soil texture (%sand, %silt, %clay), organic carbon, pH, and electrical conductivity (EC). However, from a hydrological point of view, we require parameters such as bulk density, water storage capacity, and hydraulic conductivity for different soil layers, which we used pedotransfer functions to estimate. We estimated soil bulk density (Table 2), soil available water capacity (Table 3), soil hydraulic conductivity (Table 4), soil erodibility factor for universal soil loss equation (USLE) (Table 5), and moist soil albedo (Table 6). The used pedotransfer functions are based on the soils from around the world; hence, providing parameters that are more universally applicable. The above variables were calculated for all soil records in the two soil maps.

Table 2 Soil Bulk Density (ρb) pedotransfer function (g cm−3). OC = %organic carbon, C = %clay, T = %silt, S = %sand.
Table 3 Available Water Capacity, AWC( = θ33–θ1500) (cm cm−1) pedotransfer functions. θ33 = soil water content at field capacity, θ1500 = soil water content at wilting point, C = %clay, ρb = bulk density (g cm−3), T = %silt, OC = %organic carbon, S = %sand.
Table 4 Soil Hydraulic Conductivity (cm day−1) pedotransfer functions. θ33 = soil water content at field capacity, θ1500 = soil water content at wilting point, C = %clay, ρb = bulk density (g cm−3), T = %silt, OC = %organic carbon, S = %sand, topsoil = an ordinal variable having the value of 1 for (depth 0–30 cm) or 0 (depth > 30 cm).
Table 5 Soil erodibility factor (cm day−1) pedotransfer function. S = %sand, T = %silt, C = %clay, OC = %organic carbon.
Table 6 Moist Soil Albedo based on the water content at field capacity (θ33).

Furthermore, to account for parameter uncertainty, the soils were sorted by their textural classes based on USDA classification42 that included Clay, Clay-loam, Heavy-clay, Loam, Loamy-sand, Sand, Sandy-clay, Sandy-clay-loam, Sandy-loam, Slit-loam, Silty-clay, and Silty-clay-loam. For each textural class, we pooled the estimates of various pedotransfer functions from both FAO_UNESCO and HWSD databases and calculated their cumulative probability distributions from which we obtained parameter values at the 5%, 50%, and 95% probability levels. Values for bulk density are shown in Table 7 as an example, while other parameters are given in Supplementary Tables S2S6. An example calculation of the 95 percent prediction uncertainty (95PPU) is shown in Supplementary Fig. S2 for the hydraulic conductivity of topsoil sandy loam. The 95PPU parameter range sets a physically meaningful limit on the parameters for different soil textural classes and is instrumental in constraining the respective parameters in model calibration. These ranges can, of course, be modified by the user as needed.

Table 7 Average and uncertainty estimates of bulk density for top and subsoil based on the textural classes. The numbers in the brackets are the number of samples.

In the pre-processing of HWSD database, similar to FAO/UNESCO, we modified the data where necessary by replacing zero values of %sand, %silt, and %clay by 1, and making sure that their summation equals 100%. Also, after applying various pedotransfer functions, we replaced the negative or unreasonable values with the overall averages to avoid model-generated errors. Finally, we should point out that the soil parameters in both databases must still be calibrated for a specific location.

Landcover maps of the world

Global land cover characterization (GLCC)

The GLCC from USGS is a landuse and land cover classification dataset based primarily on the unsupervised classification of the 1-km AVHRR (Advanced Very High-Resolution Radiometer) 10-day NDVI (Normalized Difference Vegetation Index) composites (Supplementary Fig. S3). The AVHRR source imagery dates from April 1992 through March 1993. The GLCC map contains 24 land cover types. We made the correspondence between the GLCC map units and SWAT’s (crop) database in Supplementary Table S7 based on the description of the land covers provided by the maps and the SWAT landuse definitions.

Global landuse GlobCover

The GlobCover is a European Space Agency initiative to develop global composites and land cover maps using observations from the 300-m MERIS sensor onboard the ENVISAT satellite mission (Soolementary Fig. S4). The GlobCover map covers the period of December 2004 to June 2006 and is derived by automatic and regionally-tuned classification of a MERIS full resolution surface reflectance time series. The GlobCover map contains 23 land cover types. We made correspondence between the GlobCover units and SWAT’s (crop) database in Supplementary Table S8 based on the description of the land covers provided by the maps and the SWAT landuse definitions.

The databases for the above two global landuse maps are supported by the table (crop) in the SWAT2012.mdb database and the lookup tables “Lookup_Landuse_GlobCover.txt” and Lookup_Landuse_USGS.txt. However, similar to the soil parameters, landuse parameters must be calibrated for a given location.

Historical weather data

The historical (1970–2005) reanalysis temperature and precipitation data from the Research Unit East Anglia (CRU TS 3.1)43 were reformatted from NetCDF into SWAT-readable text files. The database is daily and has a resolution of 0.5° and covers the entire globe in 67,420 files.

Future weather data

We provide five global climate models (GCM), each with four carbon evolution scenarios supported by ISI-MIP5 (Inter-Sectoral Impact Model Intercomparison Project)44. These daily data cover the period of 1950–2099 and have a resolution of 0.5°. Similar to CRU, they have been reformatted from NetCDF into SWAT-formatted text files.

The five GCM models include HadGEM2-ES, IPSL-CM5A-LR, MIROC-ESM-CHEM, GFDL-ESM2M, and NorESM1-M (Table 1) with Representative Concentration Pathway (RCP) scenarios (RCP2.6, RCP4.5, RCP6.0, and RCP8.5)45. The 0.5° grid WATCH Forcing Data46 for the period of January 1, 1960, to December 31, 1999 (the reference period) was used as observation data to downscale the five GCMs44. WATCH is a combination of the ERA-40 daily data, the 40-year reanalysis of the European Centre for Medium-Range Weather Forecasts, and the Climate Research Unit TS2.1 dataset (CRU)43. The WATCH Forcing Data data combines the daily statistics of ERA-40 with the monthly mean characteristics of CRU and Global Precipitation Climatology Centre (GPCC) datasets and represents a complete gridded observational dataset for bias correction of global climate data44.

The historical and future data can be downloaded for any given geographic location from www.2w2e.com using the template illustrated in Supplementary Fig. S5. The Climate Change Toolkit (CCT) program47 is linked to the above databases and can be used for bias correction if local data is available. CCT uses additive correction for temperature and a multiplicative correction factor for precipitation. The program can also be used for extreme climate analysis48.

Global actual evapotranspiration data

Actual evapotranspiration (AET) from the earth’s land surface is collected by NASA using satellite data from 1982 to 200349,50 (Supplementary Fig. S6). The algorithm calculates canopy transpiration and soil evaporation using a modified Penman-Monteith approach with biome-specific canopy conductance determined from the normalized difference vegetation index (NDVI). Priestley-Taylor approach was used to quantify open water evaporation. The observations from 34 flux network (FEUXNET) tower sites51 were used to parameterize an NDVI-based canopy conductance model to validate the global ET al.gorithm using measurements from 48 additional, independent flux towers49,50.

AET has been used before to calibrate SWAT when other observed data is not available52. It is crucial to have a measure of AET when calibrating a SWAT model with river discharge data. Using river discharge alone, we can confidently estimate runoff and infiltration. However, components of the infiltrated water cannot be estimated with any degree of confidence. These components include soil moisture (S), aquifer recharge (AR), and actual evapotranspiration (AET) (Fig. 2). Using an estimate of AET in calibration can significantly increase our confidence in the other components of infiltrating water.

Fig. 2
figure 2

Schematic illustration of the conceptual water balance model in SWAT.

To use the provided MODIS–NASA data for calibration in SWAT-CUP, users, should overlay the MODIS-AET grids with the subbasin map of their ArcSWAT/QSWAT project and average the AET grid points inside each subbasin to one single value to represent the subbasin’s AET.

Data Records

The Global FAO/UNESCO Soil Map of the World and associated SWAT data files (Lookup Table and SWAT2012.mdb)53 are deposited at Pangaea and www.2w2e.com sites. There are 4,931 soil records in this data set.

The Harmonized World Soil map and associated SWAT data files (Lookup Table and SWAT2012.mdb)54 are deposited at Pangaea and www.2w2e.com sites. There are 16,328 soil records in this data set.

The Global Land Cover Characterization (GLCC) map from USGS and SWAT data file (Lookup Table and SWAT2012.mdb)55 are deposited at Pangaea and www.2w2e.com sites. There are 24 landcover types in this database.

The GlobCover from the European Space Agency and associated data files (Lookup Table and SWAT2012.mdb)56 are deposited at Pangaea and www.2w2e.com sites. There are 23 landcover types in this database.

The historical CRU and future GCM weather data57 are deposited at Pangaea and www.2w2e.com.

Finally, the Global Actual Evapotranspiration Data58 in text format is deposited at Pangaea and www.2w2e.com.

Technical Validation

The global soil and landuse databases have been successfully used in many SWAT applications around the world4,6,9,10,16,59,60,61,62. Validation of these maps, which are based on satellite observations, are offered by ground-truth observations conducted by the map developers and also in various literature63,64.

There is a significant variation in the reported values of soil parameters in the literature and by various agencies. In this research, we used a large number of pedotransfer functions and soil samples from around the world to estimate the textural-based soil parameters. In Table 8 we compared our estimated values of bulk density and hydraulic conductivity with values reported by the U.S. Department of Agriculture (USDA), STRUCTx (STRUCTURAL ENGINEERING RESOURCES website, see Table 8), and other reported values. The rest of the parameters could not be found based on textural classes. As evident, there are significant variations in all estimates, especially for saturated hydraulic conductivities. For this reason, it is essential to have a range of estimates, so one can limit the values to a likely range during model calibration.

Table 8 Comparison of the values of bulk density and saturated hydraulic conductivity estimated in this research with values reported by different sources.

Usage Notes

There are 4,931 soil records in the FAO/UNESCO database, and 16,328 records in the HWSD soil map. Both (usersoil) tables are in the SWAT2012.mdb database. The field (Name) is concatenated by using the fields SU-SYM74, SU-SYM90, MU_GLOBAL, and ISSOL as given in the original HWSD database. SU-SYM74 is the soil unit symbol according to the FAO-74 soil classification, SU-SYM90 is the soil unit symbol according to the FAO-90 soil classification, MU_GLOBAL is the Global Mapping Unit identifier, which provides the link between the GIS soil units and the attribute database, and ISSOL is a field indicating if the soil mapping unit is a soil (1) or a non-soil (0). All maps provided have the World WGS-84 Spatial Reference without any projection. The users will have to project these maps as needed before using it in the ArcSWAT or QSWAT models.

Different soil and landuse maps are provided to emphasize the fact that often more than one database is available for building and calibrating a model, and also to encourage the users to use different databases to realize the conditionality of their calibrated models. Calibrated model parameters are always conditioned on the input data, meaning one could obtain a different set of parameters if one had used a different set of available data. This is probably the most disappointing aspect of calibration.

A calibrated model is, therefore, always: non-unique, subjective, conditional, and subsequently limited on the scope of its use. To achieve unconditionality, the calibrated parameters must be integrated over all conditioning factors. Hence, we recommend using different physical inputs and multi-objective calibration procedures.