Main

The COVID-19 pandemic has exposed many existing inequalities in the United States. The unprecedented impacts of the COVID-19 pandemic, including upsets to daily life, economic loss and emotional distress, have fallen disproportionately on low-income populations and communities of colour1,2,3,4. These same groups have also faced greater exposure to COVID-19 through high public-contact jobs5 that often make social distancing difficult or impossible and higher rates of cases as a result1,2,6,7,8.

Access to nature is also unequally distributed in the United States, with vegetation and parks often less available in low-income neighbourhoods and communities of colour. We define ‘nature’ as greenness (or the total amount of vegetation, including trees, shrubs and grass) and urban parks; while these two metrics do not fully describe nature, they nonetheless capture two important elements of what is often identified as ‘nature’ in urban settings9,10. Many studies11,12,13,14,15,16 have shown persistent patterns of inequity in individual cities, groups of cities and nationwide for Normalized Difference Vegetation Index (NDVI) at the census-tract scale17. Thus, the communities most impacted by COVID-19 may have the least access to nature. A negative association between COVID-19 case rates and greenness has been shown with county-level data in the United States18 but it is not known whether this effect holds when using finer spatial resolution data, nor whether park access has the same negative association with COVID-19 case rates.

Access to nature, including greenness and park proximity, has the potential to reduce some of the distress associated with the pandemic by improving mental health and providing safe spaces for socializing, physical activity and recreation19,20,21. We define greenness as the total volume of vegetation in an area, quantified using NDVI, which does not distinguish between different types such as trees, shrubs and lawns. Despite this drawback, NDVI is a widely used metric. Access to greenness (measured using NDVI) has been strongly linked—along with proximity to parks—to improvements in physical and mental health, including lower risk of mortality, lower odds of depression and lower rates of obesity, diabetes and cardiovascular disease22,23. Thus, inequity in nature access has the potential to translate into inequities in mental and physical health both during and beyond the pandemic.

In this study, we document the extent of these two ‘stacked’ inequities; that is, that low-income and majority people of colour (POC) communities have both more COVID-19 cases and less nature. We also explore whether there is an association between access to nature and COVID-19 incidence after accounting for income, race/ethnicity and other potentially confounding variables. There are multiple mechanisms that could produce such an association. For instance, a lack of access to nature might not only deprive individuals of a much-needed mental health resource but may also actively interfere with the body’s ability to fight infection. Contact with nature appears to play an important role in our defences against viruses through boosting Natural Killer (NK) cells24,25. Reduced defence against viruses and other mechanisms could add burden to those already suffering from health inequities and could keep a higher proportion of cases subclinical or asymptomatic in areas with more nature. This would result in a negative correlation between greenness and COVID-19 case rates that persists after accounting for sociodemographic characteristics and other factors that are also likely to be related to both greenness and COVID-19. While other mechanisms could also produce this pattern, a first step is to identify whether such a correlation exists. We emphasize that our data cannot distinguish between potential mechanisms nor infer causality but rather demonstrate an association among correlated variables.

Here, we quantify nature inequity across all census block groups in urbanized areas in the United States and link inequity in nature access to rates of COVID-19 cases for ZIP codes in 17 states using two analyses. Specifically, we ask: (1) do low-income and predominantly POC communities have both higher COVID-19 case rates and less nature access (defined here by NDVI and park proximity); (2) is nature access related to COVID-19 case rates after accounting for income, race/ethnicity and other potentially confounding variables; and (3) does inequity in nature access persist in both greenness and park proximity when examined across the entire urbanized United States at resolutions finer than the census tract? We quantify nature access for both parks and greenness to ask whether inequity is systematic across all urbanized areas in the United States.

Results

We found that majority POC ZIP codes had both higher COVID-19 case rates and less greenness (Fig. 1). As of 30 September 2020, majority POC ZIP codes had nearly twice as many COVID-19 cases per 100,000 people compared to majority-white ZIP codes (Fig. 1). Less-green ZIP codes also had higher rates of COVID-19 cases even after adjusting for differences in population density, race/ethnicity, income, time since the first recorded case, age and state (Fig. 2 and Extended Data Fig. 1). In a negative binomial mixed effect model of COVID-19 cases we found a 4.1% decrease in COVID-19 cases with a 0.1 increase in NDVI (Incidence Rate Ratio (IRR) 95% confidence interval (CI): 0.9–6.8%). Unlike NDVI, park proximity was not significantly related to COVID-19 case rates when controlling for other variables (Fig. 2 and Supplementary Table 1).

Fig. 1: COVID-19 case rates are related to both greenness and race/ethnicity.
figure 1

a, This analysis used reported COVID-19 cases at the ZIP code scale from 17 US states. b,c, Average NDVI values (b) and COVID-19 case rates per 100,000 people (c) across ZIP codes around Chicago, IL, as an example. d, Bar chart of greenness (NDVI) represented as quantiles and rates of COVID-19 showing a decline in cases with higher NDVI. e, Bar chart of greenness showing higher greenness in majority-white ZIP codes. f, COVID-19 case rates (per 100,000) showing lower rates of cases in majority-white ZIP codes. Error bars represent approximate 95% CIs.

Fig. 2: Greener ZIP codes have fewer COVID-19 cases after accounting for other factors.
figure 2

a, Coefficient values represent effect sizes from a negative binomial mixed effects model for the relationship between rates of COVID-19 (cases per 100,000 people) and greenness (NDVI), park access (ha), median income, median age, proportion persons of colour (proportion POC), days since the first recorded case (county scale) and population density. Coefficient values are represented as dots, bars represent 95% CIs and significant variables are shown in red. b,c, Predicted values for COVID-19 case rates per 100,000 people showing decline in cases as NDVI increases in ZIP codes with low (20%), average (50%) and high (80%) proportion POC in a state with a low case rate (Pennsylvania; b) and in a state with a high case rate during the study period (Illinois; c). Predicted values are from the same model described in a and shading represents 95% CIs.

We also found inequity in nature access at the US scale. Across all urbanized areas, block groups with a majority POC are less green (0.1 lower NDVI on average) and have fewer parks (0.5 fewer hectares on average). Similarly, low-income block groups are also less green (0.09 lower NDVI on average) and have fewer parks (3.6 fewer ha on average; Fig. 3, Extended Data Fig. 2 and Supplementary Table 2). For context, a 0.1 magnitude difference in NDVI is roughly equivalent to a 1 s.d. difference in greenness in our sample: the s.d. in NDVI across all block groups is 0.15 and the average within-city s.d. is 0.08. In simultaneous autoregressive models (SAR) that account for spatial autocorrelation the proportion POC in a block group and median household income were both significant predictors of NDVI and park proximity (Extended Data Fig. 2). Similarly, statistically significant differences in access to nature remain after accounting for population density (in both models of park proximity and greenness) and aridity (in the model of greenness).

Fig. 3: Nature access is inequitably distributed across urbanized areas in the United States.
figure 3

a, Greenness (NDVI) and park proximity (ha) across all 486 urbanized areas in the United States (including 142,325 block groups). Urbanized areas are represented by a point and values for greenness and park proximity are averaged within urbanized areas. b,c, Bar charts of greenness (NDVI; b) and park proximity by race/ethnicity (c), showing higher greenness in majority-white block groups. d,e, Bar charts of greenness (NDVI; d) and park proximity across all income quantiles (e), showing higher greenness and more parks in block groups with higher income. Error bars represent approximate 95% CIs.

Discussion

We show that COVID-19 has inflicted the greatest burden on communities that also face widespread inequity in nature access. Using ZIP code-scale data, we show that communities with the least access to nature had the highest case rates of COVID-19. Using fine-resolution data from all urbanized areas across all states we also show that inequity in access to greenness and parks is widespread in the United States. Taken together, our results demonstrate that the pandemic has compounded the disadvantages in low-income areas and communities of colour already facing fewer acres of park available for recreation and less greenness.

We found an association between greenness and COVID-19 case rates persisted after accounting for income, race/ethnicity and other confounding factors. While observational data such as ours cannot speak to causal relationships, previous findings from the literature suggest there are multiple possible mechanisms that could explain this statistical association. Greenness might affect COVID-19 case rates if it helps the body fight the virus once exposed, keeping a higher proportion of cases subclinical or asymptomatic. For example, NK cells play a key role in the body’s defence against viral infections, seeking out and attacking or ‘clearing’ virus-infected cells24,25 and contact with nature appears to play an important role in boosting our NK defences26. Other possible explanations for the nature–COVID-19 association could include mediation pathways through higher air pollution and temperature which have been shown to positively relate to higher COVID-19 case rates27,28 and are often higher in areas with less greenness and fewer trees29,30. It is also possible that having less green in a neighbourhood makes it more difficult to safely socialize in outdoor spaces. In places with less greenness and fewer available parks people may have chosen to socialize more frequently in indoor spaces. While municipal-wide restrictions on outdoor socializing may make this explanation less likely early in the pandemic, variation in how people socialize in outdoor spaces is possible after initial lockdowns were eased, particularly in private spaces such as yards and courtyards. While either, both or none of these explanations might underlie the lower rates of COVID-19 in areas with greater access to nature, this finding raises the possibility that populations that lack ready access to nature during the pandemic may not only be deprived of a much-needed mental health resource but may also be at greater risk of contracting COVID-19. Further research using patient-level data is needed to uncover the mechanistic drivers behind the patterns we show in this work.

After adjusting for race/ethnicity, NDVI and age, other factors including income, population density and the number of days since the first recorded case were not significantly related to the number of cases. While the virus arrived later in lower-density areas, it also tended to hit a larger fraction of the population (Supplementary Table 3), which could explain why both population density and the time since the first recorded case were poor predictors of COVID-19 case rates during the study period. Other work from Barcelona, Spain, has found a relationship between income and case rates1. One potential explanation for why we did not see this pattern is that exposure and transmission may be highest in occupations requiring more face-to-face interactions and these occupations may not be strongly correlated with income in the United States.

We found widespread evidence of inequity in access to nature across urbanized areas in the United States. These results may have cascading impacts, given that nature in urban settings has been associated with many human health benefits while also supporting other ecosystem services and biodiversity22,23,31,32. Previous work has shown inequity in the distribution of greenness and parks along income and racial lines both in the United States and abroad11,12,13,33. Within the United States, similar patterns have been found nationally, but only for NDVI at the scale of census tracts17 and at finer spatial scales in individual cities or groups of cities12,13,14,15. Our results build on this previous work to show patterns of inequity are persistent for both greenness and parks across the scale of all urbanized areas in the United States and at the fine spatial scale of census block groups. Placing our results in context, a difference of 0.1 increments of NDVI has been linked in other research to specific health impacts. For example, living with 0.1 increments lower NDVI around the home has been linked to 12% higher all-cause mortality34, 20.6 g lower birth weight in infants and higher likelihood of preterm birth35, 10% higher odds of poor self-reported health, lower neighbourhood satisfaction and social capital36 and a 39% decrease in odds of moderately vigorous physical activity in children37. Similarly, the area of available green space has also been linked to health; pregnant women living in neighbourhoods without a green space larger than 0.5 ha within 300 m are 13% more likely to report depressive symptoms38. Living closer to larger parks or more total area of parks has been associated with less stress39, more physical activity40 and lower odds for cardiometabolic disease41. These results suggest that differences in access to greenness and parks of a similar magnitude as shown here have the potential to impact a range of physical and mental health outcomes for low-income populations and communities of colour.

Our results demonstrate that inequity in nature access has implications for mental health and social interactions during a period of profound social and economic upheaval and mental health distress. In the short term, actions to overcome barriers to nature access during the pandemic—such as keeping urban parks in low-income neighbourhoods and communities of colour open, safe and accessible—could help to relieve some of the distress associated with the pandemic. As vaccination rates increase and the acute phase of the pandemic begins to wane, resuming programmes such as free public transportation to nearby parks could become an increasingly safe option to help underserved communities to access nature. Recognition of the public health implications of nature inequity could help to reframe urban nature as critical infrastructure, providing justification for sustaining and increasing funding to public greening programmes. Over the long term, actions taken to redress inequity through park creation and greening interventions could have substantial broader public health value beyond the pandemic.

Methods

We combined spatially explicit data on nature access, sociodemographic characteristics and COVID-19 case rates. We conducted two separate analyses at separate spatial scales, both limited to urban areas. In the first, we combined COVID-19 data with nature access and sociodemographic data at the ZIP code scale across 17 states to ask whether communities with the highest COVID-19 case rates also have less access to nature. In the second analysis, we related nature access with sociodemographic data across all 486 urbanized areas in the United States at the block group scale to explore US-wide patterns of nature inequity.

Data

Study extent

For the COVID-19 analysis, the availability of fine-scale case data limited the study sites to 17 states that provide publicly accessible statewide data at the ZIP code scale. While individual counties also publish COVID-19 case data at the ZIP code scale, the timelines, systems and formats for reporting and publishing these data are variable and inconsistent, and reconciling these differences was beyond the scope of this analysis. We limited our analysis to the ZIP code scale because the alternative county scale is large enough to contain substantial heterogeneity in both greenness and sociodemographic characteristics which could obscure relationships among these variables. For example, in the 17 states we included in our analysis, counties contain an average of six ZIP codes (ranging from 1 to 162). Comparing ZIP codes within the same county, the average difference in median income between the ZIP code with the lowest and highest median incomes is US$27,404. Likewise, the average difference in the proportion of POC is 20.6% and the average difference in greenness is 0.12. We limited our analyses to ZIP codes that contain centroids (that is, geographic centres) within either urbanized areas (>50,000 people) or urban clusters (>20,000 people) as defined by the US Census Bureau. We also removed 66 ZIP codes with a median age value of 0, as well as 382 ZIP codes with a median income value of 0. The remaining dataset contained 2,652 urban ZIP codes across the 17 states in our analysis.

For the nature-equity analysis, we considered US Census block groups across all 486 urbanized areas (excluding urban clusters) in the United States (excluding Puerto Rico), including 142,325 block groups and 5,197 incorporated cities. Each state was represented by at least one urban area.

COVID-19 data

We compiled publicly available COVID-19 case data at the ZIP code scale from individual state department of health websites on 1 October 2020, including data up to between 1 and 30 September 2020 for all states (Supplementary Table 3). We considered only reported cases of COVID-19 in the earlier phases of the pandemic (March through September) because some states, such as New Jersey, ceased to update their websites with new ZIP code-scale data beyond September. We were not able to obtain locally specific data quantifying the variation in rates of testing among different demographic groups. Evidence from some states (for example, Illinois; Supplementary Table 3) suggests that minority groups were being tested at much lower rates than whites, particularly in the early phases of the pandemic. These data would probably have strengthened our results, since we found that POC-majority ZIP codes have both higher case rates and less greenness.

We compared COVID-19 case rates to nature access and sociodemographic variables using data described below. We calculated case rates as the cumulative number of cases per 100,000 people for each ZIP code using the total population for each ZIP Code Tabulation Area (ZCTA) from the American Community Survey (ACS) 2018. ZCTAs were designed to represent ZIP code routes as two-dimensional areas and while there are minor discrepancies in some places they are not common in the urban areas included in this analysis42. We also calculated the total days since the first recorded case (available only at the county scale) for each ZIP code, using data from the New York Times US Coronavirus Database43.

Nature access data

We used two metrics to quantify inequity in nature access: the amount of greenness and proximity to parks. We calculated these two metrics at the level of US Census block groups for nature inequity analyses and ZIP codes for COVID-19 analyses. Greenness was quantified using NDVI, which measures the reflectance of green vegetation and is linked to the amount, health and leaf characteristics of vegetation, with unitless values that vary from −1 to 1. Values between 0.2 and 1 vary from sparse to heavily vegetated and values close to or below zero represent other types of land cover such as impervious cover, water, clouds or snow. NDVI does not distinguish between different types of vegetation such as between trees and shrubs and, while NDVI is often highly correlated with tree canopy cover, our dataset crosses many ecoregions including arid regions where tree cover is often sparse and NDVI is less strongly linked to tree canopy. Average NDVI values were calculated across each block group (nature-equity analysis) or ZCTA (COVID-19 analysis). Calculated in this way, NDVI does not distinguish between publicly accessible greenness and green areas on private property such as residential yards. Thus, NDVI values for ZIP codes and block groups represent greenness that people can access directly, greenness that is visible along streets but not necessarily accessible and greenness that is nearby but neither visible nor accessible (such as in private backyards). NDVI data were derived from Landsat imagery and processed using Google Earth Engine, filtering images from 1 January 2017 to 31 December 2018 to correspond most closely to the time period in which socio-economic and sociodemographic data were collected. To account for broad geographic patterns in NDVI, which varies at regional scales based on climate and aridity, we included the Global Aridity Index in our model for NDVI inequity. This publicly available dataset represents the ratio between precipitation and vegetation water demand, where higher values represent more humid conditions44.

To measure park proximity, we generated a database of publicly accessible parks in the United States that is as comprehensive as possible by combining four publicly available nationwide datasets. These datasets together included 337,441 parks across the entire United States, 143,228 of which are contained within the 486 urbanized areas in the United States (Trust for Public Land ParkServe, US Protected Areas Database, National Conservation Easement Database and ESRI Parks; Supplementary Table 4). We did not exclude parks below a size threshold, nor did we filter parks based on characteristics such as amount of greenness or recreation type. The size of parks within urbanized areas varied from 0.001 to 8,720 ha and includes small municipal parks that may have relatively little nature if their primary function is to provide sports facilities such as basketball courts, playgrounds or other types of recreation that typically require large paved areas.

Park proximity was calculated as the total acres of park within 1,000 m of the centroid of census blocks. This distance corresponds roughly to a 10 min walk, a common metric used by parks advocates and for measuring park accessibility45,46. For both ZIP codes and block groups, population-weighted averages were taken of block-level park proximity to derive a park proximity value for each ZIP code and block group. These population-weighted estimates were calculated to reduce the effect of areas with high park proximity where very few people live.

Sociodemographic data

Socio-economic and sociodemographic data were obtained from the US Census Bureau 2014–2018 ACS 5-year estimates47, which summarize data collected from 1 January 2014 to 31 December 2018. These data were collected for all block groups with their centroid within US urbanized areas (nature inequity analysis) and for ZCTA within the 17 states that report COVID-19 data at the ZIP code scale (COVID-19 analysis). Variables included median household income (total gross income before taxes during the past 12 months), the number of white people in a block group or ZCTA, median age and total population (used to derive population density and the proportion of POC in the block group or ZCTA).

Statistical analyses

We conducted two analyses at different spatial scales. In the first, we analysed COVID-19 rates using data from 17 states at the ZIP code scale using a negative binomial generalized linear mixed effects model. In the second, we quantified nature inequity in all urbanized areas in the United States at the block group scale. In this analysis we used SAR models to relate NDVI and park proximity to sociodemographic factors. All analyses were performed in R v.4.02 (ref. 48). SARs were performed using the package spdep (ref. 49) and negative binomial mixed effects models were performed using the package lme4 (ref. 50).

COVID-19

We analysed COVID-19 case rates by ZIP code using a negative binomial generalized linear mixed effects model51. A single full model related COVID-19 case rates in each ZIP code to fixed effects for NDVI, park proximity, the proportion of white people, median income, population density, median age and the total number of days since the first recorded case (county scale) (Fig. 2 and Supplementary Table 1). We included state as a random effect to account for the non-independence of data from the same state, which could occur because of processes we are not capturing with available data, including differences in the timing of public policy responses such as lockdowns or mask mandates51. All explanatory variables were centred and scaled. To quantify the impact of a change in NDVI, we calculated the IRR by fitting an additional model using raw and unscaled NDVI multiplied by 10 (all other variables scaled). The IRR and 95% CIs estimates how a 0.1 increment of change in NDVI affects COVID-19 case rates52. We used the Variance Inflation Factor and Pearson correlation coefficients to diagnose potential multicollinearity among explanatory variables (Supplementary Tables 1 and 5).

Model validation

The negative binomial mixed effects model fits the data structure well, providing an appropriate error structure for overdispersed count data. Two issues not well addressed through this model were spatial autocorrelation and potential endogeneity of regressors. Therefore, we built two additional models to validate the results of the negative binomial mixed effects model. In the first, we confirmed the presence of spatial autocorrelation (Moran’s I = 0.37, P < 0.0001) and we used a SAR model to adjust for the presence of spatial autocorrelation. We used the queen criteria to build the neighbours matrix and model selection with Akaike information criterion (AIC) to compare three potential structures for where the spatial autoregressive process is believed to occur. These potential structures include: a spatial error model, where spatial dependence is assumed to occur in the error term; a spatial lag model, where spatial dependence is assumed to occur in the response variable; and a mixed or Durbin model, where spatial dependence is assumed to influence both the response and explanatory variables (Supplementary Tables 6 and 7)53.

To address the potential for endogeneity between COVID-19 cases, NDVI and parks, we used Instrument Variable (IV) regression implemented using the two-stage least squares method54. Endogeneity could occur if an omitted variable, such as community investment or regulatory salience, were related to both COVID-19 cases (through health care investment, for example) and NDVI or park access (through community investment in green infrastructure, for example) within communities. In the IV model, we used Built Up Intensity55 and the Aridity Index as instrument variables for NDVI. Both of these variables are correlated with NDVI but are less likely to be correlated with COVID-19 cases or underlying omitted variables such as regulatory salience or community investment in health and green infrastructure. Two-stage least squares IV regression was conducted using the R package AER (Supplementary Table 8)54. These model validation steps confirm the basic relationships between NDVI, park access, proportion of white people in a ZIP code and COVID-19 cases, while also accounting for issues related to the spatially structured and observational nature of our data. Given the nature and structure of our data, model coefficients and the magnitude of effects are best interpreted using the negative binomial mixed effects model (Fig. 2 and Supplementary Table 1).

Nature inequity

To evaluate the relationship between nature access and sociodemographic variables, we built two models and analysed park proximity and NDVI separately. Both models included median income, the proportion of white people and the population density in the block group as covariates and the aridity index was also included in the NDVI model. We used SAR models to account for spatial autocorrelation. At the block group scale, models for NDVI (Moran’s I = 0.64, P < 0.001) and park proximity (Moran’s I = 0.62, P < 0.001) models both contained evidence of spatial autocorrelation. We used queen criteria for the neighbourhood matrix and model selection with AIC to compare a spatial error model and a spatial lag model. We did not include a mixed or Durbin model for nature equity due to issues with model convergence (Supplementary Table 6)53.