Drought Tolerance in Tropical Tree Species

Comparison of linear models, summarizing tropical tree growth during a severe drought
MEDS
R
Statistics
Author
Affiliation
Published

December 10, 2023

Motivation

Climate change, in conjunction with other environmental stressors, continue to threaten forests around the world. Mass tree deaths, fires, and deforestation can create a negative feedback loop, turning these well-recognized carbon sinks into a carbon source. Forests are also recognized for their role in preventing erosion, filtering air, and providing habitat and recreation.

Droughts have been proven to adversely affect forest health. Minimal and inconsistent water supply affects phenology and increases the chance of insect outbreaks or wildfires1. While the severity of these threats is not novel to me, I recognized that most of my knowledge on the topic was based on studies conducted in temperate forests. I was curious to learn more about how droughts impacted tropical species, which are less adapted to severe weather fluctuations.

Goal

To conduct a statistical assessment of tropical tree growth under drought conditions. This was accomplished using a series of linear regressions to model the relationship between tree growth and several climate variables. Using these regressions, I sought to examine if tree species with a widespread familial distribution (left) were better prepared for droughts than species with narrow familial distribution (right).

Distribution data from Global Biodiversity Information Facility (GBIF)

Distribution data from Global Biodiversity Information Facility (GBIF)

Data

About the Data

Data of tree diameter over time is publicly available from the DataONE repository. This dataset includes measures of tree diameter at breast height (dbh) for several tropical tree species in the Luquillo National Research Forest in Puerto Rico. The Luquillo Forest is part of the Long-Term Ecological Research Network (LTER), so substantial meteorological data for the area was also available. The diameter measurements were taken during a severe drought, lasting from 2013 to 2016.

Although there were several climate variables available, I reduced the number of variables in the models to minimize the possibility of over fitting. Mean average temperature and total rainfall were included to summarize the effects of drought. Photosynthetic photon flux density (ppfd) was also included. PPFD is a measurement of the number of photons between 400 and 700 nm in wavelength, and how frequently these waves this the leaf’s surface. This wavelength range is optimal for plant molecules to absorb, thus ppfd acts as a measurement of energy available for plants.2

Tree Species

In an effort to minimize error, modeling was limited to the four most sampled tree species. Species were also selected based on their taxonomic family, and whether that family had a wide or narrow distribution. Species and family information was provided in the metadata, but I will summarize it here:

Abbreviation Species Family Family Distribution
DACEXC Dacryodes excelsa Burseraceae narrow
MANBID Manilkara bidentata Sapotaceae narrow
CASARB Casearia arborea Salicaceae wide
INGLAU Inga laurina Fabaceae wide

Data Cleaning

Although data collection for tree diameters began in 2011, measurements were not consistently taken until June 2014. In order to create time series models without significant data gaps, I restricted the data from June 2014 to July 2016, when diameter data was collected monthly.

View Code
## =====================================
##           Clean Tree Data        ----
## =====================================

dbh <- dbh_raw %>% 
  
  # correct date object format
  mutate(date = as.Date(paste(year, doy, sep = "-"), "%Y-%j")) %>% 
  
  # filter to desired species
  filter(species %in% c("DACEXC", "MANBID", "CASARB", "INGLAU")) %>% 
  
  # calculate diameter averages per species
  select(-c("doy", "year", "flag")) %>% 
  group_by(date, species) %>% 
  summarise(mean_daily_dbh = mean(dbh, na.rm = TRUE)) %>% 
  ungroup() %>% 
  
  # filter to desired time range
  filter(date(date) >= "2014-06-01" & date(date) < "2016-08-01")
  
# add family and distribution information
dbh <- dbh %>% 
  mutate(family = case_when(species == 'CASARB' ~ 'Salicaceae',
                            species == 'MANBID' ~ 'Sapotaceae',
                            species == 'DACEXC' ~ 'Burseraceae',
                            species == 'INGLAU' ~ 'Fabaceae'),
         distribution = case_when(family == 'Salicaceae' ~ 'wide',
                                  family == 'Sapotaceae' ~ 'narrow',
                                  family == 'Burseraceae' ~ 'narrow',
                                  family == 'Fabaceae' ~ 'wide'),
         distribution = as.factor(distribution))

Climate data that was collected daily had far more missing values than data collected hourly, so I used hourly climate data to calculate daily and monthly averages.

View Code
## =====================================
##           Clean Climate Data     ----
## =====================================

# select and rename climate variables (1999-2014)
clim_1999_2014 <- clim_1999_2014 %>% 
  mutate(datetime = mdy_hm(datetime)) %>% 
  filter(date(datetime) >= "2014-06-01" & date(datetime) != "2015-01-01") %>% 
  select(c("datetime", "rain_mm", "temp_air_degrees_c", "ppfd_millimoles_m2_hour")) %>% 
  rename("temp_c" = "temp_air_degrees_c", 
         "ppfd_mmol_m2_hour" = "ppfd_millimoles_m2_hour")

# select and rename climate variables (2015-2023)
clim_2015_2023 <- clim_2015_2023 %>% 
  mutate(datetime = ymd_hms(datetime)) %>% 
  filter(date(datetime) < "2016-08-01") %>% 
  select(c("datetime", "rain_mm_tot", "air_tc_avg", "par_tot")) %>% 
  rename("rain_mm" = "rain_mm_tot",
         "temp_c" = "air_tc_avg",
         "ppfd_mmol_m2_hour" = "par_tot")

# bind to combine study time (June 2014 - July 2016)
hourly_conditions <- rbind(clim_1999_2014, clim_2015_2023) %>% 
  mutate(year_mo = yearmonth(datetime))

## =====================================
##         Calculate Averages       ----
## =====================================
# convert hourly to daily averages
daily_conditions <- hourly_conditions %>% 
  group_by(date = date(datetime)) %>% 
  summarise(tot_rain_mm = sum(rain_mm, na.rm = TRUE),
            avg_temp_c = mean(temp_c, na.rm = TRUE),
            avg_ppfd_mmol_m2 = mean(ppfd_mmol_m2_hour, na.rm = TRUE)) %>% 
   mutate(year_mo = yearmonth(date))

# create monthly conditions
monthly_conditions <- hourly_conditions %>% 
  group_by(year_mo) %>% 
  summarise(tot_rain_mm = sum(rain_mm, na.rm = TRUE),
            avg_temp_c = mean(temp_c, na.rm = TRUE),
            avg_ppfd_mmol_m2 = mean(ppfd_mmol_m2_hour, na.rm = TRUE))

# replace zeros w/NA, no data collected October 2014
monthly_conditions['tot_rain_mm'][monthly_conditions['tot_rain_mm'] == 0] <- NA

# replace NAs with the mean of previous and next month
monthly_conditions$tot_rain_mm <- na.approx(monthly_conditions$tot_rain_mm)

# remove raw data variables
rm(clim_1999_2014, clim_2015_2023, dbh_raw, hourly_conditions)

# create fully joined df
clim_dbh_full <- left_join(dbh, daily_conditions, by = c("date"))

Analysis - Static Time Series

In order to summarize the results I was most interested in, I developed the following function:

View Code
# create function to run model and clean results
tidy_results_fun <- function(model, data){
  
  # clean model outputs
  tidy <- broom::tidy(model)
  summary <- broom::glance(model)
  
  # combine desired results (keeping R/R^2 only once)
  tidy$adj_r_squared <- ifelse(1:nrow(tidy) == 1, summary$adj.r.squared, NA)
  tidy$r_squared <- ifelse(1:nrow(tidy) == 1, summary$r.squared, NA)
  
  # save results to environment
  assign(paste0(tolower(unique(data$species)), "_model"), tidy, envir = .GlobalEnv)
  
}

Simple linear regression

A simple time series regression looking at tree diameter (y) over time (x) for the four species was conducted first. This can be written out mathematically as:\[ \hat{y} = \beta_0 + \beta_1 x_1 \]

Where \(\beta_1\) is the average change in diameter given a one unit change in time (\(x_1\)), and \(\beta_0\) is the estimated diameter when time (\(x_1\)) is zero.

View Code
# apply function to all species
for (i in unique(clim_dbh_full$species)) {
  data <- clim_dbh_full %>% filter(species == i)
  model <- lm(mean_daily_dbh ~ date, data = data)
  tidy_results_fun(model, data)
}

# combine species based on distributions
wide <- rbind(inglau_model, casarb_model) %>% gt()
narrow <- rbind(dacexc_model, manbid_model) %>% gt()

# view for wide distribution
wide %>% 
  tab_header("Wide Distribution") %>% 
  tab_row_group(label = "I. laurina", rows = 1:2) %>% 
  tab_row_group(label = "C. arborea", rows = 3:4) %>% 
  tab_options(row_group.background.color = "grey90")
Wide Distribution
term estimate std.error statistic p.value adj_r_squared r_squared
C. arborea
(Intercept) -55.41282623 2.007586e+01 -2.7601721 8.456750e-03 0.70391672 0.71064589
date 0.01244958 1.211459e-03 10.2765188 3.747518e-13 NA NA
I. laurina
(Intercept) -32.27795546 1.112263e+02 -0.2902007 7.730576e-01 0.05806837 0.07947591
date 0.01293233 6.711850e-03 1.9267901 6.063085e-02 NA NA
View Code
# view results for narrow distribution
narrow %>% 
  tab_header("Narrow Distribution") %>% 
  tab_row_group(label = "D. excelsa", rows = 1:2) %>% 
  tab_row_group(label = "M. bidentata", rows = 3:4) %>% 
  tab_options(row_group.background.color = "grey90")
Narrow Distribution
term estimate std.error statistic p.value adj_r_squared r_squared
M. bidentata
(Intercept) 1.375030e+02 8.129896e+00 16.913252 1.287841e-20 0.12825930 0.1480716
date 1.341184e-03 4.905911e-04 2.733811 9.053291e-03 NA NA
D. excelsa
(Intercept) 2.224853e+02 2.421531e+01 9.187794 1.058230e-11 0.09937944 0.1198481
date 3.535865e-03 1.461251e-03 2.419753 1.983184e-02 NA NA

Multiple Linear Regression

This regression compares diameter over time, adding all climate variables as predictors to see if the models predictive capability improves. This can be written out mathematically as:\[ \hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4\]

Where \(x_2\) through \(x_4\) are the added climate variables.

View Code
# apply function to all species (now using multiple regression)
for (i in unique(clim_dbh_full$species)) {
  data <- clim_dbh_full %>% filter(species == i)
  model <- lm(mean_daily_dbh ~ date + tot_rain_mm + avg_temp_c + avg_ppfd_mmol_m2, data = data)
  tidy_results_fun(model, data)
}

# combine species based on distributions
wide <- rbind(inglau_model, casarb_model) %>% gt()
narrow <- rbind(dacexc_model, manbid_model) %>% gt()

# view for wide distribution
wide %>% 
  tab_header("Wide Distribution") %>% 
  tab_row_group(label = "I. laurina", rows = 1:2) %>% 
  tab_row_group(label = "C. arborea", rows = 3:4) %>% 
  tab_options(row_group.background.color = "grey90")
Wide Distribution
term estimate std.error statistic p.value adj_r_squared r_squared
C. arborea
tot_rain_mm 2.253365e-02 5.057704e-02 0.4455312 6.586038e-01 NA NA
avg_temp_c -2.175391e-01 1.320654e+00 -0.1647207 8.700852e-01 NA NA
I. laurina
(Intercept) 5.427322e+01 1.451973e+02 0.3737894 7.107535e-01 0.03662887 0.1329660
date 7.605868e-03 8.364894e-03 0.9092605 3.692601e-01 NA NA
avg_ppfd_mmol_m2 6.862826e-03 4.535362e-03 1.5131816 1.389640e-01 NA NA
(Intercept) -6.751628e+01 2.517764e+01 -2.6815968 1.098954e-02 0.70104536 0.7309408
date 1.249580e-02 1.450497e-03 8.6148412 2.846680e-10 NA NA
tot_rain_mm -1.394576e-03 8.770206e-03 -0.1590129 8.745474e-01 NA NA
avg_temp_c 4.835603e-01 2.290053e-01 2.1115683 4.173115e-02 NA NA
avg_ppfd_mmol_m2 -1.744404e-04 7.864450e-04 -0.2218087 8.257163e-01 NA NA
View Code
# view results for narrow distribution
narrow %>% 
  tab_header("Narrow Distribution") %>% 
  tab_row_group(label = "D. excelsa", rows = 1:5) %>% 
  tab_row_group(label = "M. bidentata", rows = 6:10) %>% 
  tab_options(row_group.background.color = "grey90")
Narrow Distribution
term estimate std.error statistic p.value adj_r_squared r_squared
M. bidentata
(Intercept) 1.551021e+02 9.074015e+00 17.0929988 7.383427e-19 0.30398727 0.3735885
date 3.676208e-04 5.227587e-04 0.7032322 4.864344e-01 NA NA
tot_rain_mm 1.958652e-03 3.160780e-03 0.6196737 5.393752e-01 NA NA
avg_temp_c -1.042075e-01 8.253344e-02 -1.2626099 2.148471e-01 NA NA
avg_ppfd_mmol_m2 1.058597e-03 2.834345e-04 3.7348907 6.484181e-04 NA NA
D. excelsa
(Intercept) 2.182800e+02 3.184127e+01 6.8552534 5.086965e-08 0.04474788 0.1402731
date 3.803115e-03 1.834392e-03 2.0732289 4.536645e-02 NA NA
tot_rain_mm -1.142779e-02 1.109137e-02 -1.0303323 3.097269e-01 NA NA
avg_temp_c 4.315974e-02 2.896148e-01 0.1490246 8.823659e-01 NA NA
avg_ppfd_mmol_m2 -1.071374e-03 9.945890e-04 -1.0772032 2.885552e-01 NA NA

Analysis - Dynamic Time Series

Climate impacts on tree growth are likely not immediate, in order to produce a more accurate model it would be best to add a lag to tree growth. Ideally, this regression would test if current diameter at breast height is dependent on past diameter *and* past climate variables.

View Code
# conduct dynamic regression for a single species

# apply function to all species (dynamic time series)
for (i in unique(clim_dbh_full$species)) {
  
  data <- clim_dbh_full %>% filter(species == i)
  
  model <- dynlm(mean_daily_dbh ~ tot_rain_mm + 
                   lag(tot_rain_mm, 1) + 
                   avg_temp_c + 
                   lag(avg_temp_c, 1) + 
                   avg_ppfd_mmol_m2 + 
                   lag(avg_ppfd_mmol_m2, 1), 
                 data = data)
  
  tidy_results_fun(model, data)
}

Upon running these models, I recognized they were producing abnormally high values of r-squared (between 0.96 and 1). Inflated values of the adjusted r-squared can be caused by autocorrelation, which violates one of the assumptions that must be met to run a dynamic linear model. The autocorrelation between previous and current diameter breast height was very high, and can be seen by creating an ACF plot:

View Code
# isolate single species for example
casarb <- clim_dbh_full %>% filter(species == "CASARB")

# plot autocorrelation
acf(casarb$mean_daily_dbh, lag.max = 12, na.action = na.pass,  main = 'Autocorrelation for CASARB Diameter')

Results

After running several regressions, I found that the addition of climate variables did not improve the model’s predictive capabilities. In fact, the simple time series for C. arborea (CASARB) was the most accurate, with a model that explained 70% of variation in diameter growth (y). CASARB has the largest increase in mean diameter height between the first and last measurement, and I believe this is what produced such a well-performing model. Given more time, I predict that time alone would be a much better indicator of diameter growth than what is demonstrated here. All species except M. bidentata (MANBID) experienced decreases in adjusted r-squared values when climate variables were added. This suggests climate variables caused overfitting, but the one exception is interesting. By adding climate variables, the MANBID model explained 30% of variation in diameter, as opposed to 12% in the initial model. While this percentage is still rather low, it suggests potential discrepancies between which tree species are more rapidly impacted by climate.

Limitations

Given more time, I would have analyzed the change in diameter at breast height between measurements instead of the diameter itself. This could reduce the impact of autocorrelation, so a dynamic time series could be run instead. Alternatively, an autoregressive moving average (or ARIMA) model would allow me to compare diameters regardless of the high autocorrelation. As mentioned, I suspect that data collected over a longer period would demonstrate a stronger linear relationship between tree diameter and time. To expand this analysis, growth rates during periods of drought could be compared to rates during favorable climate conditions.

Conclusion

Much more work needs to be done to properly assess the impact that climate change has on tropical rain forests, and how these impacts may be different from temperate forests. Data on trees physiological responses can be much harder to collect, leaving gaps in historic records. Similarly, species distribution data can be hard to difficult consolidate. Still, improvements in technology, climate science, and environmental empathy continue to lead us in the right direction.

Footnotes

  1. Effects of Drought on Forests and Rangelands | Climate Change Resource Center. https://www.fs.usda.gov/ccrc/topics/effects-drought-forests-and-rangelands. Accessed 10 Dec. 2023.↩︎

  2. Rabinowitz, Harold, and Suzanne Vogel, editors. “Chapter 11 - Style and Usage in Earth Science and Environmental Science.” The Manual of Scientific Style, Academic Press, 2009, pp. 427–68. ScienceDirect, https://doi.org/10.1016/B978-012373980-3.50015-0.↩︎

Citation

BibTeX citation:
@online{barajas2023,
  author = {Barajas, Briana},
  title = {Drought {Tolerance} in {Tropical} {Tree} {Species}},
  date = {2023-12-10},
  url = {https://github.com/briana-barajas/luquillo-tree-growth},
  langid = {en}
}
For attribution, please cite this work as:
Barajas, Briana. 2023. “Drought Tolerance in Tropical Tree Species.” December 10, 2023. https://github.com/briana-barajas/luquillo-tree-growth.