---
title: "Model continuous trait evolution"
author: "Maël Doré"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Model continuous trait evolution}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r set_options, include = FALSE}
knitr::opts_chunk$set(
eval = FALSE, # Chunks of codes will not be evaluated by default
collapse = TRUE,
comment = "#>",
fig.width = 7, fig.height = 5 # Set device size at rendering time (when plots are generated)
)
```
```{r setup, eval = TRUE, include = FALSE}
library(deepSTRAPP)
is_dev_version <- function (pkg = "deepSTRAPP")
{
# # Check if ran on CRAN
# not_cran <- identical(Sys.getenv("NOT_CRAN"), "true") # || interactive()
# Version number check
version <- tryCatch(as.character(utils::packageVersion(pkg)), error = function(e) "")
dev_version <- grepl("\\.9000", version)
# not_cran || dev_version
return(dev_version)
}
```
```{r adjust_dpi_CRAN, include = FALSE, eval = !is_dev_version()}
knitr::opts_chunk$set(
dpi = 72 # Lower DPI to save space
)
```
```{r adjust_dpi_dev, include = FALSE, eval = is_dev_version()}
knitr::opts_chunk$set(
dpi = 72 # Default DPI for the dev version
)
```
This vignette presents the different options available to model __continuous trait evolution__ within deepSTRAPP.
It builds mainly upon functions from R packages `[geiger]` and `[phytools]` to offer a simplified framework to model and visualize __continuous trait evolution__ on a __time-calibrated phylogeny__.
Please, cite also the initial R packages if you are using deepSTRAPP for this purpose.
For an example with __categorical__ data, see this vignette: `vignette("model_categorical_trait_evolution")`
For an example with __biogeographic__ data, see this vignette: `vignette("model_biogeographic_range_evolution")`
```{r model_trait_evolution_cont}
# ------ Step 0: Load data ------ #
## Load phylogeny and tip data
library(phytools)
data(eel.tree)
data(eel.data)
# Dataset of feeding mode and maximum total length from 61 species of elopomorph eels.
# Source: Collar, D. C., P. C. Wainwright, M. E. Alfaro, L. J. Revell, and R. S. Mehta (2014)
# Biting disrupts integration to spur skull evolution in eels. Nature Communications, 5, 5505.
# Extract body size
eel_tip_data <- stats::setNames(eel.data$Max_TL_cm,
rownames(eel.data))
plot(eel.tree)
ape::Ntip(eel.tree) == length(eel_tip_data)
## Check that trait tip data and phylogeny are named and ordered similarly
all(names(eel_tip_data) == eel.tree$tip.label)
# Reorder tip_data as in phylogeny
eel_tip_data <- eel_tip_data[eel.tree$tip.label]
# ------ Step 1: Prepare continuous trait data ------ #
## Goal: Map continuous trait evolution on the time-calibrated phylogeny
# 1.1/ Fit evolutionary models to trait data using Maximum Likelihood.
# 1.2/ Select the best fitting model comparing AICc.
# 1.3/ Infer ancestral characters estimates (ACE) at nodes.
# 1.4/ Infer ancestral states along branches using interpolation to produce a `contMap`.
library(deepSTRAPP)
# All these actions are performed by a single function: deepSTRAPP::prepare_trait_data()
?deepSTRAPP::prepare_trait_data()
# Model selection is performed internally with deepSTRAPP::select_best_model_from_geiger()
?deepSTRAPP::select_best_model_from_geiger()
# Plotting of the contMap is carried out with deepSTRAPP::plot_contMap()
?deepSTRAPP::plot_contMap()
## Macroevolutionary models for continuous traits
?geiger::fitContinuous() # For more in-depth information on the models available
## 7 models from geiger::fitContinuous() are available
# "BM": Brownian Motion model.
# Default model that assumes a Brownian random walk in the trait space.
# No trend. No time-dependence.
# Correlation structure is proportional to the extent of shared ancestry for pairs of species.
# sigma² ('sigsq') is the evolutionary rate that represents
# the expected variance in traits in proportion to time.
# 'z0' is the ancestral character estimate (trait value) at the root.
# "OU": Ornstein-Uhlenbeck model.
# Random walk with a central tendency (= an optimum).
# Attraction toward the central tendency is controlled by parameter 'alpha'.
# "EB": Early-burst model.
# Time-dependent model where rate of evolution increases or decreases exponentially
# through time under the model r[t] = r[0] * exp(a * t), where r[0] is the initial rate,
# 'a' is the rate change parameter, and t is time. By default, 'a' is set to be negative,
# therefore the model represents a decelerating rate of evolution.
# Parameter estimate boundaries can be change to allow accelerating evolution
# with positive values of 'a' (as in the ACDC model).
# "rate_trend": Linear trend model.
# Time dependent model where rate of evolution varies linearly with time
# (i.e., following a slope). If the 'slope' parameter is positive,
# rates are increasing, and conversely.
# "lambda": Pagel's lambda model
# Based on branch length transformation. Modulates the extent to which the phylogeny
# predicts covariance among trait values for species (i.e., the degree of phylogenetic signal).
# The model multiplies all internal branch lengths by 'lambda'.
# 'lambda' close to zero indicates no phylogenetic signal.
# 'lambda' close to one approximate the 'BM' model and indicates strong phylogenetic signal.
# "kappa": Pagel's kappa model.
# Based on branch length transformation. Punctuational (speciational) model where
# trait divergence is related to the number of speciation events between pairs of species.
# Assumes that speciation events are responsible for trait divergence.
# The model raises all branch lengths to an estimated power 'kappa'.
# 'kappa' close to zero indicates a strong dependency of trait evolution on speciation events.
# 'kappa' close to one approximate the 'BM' model and indicates strong phylogenetic signal.
# "delta": Pagel's delta model.
# Based on branch length transformation. Time-dependent model that modulates
# the relative contributions of early vs. late evolution in the tree.
# The model raises all node depths to an estimated power 'delta'.
# 'delta' close to one approximate the 'BM' model and indicates strong phylogenetic signal.
# 'delta' lower than one gives more weight to early evolution, thus represents decelerating rates.
# 'delta' higher than one gives more weight to late evolution, thus represents accelerating rates.
## Model trait data evolution
eel_cont_data <- prepare_trait_data(
tip_data = eel_tip_data,
trait_data_type = "continuous",
phylo = eel.tree,
seed = 1234, # Set seed for reproducibility
# All possible models
evolutionary_models = c("BM", "OU", "EB", "rate_trend", "lambda", "kappa", "delta"),
control = list(niter = 200), # Example of additional parameters that can be pass down
# to geiger::fitContinuous() to control parameter optimization.
res = 100, # To set the reoslution of the continuous mapping of trait value on the contMap
color_scale = c("darkgreen", "limegreen", "orange", "red"),
plot_map = FALSE,
# PDF_file_path = "./eel_contMap.pdf", # To export in PDF the contMap generated
return_ace = TRUE, # To include Ancestral Character Estimates (ACE) at nodes in the output
return_best_model_fit = TRUE, # To include the best model fit in the output
return_model_selection_df = TRUE, # To include the df for model selection in the output
verbose = TRUE) # To display progress
# ------ Step 2: Explore output ------ #
## Explore output
str(eel_cont_data, 1)
## Extract the contMap showing interpolated continuous trait evolution
## on the phylogeny as estimated from the model
eel_contMap <- eel_cont_data$contMap
# Plot with initial color_scale
plot_contMap(eel_contMap,
fsize = c(0.6, 1)) # Adjust tip label size
# Plot with updated color_scale
plot_contMap(contMap = eel_contMap,
color_scale = c("purple", "violet", "cyan", "blue"),
fsize = c(0.6, 1)) # Adjust tip label size
# The contMap is the main input needed to perform a deepSTRAPP run on continuous trait data.
## Extract the Ancestral Character Estimates (ACE) = trait values at nodes
eel_ACE <- eel_cont_data$ace
head(eel_ACE)
# This is a named numerical vector with names = internal node ID and values = ACE.
# It can be used as an optional input in deepSTRAPP run to provide
# perfectly accurate estimates for trait values at internal nodes.
## Explore summary of model selection
eel_cont_data$model_selection_df # Summary of model selection
# Models are compared using the corrected Akaike's Information Criterion (AICc)
# Akaike's weights represent the probability that a given model is the best among
# the set of candidate models, given the data.
# Here, the best model is Pagel's lambda
## Explore best model fit (Pagel's lambda)
eel_cont_data$best_model_fit # Summary of best model optimization by geiger::fitContinuous()
eel_cont_data$best_model_fit$opt # Parameter estimates and goodness-of-fit information
# 'lambda' = 0.636. The best model detects a moderate degree of phylogenetic signal.
## Inputs needed to run deepSTRAPP are the contMap (eel_contMap), and optionally,
## the tip_data (eel_tip_data), and the ACE (eel_ACE)
```
```{r model_trait_evolution_cont_eval, eval = TRUE, echo = FALSE, out.width = "100%"}
suppressWarnings(library(maps))
suppressWarnings(library(ape))
suppressWarnings(library(phytools))
data(eel.tree)
data(eel.data)
# Extract body size
eel_tip_data <- stats::setNames(eel.data$Max_TL_cm,
rownames(eel.data))
# Reorder tip_data as in phylogeny
eel_tip_data <- eel_tip_data[eel.tree$tip.label]
## Model trait data evolution
eel_cont_data <- suppressWarnings(prepare_trait_data(
tip_data = eel_tip_data,
trait_data_type = "continuous",
phylo = eel.tree,
seed = 1234, # Set seed for reproducibility
evolutionary_models = c("BM", "OU", "EB", "rate_trend", "lambda", "kappa", "delta"), # All possible models
control = list(niter = 200), # Example of additional parameters that can be pass down to geiger::fitContinuous() to control parameter optimization.
res = 100, # To set the resolution of the continuous mapping of trait value on the contMap
color_scale = c("darkgreen", "limegreen", "orange", "red"),
plot_map = FALSE,
# PDF_file_path = "./eel_contMap.pdf", # To export in PDF the contMap generated
return_ace = TRUE, # To include Ancestral Character Estimates (ACE) at nodes in the output
return_best_model_fit = TRUE, # To include the best model fit in the output
return_model_selection_df = TRUE, # To include the df for model selection in the output
verbose = FALSE)) # To display progress
# Plot with initial color_scale
plot_contMap(contMap = eel_cont_data$contMap,
fsize = c(0.6, 1)) # Adjust tip label size
# Plot with updated color_scale
plot_contMap(contMap = eel_cont_data$contMap,
color_scale = c("purple", "violet", "cyan", "blue"),
fsize = c(0.6, 1)) # Adjust tip label size
## Explore summary of model selection
eel_cont_data$model_selection_df # Summary of model selection
```