First Steps with BEAST 2
Tim Vaughan
(Slides adapted from a presentation by Louis du Plessis)
cEvo group, D-BSSE, The ETH Zurich
Taming the BEAST, 18th Feb, 2019

Why are we here?

We all have one thing in common...

We all use (or want to use) BEAST 2 to analyse our data.

But how?

Bayesian inference

\begin{equation*} P(\color{blue}{\text{model params}}|\color{darkorange}{\text{data}}) = \frac{P(\color{darkorange}{\text{data}}|\color{blue}{\text{model params}})P(\color{blue}{\text{model params}})}{P(\color{darkorange}{\text{data}})} \end{equation*}
\begin{equation*} \text{Posterior} = \frac{\text{Likelihood}\times\text{Prior}}{\text{Model Evidence}} \end{equation*}
Prior
Original probability for the model parameters/components. All parameters have priors, whether you specify them or not!
Posterior
Updated probability for the model parameters in light of the data.
Likelihood
Probability of data give parameters (defined by model).
Model Evidence
Probability for data given model (any combination of parameters): used for Bayesian model selection.

What goes into a BEAST 2 Model?

The Data

  • Typically an alignment of RNA or DNA sequences.
  • Can also be an amino acid sequence.
  • Often split into multiple "partitions" to allow specific sites to evolve at diffferent rates, e.g.
    • Coding/non-coding sites,
    • Different codon positions (1+2 vs 3).
  • Not necessarily composed of genetic sequences!

What goes into a BEAST 2 Model?

The Genealogy

The fundamental genealogical structure in BEAST 2 is the rooted time tree:

  • This tree is a "sampled" or "reconstructed" tree.
  • Displays relationships between sampled individuals/taxa only.
  • In contrast to the "full" tree including unsampled individuals/taxa.

What goes into a BEAST 2 Model?

Different population dynamics produce different phylogenetic tree shapes

Demographic model

  • Describes the population (incl. individuals/species/etc.) dynamics.
  • How does a population of organisms change through time?
  • How does the species richness change through time?
  • Described by the "tree prior".
  • Usually a birth-death or coalescent model.

What goes into a BEAST 2 Model?

Site model

  • Links the genome sequences to the genealogy.
  • We observe sequences at the tips, not their histories.
  • Multiple substitutions at the same site means not all substitutions are observed.
  • Site model describes rates of substitution between available characters relative to genetic distance, as well as equilibrium frequencies of characters.
  • May also permit site-to-site rate variation.

What goes into a BEAST 2 Model?

Clock model

  • Determines how quickly sequences are evolving along the tree.
  • Different tree edges may have the same or different clock rates:
    • Strict clock models: all edges have equal rates
    • Relaxed clock models: edges may take different rates

Putting it all together...

Assume independence of various model components:

BEAST 2 In Practice

BEAST 2 Workflow

BEAUti: Bayesian Evolutionary Analysis Utility

Graphical tool for setting up a BEAST analysis.
Input
Genetic sequence data, together with other data sources (locations, sample times, etc.)
Output
Compact XML description of data, model and prior distributions.

BEAUti: Bayesian Evolutionary Analysis Utility

BEAUti: Bayesian Evolutionary Analysis Utility

BEAST 2: Bayesian Evolutionary Analysis by Sampling Trees

  • Performs MCMC analyses of sequences under selected sequence evolution and tree (epidemiological/speciation) model.
  • Similar to BEAST 1.8.4/1.10 but completely separate and generally incompatable.
  • BEAST2 and BEAST1 have a common origin, have much of the same functionality but have diverged over time.
  • BEAST2 has a modular design that makes it easy to extend.
Input
XML model description file
Output
  • (Trace) log file
  • Tree (log) file

BEAST 2: Bayesian Evolutionary Analysis by Sampling Trees

BEAST 2 Packages

  • BEAST 2 is organized into a central "core" together with a large number of separate "packages".
  • Packages can be developed by anybody - including you!
  • Can be directly integrated into BEAST 2 and updated frequently without waiting for a full BEAST 2 release.
  • Packages add new models or completely new functionality
:
    • Phylogeography,
    • bacterial ARG inference,
    • morphological models,
    • model selection and averaging,
    • stochastic simulations,
    • ...
  • Install new packages through BEAUti.

BEAST 2 Packages

Tracer (http://beast.community)

  • Analyse (parameter) log files from BEAST2 runs
  • Assess mixing, ESS, ACT, parameter correlations
  • Provides overview of posterior parameter estimates
  • Comparisons of several analyses.
  • Tracer is primarily a diagnostic tool — usually want to perform final analyses in a statistical package like R.
Input
One or more log files
Output
Insight

Tracer (http://beast.community)

TreeAnnotator (included with BEAST 2)

  • Analyse trees file from BEAST2 runs.
  • Produces single summary tree with node annotations
 (including clade posterior probabilities).
  • Positions internal nodes according to average taxon set MRCA times in trees file.
  • Note that the MCC tree is just a heuristic summary: may produce negative edge lengths when topological uncertainty is large!
Input
Tree log file
Output
File containing annotated summary tree

TreeAnnotator (included with BEAST 2)

FigTree (tree.bio.ed.ac.uk/software/figtree/)

  • Visualise trees from BEAST2 runs.
  • Annotate branches and nodes with probabilities and labels.
  • Many different tree visualisation styles: circular, unrooted, etc.
  • Allows highlighting of particular clades, colouring of edges and more.
Input
Tree file (eg. TreeAnnotator output)
Output
Tree visualisation

FigTree (tree.bio.ed.ac.uk/software/figtree/)

IcyTree (icytree.org)

  • Similar to FigTree, but places an emphasis on quick visualisation rather than publication quality output.
  • Only rooted rectangular style visualisation supported.
  • Rudimentary support for phylogenetic networks.
  • Web app: no installation required, just visit icytree.org.

IcyTree (icytree.org)

ggtree
(guangchuangyu.github.io/software/ggtree/)

  • R-package to visualise trees using something like Hadley Wickham's grammar of graphics (ggplot)
  • Works with BEAST2 tree files (and many other packages)
  • Can be easily annotate trees with other analyses in R

Summary: BEAST 2 Workflow

Summary: Tools of the Trade

BEAST2
Software implementing MCMC for model parameter and
 tree inference
BEAUti2
Part of BEAST2 package for setting up the input file (.xml)
Tracer
Analysis of BEAST1 and BEAST2 output files (.log)
TreeAnnotator
Analysis of BEAST2 output files (.trees)
FigTree, IcyTree, ggtree
Visualisation of trees (.trees)

Tutorial

Open the tutorial webpage at

taming-the-beast.org/tutorials/Introduction-to-BEAST2/

or follow the link from the workshop programme page.