Feast

Tasty additions to BEAST 2

GitHub Repository: https://github.com/tgvaughan/feast

1. Overview

This is a small BEAST 2 (http://www.beast2.org) package which contains additions to the core functionality. The common thread which connects these additions is that they all work to increase the flexibility of BEAST and to empower users to set up a broader range of analyses without needing to write additional Java code.

You won't find new models here, but you will find different ways to combine existing models and make certain tasks easier.

1.1. Installation

You can install directly from within BEAUti by opening the package manager via File→"Manage packages", selecting "feast" and clicking the "Install/Upgrade" button.

1.2. Building from source

To build the package you'll need OpenJDK 17 or later, the OpenJFX SDK (https://openjfx.io) and Apache ANT (http://ant.apache.org). Once these dependencies are in place, issue the following command from the root directory of this repository:

JAVA_FX_HOME=/path/to/openjfx/libs ant

(Replace /path/t/openjfx/libs with the location of the OpenJFX libraries.) The package archive will be left in the dist/ directory.

2. Features

This section aims to be a complete list of the features feast provides, organised by topic.

2.1. Importing data from files

:CUSTOMID: id-2-Tree-and-Alignment-Input/Output

2.1.1. TreeFromNewickFile & TreeFromNexusFile

These classes allow trees to be loaded at runtime from Newick files rather than stored in the BEAST 2 XML:

<tree spec='feast.fileio.TreeFromNewickFile' fileName="example.tree"
    IsLabelledNewick="true" adjustTipHeights="false" .../>

or

<tree spec='feast.fileio.TreeFromNexusFile' fileName="example.nexus"
    IsLabelledNewick="true" adjustTipHeights="false" .../>

The optional attribute treeIndex can be used to specify the index (starting from zero) of the tree to read from the file.

For Nexus files, only the first trees block encountered is processed.

Both classes are thin wrappers around TreeParser, so all of the various inputs supported by that class are also supported. In particular, setting adjustTipHeights to "true" causes all tip ages to be set to zero, which is sometimes desirable when reading in trees with only extant tips in order to avoid rounding errors. (Such rounding errors can cause problems with tree priors such as BDSKY which use the age of tips to classify samples as resulting from ρ-sampling or ψ-sampling.)

2.1.2. TraitSetFromTaxonSet

This class allows TraitSet objects to be configured directly from TaxonSet objects. This is similar in spirit to the "auto configuration" option provided by BEAUti to set up tree tip dates from information encoded in the sequence names. While it's not directly i/o related, it solves a common problem that occurs when trees are loaded from an external file.

Use this as you would usually use a TraitSet, but instead of including the trait assignments directly in the body, specify inputs indicating how to extract the trait values from the taxon names.

For instance:

<trait spec="feast.fileio.TraitSetFromTaxonSet"
       delimiter="|"
       everythingAfterLast="true"
       traitname="date"
       dateFormat="yyyy-M-dd">
       <taxa id="taxonSet" spec="TaxonSet" alignment="@alignment"/>
</trait>

Instead of everythingAfterLast="true", one can also use everythingBeforeFirst="true" or takeGroup="N" where N is the index of the group (delimited by the chosen delimiter).

2.1.3. TraitSetFromXSV

This class allows TraitSet objects to be initialised from external CSV/TSV files.

Use this as you would use TraitSet, but instead of including the trait assignments in the body, use the inputs to direct the class to load the trait assignments from a file.

For example:

<trait spec="feast.fileio.TraitSetFromXSV"
       fileName="example.csv" sep=",">
  <taxa spec="TaxonSet" alignment="@alignment"/>
</trait>

loads trait values for a taxon set from the CSV file example.csv, which would have the following format

first_taxon, 12.3
second_taxon, 7.8
third_taxon, 15.4
...

The order of the lines in the input file is not important, and any taxon names in the file which don't match the taxa provided by the taxa input are ignored.

By default, taxon names are read from the first column (index 0) of the input file and values are read from the second (index 1). This behaviour can be customized through the taxonNameCol and traitValueCol inputs, which accept the indicies of the columns to use. All other columns in the input file besides the two specified here are ignored, so you can read multiple trait sets from a single file simply by specifying different traitValueCol values.

Finally, you can set the optional input skipFirstRow to "true" to cause TraitSetFromXSV to ignore the first line of the input file. This can be useful for CSV/TSV files which include a header line.

2.1.4. TaxonSetFromTree & TipDatesFromTree

These classs allow TaxonSet and TraitSet objects to be initialized from trees. Like TraitSetFromTaxonSet, these classes don't themselves perform any file i/o, but makes it easier to work with trees loaded from external files, or simulated inline.

An example use of TaxonSetFromTree is:

<taxonset spec="feast.fileio.TaxonSetFromTree" tree="@tree"/>

which could be included as an input to a TraitSetFromXSV as described in the previous section.

TipDatesFromTree is used similarly to create a TraitSet object representing the ages of tips on an input tree:

<traitset id="tipDates" spec="feast.fileio.TipDatesFromTree" tree="@tree">
  <taxa id="taxonSet" spec="feast.fileio.TaxonSetFromTree" tree="@tree"/>
</traitset>

2.1.5. AlignmentFromNexus/Fasta

This class allows alignments to be loaded at runtime from Nexus or Fasta files rather than stored in the BEAST 2 XML:

<!-- Nexus: -->
<alignment spec='feast.fileio.AlignmentFromNexus' fileName="example.nexus"/>

<!-- Fasta: -->
<alignment spec='feast.fileio.AlignmentFromFasta' fileName="example.fasta"/>

Sequence files can also be referenced using a URL:

<!-- Nexus: -->
<alignment spec="feast.fileio.AlignmentFromNexus"
           url="https://example.com/alignment.nexus"/>

<!-- Fasta: -->
<alignment spec="feast.fileio.AlignmentFromFasta"
           url="https://example.com/alignment.fasta"/>

The Fasta import uses the sequence labels from the file as taxon labels.

Obviously these classes runs contrary to BEAST's philosophy of keeping everything necessary to run an analysis in the XML file. However, it is sometimes convenient to be able to do this. As a bonus, the xmlFileName attribute can be used to write the appropriate XML fragment to disk.

2.1.6. Initialize RealParameters from CSV/TSV files

When setting up complicated analyses, it can be useful to extract some (fixed) parameter values from an external CSV or TSV (or whatever-SV) file. The RealParameterFromXSV and RealParameterFromLabelledXSV classes can be used to do this.

RealParameterFromXSV simply initializes the parameter with values read (in row-major order) from a rectangular portion of the table in the specified file. For instance,

<migrationRates spec="feast.fileio.RealParameterFromXSV"
                fileName="migration_rates.csv"
                sep=","
                startRow="1" rowCount="3"
                startCol="1" colCount="3"/>

could be used to initialize a 3x3 matrix of migration rates in some analysis using elements in a CSV file. The inputs startRow and startCol both default to "0", while leaving either rowCount and colCount undefined will cause values to be taken up until the end of the corresponding row or column.

RealParameterFromLabelledXSV works simillarly, but assumes that the either or both of the first row or column contain labels, and allows these labels to be used to specify elements. For example,

<samplingProportions spec="feast.fileio.RealParameterFromLabelledXSV"
    fileName="sampling.tsv"
    colLabels="sampling"
    rowLabels="early_cretaceous, late_cretaceous, jurassic"/>

could be used to set sampling rates for geological epochs from some external data file. If rowLabels is omitted, the entire column will be read. Similarly, if colLabels is omitted, the entire row is read.

2.2. Function (array) creation and manipulation

This section describes feast classes which implement the BEAST 2 Function interface. A Function here refers to an array of real numbers. (The naming is an unfortunate historical decision.) Many BEASTObject classes implement Function, including RealParameter and Tree. (In the example of Tree, the real numbers represent node ages.)

Many of the classes here take one or more Function objects as input, and act to transform or combine the inputs to yield the value of the output Function. This means that instances of these classes can be chained so as to achieve a variety of fairly complex transformations.

Note that all of the classes discussed here also implement the Loggable interface, meaning that they can be passed directly as inputs to the <logger/> elements in the XML.

2.2.1. Numeric sequences as Functions

An instance of feast.function.Sequence is a Function which contains a sequence of numbers evenly separated between two bounds. For instance, the following XML fragment represents a sequence of 101 numbers evenly spaced between 0 and 10, i.e. 0,0.1,0.2,…,9.9,10:

<function spec="feast.function.Sequence" start="0" stop="10" length="101"/>

2.2.2. Function Slicing

An instance of the feast.function.Slice class is a Function which represent or more elements of another Function. This allows element-specific priors to be set, and individual elements to be logged.

Here's a simple example of using Slice to place a prior on a subset of elements of a RealParameter:

<distrib spec="Prior">
    <x spec="feast.function.Slice" arg="@samplingProportion" index="2" count="1"/>
    <distr spec="LogNormalDistributionModel" M="0" S="1"/>
</distrib>

The optional by input can be used to increase the step size between elements. For example, here's how one could use Slice to apply a prior to elements 1,3,5 of the same RealParamter:

<distrib spec="Prior">
    <x spec="feast.function.Slice" arg="@samplingProportion" index="1" count="3" by="2"/>
</distrib>

2.2.3. Function reversing

Instances of the feast.function.Reverse class are Funtion and Loggable which represent the elements of another Function but in reverse order.

2.2.4. Function concatenation

An instance of the feast.function.Concatenate class is a Function whose elements are composed of the elements of their input Function objects joined together. For example, here is a Function representing a piecewise constant reproductive number to be used as input to the BDSKY tree prior, but composed of distinct input parameters rather than a single real parameter with multiple dimensions:

<reproductiveNumber spec="feast.function.Concatenate">
    <arg spec="RealParameter" value="1.0"/>
    <arg spec="RealParameter" value="2.0"/>
</reproductiveNumber>

2.2.5. Function scaling

Instances of feast.function.Scale allow one Function to be scaled by another:

<scaledBirthRate spec="feast.function.Scale">
  <function spec="RealParameter" value="1.0 2.0 3.0"/>
  <scaleBy spec="RealParameter" value="1.5"/>
</scaledBirthRate>

2.2.6. Function interleaving

Instances of feast.function.Interleave represent the result of interleaving the elements of several input Function objects. For example:

<twoDemeBirthRate spec="feast.function.interleave">
  <arg spec="RealParameter" value="1 3 5 7"/>
  <arg spec="RealParameter" value="2 4 6 8"/>
</twoDemeBirthRate>

is as a Function with elements 1,2,3,4,5,6,7,8.

Finally, instances of feast.function.UniqueElementCount are Function objects which contain a single integer representing the number of unique values contained in the elements of an input Function. For example,

<uniqueValues spec="feast.function.UniqueElementCount">
    <arg spec="RealParameter" value="1.5 10 1.5 1.4"/>
</uniqueValues>

is a Function with a value of 3.

2.3. Mathematical Expressions

The following classes, which are found in feast.expressions, provide support for parsing of simple arithmetic expressions.

2.3.1. ExpCalculator

Takes an arithmetic expression and returns the result by acting as a Loggable or a Function. Binary operators can be applied to Functions (including Parameters) of different lengths as in R, with the result having the maximum of the two lengths, and the index into the shortest parameter being the result index modulo the length of that parameter. True and false are represented by 1.0 and 0.0 respectively.

Example expressions follow. (I and J are IDs of RealParameters with elements {1.0, 2.0, 3.0} and {5.0, 10.0}, respectively.)

Expression Loggable/Function value
2.5*I {2.5, 5.0, 7.5}
I+J {6.0, 12.0, 8.0}
exp(I[0]) {2.718…}
{I,J} {1.0, 2.0, 3.0, 5.0, 10.0}
1:3 {1.0, 2.0, 3.0}
J[1:0] {10.0, 5.0}
1[1:3] {1.0, 1.0, 1.0}
-log(exp(J))/(1.5+0.5*I[0]) {-2.5, -5.0}
sqrt(J) {2.236…, 3.162…}
J! {1.0, 2.0, 6.0}
2I {2.0, 4.0, 6.0}
sum(I) {6.0}
cumsum(I) {1.0, 3.0, 6.0}
diff(I) {1.0, 1.0}
{sin(0), cos(0)} {0.0, 1.0}
{1,2,3}2 {1.0, 4.0, 9.0}
theta(I-2) {0.0, 1.0, 1.0}
I < 3 && I >=1 {1.0, 1.0, 0.0}
I < 3 ? 12 : 13 {12.0, 12.0, 13.0}
{min(J), max(J), len(J)} {5.0, 10.0, 2.0}
sort({5,1,3}) {1.0, 3.0, 5.0}

Note that since each ExpCalculator is a Function object itself, it can be used as the input for other ExpCalculators.

Inspired by RPNcalculator by Joseph Heled (BEAST 1, BEAST 2 port by Denise Kuehnert). (This uses a parser generated by ANTLR, which is cheating.)

2.3.2. ExpCalculatorDistribution

This is basically identical to ExpCalculator, but extends the abstract Distribution class, allowing the definition of arbitrary distributions over Rn at runtime. Distributions can be specified in terms of their log or directly as probabilities (/probability densities).

The file ECdistribTest.xml in the examples/ folder is a simple example where a multi-variate normal distribution is constructed. This example also illustrates the nested expression concept mentioned above. The file ECdistribArraytest.xml in the same folder samples the same posterior, but uses array notation to express the target density.

2.3.3. ExpCalculatorParametricDistribution

This provides a cut-down version of the ExpCalculatorDistribution functionality that is compatible with beast.math.distributions.Prior.

2.4. RealParameter initialisation

2.4.1. Initialize RealParameters from Functions

The RealParameterFromFunction class provides a RealParameter which is initialized from a Function. Thus you can do things like

<samplingRateChangeTimes spec="feast.parameter.RealParameterFromFunction">
    <function spec="feast.expressions.ExpCalculator" value="{0,max(taxonTimes)+0.1}">
        <arg id="taxonTimes" spec="feast.function.TraitSetAsFunction" traitSet="@dateTrait"/>
    </function>
</samplingRateChangeTimes>

which initializes a BDSKY sampling rate change time array so that the change automatically falls just before the oldest sample.

Beware that RealParameterFromFunction only retrieves the Function values during initialization. Changes in the Function values after initialization will not be reflected in the parameter!

2.4.2. Initializing RealParameters using formatted times

Particularly in birth-death models, it is often desirable to specify several points in time as ages relative to the "present". This is done when setting up change times for birth-death parameters, and requires one to do a fairly large number of error-prone calculations involving converting date strings like "05/03/1980" and "27/02/1980" to fractional years before some later date such as "01/01/1990". Once converted, these values become extremely important yet difficult to interpret entries in the final BEAST XML file.

The TimeParameter class is intended to combat this. It can be used anywhere that a RealParameter is expected. For example, to create a RealParameter with the ages corresponding to the times above, use

<parameter spec="feast.parameter.TimeParameter"
           time="05/03/1980 27/02/1980"
           mostRecentSampleTime="01/01/1990"
           timeFormat="dd/MM/yyyy"/>

This is completely equivalent to

<parameter spec="RealParameter"
           value="9.825137 9.844262"/>

but much more readable.

In addition, the timeEarlier and timeLater inputs can be used to set the bounds of the parameter. These are equivalent to upper and lower inputs of RealParameter, but accept formatted times.

Note that the time format specification string is the same as used by the dateFormat input to TraitSet, and applies both to the values of time and the value of mostRecentSampleTime. If this format specifier is not provided, the time is assumed to have already been converted to fractional years, so the only action taken by TimeParameter is to subtract this value from mostRecentSampleTime.

2.4.3. Randomly initializing RealParameter values

Sometimes several attempts are needed to find a acceptable starting state for MCMC. An initializer for trees, capable of generating a new starting tree at each attempt is implemented in BEAST. RandomRealParameter allows you to sample new initial parameter values at each attempt. The distribution, from which to sample, can be the parameter prior or any other proper distribution.

For example, here we sample a birth rate value from a uniform distribution at each initialization attempt:

<init id="birthRateInitializer" spec="feast.parameter.RandomRealParameter"
    initial="@birthRate">
    <Uniform id="Uniform.birthRate" name="distr" lower="0" upper="1.5"/>
</init>

2.5. RealParameter and IntegerParameter Operators

2.5.1. Scaling subsets of RealParameter elements together

It's occasionally necessary to tie multiple elements of a RealParameter together, for instance to represent a vector of rates where a subset of these rates are conditioned to be identical. The BlockScaleOperator operator makes this possible. It behaves essentially identically to the standard BEAST 2 ScaleOperator, with inputs "parameter" and "indicator" specifying respectively the RealParameter to scale and the BooleanParameter indicating which elements to scale. The differences are that BlockScaleOperator

  1. always scales the selected elements together (i.e. using the same scale factor), and
  2. it correctly takes into account the number of degrees of freedom (i.e. unique element values) among the selected elements.

An example usage of this operator is as follows:

<operator spec="feast.operators.BlockScaleOperator" parameter="@paramToScale"
          scaleFactor="0.8" weight="1.0">
    <indicator spec="BooleanParameter" value="true true false false" estimate="false"/>
</operator>

Here "@paramToScale" references a RealParameter with 4 elements, and the operator jointly scales the first two.

An alternative approach is available in the SmartScaleOperator, which does not use indicator variables but instead considers elements with identical values to represent the same random variable. Thus, the following achieves the same result as above, with the added requirement that values to be scaled together must be initialised to the same value:

<state>
    <stateNode spec="RealParameter" id="paramToScale" value="1 1 2 2"/>
</state>

<operator spec="feast.operators.SmartScaleOperator" parameter="@paramToScale"
          scaleFactor="0.8" weight="1.0"/>

The SmartRealRandomWalkOperator acts analogously, but implements a random walk equivalent to BEAST's RealRandomWalkOperator applied only to unique elements.

2.5.2. Operators on IntegerParameters

Parameters of type IntegerParameter require special operators. Beast has two main operators for this type: IntUniformOperator (now deprecated in favour of UniformOperator) and IntRandomWalkOperator. Feast exptends this collection with a pair of operators allowing for joint operation on a subset of elements of a single IntegerParameter. The new operators are:

  1. BlockIntUniformOperator and
  2. BlockIntRandomWalkOperator.

These operators take the same inputs as the vanilla (non-block) BEAST 2 versions, but with the addition of an input named indicator which takes the same BooleanParameter input as the similarly-named input of the BlockScaleOperator described above.

2.6. Coalescent population functions

2.6.1. Time-shifted coalescent population functions

The ShiftedPopulationModel class allows you to create a new PopulationFunction which is a time-shifted view of an existing population function.

For example, here is a time-shifted exponential growth population mode:

<populationModel spec="feast.popmodels.ShiftedPopulationModel" offset="0.5">
    <populationModel spec="ExponentialGrowth" popSize="1.0" growthRate="1.0"/>
</populationModel>

At time 0, the population size of this shifted model is the same as the population size of the exponential growth model at time 0.5.

The offset input expects a Function value, so RealParameter objectss etc. can also be used here.

2.6.2. Compound coalescent population functions

BEAST 2 includes several commonly-used coalescent population models such as a constant population function and an exponential growth function. To add other population functions required writing a new Java class. The CompoundPopulationModel class lifts this restriction.

This class allows you to combine existing population functions (i.e. objects which implement the PopulationFunction interface) in a piecewise manner to construct arbitrarily complicated compound models.

For example, here is a piecewise-constant population function created by joining together three ConstantPopulation models:

<populationModel spec="feast.popmodels.CompoundPopulationModel">
    <populationModel spec="ConstantPopulation"> <popSize spec="RealParameter" value="5.0"/></populationModel>
    <populationModel spec="ConstantPopulation"> <popSize spec="RealParameter" value="10.0"/></populationModel>
    <populationModel spec="ConstantPopulation"> <popSize spec="RealParameter" value="2.0"/></populationModel>
    <changeTimes spec="RealParameter" value="1.0 3.0"/>
</populationModel>

The changeTimes input specifies the times of the transitions between models, and thus must have one less element than the number of constituent models.

Setting the boolean makeContinuous input to "true" (by default it is "false") will cause all component populations besides the most recent (i.e. first) to be scaled so that the resulting compound population function is continuous. While this is useless for compound population functions composed solely of constant components (as it will result in a single constant population function), this is useful for creating piecewise-exponential functions or mixtures between exponential and constant components.

The file piecewiseCoalescent.xml illustrates how this class can be used in a simple coalescent analysis.

2.6.3. Expression coalescent population functions

The ExpressionPopulationModel class allows you to specify a population function symbolically using a mathematical expression evaluated at runtime.

For example, here is a sinusoidally-varying population function with an exponentially-decaying amplitude:

<populationModel spec="feast.popmodels.ExpressionPopulationModel">
  exp(0.1t)*(cos(2*t) + 2)
</populationModel>

These models can also depend on one or more RealParameter or Function objects:

<populationModel spec="ExpressionPopulationModel">
  exp(gamma*t)*(cos(omega*t) + 2)
  <arg id="omega" spec="RealParameter" value="0.3"/>
  <arg idref="gamma"/>
</populationModel>

taking care that these objects have IDs defined, since these IDs are used to refer to the objects inside of the expression.

The optional isLog input (boolean) can be used to indicate that the value of the expression is the natural logarithm of the population size rather than the population size itself.

The optional maxTimeToConsider input (double) can be used to change the maximum time to be considered when numerically inverting the "intensity function" (integral of 1/N(t)), a procedure necessary for simulating coalescent trees using this model.

For a complete description of the arithmetic expression syntax which can be used to specify population dynamics, refer to 2.3.1.

Beware: ExpressionPopulationModel uses numerical integration and numerical root finding algorithms to compute the "intensity" (integral of 1/N(t)) and its inverse. The current implementation of this algorithm, particularly the inverse, is slow and susceptible to numerical errors. Take care when using this population model, particularly for quickly-changing population functions!

2.7. Model selection and averaging

2.7.1. DirichletProcessOperator and DirichletProcessPrior

The DirichletProcessPrior and DirichletProcessOperator allow the application of Dirichlet Process Priors (DPPs) to the elements of ~Function~s.

For example, suppose you have a model with parameter X to which you wish to apply such a prior. To do this, include the following prior for X in the target density:

<distribution spec="feast.modelselect.DirichletProcessPrior" parameter="@X">
    <scaleParameter id="dppScale" spec="RealParameter" value="1.0"/>
    <baseDistrib id="dppBase" spec="LogNormalDistributionModel" M="0.8" S="0.5"/>
</distribution>

You also need to include the following operator:

<operator spec="DirichletProcessOperator" parameter="@X"
          scaleParameter="@dppScale" baseDistrib="@dppBase" weight="1.0"/>

Other operators on X can be included, but only if they do not alter the number of elements of X which are identical.

2.7.2. ModelSelectionParameter

While model selection is extremely complicated in general, it can be almost trivial when the number of models you wish to consider is small and the models can be regarded as special cases of either each other (i.e. the are nested) or can easily expressed as special cases of a single general model.

The ModelSelectionParameter is created with this kind of BSSVS-style model selection in mind. It is a CalculationNode implementing Function which takes one or more parameters (also Function~s) as input, together with an ~IntegerParameter known as the selection index. The ModelSelectionParameter acts as though it is the parameter specified by the selection index.

The simplest application of this is to allow BEAST to select between models 1 and 2, where in model 1 a parameter (eg the extinction rate) is zero, while in model 2 the extinction rate has some unknown non-zero value.

We can implement this by defining the death rate to be a ModelSelectionParameter:

<deathRate id="deathRate" spec="feast.modelselect.ModelSelectionParameter">
    <parameter spec="RealParameter" value="0.0"/>
    <parameter id="nonZeroDeathRate" spec="RealParameter" value="1.0"/>
    <selectionIndices id="selectionIdx" spec="IntegerParameter" value="1" lower="0" upper="1"/>
</deathRate>

We then use this "deathRate" parameter as input to the tree prior, place some sensible prior on "nonZeroDeathRate", and add an operator to modify the "selectionIdx" IntegerParameter. The result is equivalent to placing a "spike and slab" prior on the death rate parameter, which is a mixture between a point mass at zero and whatever prior you place on "nonZeroDeathRate".

In addition to this simple case, if "selectionIndices" is an IntegerParameter with more than one element, the additional ModelSelectionParameter input, "thisIndex", can be used to specify which element to use to select the parameter. This allows you to sample across models where, for example, subsets of different parameters are allowed to be exactly equal. In this case, each model parameter is its own ModelSelectionParameter, shares exactly the same set of "parameter" and "selectionIndices" inputs, but has its own unique value of "thisIndex".

Beware that ModelSelectionParameter is that is only compatible with distributions which take Function objects rather than RealParameter objects as input.

2.7.3. Sampling from a discrete set of states

Occasionally we wish to characterise the posterior for a parameter or tree which is constrained to a small set of discrete traits. This can be accomplished using the DiscreteUniformJumpOperator.

For instance, suppose we have a small set of trees we want the MCMC to sample from. We can accomplish this as follows:

<operator spec="feast.modelselect.DiscreteUniformJumpOperator" x="@tree" weight="1">
  <possibility spec="TreeParser" adjustTipHeight="false" taxa="@alignment" estimate="false"
               newick="((((1:0.232 ..."/>
  <possibility spec="TreeParser" adjustTipHeight="false" taxa="@alignment" estimate="false"
               newick="(((5:1.2321 ..."/>
  <possibility spec="TreeParser" adjustTipHeight="false" taxa="@alignment" estimate="false"
               newick="((((3:4.222 ..."/>
</operator>

where the "…" are the remainder of the newick strings.

Take care when using this operator that the state nodes specified by the <possibility/> elements are actually possible values of x. I.e. parameters must have the same dimension and satisfy the bounds of x, and trees must correspond to the same set of taxa as x.

2.8. Simulation and Testing

2.8.1. Easy state simulation

One BEAST 2 idiom which is pervasive in my code at least is the existance of StateNodes that initialize their own state stochastically. RandomTree is an example of this in the core: this class is a Tree that initializes itself by drawing from a coalescent distribution with a given population function.

While such classes usually exist to choose a starting state for the MCMC chain, it is often necessary - particularly in the validation phase of model development - to simulate a large number of instances of this simulated state. This is usually because the simulation is often done from some known distribution, which can then be compared against the output of MCMC.

The class feast.simulation.GPSimulator (from "general purpose simulator") class simply makes it possible to configure this kind of simulation using a BEAST XML file. It is simply a beast.core.Runnable that takes a BEASTObject as input, a number of iterations and some loggers. For each iteration, it then simply calls the BEASTObject's initAndValidate method and tells the loggers to write their results.

See examples/SimulateCoalescentTrees.xml for an example of using GPSimulator to simulate 100 coalescent trees.

2.8.2. Sequence alignment simulation

While BEAST contains its own alignment simulator in beast.app.seqgen.SimulatedAlignment, this can be awkward to use as it requires pre-specifying the taxa to associate with the simulated alignment.

The feast.simulation.SimulatedAlignment class is a simpler alignment simulator which lifts this requirement. For example, the following will initialize an alignment by simulating sequences down the input tree using a Jukes-Cantor substitution model:

<alignment spec="feast.simulation.SimulatedAlignment"
           outputFileName="simulated_alignment.nexus"
           tree="@tree" sequenceLength="1000">
    <siteModel spec="SiteModel">
        <substModel spec="JukesCantor"/>
    </siteModel>
</alignment>

The class supports the same inputs as Alignment and beast.app.seqgen.SimulatedAlignment, with the exception that the data input should be omitted - it will cause strange errors if you include it. Also, only strict clocks are currently supported, so there is no branchRateModel input.

feast.simulation.SimulatedAlignment also allows you to specify a known ancestral sequence for the simulation. To do this, supply a Sequence to the startingSequence input as follows:

<alignment spec="feast.simulation.SimulatedAlignment"
           outputFileName="simulated_alignment.nexus"
           tree="@tree" sequenceLength="10">
  <startingSequence spec="Sequence" taxon="ancestor"> GATCGATCGA </startingSequence>
  <siteModel spec="SiteModel">
    <substModel spec="JukesCantor"/>
  </siteModel>
</alignment>

When used with startingSequence, the optional input startingSequenceAge can be used to specify the age relative to the most recent leaf at which the lineage above the root (the stem lineage) held the specified sequence. This might be useful for birth-death models which consider the stem lineage to be part of the tree.

2.8.3. DensityMapper

This class provides an easy way to produce plots of the variation in a particular Distribution (i.e. probability density) as a function of one or more parameters. The class extends the BEAST Runnable abstract class and thus takes the place of the MCMC class in a regular BEAST XML. To use DensityMapper you'll therefore need to create an XML with the following general form:

<beast version="2.0">
    <run spec="feast.mapping.DensityMapper">
        <distribution id="distrib" ... />

        <realParam spec="RealParameter" id="paramA"
            value="V_A" lower="LOWER_A" upper="UPPER_A"/>
        <steps spec="IntegerParameter"
            value="STEPS_A"/>

        <realParam spec="RealParameter" id="paramB"
            value="V_B1 V_B2" lower="LOWER_B" upper="UPPER_B"/>
        <steps spec="IntegerParameter"
            value="STEPS_B1 STEPS_B2"/>

        <logger spec="Logger" logEvery="1">
            <log idref="paramA"/>
            <log idref="paramB"/>
            <log idref="distrib"/>
        </logger>

        <!-- Additional (screen?) loggers -->
    </run>
</beast>

The <realParam> elements specify which parameters are varied, the upper and lower bounds of these parameters specify the range over which the parameter is varied, and the corresponding <steps> elements specify the number of steps to use for each parameter. When the number of steps is 1 the parameter is not varied and instead the contents of the value attribute is used.

When parameters are vectors, the associated <steps> parameter may have dimension 1 in which case all elements of the vector parameter are varied together (e.g. [0,0], [0.1,0.1], …, [1, 1]). Alternatively, the steps parameter may have dimension equal to that of the real parameter in which case each element of the parameter is varied independently with the corresponding number of steps (e.g. [0,0], [0, 0.1], …, [0, 1], [0.1, 0.0], …, [1, 1]). To keep one component of a vector parameter fixed, set the corresponding steps element to 1.

If, in addition to the <realParam> and <steps> elements a <logScale> BooleanParameter element is provided, DensityMapper will evaluate the densities at a logarithmic distribution of parameter values between the same bounds. I.e., the following snippet will cause paramA to be set to values logarithmically distributed between LOWER_A and UPPER_A:

<realParam spec="RealParameter" id="paramA"
    value="V_A" lower="LOWER_A" upper="UPPER_A"/>
<steps spec="IntegerParameter"
    value="STEPS_A"/>
<logScale spec="BooleanParameter"
    value="true"/>

Take care that if the log-scale element is provided for one parameter that it is also provided for all other parameters.

An example usage of DensityMapper which produces the variation in a coalescent tree density as a function of the population size is provided as DensityMapper.xml which can be found in the examples directory.

2.8.4. Sequence alignment shuffling

Occasionally, for the purpose of date randomization tests, it is useful to be able to randomize the association between taxon names and the input sequences. The ShuffledAlignment class makes this easy. Simply replace the existing alignment definition in the XML, which is usually something like

<alignment id="data" spec="Alignment"> ... </alignment>

with the folllowing:

<alignment id="data" spec="feast.simulation.ShuffledAlignment">
  <alignment spec="Alignment"> ... </alignment>
</alignment>

When doing this, take care that you move any ID defined on the original alignment object to the new ShuffledAlignment object.

2.9. Log File Post-processing

Occasionally it's useful to be able to resurrect BEAST states from trace and tree log files in order to perform further processing. This is useful, for instance, if you realise that it would have been useful to compute additional tree statistics. Or perhaps you'd like to stochastically map ancestral traits onto trees belonging to the posterior of a multi-type birth-death analysis. The feast.fileio.LogFileIterator class and its friends allow you to do these things and more.

The LogFileIterator class is a Runnable class, meaning that you use it in place of BEAST's usual MCMC class at the top level. A trivial example is as follows:

<beast version='2.0' namespace='feast.fileio.logfileiterator:beast.evolution.tree:beast.core.parameter'>

  <run spec="LogFileIterator">
    <logFileState spec="TraceLogFileState" logFileName="original_analysis.log">
      <logFileEntry spec="LogFileRealParameter" fieldName="hky.kappa">
        <fieldParameter id="kappa" spec="RealParameter" value="0.0"/>
      </logFileEntry>
    </logFileState>

    <logFileState spec="TreeLogFileState" logFileName="original_analysis.trees">
      <tree spec="beast.evolution.tree.Tree" id="tree"/>
    </logFileState>

    <logger logEvery="10000" fileName="processed.log">
      <log idref="recoveredParameter"/>
      <log id="treestat" spec="beast.evolution.tree.TreeStatLogger" tree="@tree"/>
    </logger>

    <logger logEvery="100000">
      <log idref="kappa"/>
      <log idref="treestat"/>
    </logger>
  </run>
</beast>

In this example (which is also provided in the examples directory as TreeLogFileIteratorTest.xml) we use the "hky.kappa" field of the original_analysis.log trace log file to set the value of a RealParameter named "kappa". Simultaneously, we use the trees in the tree log file original_analysis.trees to set the value of the Tree named "tree". The value of "kappa" and the height and length statistics of "tree" are then written to the file "processed.log" and the screen in the usual way using loggers.

2.10. Single BEAST 2.7 Jar files

Unlike earlier versions, BEAST 2.7 has a sophisticated system for controlling which package classes are available to use in analyses. While this is a great step forward, its implementation currently precludes building the stand-alone jar file bundles of BEAST 2 and its packages which are so helpful for collaboration using in-development BEAST 2 models.

Feast now includes a class feast.app.FeastMain which addresses this problem. It is a small wrapper around beastfx.app.beast.BeastMain which, before handing control of the execution over to BEAST, seaches through all of the directories and jar files in the runtime classpath for files named beast_services.xml. When it finds one of these files, FeastMain treats these the same as a standard version.xml file by registering all of the services it contains.

In order to use this wrapper to make a stand-alone jar file for a package, follow these steps:

  1. Copy (or symlink) any necessary version.xml files into a package directory below the src/ directory of your project. You can place them in any package directory you like, but be sure to name each file beast_services.xml. Beware: jar files share a single namespace, so avoid placing the file directly in the src/ directory itself when bundling multiple BEAST 2 projects into the jar.
  2. Build a jar file with the Main-Class: attribute in MANIFEST.MF set to feast.app.FeastMain. (You don't necessarily need to edit this file directly, most software that builds jar files will ask you at some point for the main class: simply enter feast.app.FeastMain when asked.)

To launch an analysis using this jar file (here we suppose it's named my.jar), use either

java -jar my.jar analysis.xml

or, if you are not using a combined openjdk+openjfx installation,

java --module-path=/path/to/openjfx/libs --add-modules javafx.fxml -jar my.jar analysis.xml

here /path/to/openjfx/libs specifies the location of the JavaFX libraries.

2.11. Feast Query: BEAST 2.7 Object Browser

Feast includes a simple GUI application called FeastQuery which can be launched using BEAST 2.7's AppLauncher utility or by selection "launch apps" from BEAUti's File menu.

FeastQuery provides information about the XML-compatible BEASTObjects in all of the packages currently installed on your system. It is intended to be used as an aid to writing and editing BEAST 2.7 XML scripts.

feastquery.png

The documentation provided by FeastQuery is mostly equivalent to that provided by the BEAST 2.7 core utility, DocMaker. The main advantages of FeastQuery are:

  1. It doesn't require you to create, store and continually update a directory full of HTML files.
  2. It doesn't require a separate web browser.
  3. It has a dynamic filtering facility that lets you quickly identify the class you're looking for.

3. License

The feast package is free (as in freedom). With the exception of the libraries on which it depends, it is made available under the terms of the GNU General Public Licence version 3, which is distributed with feast or may be found online at https://www.gnu.org/licenses/gpl-3.0.html.

Author: Tim Vaughan

Created: 2024-02-09 Fri 09:57