Feast
Tasty additions to BEAST 2
GitHub Repository: https://github.com/tgvaughan/feast
1. Overview
This is a small BEAST 2 (http://www.beast2.org) package which contains additions to the core functionality. The common thread which connects these additions is that they all work to increase the flexibility of BEAST and to empower users to set up a broader range of analyses without needing to write additional Java code.
You won't find new models here, but you will find different ways to combine existing models and make certain tasks easier.
1.1. Installation
You can install directly from within BEAUti by opening the package manager via File→"Manage packages", selecting "feast" and clicking the "Install/Upgrade" button.
1.2. Building from source
To build the package you'll need OpenJDK 17 or later, the OpenJFX SDK (https://openjfx.io) and Apache ANT (http://ant.apache.org). Once these dependencies are in place, issue the following command from the root directory of this repository:
JAVA_FX_HOME=/path/to/openjfx/libs ant
(Replace /path/t/openjfx/libs
with the location of the OpenJFX libraries.)
The package archive will be left in the dist/
directory.
2. Features
This section aims to be a complete list of the features feast provides, organised by topic.
2.1. Importing data from files
2.1.1. TreeFromNewickFile & TreeFromNexusFile
These classes allow trees to be loaded at runtime from Newick files rather than stored in the BEAST 2 XML:
<tree spec='feast.fileio.TreeFromNewickFile' fileName="example.tree" IsLabelledNewick="true" adjustTipHeights="false" .../>
or
<tree spec='feast.fileio.TreeFromNexusFile' fileName="example.nexus" IsLabelledNewick="true" adjustTipHeights="false" .../>
The optional attribute treeIndex
can be used to specify the index (starting
from zero) of the tree to read from the file.
For Nexus files, only the first trees block encountered is processed.
Both classes are thin wrappers around TreeParser
, so all of the various inputs
supported by that class are also supported. In particular, setting adjustTipHeights
to "true" causes all tip ages to be set to zero, which is sometimes desirable
when reading in trees with only extant tips in order to avoid rounding errors.
(Such rounding errors can cause problems with tree priors such as BDSKY
which use the age of tips to classify samples as resulting from
ρ-sampling or ψ-sampling.)
2.1.2. TraitSetFromTaxonSet
This class allows TraitSet
objects to be configured directly from
TaxonSet
objects. This is similar in spirit to the "auto
configuration" option provided by BEAUti to set up tree tip dates from
information encoded in the sequence names. While it's not directly
i/o related, it solves a common problem that occurs when trees are
loaded from an external file.
Use this as you would usually use a TraitSet
, but instead of including the
trait assignments directly in the body, specify inputs indicating how to extract
the trait values from the taxon names.
For instance:
<trait spec="feast.fileio.TraitSetFromTaxonSet" delimiter="|" everythingAfterLast="true" traitname="date" dateFormat="yyyy-M-dd"> <taxa id="taxonSet" spec="TaxonSet" alignment="@alignment"/> </trait>
Instead of everythingAfterLast="true"
, one can also use everythingBeforeFirst="true"
or
takeGroup="N"
where N is the index of the group (delimited by the chosen delimiter).
2.1.3. TraitSetFromXSV
This class allows TraitSet
objects to be initialised from external
CSV/TSV files.
Use this as you would use TraitSet
, but instead of including the
trait assignments in the body, use the inputs to direct the class
to load the trait assignments from a file.
For example:
<trait spec="feast.fileio.TraitSetFromXSV" fileName="example.csv" sep=","> <taxa spec="TaxonSet" alignment="@alignment"/> </trait>
loads trait values for a taxon set from the CSV file example.csv
, which
would have the following format
first_taxon, 12.3 second_taxon, 7.8 third_taxon, 15.4 ...
The order of the lines in the input file is not important, and any taxon
names in the file which don't match the taxa provided by the taxa
input
are ignored.
By default, taxon names are read from the first column (index 0) of
the input file and values are read from the second (index 1). This
behaviour can be customized through the taxonNameCol
and traitValueCol
inputs, which accept the indicies of the columns to use. All other
columns in the input file besides the two specified here are ignored,
so you can read multiple trait sets from a single file simply by
specifying different traitValueCol
values.
Finally, you can set the optional input skipFirstRow
to "true" to cause
TraitSetFromXSV
to ignore the first line of the input file. This can
be useful for CSV/TSV files which include a header line.
2.1.4. TaxonSetFromTree & TipDatesFromTree
These classs allow TaxonSet
and TraitSet
objects to be initialized from trees.
Like TraitSetFromTaxonSet
, these classes don't themselves perform any file i/o, but
makes it easier to work with trees loaded from external files, or simulated inline.
An example use of TaxonSetFromTree
is:
<taxonset spec="feast.fileio.TaxonSetFromTree" tree="@tree"/>
which could be included as an input to a TraitSetFromXSV
as
described in the previous section.
TipDatesFromTree
is used similarly to create a TraitSet
object
representing the ages of tips on an input tree:
<traitset id="tipDates" spec="feast.fileio.TipDatesFromTree" tree="@tree"> <taxa id="taxonSet" spec="feast.fileio.TaxonSetFromTree" tree="@tree"/> </traitset>
2.1.5. AlignmentFromNexus/Fasta
This class allows alignments to be loaded at runtime from Nexus or Fasta files rather than stored in the BEAST 2 XML:
<!-- Nexus: --> <alignment spec='feast.fileio.AlignmentFromNexus' fileName="example.nexus"/> <!-- Fasta: --> <alignment spec='feast.fileio.AlignmentFromFasta' fileName="example.fasta"/>
Sequence files can also be referenced using a URL:
<!-- Nexus: --> <alignment spec="feast.fileio.AlignmentFromNexus" url="https://example.com/alignment.nexus"/> <!-- Fasta: --> <alignment spec="feast.fileio.AlignmentFromFasta" url="https://example.com/alignment.fasta"/>
The Fasta import uses the sequence labels from the file as taxon labels.
Obviously these classes runs contrary to BEAST's philosophy of keeping
everything necessary to run an analysis in the XML file. However, it
is sometimes convenient to be able to do this. As a bonus, the
xmlFileName
attribute can be used to write the appropriate XML
fragment to disk.
2.1.6. Initialize RealParameters from CSV/TSV files
When setting up complicated analyses, it can be useful to extract some
(fixed) parameter values from an external CSV or TSV (or whatever-SV) file.
The RealParameterFromXSV
and RealParameterFromLabelledXSV
classes can be
used to do this.
RealParameterFromXSV
simply initializes the parameter with values read
(in row-major order) from a rectangular portion of the table in the specified
file. For instance,
<migrationRates spec="feast.fileio.RealParameterFromXSV" fileName="migration_rates.csv" sep="," startRow="1" rowCount="3" startCol="1" colCount="3"/>
could be used to initialize a 3x3 matrix of migration rates in some analysis
using elements in a CSV file. The inputs startRow
and startCol
both
default to "0", while leaving either rowCount
and colCount
undefined
will cause values to be taken up until the end of the corresponding row
or column.
RealParameterFromLabelledXSV
works simillarly, but assumes that the either
or both of the first row or column contain labels, and allows these labels
to be used to specify elements. For example,
<samplingProportions spec="feast.fileio.RealParameterFromLabelledXSV" fileName="sampling.tsv" colLabels="sampling" rowLabels="early_cretaceous, late_cretaceous, jurassic"/>
could be used to set sampling rates for geological epochs from some
external data file. If rowLabels
is omitted, the entire column
will be read. Similarly, if colLabels
is omitted, the entire row
is read.
2.2. Function (array) creation and manipulation
This section describes feast classes which implement the BEAST 2 Function
interface.
A Function
here refers to an array of real numbers. (The naming is an unfortunate
historical decision.) Many BEASTObject
classes implement Function
, including RealParameter
and Tree
. (In the example of Tree
, the real numbers represent node ages.)
Many of the classes here take one or more Function
objects as input,
and act to transform or combine the inputs to yield the value of the
output Function
. This means that instances of these classes can be chained
so as to achieve a variety of fairly complex transformations.
Note that all of the classes discussed here also implement the Loggable
interface,
meaning that they can be passed directly as inputs to the <logger/>
elements in the
XML.
2.2.1. Numeric sequences as Functions
An instance of feast.function.Sequence
is a Function
which contains a sequence
of numbers evenly separated between two bounds. For instance, the following
XML fragment represents a sequence of 101 numbers evenly spaced between 0 and 10,
i.e. 0,0.1,0.2,…,9.9,10:
<function spec="feast.function.Sequence" start="0" stop="10" length="101"/>
2.2.2. Function Slicing
An instance of the feast.function.Slice
class is a Function
which
represent or more elements of another Function
. This allows
element-specific priors to be set, and individual elements to be logged.
Here's a simple example of using Slice
to place a prior on a subset of elements
of a RealParameter
:
<distrib spec="Prior"> <x spec="feast.function.Slice" arg="@samplingProportion" index="2" count="1"/> <distr spec="LogNormalDistributionModel" M="0" S="1"/> </distrib>
The optional by
input can be used to increase the step size between elements. For
example, here's how one could use Slice
to apply a prior to elements 1,3,5 of the
same RealParamter
:
<distrib spec="Prior"> <x spec="feast.function.Slice" arg="@samplingProportion" index="1" count="3" by="2"/> </distrib>
2.2.3. Function reversing
Instances of the feast.function.Reverse
class are
Funtion
and Loggable
which represent the elements of another
Function
but in reverse order.
2.2.4. Function concatenation
An instance of the feast.function.Concatenate
class is a Function
whose elements are composed of the elements of their
input Function
objects joined together. For example, here is a Function
representing a piecewise constant reproductive number to be used as
input to the BDSKY tree prior, but composed of distinct input parameters
rather than a single real parameter with multiple dimensions:
<reproductiveNumber spec="feast.function.Concatenate"> <arg spec="RealParameter" value="1.0"/> <arg spec="RealParameter" value="2.0"/> </reproductiveNumber>
2.2.5. Function scaling
Instances of feast.function.Scale
allow one Function
to be scaled
by another:
<scaledBirthRate spec="feast.function.Scale"> <function spec="RealParameter" value="1.0 2.0 3.0"/> <scaleBy spec="RealParameter" value="1.5"/> </scaledBirthRate>
2.2.6. Function interleaving
Instances of feast.function.Interleave
represent the result of interleaving
the elements of several input Function
objects. For example:
<twoDemeBirthRate spec="feast.function.interleave"> <arg spec="RealParameter" value="1 3 5 7"/> <arg spec="RealParameter" value="2 4 6 8"/> </twoDemeBirthRate>
is as a Function
with elements 1,2,3,4,5,6,7,8.
Finally, instances of feast.function.UniqueElementCount
are
Function
objects which contain a single integer representing the
number of unique values contained in the elements of an input
Function
. For example,
<uniqueValues spec="feast.function.UniqueElementCount"> <arg spec="RealParameter" value="1.5 10 1.5 1.4"/> </uniqueValues>
is a Function
with a value of 3.
2.3. Mathematical Expressions
The following classes, which are found in feast.expressions
, provide
support for parsing of simple arithmetic expressions.
2.3.1. ExpCalculator
Takes an arithmetic expression and returns the result by acting as a Loggable or a Function. Binary operators can be applied to Functions (including Parameters) of different lengths as in R, with the result having the maximum of the two lengths, and the index into the shortest parameter being the result index modulo the length of that parameter. True and false are represented by 1.0 and 0.0 respectively.
Example expressions follow. (I and J are IDs of RealParameters with elements 1.0,2.0,3.0 and 5.0,10.0, respectively.)
Expression | Loggable/Function value |
---|---|
2.5*I |
{2.5, 5.0, 7.5} |
I+J |
{6.0, 12.0, 8.0} |
exp(I[0]) |
{2.718…} |
{I,J} |
{1.0, 2.0, 3.0, 5.0, 10.0} |
1:3 |
{1.0, 2.0, 3.0} |
J[1:0] |
{10.0, 5.0} |
1[1:3] |
{1.0, 1.0, 1.0} |
-log(exp(J))/(1.5+0.5*I[0]) |
{-2.5, -5.0} |
sqrt(J) |
{2.236…, 3.162…} |
J! |
{1.0, 2.0, 6.0} |
2^I |
{2.0, 4.0, 6.0} |
sum(I) |
{6.0} |
cumsum(I) |
{1.0, 3.0, 6.0} |
diff(I) |
{1.0, 1.0} |
{sin(0), cos(0)} |
{0.0, 1.0} |
{1,2,3}^2 |
{1.0, 4.0, 9.0} |
theta(I-2) |
{0.0, 1.0, 1.0} |
I < 3 && I >=1 |
{1.0, 1.0, 0.0} |
I < 3 ? 12 : 13 |
{12.0, 12.0, 13.0} |
{min(J), max(J), len(J)} |
{5.0, 10.0, 2.0} |
sort({5,1,3}) |
{1.0, 3.0, 5.0} |
Note that since each ExpCalculator is a Function object itself, it can be used as the input for other ExpCalculators.
As an example, here is a use of ExpCalculator
to transform a
parameter representing the reproductive number to a parameter
representing the birth rate of some birth-death model:
<reproductiveNumber spec="feast.expressions.ExpCalculator" value="birthRate/(deathRate+sampRate)"> <arg idref="birthRate"/> <arg idref="deathRate"/> </reproductiveNumber>
Notice that the expression is supplied as the value
input. The
arg
inputs provide one or more Function
objects to be referenced
within the expression by their IDs.
When used exclusively for logging, the value caching behaviour
ExpCalculator
uses by default to reduce unnecessary computation can
result in the expression not being updated as necessary. To avoid
this, set the boolean useCaching
input to false
.
2.3.2. ExpCalculatorDistribution
This is basically identical to ExpCalculator, but extends the abstract Distribution class, allowing the definition of arbitrary distributions over Rn at runtime. Distributions can be specified in terms of their log or directly as probabilities (/probability densities).
The file ECdistribTest.xml
in the examples/ folder is a simple
example where a multi-variate normal distribution is constructed.
This example also illustrates the nested expression concept mentioned
above. The file ECdistribArraytest.xml
in the same folder samples
the same posterior, but uses array notation to express the target
density.
2.3.3. ExpCalculatorParametricDistribution
This provides a cut-down version of the ExpCalculatorDistribution functionality that is compatible with beast.math.distributions.Prior.
2.4. RealParameter initialisation
2.4.1. Initialize RealParameters from Functions
The RealParameterFromFunction
class provides a RealParameter
which is
initialized from a Function
. Thus you can do things like
<samplingRateChangeTimes spec="feast.parameter.RealParameterFromFunction"> <function spec="feast.expressions.ExpCalculator" value="{0,max(taxonTimes)+0.1}"> <arg id="taxonTimes" spec="feast.function.TraitSetAsFunction" traitSet="@dateTrait"/> </function> </samplingRateChangeTimes>
which initializes a BDSKY sampling rate change time array so that the change automatically falls just before the oldest sample.
Beware that RealParameterFromFunction
only retrieves the Function
values
during initialization. Changes in the Function
values after initialization
will not be reflected in the parameter!
2.4.2. Initializing RealParameters using formatted times
Particularly in birth-death models, it is often desirable to specify several points in time as ages relative to the "present". This is done when setting up change times for birth-death parameters, and requires one to do a fairly large number of error-prone calculations involving converting date strings like "05/03/1980" and "27/02/1980" to fractional years before some later date such as "01/01/1990". Once converted, these values become extremely important yet difficult to interpret entries in the final BEAST XML file.
The TimeParameter
class is intended to combat this. It can be used
anywhere that a RealParameter
is expected. For example, to create
a RealParameter
with the ages corresponding to the times above, use
<parameter spec="feast.parameter.TimeParameter" time="05/03/1980 27/02/1980" mostRecentSampleTime="01/01/1990" timeFormat="dd/MM/yyyy"/>
This is completely equivalent to
<parameter spec="RealParameter" value="9.825137 9.844262"/>
but much more readable.
In addition, the timeEarlier
and timeLater
inputs can be used to
set the bounds of the parameter. These are equivalent to upper
and
lower
inputs of RealParameter
, but accept formatted times.
Note that the time format specification string is the same as used by
the dateFormat
input to TraitSet
, and applies both to the values
of time
and the value of mostRecentSampleTime
. If this format
specifier is not provided, the time is assumed to have already been
converted to fractional years, so the only action taken by
TimeParameter is to subtract this value from mostRecentSampleTime
.
2.5. RealParameter and IntegerParameter Operators
2.5.1. Scaling subsets of RealParameter elements together
It's occasionally necessary to tie multiple elements of a RealParameter
together, for instance to represent a vector of rates where a subset of
these rates are conditioned to be identical. The BlockScaleOperator
operator makes this possible. It behaves essentially identically to the
standard BEAST 2 ScaleOperator
, with inputs "parameter" and "indicator"
specifying respectively the RealParameter
to scale and the BooleanParameter
indicating which elements to scale. The differences are that BlockScaleOperator
- always scales the selected elements together (i.e. using the same scale factor), and
- it correctly takes into account the number of degrees of freedom (i.e. unique element values) among the selected elements.
An example usage of this operator is as follows:
<operator spec="feast.operators.BlockScaleOperator" parameter="@paramToScale" scaleFactor="0.8" weight="1.0"> <indicator spec="BooleanParameter" value="true true false false" estimate="false"/> </operator>
Here "@paramToScale" references a RealParameter
with 4 elements, and the
operator jointly scales the first two.
An alternative approach is available in the SmartScaleOperator
, which
does not use indicator variables but instead considers elements with
identical values to represent the same random variable. Thus, the following
achieves the same result as above, with the added requirement that values
to be scaled together must be initialised to the same value:
<state> <stateNode spec="RealParameter" id="paramToScale" value="1 1 2 2"/> </state> <operator spec="feast.operators.SmartScaleOperator" parameter="@paramToScale" scaleFactor="0.8" weight="1.0"/>
The SmartRealRandomWalkOperator
acts analogously, but implements a random
walk equivalent to BEAST's RealRandomWalkOperator
applied only to unique
elements.
2.5.2. Operators on IntegerParameters
Parameters of type IntegerParameter
require special operators. Beast has
two main operators for this type: IntUniformOperator
(now deprecated in favour
of UniformOperator
) and IntRandomWalkOperator
. Feast exptends this
collection with a pair of operators allowing for joint operation on a subset
of elements of a single IntegerParameter
. The new operators are:
BlockIntUniformOperator
andBlockIntRandomWalkOperator
.
These operators take the same inputs as the vanilla (non-block) BEAST 2 versions,
but with the addition of an input named indicator
which takes the same
BooleanParameter
input as the similarly-named input of the BlockScaleOperator
described above.
2.6. Coalescent population functions
2.6.1. Time-shifted coalescent population functions
The ShiftedPopulationModel
class allows you to create a new
PopulationFunction
which is a time-shifted view of an existing population
function.
For example, here is a time-shifted exponential growth population mode:
<populationModel spec="feast.popmodels.ShiftedPopulationModel" offset="0.5"> <populationModel spec="ExponentialGrowth" popSize="1.0" growthRate="1.0"/> </populationModel>
At time 0, the population size of this shifted model is the same as the population size of the exponential growth model at time 0.5.
The offset
input expects a Function
value, so RealParameter
objectss etc. can
also be used here.
2.6.2. Compound coalescent population functions
BEAST 2 includes several commonly-used coalescent population models such
as a constant population function and an exponential growth function.
To add other population functions required writing a new Java class.
The CompoundPopulationModel
class lifts this restriction.
This class allows you to combine existing population functions (i.e.
objects which implement the PopulationFunction
interface) in a piecewise
manner to construct arbitrarily complicated compound models.
For example, here is a piecewise-constant population function created
by joining together three ConstantPopulation
models:
<populationModel spec="feast.popmodels.CompoundPopulationModel"> <populationModel spec="ConstantPopulation"> <popSize spec="RealParameter" value="5.0"/></populationModel> <populationModel spec="ConstantPopulation"> <popSize spec="RealParameter" value="10.0"/></populationModel> <populationModel spec="ConstantPopulation"> <popSize spec="RealParameter" value="2.0"/></populationModel> <changeTimes spec="RealParameter" value="1.0 3.0"/> </populationModel>
The changeTimes
input specifies the times of the transitions between
models, and thus must have one less element than the number of constituent models.
Setting the boolean makeContinuous
input to "true" (by default it is "false")
will cause all component populations besides the most recent (i.e. first) to be
scaled so that the resulting compound population function is continuous.
While this is useless for compound population functions composed solely of
constant components (as it will result in a single constant population function),
this is useful for creating piecewise-exponential functions or mixtures between
exponential and constant components.
The file piecewiseCoalescent.xml
illustrates how this class can be used in a
simple coalescent analysis.
2.6.3. Expression coalescent population functions
The ExpressionPopulationModel
class allows you to specify a
population function symbolically using a mathematical expression
evaluated at runtime.
For example, here is a sinusoidally-varying population function with an exponentially-decaying amplitude:
<populationModel spec="feast.popmodels.ExpressionPopulationModel"> exp(0.1t)*(cos(2*t) + 2) </populationModel>
These models can also depend on one or more RealParameter
or Function
objects:
<populationModel spec="ExpressionPopulationModel"> exp(gamma*t)*(cos(omega*t) + 2) <arg id="omega" spec="RealParameter" value="0.3"/> <arg idref="gamma"/> </populationModel>
taking care that these objects have IDs defined, since these IDs are used to refer to the objects inside of the expression.
The optional isLog
input (boolean) can be used to indicate that the value of the
expression is the natural logarithm of the population size rather than the
population size itself.
The optional maxTimeToConsider
input (double) can be used to change the maximum
time to be considered when numerically inverting the "intensity function" (integral
of 1/N(t)), a procedure necessary for simulating coalescent trees using this model.
For a complete description of the arithmetic expression syntax which can be used to specify population dynamics, refer to 2.3.1.
Beware: ExpressionPopulationModel
uses numerical integration and numerical
root finding algorithms to compute the "intensity" (integral of 1/N(t)) and
its inverse. The current implementation of this algorithm, particularly the inverse,
is slow and susceptible to numerical errors. Take care when using this population
model, particularly for quickly-changing population functions!
2.7. Model selection and averaging
2.7.1. DirichletProcessOperator and DirichletProcessPrior
The DirichletProcessPrior
and DirichletProcessOperator
allow the application
of Dirichlet Process Priors (DPPs) to the elements of ~Function~s.
For example, suppose you have a model with parameter X
to which you wish
to apply such a prior. To do this, include the following prior for X
in the target density:
<distribution spec="feast.modelselect.DirichletProcessPrior" parameter="@X"> <scaleParameter id="dppScale" spec="RealParameter" value="1.0"/> <baseDistrib id="dppBase" spec="LogNormalDistributionModel" M="0.8" S="0.5"/> </distribution>
You also need to include the following operator:
<operator spec="DirichletProcessOperator" parameter="@X" scaleParameter="@dppScale" baseDistrib="@dppBase" weight="1.0"/>
Other operators on X
can be included, but only if they do not alter the
number of elements of X
which are identical.
2.7.2. ModelSelectionParameter
While model selection is extremely complicated in general, it can be almost trivial when the number of models you wish to consider is small and the models can be regarded as special cases of either each other (i.e. the are nested) or can easily expressed as special cases of a single general model.
The ModelSelectionParameter
is created with this kind of BSSVS-style model
selection in mind. It is a CalculationNode
implementing Function
which takes
one or more parameters (also Function~s) as input, together with an
~IntegerParameter
known as the selection index. The ModelSelectionParameter
acts as though it is the parameter specified by the selection index.
The simplest application of this is to allow BEAST to select between models 1 and 2, where in model 1 a parameter (eg the extinction rate) is zero, while in model 2 the extinction rate has some unknown non-zero value.
We can implement this by defining the death rate to be a ModelSelectionParameter
:
<deathRate id="deathRate" spec="feast.modelselect.ModelSelectionParameter"> <parameter spec="RealParameter" value="0.0"/> <parameter id="nonZeroDeathRate" spec="RealParameter" value="1.0"/> <selectionIndices id="selectionIdx" spec="IntegerParameter" value="1" lower="0" upper="1"/> </deathRate>
We then use this "deathRate" parameter as input to the tree prior, place some
sensible prior on "nonZeroDeathRate", and add an operator to modify the
"selectionIdx" IntegerParameter
. The result is equivalent to placing a
"spike and slab" prior on the death rate parameter, which is a mixture between
a point mass at zero and whatever prior you place on "nonZeroDeathRate".
In addition to this simple case, if "selectionIndices" is an IntegerParameter
with more than one element, the additional ModelSelectionParameter
input,
"thisIndex", can be used to specify which element to use to select the parameter.
This allows you to sample across models where, for example, subsets of different
parameters are allowed to be exactly equal. In this case, each model parameter
is its own ModelSelectionParameter
, shares exactly the same set of
"parameter" and "selectionIndices" inputs, but has its own unique value of
"thisIndex".
Beware that ModelSelectionParameter
is that is only compatible with
distributions which take Function
objects rather than
RealParameter
objects as input.
2.7.3. Sampling from a discrete set of states
Occasionally we wish to characterise the posterior for a parameter or tree
which is constrained to a small set of discrete traits. This can be
accomplished using the DiscreteUniformJumpOperator
.
For instance, suppose we have a small set of trees we want the MCMC to sample from. We can accomplish this as follows:
<operator spec="feast.modelselect.DiscreteUniformJumpOperator" x="@tree" weight="1"> <possibility spec="TreeParser" adjustTipHeight="false" taxa="@alignment" estimate="false" newick="((((1:0.232 ..."/> <possibility spec="TreeParser" adjustTipHeight="false" taxa="@alignment" estimate="false" newick="(((5:1.2321 ..."/> <possibility spec="TreeParser" adjustTipHeight="false" taxa="@alignment" estimate="false" newick="((((3:4.222 ..."/> </operator>
where the "…" are the remainder of the newick strings.
Take care when using this operator that the state nodes specified by
the <possibility/>
elements are actually possible values of x
.
I.e. parameters must have the same dimension and satisfy the bounds of
x
, and trees must correspond to the same set of taxa as x
.
2.8. Simulation and Testing
2.8.1. Easy state simulation
One BEAST 2 idiom which is pervasive in my code at least is the existance
of StateNodes that initialize their own state stochastically. RandomTree
is an example of this in the core: this class is a Tree
that initializes
itself by drawing from a coalescent distribution with a given population
function.
While such classes usually exist to choose a starting state for the MCMC chain, it is often necessary - particularly in the validation phase of model development - to simulate a large number of instances of this simulated state. This is usually because the simulation is often done from some known distribution, which can then be compared against the output of MCMC.
The class feast.simulation.GPSimulator
(from "general purpose simulator")
class simply makes it possible to configure this kind of simulation using
a BEAST XML file. It is simply a beast.core.Runnable
that takes a BEASTObject
as input, a number of iterations and some loggers. For each iteration, it then simply calls the
BEASTObject
's initAndValidate
method and tells the loggers to write their
results.
See examples/SimulateCoalescentTrees.xml
for an example of using GPSimulator
to simulate 100 coalescent trees.
2.8.2. Sequence alignment simulation
While BEAST contains its own alignment simulator in beast.app.seqgen.SimulatedAlignment
,
this can be awkward to use as it requires pre-specifying the taxa to associate
with the simulated alignment.
The feast.simulation.SimulatedAlignment
class is a simpler alignment simulator
which lifts this requirement. For example, the following will initialize
an alignment by simulating sequences down the input tree using a Jukes-Cantor
substitution model:
<alignment spec="feast.simulation.SimulatedAlignment" outputFileName="simulated_alignment.nexus" tree="@tree" sequenceLength="1000"> <siteModel spec="SiteModel"> <substModel spec="JukesCantor"/> </siteModel> </alignment>
The class supports the same inputs as Alignment
and beast.app.seqgen.SimulatedAlignment
,
with the exception that the data
input should be omitted - it will cause
strange errors if you include it. Also, only strict clocks are currently
supported, so there is no branchRateModel
input.
feast.simulation.SimulatedAlignment
also allows you to specify a known ancestral
sequence for the simulation. To do this, supply a Sequence
to the startingSequence
input as
follows:
<alignment spec="feast.simulation.SimulatedAlignment" outputFileName="simulated_alignment.nexus" tree="@tree" sequenceLength="10"> <startingSequence spec="Sequence" taxon="ancestor"> GATCGATCGA </startingSequence> <siteModel spec="SiteModel"> <substModel spec="JukesCantor"/> </siteModel> </alignment>
When used with startingSequence
, the optional input
startingSequenceAge
can be used to specify the age relative to the
most recent leaf at which the lineage above the root (the stem lineage) held the
specified sequence. This might be useful for birth-death models which
consider the stem lineage to be part of the tree.
2.8.3. Randomly initializing RealParameter values
Sometimes several attempts are needed to find a acceptable starting
state for MCMC. An initializer for trees, capable of generating a new
starting tree at each attempt is implemented in BEAST. In the same
way, RandomRealParameter
allows you to sample new initial parameter
values at each attempt. The distribution, from which to sample, can be
the parameter prior or any other proper distribution.
For example, here we sample a birth rate value from a uniform distribution at each initialization attempt:
<init spec="feast.parameter.RandomRealParameter" initial="@birthRate"> <distr spec="beast.base.inference.distribution.Uniform" lower="0" upper="1.5"/> </init>
Alternatively, you can use RandomRealParameter
wherever BEAST expects a RealParameter
:
<birthRate spec="feast.parameter.RandomRealParameter"> <distr spec="beast.base.inference.distribution.Uniform" lower="0" upper="1.5"/> </birthRate>
2.8.4. Sequence alignment shuffling
Occasionally, for the purpose of date randomization tests, it is useful to be
able to randomize the association between taxon names and the input sequences.
The ShuffledAlignment
class makes this easy. Simply replace the existing
alignment definition in the XML, which is usually something like
<alignment id="data" spec="Alignment"> ... </alignment>
with the folllowing:
<alignment id="data" spec="feast.simulation.ShuffledAlignment"> <alignment spec="Alignment"> ... </alignment> </alignment>
When doing this, take care that you move any ID defined on the original alignment
object to the new ShuffledAlignment
object.
2.8.5. DensityMapper
This class provides an easy way to produce plots of the variation in a
particular Distribution
(i.e. probability density) as a function of one or
more parameters. The class extends the BEAST Runnable
abstract class and
thus takes the place of the MCMC
class in a regular BEAST XML. To use
DensityMapper
you'll therefore need to create an XML with the following
general form:
<beast version="2.0"> <run spec="feast.mapping.DensityMapper"> <distribution id="distrib" ... /> <realParam spec="RealParameter" id="paramA" value="V_A" lower="LOWER_A" upper="UPPER_A"/> <steps spec="IntegerParameter" value="STEPS_A"/> <realParam spec="RealParameter" id="paramB" value="V_B1 V_B2" lower="LOWER_B" upper="UPPER_B"/> <steps spec="IntegerParameter" value="STEPS_B1 STEPS_B2"/> <logger spec="Logger" logEvery="1"> <log idref="paramA"/> <log idref="paramB"/> <log idref="distrib"/> </logger> <!-- Additional (screen?) loggers --> </run> </beast>
The <realParam>
elements specify which parameters are varied, the upper and
lower bounds of these parameters specify the range over which the parameter is
varied, and the corresponding <steps>
elements specify the number of steps to
use for each parameter. When the number of steps is 1 the parameter is not
varied and instead the contents of the value
attribute is used.
When parameters are vectors, the associated <steps>
parameter may have
dimension 1 in which case all elements of the vector parameter are varied
together (e.g. [0,0], [0.1,0.1], …, [1, 1]). Alternatively, the steps
parameter may have dimension equal to that of the real parameter in which case
each element of the parameter is varied independently with the corresponding
number of steps (e.g. [0,0], [0, 0.1], …, [0, 1], [0.1, 0.0], …, [1, 1]).
To keep one component of a vector parameter fixed, set the corresponding steps
element to 1.
If, in addition to the <realParam>
and <steps>
elements a
<logScale>
BooleanParameter element is provided, DensityMapper will
evaluate the densities at a logarithmic distribution of parameter
values between the same bounds. I.e., the following snippet will
cause paramA
to be set to values logarithmically distributed between
LOWER_A
and UPPER_A
:
<realParam spec="RealParameter" id="paramA" value="V_A" lower="LOWER_A" upper="UPPER_A"/> <steps spec="IntegerParameter" value="STEPS_A"/> <logScale spec="BooleanParameter" value="true"/>
Take care that if the log-scale element is provided for one parameter that it is also provided for all other parameters.
An example usage of DensityMapper
which produces the variation in a
coalescent tree density as a function of the population size is provided as
DensityMapper.xml
which can be found in the examples directory.
2.9. Log File Post-processing
Occasionally it's useful to be able to resurrect BEAST states from trace
and tree log files in order to perform further processing. This is useful,
for instance, if you realise that it would have been useful to compute
additional tree statistics. Or perhaps you'd like to stochastically map
ancestral traits onto trees belonging to the posterior of a multi-type
birth-death analysis. The feast.fileio.LogFileIterator
class and its
friends allow you to do these things and more.
The LogFileIterator
class is a Runnable
class, meaning that you use it
in place of BEAST's usual MCMC
class at the top level. A trivial example
is as follows:
<beast version='2.0' namespace='feast.fileio.logfileiterator:beast.evolution.tree:beast.core.parameter'> <run spec="LogFileIterator"> <logFileState spec="TraceLogFileState" logFileName="original_analysis.log"> <logFileEntry spec="LogFileRealParameter" fieldName="hky.kappa"> <fieldParameter id="kappa" spec="RealParameter" value="0.0"/> </logFileEntry> </logFileState> <logFileState spec="TreeLogFileState" logFileName="original_analysis.trees"> <tree spec="beast.evolution.tree.Tree" id="tree"/> </logFileState> <logger logEvery="10000" fileName="processed.log"> <log idref="kappa"/> <log id="treestat" spec="beast.evolution.tree.TreeStatLogger" tree="@tree"/> </logger> <logger logEvery="100000"> <log idref="kappa"/> <log idref="treestat"/> </logger> </run> </beast>
In this example (which is also provided in the examples directory as TreeLogFileIteratorTest.xml
)
we use the "hky.kappa" field of the original_analysis.log
trace log file to set the value of a
RealParameter
named "kappa". Simultaneously, we use the trees in the tree log file original_analysis.trees
to set the value of the Tree
named "tree". The value of "kappa" and the height and length statistics
of "tree" are then written to the file "processed.log" and the screen in the usual way using loggers.
2.10. Single BEAST 2.7 Jar files
Unlike earlier versions, BEAST 2.7 has a sophisticated system for controlling which package classes are available to use in analyses. While this is a great step forward, its implementation currently precludes building the stand-alone jar file bundles of BEAST 2 and its packages which are so helpful for collaboration using in-development BEAST 2 models.
Feast now includes a class feast.app.FeastMain
which addresses this
problem. It is a small wrapper around beastfx.app.beast.BeastMain
which, before handing control of the execution over to BEAST, seaches
through all of the directories and jar files in the runtime classpath
for files named beast_services.xml
. When it finds one of these files,
FeastMain
treats these the same as a standard version.xml
file by
registering all of the services it contains.
In order to use this wrapper to make a stand-alone jar file for a package, follow these steps:
- Copy (or symlink) any necessary version.xml files into a package
directory below the
src/
directory of your project. You can place them in any package directory you like, but be sure to name each filebeast_services.xml
. Beware: jar files share a single namespace, so avoid placing the file directly in thesrc/
directory itself when bundling multiple BEAST 2 projects into the jar. - Build a jar file with the
Main-Class:
attribute inMANIFEST.MF
set tofeast.app.FeastMain
. (You don't necessarily need to edit this file directly, most software that builds jar files will ask you at some point for the main class: simply enterfeast.app.FeastMain
when asked.)
To launch an analysis using this jar file (here we suppose it's named
my.jar
), use either
java -jar my.jar analysis.xml
or, if you are not using a combined openjdk+openjfx installation,
java --module-path=/path/to/openjfx/libs --add-modules javafx.fxml -jar my.jar analysis.xml
here /path/to/openjfx/libs
specifies the location of the JavaFX libraries.
2.11. Feast Query: BEAST 2.7 Object Browser
Feast includes a simple GUI application called FeastQuery
which
can be launched using BEAST 2.7's AppLauncher
utility
or by selection "launch apps" from BEAUti's File menu.
FeastQuery
provides information about the XML-compatible
BEASTObjects in all of the packages currently installed on your system.
It is intended to be used as an aid to writing and editing BEAST 2.7 XML
scripts.
The documentation provided by FeastQuery
is mostly equivalent to
that provided by the BEAST 2.7 core utility, DocMaker
. The main
advantages of FeastQuery
are:
- It doesn't require you to create, store and continually update a directory full of HTML files.
- It doesn't require a separate web browser.
- It has a dynamic filtering facility that lets you quickly identify the class you're looking for.
3. License
The feast package is free (as in freedom). With the exception of the libraries on which it depends, it is made available under the terms of the GNU General Public Licence version 3, which is distributed with feast or may be found online at https://www.gnu.org/licenses/gpl-3.0.html.