Investigating seasonal NZ influenza dynamics using genomic and mobile phone data

Tim Vaughan, André Lichtsteiner, David Welch, Alexei Drummond
Centre for Computational Evolution
The University of Auckland
21st Annual New Zealand Phylogenomics Meeting
12-17 February, 2017
Waiheke Island

Influenza



  • RNA virus with a 14 kb genome consisting of 8 distinct segments.
  • Evolving $\sim 10^6$ times faster than human genome: measurably evolving pathogen.
  • Seasonal epidemics occur repeatably around the world in predictable order.
 

Perfect system for the application of phylodynamic methods.

Genomic data



  • ESR have sequenced 141 full H3N2 influenza genomes.
  • Sequences from isolates sampled during 2012, 2013 and 2014 seasons.
  • Samples distributed across 18 of the 20 New Zealand District Health Boards.

Host movement data

  • QRIOUS (qrious.co.nz) have generously provided anonymized and aggregated information on movement of IMSIs within NZ.
  • Data provided as a matrix specifying the fraction of time individuals identified as having a home in a particular area spend in other areas.
  • Locations (home and away) specify only which of the 30 "regional tourism organizations" spread across the country the entry refers to.
  • Although the data comes only from Spark subscribers, a correction has been applied to account for non-uniform market share.

Host movement data

Combining host movement matrix with stochastic SIR model yields simulated epidemics:

Questions


Is there evidence for multiple introductions during a single season?

Does population structure influence the pathogen evolution?

If so, is this structure correlated with what mobile phones tell us of host movements?

Phylogeography of seasonal epidemics

Potential for phylogeographic analyses

  • Each genome tagged with location down to DHB-level.
  • Have evaluated two distinct phylogeographic inference methods:
    1. Discrete trait phylogeography ("mugration") model. [Lemey, Rambaut and Drummond, 2009]
    2. Structured coalescent model.
      [Hudson, 1990; Notohara, 1990]
  • Under SC model, pure genomic data contains evidence for North Island/South Island structure.
Neither of these models deal explicitly with the epidemiological dynamics: to do!

SC results (North/South, 2012)

SC results (North/South, 2013)

SC results (North/South, 2014)

Detecting multiple introductions

Qualitative evidence

Summary tree lineage count at the season start is a rough estimate for the introduction count.
8
8
10

Semi-quantitative evidence

Use distribution of sampled tree lineage counts at season start as proxy for intro. count posterior.

North/South/World model

Use structured coalescent model to explicitly model introductions using an additional World deme.

Posterior for 2012 introductions

Incorporating host movement data

How does host movement data fit in?

  • Structured coalescent analyses with uninformative priors are usually limited to approximately 10 demes.
  • Can use the movement data to fix the rate matrix (up to a multiplicative constant).
  • Drastically improves computational burden of sampling combined parameter/multi-type tree space.

Movement matrix transformation

RTOs
DHBs

Movement matrix transformation

We transform the matrix using a simple reweighting: $$ r_{ab} = \sum_{ij} f_{ia} f_{jb} r_{ij} $$ where $f_{ia}$ is the fraction of RTO $i$ in DHB $a$.


This basis transformation introduces additional noise.

The forward-time rate matrix is then transformed into the following backward-time migration matrix for ancestral lineages in the SC model: $$ m_{ab} = r_{ba}N_{b}/N_{a} $$

Ancestral locations inferred using phone data

How well do host movements fit the genetic data?

Can express the log BF as the following definite integral $$\text{logBF} = \int_0^1 E_{\beta}[U]d\beta$$
where $U = \log P(D,\theta|M_1) - \log P(D,\theta|M_2)$

and $E_{\beta}[U]$ is the expected value of $U$ under the product distribution

$$P(D,\theta|M_1)^{\beta}P(D,\theta|M_2)^{1-\beta}$$

How well do host movements fit the genetic data?

Bayes factor in favor of host movement-derived migration matrix: $\sim 10^7$.

Summary

  1. Population structure, particularly the North/South island split, shapes the seasonal evolutionary dynamic of H3N2.
  2. There does seem to be strong evidence for multiple introductions within a single season.
  3. Mobile phone-derived host movement data is well supported as a model for national H3N2 structue.

Outlook

  1. Compare mobile phone model with simple distance matrix-based model.
  2. Include a "World" population in analysis to allow more defensible ancestral location inferences.
  3. Acquire more sequences. ESR has collected thousands of isolates as part of the CDC SHIVERS project.
  4. Acquire finer-resolution host movement information and in an appropriate basis.

Acknowledgements

  1. Genomic data was provided by Richard Hall and Sue Huang at ESR.
  2. Generous donation of aggregated phone movement data:
  3. Summer internship students:
    • André Lichtsteiner
    • John Mlyahilu