Book notes: Analysis of Phylogenetics and Evolution with R

Published: 1 minute read, with 306 words. Post views:

Analysis of Phylogenetics and Evolution with R by Emmanuel Paradis is suggested by John and I am reading it now. This post contains my notes on this book. Emmanuel Paradis is the author of the ape package in R, I think this will be the best resource if handling phylogenetic data in R. It should also be a good proxy if I want to learn more about fundamental concepts in computational evolution.

Phylogenetic data in R

Phylogenetic Data as R Objects

Trees

The Class “phylo” (ape):

  • edge: a matrix with two columns, each row represents an edge in the tree. the $n$ tips are numbered from 1 to $n$, and the $m$ (internal) nodes from $n+1$ to $n+m$ (the root being $n + 1$).
  • edge.length: a vector of edge lengths.
  • tip.label: a vector of tip labels.
  • Nnode: the number of internal nodes.
  • node.label: a vector of node labels.
  • root.edge: the edge leading to the root.

Networks

Very often, networks are analyzed as a sum of distinct trees: migration or horizontal gene transfer results in reticulations among populations or species but the genetic lineages have a tree-like dynamics. The class “evonet” is used to store networks.

Splits

A split (or bipartition) is a pair of two exclusive sets of tips (or taxa). The class “prop.part” is used to store splits.

Molecular Sequences

This part is interesting as different packages handle molecular sequences in different ways.

The Class “DNAbin” (ape):

  • Each nucleotide is coded by a single byte (8 bits).
  • 0123: AGCT
  • 4: Is the base known?
  • 5: Alignment gap?
  • 6: Unknown character?
  • 7: (unused)

This approach is can also record the ambiguity of the nucleotide.

The Class “alignment” (seqinr):

  • nb: the number of sequences.
  • seq: the sequences as a vector of mode character
  • nam: the names of the sequences.

Allelic Data

Phenotypic Data

Reading Phylogenetic Data

Phylogenies


Tags:

Categories:

Comments