Notes 20101012 BIOL 614
From SnOwy - Ed's Wiki Notebook
Dr. McConkey's bioinformatics tools class
Contents |
Phylogenetic Models
- model evaluation is useful for testing the applicability and "performance" (given particular information criteria)
- see PhyML
- notice the difference between BLOSUM62 and LG (newer substitution model)
- the first is based on probability of changing from one amino acid to another
- the second provides a rate of change to allow us to build up a branch length
- I think that the notion of "rate" is the same as the notion of "likelihood"
- note that for model assessment, the n-parameters term refers to independent parameters
- in the set {πA, πC, πG, πT} -- the abundance of each nucleotide -- counts as only one parameter (they also always sum to one).
Evolutionary Distance
- simplest model: fractional alignment distance (p-distance).
- Poisson distance: dp
- example: PAM -- transitive application of mutations as rates for single aligned columns
- p1/2 = 1 - e e-rt
- p = 1 - e e-2rt
- ... ugh -- missed the last two equations that explain how to get to dp -- distance
- Gamma distribution
- models variation across different sequence positions
- may be important for my project -- that is, mutation rates, highly conserved regions --
- parameter a is inversely related to conservation
- this is not a linear relationship
Evolutionary Models
In reality, for most biology related papers, only a single line of explanation about model selection is needed -- the model selection is powered by perhaps ProtTest, PhyML or some other automated model fitting software.
Nucleotides Evolution
- Jukes-Cantor -- all SNPs have the same probability
- Kimura -- added transitions and transversion scores
- HKY85 -- added relative abundance of {G, C}, {A, T}
- up to modern day -- GTR -- general time reversible -- the rates for each transition and for each transversion are different. Wow.
Protein Evolution
- PAM
- BLOSUM
- JTT (Jones, Taylor, Thornton) -- an extension and update of the original PAM matrix
Confidence measures
- bootstrap confidence
- is Monte Carlo sampling -- each aligned column is pulled as a sample -- sample with replacement -- 1000 samples are the norm