Notes 20101019 BIOL 614 Bioinformatics Tools - Phylogeny - Measures of confidence
From SnOwy - Ed's Wiki Notebook
We've backtracked just a touch and are describing phylogeny -- the course project has been shuffled back and fourth in meetings -- I'll settle on the piece of the puzzle that occurs after the "views" described in my thesis proposal -- Dr. McConkey and I seem to both agree that that's the next logical step.
Contents |
Measurements of Confidence
- Bootstrapping
- tests confidence level of phylogenetic hypothesis
- utilizing replicates
- resampling alignments / resampling with replacement
- parsimony vs distance methods -- neither are guaranteed to create the correct tree
- methods however create comparable results
- depends mostly on the robustness of the input data
Phylogeny Programs
- newer: MaxML
Parsimony and Distance Methods
Known Issues
- less reliable for distant taxa
- issues occur when amongst many taxa, a few are distantly related -- treated as an outgroup
Maximum Likelihood
- given an evolutionary model, what is the probability of seeing this tree?
- the model is the likelihood distribution -- the tree is an exemplar in the probability distribution
- requires an implicit model -- distance based methods do not require a model
Trees
- are part of the model too
- long branches eventually have multiple substitutions at certain sites (we don't see them, but they exist)
Long Branch Attraction in Parsimony
- long branches tend to be pulled together
- is a problem that is better dealt with by likelihood models
ProtTest
- how do we detect which model is correct?
- the Information Criteria describes which model is best suited
- parameterizes (1) the number of parameters in the model, (2) the number of exemplars
- the lower the information criterion number, the better suited the model
- we are preventing data overfitting
- Akaike Information Criterion / Bayesian Information Criterion
Other Software
- PAUP
- ModelTest
- DAMBE -- best option according to the floor
Structure Prediction Software
Structure Fitting Software
Sequence to Structure
- bioinformatics approaches for gene and protein function
- prediction of biochemical function
- prediction of interactions
- conservation / phylogenetic distribution
- structure prediction
- relevance of variations
- predicted post-translational modifications
- fold of protein
- multidomain protein
- functionally important residues
- ligands and cofactors
- substrate specificity, enzyme activity
Viewers
- PDBSum -- shows a quick and dirty SSE output -- it is not up-and-down however, all elements are lined up with the text reading axis
- Utilize known tertiary structure to create anchors for multiple sequence alignments
Course Project
- Proposal -- just write up the scoring function starting with "views"; list assumptions, and hand in.
- Need to pick a paper -- I think I want to discuss Elemento/RADAR/HHRepID/TRUST