From SnOwy - Ed's Wiki Notebook
Majority rule supertree
- find all splits found within > k/2 trees
- this is a voting system
- the tree that agrees with the majority of trees on a pass of the algorithm
- what if we chose the wrong k trees in the first place?
- remember there are several scenarios to picking trees in the first place
- Bayesian world
- sampling from the posterior
- k-different experiments, each experiment is on a different gene pretend mutual independence...
- Multiple optima
- we manage to have several modes
- majority rule still makes sense
- we are saying that the split that occurs most often is the most likely
- giving one vote for every sample isn't that silly: it's one point from the sample distribution
- this is kind of weird too: similar to putting everyone's votes on a ballot, then throwing out all the ballots but one
Parsimony
- three possible optima which look like...
(1) |
+--------+--------+
| |
+-----+----+ |
| | |
+--+--+ | |
| | | |
+-+-+ | | |
| | | | |
A B C subtree(D) subtree(E)
(2) vs a vote where ((((A, C), B), subtree(D)), subtree(E))
(3) vs a vote where ((((A, C), subtree(D)), B), subtree(E))
- we basically have two trees which agree that A~B~C should be be together (1 and 2) and one that's radically different
- but think about what we're doing-- we basically are saying we don't know which of (1, 2) or (3) is better, so we give (1, 2) an additional vote
- is this sane?
- what does this mean informationally-- does it have an informational meaning?
- other options strict concensus tree: only keep splits that occur in 100% of samples
- assume that the algorithm uniformly samples at random from all optimal trees
- we still bias the result based on the combinatorial structure of the multiple optima
- summary trees end up being useful for visualization etc., i.e. don't count on it to represent reality
Rooted family of trees
- tripartition
- naturally: the ancestor of three taxa is further back than the ancestor of two taxa
- ((A, B), C)
- Adam's concensus: find all tripartitions not incompatible with any of the trees
- topology only (no edge lengths)
- i.e. find all the tripartitions that ARE COMPATIBLE with all trees
- the result is a collection of tripartitions which can be used to build a tree
- tripartitions evaluated for each tripartition in each tree
- is unlikely to create the majority rule tree nor the strict concensus tree
Paleontology
- how do we incorporate n-living species and m-fossils
- we must now switch to a different kind of data
- we can get some DNA out of fossils, not a lot
- we have morphology -- morphological characters
- feathers, claws, bone structures (ratios, linear dimensions, curve parameters etc.!), limbs, most likely colour (given microscopic structures)
- usually bitvector elements
- how do we build this into a tree?
- fossils are not necessarily ancestors of modern taxa
- most fossils are not direct ancestors of currently extant species-- many are aunts, uncles, cousins etc.,
- example: hominids -- how close must the edge lengths be to the trunk of the tree for us to categorize a given species as our ancestor?
Fossil Strata
- should be dense
- a stratum is a period of time
- a given stratum should contain fossils corresponding to the species that exists before and after it
- a stratum that misses a lineage is rare
- parsimony tries to find the topology to minimize forced mutations
- we can add one additional item to this with fossils
- we sprinkle the fossil characters into a phylogenetic tree of modern species
- our objective function pays for all the physical mutations to sanely insert a fossil into a tree
- we'll also pay whenever we expected a fossil but didn't find one
- this method is very sensitive to the number of fossils that have been found
- stratocladistics == geological parsimony
Stratolikelihood
- depositing a fossil happens in some probabilistic process
- expect to see k-lambda fossils
- likelihood is the continuous interpretation of parsimony...
- we need a model of fossil deposition and a model of evolution
- including fossils that contain molecular biosequence
- ancient molecular sequence