Notes 20100615 CS 798 Inferring Phylogenies
From SnOwy - Ed's Wiki Notebook
Lecturer Dan G. Brown
Contents |
Notes for presentations
- July 8th presentations are to be held in DC1331.
Continuing on Bayesian Phylogenetics
Pr[A | Data] = Pr[Data | A] Pr[A] / Pr[Data] Pr[B | Data] = Pr[Data | B] Pr[B] / Pr[Data] Pr[A | Data] / Pr[B | Data] = (Pr[A] / Pr[B]) (Pr[Data | A] / Pr[Data | B])
- what does this give us in terms of inference
- if for whatever reason we had a prior that surrounds a particular value, then received data that pushes us toward a different value-- we would more and more believe the data-- however, if there was little data, we would be more likely to believe the prior.
- when we see 50 to 200 heads on a coin flip in a row-- we are more likely to believe that this is a two headed coin.
- the original understanding of using Bayesian phylogeny is that if we have a worthwhile prior, trust it until we start to see lots of evidence to the contrary
- if we want the maximum a posteriori for our data, we are looking for the tree that best explains all of the data
- if we can't reliably calculate Pr[A] but we can get Pr[B]-- then we are still OK -- (these are priors)
How to do Bayesian Phylogeny
- find MAP estimator
- summarize posterior distribution
- Dr. Brown remarks this is where Bayesian reasoning is more informative than the likelihood framework
- we could ask for example-- here are four taxa
A C A B A B >--< vs >--< vs >--< B D C D D C Q1 Q2 Q3
- Pr[Qq | Data]?
- that is to ask given the data, how do we determine which quartet is most correct?
- posterior distribution
Metropolis Algorithm
- proof in the finite case
- we have a space Ω of possible hypotheses
- (for today, we operate under that Ω is finite)
- in phylogeny, we operate under a combinatorial finite space
- we know that ∑ --
- for ω ∈ Ω, Pr[ω] = its a priori probability-- that ∑ω ∈ ΩPr[ω] = 1
- given data D, we have PrD = Pr[ω | D]
- for any choice of D (subject to a few constraints we aren't naming) -- that ∑ω ∈ ΩPr[ω] = 1
- let us have an event, A ∈ Ω -- we can ask: what is PrD[A] = ∑ω ∈ ΩPr[ω]
- this is the kind of question that we're asking
- so -- how do we determine if Q1 is the correct solution?
- two choices
- for all ω ∈ A, compute PrD[ω]
- more common is to pick a lot of samples from PrD and ask how many of them are in A and use that ratio as an estimate of the conditional probability of A
- if event A is particularly unlikely, then we would need to sample A LOT
- we are capable of answering of which Q1, Q2, Q3 is the most probable a posteriori since these quartets are often very different (by definition-- they partition the solution space by thirds-- hence the largest must be at least one third).
Markov Chain Monte Carlo
- idea: We want to construct a Markov chain whose stationary distribution is PrD
- What does this mean?
- Running this particular Markov chain long enough will yield members of PrD
- (The kinds of Markov chains we're discussing converge on a stationary distribution)
- start the chain at some arbitrary point and run it "to convergence"-- this gives a point from PrD
- repeat this until we have "enough" samples
- comment from the floor: when this is run in real phylogenetics, researchers may repeat instead from the previous 'stationary point' -- discussion later?
Using this framework to select members of the distribution
- suppose that PrD is uniform
- ∑ω ∈ ΩPr[ω] = 1 -- each ω has probability of 1/|Ω|
- imagine a graph-- nodes ω ∈ Ω edges are "close members" and degree of each node is the same
- stationary distribution is uniform
- consider π = [1/|Ω|, 1/|Ω| ... 1/|Ω|]
- every node has the same indegree and outdegree-- because the distribution is uniform, that means that each node also has the same 'mass' incoming and outgoing.
- if each node has three neighbours, then the transition would be 1/(3|Ω|) -- if each node has k neighbours, then the transition would be 1/(k|Ω|)
- if T is transition matrix for the chain, then πT = π
- if we want a member of the distribution, we would start on a point and perform a random walk for a number of steps on the graph
- can start at the same point
- can start at the selected previous point
- -- this is the general meaning of a Markov Chain Monte Carlo-- a random walk on a weighted graph
How long to convergence?
- If we had a completely connected graph, then the simulation goes to convergence in one step.
- In any ergodic chain, one eigenvalue is 1.
- ergodic : $_ → $_ has a state S that returns to S
- the second eigenvalue is the interesting one -- the second eigenvalue of the chain describes how many steps to convergence
→ → → → → → 0 * * * n ← ← ← ← ← ← (0 self-loops), (points between 0 and n have one arrow pointing to the left and one to the right), (n self-loops).
- it would take a long time to get from point 0 to n
→ → → → → → 0 * * * n (all points have an edge going to 0 except for the point n-- n has a self-loop-- the points between have one arrow pointing right)
- this means that the stationary distribution is all point 'n'-- once we get to 'n', we stay there-- but the probability of getting there is very small
- the second eigenvalue of this (Dr. Brown thinks) is 1-2-n
- the second graph is actually a counter example for using Markov Chains-- due to its sparse connectivity
- if the second eigenvalue is close to 1, the steps to converge is very large
- in a Markov Chain, the first eigenvalue describes whether you are on an edge pointing toward the stationary distribution (is positive) or if you're pointing away from it (is negative)--
- the second eigenvalue tells us how fast we're getting to the stationary distribution (is positive)
- specifically, the number of steps to convergence is O(k) if the second eigenvalue is 1 - 1/k
- the instance we have a connected chain, we can make things aperiodic
Back to the Metropolis Algorithm
- 'Metropolis' is actually a last name
- for each ω ∈ Ω, have k neighbours
- we pick one neighbour ω′ uniformly at random
- we ask what is the PrD[ω] vs PrD[ω′]
- if PrD[ω] >e; PrD[ω′], we move to ω′
- if PrD[ω] < PrD[ω′], ...
- we move to ω′ with probability PrD[ω′] / PrD[ω], ...
- or stay put at ω otherwise.
Theorem (assuming ergodicity) -- stationary distribution
- the stationary distribution of the chain is PrD
- what are the entries of T (the transition matrix)
- let's look at the ath row of T
- in entries corresponding to a's neighbours b where PrD[a] <e; PrD[b], ...
- T[a, b] = 1/k
- in entries corresponding to PrD[a] > PrD[b], ...
- T[a, b] = (1/k)(PrD[b]/PrD[a]) ...
- T[a, a] = ∑neighboursb of a (PrD[a] - PrD[b]) / kPrD[a] ...
- where PrD[b] < PrD[a]
- now let's consider what π = PrDT is.
- let's focus on the ath entry in this vector
- we're going to put mass into π[a] = ...
- Pr[a] ∑neighboursb of a, Pr[b] < Pr[a] (PrD[a] - PrD[b]) / kPrD[a] + ...
- ∑neighboursb of a, Pr[b] < Pr[a] PrD[b] (1/k) + ...
- ∑neighboursb of a, Pr[b] ≥ Pr[a] PrD[b] (1/k) (Pr[a] / Pr[b])
- we want to wind up in the state a-- if we were previously in the distribution PrD-- the above three items correspond each to ...
- we were already in a, and we considered a transition to a neighbour and turned it down since the transition probability was lower
- we were in state b and we either chose to transition to a with a higher probability
- or we were in state b and we chose to transition to state a even though it has a lower probability
- π[a] = ∑neighboursb of a, Pr[b] < Pr[a] Pr[a] (1/k) + ∑neighboursPr[b] ≥ Pr[a] 1/k Pr[a] = Pr[a]
Metropolis / Hasting Algorithm
- we need to modify the algorithm such that we now select from ω a neighbour ω′ at random from the distribution Q(ω)
- that is, things are no longer uniform, we use Q to give us the distribution
- what is Q(ω)[ω′]Pr[ω] vs Q(ω′)[ω]PrD[ω′]?
- this is different than the Metropolis algorithm as follows...
- the incoming and outgoing mass (probability distribution) was balanced in the Metropolis algorithm
- in the Metropolis/Hasting algorithm, the proposal for which point to transition to is different for each point; however, since we want to reach a stationary distribution, we 'borrow' mass from the future and pay it back later.
- -- continued Thursday.