Notes 20100513 CS 798 Ultrametric data tree construction - Class - Dan Brown
From SnOwy - Ed's Wiki Notebook
Dan G Brown : CS798
Contents |
Continuing on with Ultrametric Tree Construction
- we cannot really guarantee that the data set can be used to deduce an ancestral taxon
- we can relax that stipulation a bit for some models
Reversible model of evolution
- is a stochastic model
- for P(Si, t) -- and P(P(Si, t), u)) -- should be the same as P(Si, t + u)
- the above additive property is something that we want
- what does this mean?
- we should be able to make a prediction so that the start string evolving for t-many years then for u-many years should yield the same string as one that evolved for t+u-many years.
- P(Si, t+u) ... Pr[P(S, t) = T] ... Pr[P(T, t) = S]
- the likelihood of our data does not depend on where we started off in the tree
- which is more realistic--
- however, this means that ultrametricity does not necessarily hold.
- Given a sequence Si, we might be able to estimate (max likelihood perhaps) what d[i, j] is for all i, j.
- this is asking slightly less than the molecular clock assumption built into the ultrametric model
Math for this additive data assumption
- D is additive if there exists an edge labelled ???tzee??? with leaves mapped to the rows of D
- such that the path length of i and j on T matches D[i, j] for every i, j.
- example distance tree below...
e
6 4 /1
a-----*-----*
2| 2\
c *6
3/ \
d b
- A distance matrix can be used to build an ultrametric tree...
a
/ \6
/ *
/ |\2
/ 4| c
/ | \
/ * \
/ |\1 \
/ | e \
/ * \ \
/ 3/ \6 \ \
a d b e c
- D[a,x] is increased by mv-D(a,x) --- where mv is max pairwise distance
- For i, j: D'[i,j] = D[i,j] +2mv - d[a,i] - d[a,j]
For an additive tree...
i \ \y z *-----a /x / j
-
What we saw in the original data...-
D[i,j] = y + z; -
D[i,a] = x + y; -
D[j,a] = z + x;
-
-
D'[i,j] = D[i, j] + 2mv - D[a,i] - D[a,j]-
= D[i,j] + 2mv - x - y - x - z -
= y + z + 2mv - x - y - x - z -
= 2mv - 2x-
where mv is the longest distance in the entire graph.
-
-
-
we can get mv given the original distance matrix
Summary
- If we have a distance matrix that has an additive tree
- then there exists a procedure to find that additive tree
- for each distance in the distance matrix, we do this
- D[a,x] is increased as mv-D(a,x),
- then D'[i,j] = D[i,j] + 2mv - D[a,i] - D[a,j]
- D'[i,j] is ultrametric!
- running the tree building algorithm on D'[i,j] (for an ultrametric distance to an ultrametric tree) produces an ultrametric tree
- we turn this back into an additive tree
- we take our distances D'[i,j] for the tree edges, and subtract away (2mv - D[a,i] - D[a,j]) to produce our additive tree
- we turn this back into an additive tree
- the 'a' in this case helps us produce a root because it is a certainly distant point
Neighbour Joining
- has a built in optimistic assumption that we're joining neighbours
- let's take a look at this from the vantage point of UPGMA--
- recall UPGMA:
- pick i, j so that D[i, j] is minimum.
- combine i, j into k
- D[k,l] = (|Ci| D[i,l] + |Cj| D[j,l]) / (|Ci| + |Cj|)
- D[i,k] = D[j,k] = D[i,j] / 2
- the right-hand side for each of the above steps changes in neighbour joining-- but the left-hand side is the same.
- phylogenetic method A is consistent on a class of distance matrices D if A(D) returns T and the proper edge weighting in T to be matching for D.
- for every pair x,y:
- QD(x,t) =
- D[x,y] - [rD(x)+rD(y)]/(n-2)
- rD(x) = sum over y for D[x,y]
- QD(x,t) =
- now for NJ:
- pick i, j so that QD[i, j] is minimum.
- combine i, j into k
- D[k,l] = 0.5(D[i,l] + D[j,l] - D[i,j]) -- see below diagram...
- D[i,k] = (1/(2(n-1))) * sum over l when l is not i or j: D[i,l] + D[i,j] - D[j,l]
- D[j,k] = -- abduct with logic.
i
\ ?
k~~~~~~~l
/
j -- get (k->l) (-- D[k,l]) without knowing l-- add i->l, j->l -- subtract i->j, leaves you with 2k->l
- So what's QD -- it's our weird averaging matrix-- building this thing requires O(N^3).
- this is an open problem-- is there an O(n(3-ε)) algorithm?
- What's Q? and why do we have QD = D[x,y] - [rD(x)+rD(y)]/(n-2) ?
- Consider the effect of adding ki to D[i,j] for every j that is not i.
- Call the new matrix D'.
- QD'[x,y] where x ≠ i, y ≠ i.
- = D[x,y] - [rD'(x)+rD'(y)]/(n-2)
- = QD[x,y] = 2ki/(n-2)
- QD'[x,i] where x ≠ i.
- = D'[x,i] - [rD'(x)+rD'(i)]/(n-2)
- D'[x,i] increases by ki
- rD'(x) decreases by ki / (n-2)
- rD'(i) decreases by (n-1)ki / (n-2)
- = Q(x,i) (2/(n-2))ki
More Discussion on Neighbour Joining
- let's say that we subtracted ki from every distance involving i
- let's say we subtracted kj from every distance involving j
- we need to show that two neighbouring nodes
. . .