Notes 20100817 McConkey Meeting

From SnOwy - Ed's Wiki Notebook

Jump to: navigation, search

Contents

Committee Members

Thesis Proposal Details

Table 3

Examples

Equation 1

From the e-mail

Comments

Table 1 - If the diagonal of a self alignment is set to zero in the Smith Waterman array, does the highest scoring cell in the remaining array identify the highest scoring repeat? I think it might… is this the basis of the IRF algorithm?

p. 7 – “Given a protein with rotational symmetry that has had its three dimensional structure solved” this can limit the problem – what about repeats within a larger structure or sequence? These may not have easily identifiable rotational symmetry or a solved structure.

finding best repeat for each profile. Is this set up in such a way that it doesn’t matter if there isn’t a repeat, i.e. Say sequence A has a repeat but sequence B does not?

p. 11 – Identifyrepeatfunction

- this doesn’t really have a basis in theory. Is it good to include if it’s a dead end?

this has the same issue raised on p. 7- what if only one of the sequences contains a repeat? As currently written, if any sequence has repeat length <20, the algorithm returns false. Relatedness is entirely based on the length of the profiles? Likely not the best approach… ‘relatedness’ is likely not a good choice of terminology – relatedness implies homology(?). But, we expect all sequences to be related; is this addressing common repeats? I’ll assume Relatedness = true means the sequences share a common repeat(?).


Table 3 “A heuristic to guess where” - to approximate, to estimate? A heuristic implies it does not necessarily produce optimal results; a guess implies there is no formal procedure.

The Boolean output of the Identifyrepeatfunction may be leading down a path that isn’t ideal.

Here, the output of the heuristic should ideally be an alignment, incorporating each of the profiles. There are four cases – no novel repeats, the best alignment is the current parent alignment

novel repeat in Child1, profile for parent and Child1 adjusted
novel repeat in Child2, profile for parent and Child2 adjusted
novel repeats in Child 1 and Child2, all profiles adjusted.

e.g. example node 5 in ricins:

parent length: 35  ;
left length  : 26  ;
right length : 36  ;
>>> I think that the parent repeat is related to the left repeat.
>>> I think that the parent repeat is related to the right repeat.
>>> I don't think that the left repeat is related to the right repeat.
>>> The repeat found in the parent has been inherited by both siblings.
>>> But the repeat has changed since then.
5 (selected)
3  : -------------------------------------------------------THSCLDSNAQGQVYTLGCNQGNYQHWVYAAGNDGVRLRNAQTNNCVGSRANPAP
3  :                   DGTRYQGTVYAIGCDGGAAQLWTTSSDGAGMTFRNAATGECLDSNADGRVYTQGCNHGDYQRWG--
4  : MNTLTKLTIGAVALTGSFLAAAPASAAPAADTTASPALGSQVSAQFASVTIRNAQTGRLLDSNYNGNVYTLPANGGNYQRWT--GPGDGT-VRNAQTGRCLDS------
4  :                   ---NYDGAVYTLPCNGGSYQKWLFYSNGY---IQNVETGRVLDSNYNGNVYTLPANGGNYQKWYTG
3 : YP_710563 (left child)
3  :                                  THSCLDSNAQGQVYTLGCNQGNYQHWVYAAGNDGVRLRNAQTNNCVGSRANPAPDGTR
3  : YQGTVYAIGCDGGAAQLWTTSSDGAGMTFRNAATGECLDSNADGRVYTQGCNHGDYQRWG
4 : Q9KWN0 (right child)
4  : MNTLTKLTIGAVALTGSFLAAAPASAAPAADTTASPALGSQVSAQFASVTIRNAQTGRLLDSNYNGNVYTLPANGGNYQRWTGPGDGTVRNAQTGRCLDSN
4  :                          YDGAVYTLPCNGGSYQKWLFYSNGYIQNVETGRVLDSNYNGNVYTLPANGGNYQKWYTG

Most likely conclusion is that the repeat is common to both, i.e. no new repeats for either child (the most frequent expected result in a large tree?)

In node 6, the repeat for the child node 2 has very different boundaries in the parent alignment than in the child. The parent alignment looks better and is more likely; this is a result of the three-fold repeats, but also is a case where the length assessment produces an incorrect result. (This raises other questions – should repeats be constrained so they must be adjacent? Should three-fold repeats be addressed directly, and not as two separate events?)

[aside- a simple scoring function for the repeat alignment could address this]


p. 13:

these three examples correspond to the three possible cases of inheritance: (1) that the repeat was inherited by both children, (2) that the repeat was inherited only by one child, (3) that the repeat was not inherited.

does this address the case of a novel repeat in a child? I think this is what we are looking for…


p. 15: the issue here is the boundaries have been selected differently – on inspection, the repeats are present in all sequences. We should discuss three-fold repeat detection and boundary cases. Repeat in node 27 looks better than either in node 18 or 28.

p. 16 – wow this is messy. A better means of assessment is needed to distinguish these repeats – it looks like there are several, and different ones are identified in each subset. [node 15 looks like it has alignment issues outside of the defined repeat area]

we should discuss eq’n 1 – there may be a nomenclature issue(?).

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox