Notes 20100921 McConkey Bioinformatics BIOL 614
From SnOwy - Ed's Wiki Notebook
Content
- discussing BLAST
- see the two papers posted today
- statistics are important in this field!
- statistics concepts reviewed today ...
- normal distribution (Gaussian)
- μ (average)
- σ (standard deviation)
- n; (sample size)
- p-value; (probability that a datapoint belongs to a different distribution)
- we get p-value by taking the area under the gaussian and only count the area after a given data point
- are two samples from the same population: student's t-test
- BLAST: each returned sequence in the result can be...
- an actual match for the query sequence
- a non-match for the query sequence (a false positive)
- the distribution isn't actually Gaussian; instead, there is a long right-handed tail (is skew left).
- alignments and gap penalties: gap penalties are discovered empirically
Database Searching
- search space -- BLOSUM and Smith-Waterman are simply too slow for this
- BLAST and FASTA was thus developed; it's a lot faster
- heuristics employed -- approximation, not guaranteed to find best result; only retrieves a fraction of the search space
- taxonomy can be returned instead of just p-values, e-values
- graphical descriptions -- notice that the query is returned with all of the parameters echoed
- looking for a potassium channel protein
- bit scores
- E-values ...
- E > 1 -- chance alignment likely
- E < 0.05 or 0.1 -- represents biological significance
Blast Searching Specifics
- word building (these are 3mers)
- find word matches between two sequences
- seed expansion in both directions
- 1990: each word match was extended
- 1997: two hits needed before seeds would be expanded (must be a certain distance between one another; can have gaps between them)
- BLAST follows an extreme value distribution
- P = 1 - e-E
- E = Kmne-λS
- E = mn2-S′
- where S is the bitscore
- E < 0.1 considered significant
Evaluation
- project -- start thinking now about what project I want to do
- 1 to 2 page summary of what I want this project to be
- seminar