Notes 20100805 Owen Group Meeting
From SnOwy - Ed's Wiki Notebook
Contents |
Estimating Accuracy RNA Sequencing Microarrays Proteomics
- Debate exists about best way to analyze microarrays
- New RNA techniques allows us to retrieve sequences with higher fidelity
- Claim made that coupling with proteomics allows us to increase fidelity even more
- Transcriptome, Proteome measured relatively against some negative control
- Abundances
- Fold change kinds of measurements (result is increased two-fold etc.)
- What about a complete count (absolute value)?
- That would be handy for many reasons
- Systems Biology
- Microarrays in this paper --- attempt to give absolute abundance quantification
- Gold standard -- transcriptome value is evaluated against proteome value
- Which transcriptome platform is more accurate?
- Specific technologies
- Affymetrix Exon microarrays
- Illumina/Solexa RNA-Seq
- Mass Spec: Generic -- mass spec 'everything' in the array.
Affymetrix overview
- General design: small chip -- "genomic focus" (a region of chromosome like a gene)
- Probes on microarray designed to complement specific exons
- Possible to detect alternative splicing by using different exon arrays
On Microarrays
- Fixed probe on chip designed to complement labeled target
- Annealing occurs -- probes light up in proportion to the amount of transcript
- Affymetrix has historically allowed the measurement of absolute abundance
- Intentional mismatches used to bind suboptimally (one SNP -- catches random crummy matches vs intended target)
- The fluorescent difference between mismatch cells vs intended cells is the actual abundance
- Algorithms were introduced that described mismatch probes as damaging to precision for relative abundance
- Now we use random strings with varying GC content -- this apparently has some predictive power for crummy matches
- The random -GC based stuff are used with a GC-RMA algorithm that contains a number of normalization steps
- The random -GC arrays were released without algorithms in reaction to the criticism to the intentional mismatch era
Illumina RNA Sequence
- A number of short reads are performed and we align them with the genome in order to determine the correct sequence
- RNA-seq advantages over microarrays
- Technologically agnostic-- it doesn't matter where you got your numbers, the numbers are absolute integer counts
- No probing needed
- We don't need to know anything about the sequence to begin with
- Question: How about sequence errors?
- If there's only one nucleotide difference for instance
- If we get many many reads on a single region, then we can estimate the background rate of sequencing error
Label-Free Mass Spec[tometry]
- Spectral count profile should match our expectations given known protein sequence
- Enzyme that acts specifically Lysine-Arginine
Relative reads vs absolute reads of some protein
- relative
- gives a strange curve that is symmetrical about y = -x; aligns roughly on y = x but if absolute value taken, we get a weird sqrt(x)-like curve
- Ed: Use a two-parameter curve which does a regression fit as though it were an absolute value, but will accommodate the odd quadrants
- absolute
- example shown is the expression bead chips -- we see an absolute change in concentration of x on an expression level of y -- looks sigmoidal
- where x is very low -- the proportion of the signal that is background is large: leads to larger error bars (known problem)
Experimental Design
- human brain tissue
- microarray vs proteomics
- pools of brains and individual brains used
- pools required for RNA sequencing
- proteomics platform and the individual microarray trials used individual brains (?)
Results and Discussion
- proteomic-microarray trials
- correlation = 0.24
- proteomic-RNA-Seq trials
- correlation = 0.35
- To be fair, the correlation should be in the range [0.2, 0.5]
- this is because of the weird noise that occurs in the translation step -- going from mRNA to protein
- Unclear whether the sequencing (new) or microarray (classic) method is better for abundance evaluation
Critique
- Calculation of high leverage points -- would the results be the same if such points were removed?
- PCR -- can be used to get abundance counts -- a known opponent
- Experimental design strange: Brains? Pooled vs. Single?