Notes 20101006 CHEM 731 Meiering Protein Design
From SnOwy - Ed's Wiki Notebook
Arrived late -- was proctoring BIOL 208 midterm 1 of 2
ROSETTA / ROBETTA
- homology modelling
- start with a predicted initial conformation, then "relax" to fit into low energy conformation
- has facilities to predict ligand binding conformation
- example -- designed a protein switch -- trimeric coil-coiled protein when zinc was absent; is a zinc-finger when available
- -- how? optimized sequence for both targets
- a "rosetta potential" is the resulting energy term for each amino acid
CASP / CAPRI (sp?)
- CASP -- sequence to structure prediction contest
- CAPRI -- ligand-protein, protein-protein docking contest
Quantum Mechanics
- expensive or intractable to perform comprehensive quantum mechanical calculation for all rotamers in a system
- able to use a rotamer library of precalculated values to do prediction
- depending on the purpose and the library, the rotamer library may take into account the backbone
- suggested that predictions can be improved upon if backbone potentials are used
CHARMM, AMBER
- Φ, Ψ angles for amino acids
Gaussian
- Precise quantum mechanical tool to evaluate protein conformation energetics
Rotamer Libraries
- many libraries have been made, depends on application
- example: secondary structure predictions
- which rotamers are present in a known structures?
- most side chains may only exist in a few conformations in a stable structure
- example: hydrophobic positions have less liberty than exposed surface sidechains
- provides a distribution of Φ, Ψ angles -- generally provides only a few due to computational expense
- Monte Carlo construction
- rotamers -- staggered and eclipsed --
- libraries can:
- be used to evaluate energy functions
- be used to search for new (non-favoured?) conformations given an energy conformation
- take solubility into account
- take backbone into account
- good libraries are available -- will likely improve in content as more structures are solved
Title: Rotamer libraries in the 21st century
- Ramachandron plot was the very first rotamer library
- proteins used have changed -- increasing resolution, increasing number of exemplars is good
- secondary structure dependent or backbone dependent
- able to switch between rotamer libraries during optimization between iterations of
- annealing (energy minimization)
- human searching
- backbone dependence is not really useful for distant-from-optimal predictions
- backbone dependence and secondary structure libraries are really only useful for high identity homologous sequences
- not much work in the following
- covariance -- such as
- correlations between rotamers of different columns
- correlations between a backbone rotamer and a sidechain rotamer of the same column
- entropy -- we have more information and can likely comment on the math
- covariance -- such as
- current libraries are not good on the atomic level -- solved crystal structures are used to evaluate how good a prediction made with such libraries are
Search Algorithms
- E(r) = ΣiE(ri) + Σi,jE(ri,rj)
- E(ri) = in reaction of rotamer i with fixed template (self energy)
- E(ri,rj) = a rotamer i with a different rotamer j ( ? )
- Stochastic algorithms
- Monte Carlo, Genetic Algorithms, FASTER
- samples subset of search space and discovers a good local minimum
- Deterministic algorithms
- Dead-End Elimination, Self-Consistent Mean Field, Belief Propagation, Linear Programming
- attempts to find actual minimum -- computationally expensive -- likelihood of finding solution decreases as problem space increases
Monte Carlo
- assign rotamers randomly to starting sequence
- evaluate energy (E)
- make random changes to the sequence or to the rotamers
- evaluate energy
- accept new state according to Metropolis criterion
- if energy decreased -- accept
- if energy increased, accept subject to ...
- probability (p) -- given Boltzmann ...
- p = e(Enew - Eprevious) / kBT
- kB is the Boltzmann constant, T is the absolute temperature
- probability (p) -- given Boltzmann ...
- anneal from high to low temperature -- simulated annealing (SA)
- high temperature allows for many many bad high energy conformations
- the conformation "settles" down as the simulation goes on
Genetic Algorithm
- generate a population of sequences
- mutate randomly at a rate of 1 to 2 mutations per sequence
- evaluate, rank energies of the different sequences
- recombine sequences with lowest energies
- evaluate energies -- randomly select sequences with best energies
- repeat
- advantage: recombination overcomes barriers in sequence space
- disadvantage: performs poorly on highly coupled systems (hydrophobic cores)
Dead-end Elimination
- eliminate all rotamers or combinations of rotamers that are incompatible with global minimal energy criterion (GMEC)
- criterion ...
- E(ri) + ΣjmintjE(ri,tj) > E(si) + ΣijmaxtjE(si,tj)
- r is the original rotamer
- t is a neighboring position -- tj is a neighbouring rotamer
- s is a new rotamer to consider
- if the minimum energy is reduced by the substituted rotamer, then keep the new rotamer
- the assumption must hold that this whole energy function is a sum of all pairwise interactions
Improving energy functions
- loads of research occurring here -- will discuss some of the most prominent ones
Missing Terminology
- gauche plus / gauche minus (conformation)
- Χ value
- rotamer -- I need to really understand this topic -- will make some notes later
- actually ... this page does a good job: http://en.wikipedia.org/wiki/Rotamer
Title: Automated design of specificity in molecular recognition
- genetic algorithm example
- multi-state design -- one desired outcome, multiple undersirable states
- algorithm that correctly selects homodimers but not heterodimers
- desired state: homodimer
- undersired state: heterodimers
Title: De Novo Protein Design: Fully Automated Sequence Selection
- dead-end elimination
- for a given target structure
- beta-beta-alpha protein motif
The FASTER algorithm
- assume all rotamers at all positions are correct except for one
- find the best rotamer for that position
- do this for each position in the sequence
- update all positions simultaneously
- repeat until convergence
Title: Full-sequence Computational Design and Solution Structure of a Thermalstable Protein Variant
- redesign the engrailed homeodomain -- 51 amino acids
- proteins are synthesized after design and characterization done
Title: NMR-detected conformational exchange observed in a computationally designed variant of protein Gβ1
- dead-end elimination