Notes 20100929 CHEM 731 Meiering Protein Engineering and Design
From SnOwy - Ed's Wiki Notebook
Note to self: do I want to audit this course?
Administrative
- a book for this course is on reserve in the library
Display systems, protein engineering
- phage display
- cell surface display
- binding proteins
- present structure on membrane
- cell-free display
Cell free displays
- directed evolution
In vitro compartmentalization
Directed evolution by in vitro compartmentalization (July 2006)
- constructing artificial cells
- emulsion compartments
- lipid emulsion surrounds each environment
- on average, one DNA sequence per droplet -- incl. RNA manufacturing machinery in each droplet
- 1010 genes per droplet
- (other techniques go to 12 to 15 magnitudes)
- ligate
- the droplets look like monolayers
- droplets are nanometer size
- particularly useful for enzymes
- enzymes must be complementary to a transition state
- enzymes are then trapped in each droplet
- examples
- DNA altering enzymes -- DNA methyl transferases
Directed evolution methods
in vitro evolution
- faulty DNA polymerase
- primers with amino acid substitutions
- product is bound and ligated
- substrate is covalently linked to the DNA
- retrieve droplets with fluorescent screening
- i.e. find the changed substrate → correct enzyme
Near future
- microfluidics -- engineering attempt to sort the drops better and better
Water-Oil-Water (w/o/w) emulsions
- proteins may not translate well in vitro
- capture each E. coli cell in its own membrane
Back to general directed evolution
- enzyme modifications include
- increase equilibrium affinity
- slow dissociation rates
- protein stability and folding
- evolve enzymes -- the above compartmentalization methods
- the evolution tasks require different approaches
Rational design
force fields, computation
- protein fitness landscapes -- directed evolution
Library construction -- protein engineering
see text -- follows display methods
- error-prone PCR
- low return
- proteins won't tolerate a high density of changes
- recombination
- generate fragments of your protein and recombine them
- orthologous genes
- protein modularity
- site-directed diversity (single nucleotide polymorphism (SNP) / point mutation)
- bioinformatics powered?
- computation --
- scanning mutagenesis
- mutate each amino acid in turn -- perhaps into an alanine -- determines which positions are mandatory
- natural source diversity -- antibody libraries from germline sequences
- things that aren't well discuss
- the extended nature of the diversification -- sufficient diversity to solve the problem / sufficient number of amino acid types
- sampling size problem (degrees of freedom!)
- library quantity -- library must represent intended design
- design versus randomization
- drive to use random approaches since there is insufficient mathematical background still -- reminds me of machine learning
Protein fitness landscapes
Exploring protein fitness landscapes by directed evolution (2009)
- mutations far from active site matter -- not well understood
- proteins in general are not very stable
- directed evolution has allowed customization of fluorescence proteins
- single-site modifications allow us to see the mutations that won't survive in nature
- we are suffering from an 'extreme' observer's bias in bioinformatics -- we're only seeing the stuff that survives!
- fitness space
- epistasis: an amino acid's behaviour depends on its neighbours
- general -- a gene's behaviour depends on other genes.
- some proteins cannot be improved -- we're assuming that a starting candidate is evolvable
- extra evolvable: TIM barrels (catalytic apparatus), antibodies (VDJ madness!) ...
Steps
- parent selection (template, candidate sequence)
- ...
Misc
- screening is difficult -- fluorescence / binding based screens
- incorrect screening → poor results
- screen for iterative improvement
- gradually increase selection pressure
- deleterious mutations occur often -- of random mutations,
- ... 30-50% are deleterious
- ... 50-90% are neutral
- ... 0.01-1% are beneficial
- note -- search is biased due to the candidate
Recombination
- splicing modules of different proteins
- spliced orthologous proteins
- may produce more stable proteins with poorer function
- creates new candidate parents for further point mutagenesis
- example: cellobiohydrase II
Findings for recombined enzymes
- optimization requires trading stability against function
- example enzyme -- cytochrome P450 mutant gained a long-chain fatty acid catabolysis into propane function
- proteins are only as stable as they need to be
- most mutagenesis must cause destabilization -- 80% of diseases involving proteins: the proteins are destabilized
- possible to use more stable mutant before attempting to enhance function
Computational approaches
see book
- computational approaches, homology modelling and bioinformatics
- consensus motifs
Repeat proteins
A recurring theme in protein engineering: the design, stability and folding of repeat proteins (2005)
- fibrillar (non-globular)
- widely used in nature -- relatively simple
- repeating units are 20-40 aa
- different folding properties in terms of hydrophobicity
- structural capping at the end of the long chain of repeats
- know your fibrillars ...
- ANK Ankyrin
- TPR Tetratrico peptide repeat
- LRR Leucine rich ...
- coupling networks -- direct correlation between two specific positions -- e.g. salt bridge Lysine---Glutamate
- novel binding specificity
- unlikely to fold in the way globulars do
- possible to have multiple folding pathways (needs more research)
- repeats in a fibrillar repeat protein might not be perfect lego blocks -- loss of stability if insertion of repeat
- however, consensus sequences are OK
- ANK proteins are used very often in protein engineering
Biophysical characterization
- melting point
- ΔG
Consensus Analysis
Evolutionarily Conserved Pathways of Energetic Connectivity in Protein Families
- energetically coupled residues that are structurally distant
- enzymatic and substrate binding is beyond binding site analysis
- coupled networks
- can be calculated
- statistical free energy for sites i, j
- probability of positions used in Boltzmann distribution
- free energy G
- G = -RTln(k)
- k = [a]/[b]
- background probability given by genome
- the networks -- allosteric regulation networks
- can go right through the core of the protein from the binding site to the opposite side.
Pairwise Interactions
Surface Sites for Engineering Allosteric Control in Proteins
- coupling DHFR with PAS to make PAS-DHFR complex -- tuning activity of DHFR with light-sensing PAS
- continuation of previous paper ...
- PAS-DHFR does not normally exist
- PDZ binding site of DHFR connected to core which will talk to PAS
- chromophore absorbs light on surface of PAS
- change on back-side in C-terminus -- coupled using a allosteric regulation network
- helices carry allosteric signal
- DHFR binds folate -- there is a corresponding network there on the back site, βF-βG loop
- βF-βG loop must be dynamic to enable enzymatic action
- input PAS light
- output DHFR
- two sets of constructs created
- things expected to work
- negative controls -- with intentional sabotage of allostery by removal of important loops (communication cutting)
- conclusion -- allosteric regulatory networks needed during protein design
Graphical Analysis
- homology modelling
- stability, affinity through steric complementarity
- affinity through electrostatic complementarity
Fast methods for mutation analysis
Action-at-a-distance interactions enhance protein binding affinity
- folding problem and inverse folding problem (threading problem)
- folding: sequence → structure → function
- dead-end elimination
- self-consistent meanfield theory
- simulated annealing
- calculation of effects of individual mutations
- effect on binding
- calculation capable of determining binding energy
- hydrogen bonds, van der Waals forces -- these ended up not being needed in this case
- charge mutation library possible
- residues aren't directly in binding interface
Math and software used
- CHARMM etc.,
- Poisson-Boltzmann equation with DELPHI / PARSE
- simplified force fields
Molecular force fields
- vastly simplified here (thanks!)
- physics-based force fields used for
- protein stability
- denatured state -- required for a reference state
- computing free energy -- explicit / implict representation
- commonly -- implicit water represented as an all-purpose dielectric medium
- statistical potentials
- potential of mean force (energies based on statistics of protein structures) -- information-based -- ROSETTA
- reference states ← ?
- fundamental interactions between atoms, molecules
Internal energy function
- Hooke's law for springs!
- Etotal = ΣbondsKr(r-req)2 + ΣanglesKθ(θ-θeq)2 + Σdihedrals (Vn / 2)(1 + cos(nφ - γ)] + Σi<j[(Aij / Rij12) - (Bij / Rij6) - (qiqj / εRij)]
- A is repulsion of very close molecules
- B is attraction of close molecules
- Etotal = ΣbondsKr(r-req)2 + ΣanglesKθ(θ-θeq)2 + Σdihedrals (Vn / 2)(1 + cos(nφ - γ)] + Σi<j[(Aij / Rij12) - (Bij / Rij6) - (qiqj / εRij)]
- Newton's laws solved for very small time intervals
- no quantum mechanical terms here yet-- but they can be included!
- we can try to minimize the energies of the system
- these energy formulas relate to the enthalpy of the system
- the entropy for proteins is not solved -- we can infer some useful entropic properties with molecular dynamics
- Monte Carlo simulation useful for evaluating a random sampling of conformations -- step by step approaching a minimal energy
- Rosetta -- uses conformations observed in real proteins
Molecular Dynamics / Mechanical -- Force Fields
BIOMOLECULAR SIMULATIONS: Recent Developments in Force Fields, Simulations of Enzyme Catalysis, Protein-Ligand, Protein-Protein, and Protein-Nucleic Acid Noncovalent Interactions
- explicitly including water is computationally tough
- implicit is easier -- use force fields
- see the alphabet soup of energy functions
- AMBER, CHARMm, . . .
- using various force fields
- keep in mind which force fields work in reality, and which don't -- must draw from literature based on success
- hybridizing molecular dynamics with quantum mechanical methods
- requires Hamiltonian matrix
Rosetta Overview
Practically Useful: What the ROSETTA Protein Modeling Suite Can Do for You
- conformational sampling space
- sequence sampling space
Intrigue
- consensus motifs bother the heck out of me because they drop all notion of column to column correlation
- would like to figure out if we can improve similarity scoring with that knowledge
- sounds like Bayes
- see "Consensus Analysis" above -- this has been touched!
- Hamiltonian missing assembly -- undergraduate chemistry :(