CS[46]83 A3 Hints
This is a Structural Bioinformatics Course Hints Page for Assignment 3 (Winter 2010).
Update: Please visit CS[46]83 A4 Hints for assignment four related items!
Hi!
This temporary page is put together for CS[46]83 students so that fair hints can be shared between students. It is likely that we’ll be bumping into similar snags every so often. Please refrain from posting items that are too revealing; maintain academic integrity. Any code examples should be pseudocode only unless provided by the instructor.
Update: I think short snips of code should be OK if we’re trying to explain syntax or specific function or mathematical semantics etc..
Most items discussed on this page refer to Chimera software — see Dr. Burkowski’s course site for CS[46]83.
Thanks,
Ed
Ed's Big Plans
A3 Exercise 2a – Nitrogens as Tube Terminators
The tubes to be drawn within the alpha helices and beta strands start and end on nitrogens. This is because ending on a backbone carbon is less aesthetically appealing.
Eddie Ma
19 Feb 10 at 11:48 am
A3 Exercise 2a – Normalizing Points to (0, 0, 0)
When solving for the 3×3 Intertial Tensor, is it more correct to first normalize the coordinates so that the centre of mass is (0, 0, 0)?
It looks like one would be able to retrieve a valid solution whether or not coordinates are first normalized.
Answering my own question:
Yes, it is better to normalize all points to the origin as the ‘a’ vectors which represent the positions of atoms assume an offset away from the origin.
This makes the math simpler than calculating many many deltas.
Eddie Ma
19 Feb 10 at 11:50 am
A3 Exercise 2a – Inertial Tensor and Lagrange Multiplier
Each element of the inertial tensor 3×3 matrix is given in the notes on slide 42 of the Lagrange Multiplier lecture PDF.
Eddie Ma
19 Feb 10 at 11:55 am
A3 Exercise 2a – numpy.array( ) and numpy.linalg.eig( )
Use the numpy.array( ) initializer to convert a two-deep nested 3×3 list into a numpy array object (that is, a list with three sublists as elements where each sublist has three floats).
I’ve used the numpy.linalg.eig( ) function to get the eigenpairs for each strand (or helix).
Argument 1: numpy.array object (in the form M, M)
Return 1: numpy.array object (eigenvalues)
Return 2: numpy.array object (eigenvectors)
(In Python, multiple returned values are provided in a tuple.)
Eddie Ma
19 Feb 10 at 12:02 pm
A3 Exercise 2a – Inertial Tensor, Eigenpair — Values too high?
For 2FIF.PDB, the first alpha helix processed gives me these values, does anyone else have numbers this high?
I would gather that “they’re normal” since each element of the tensor has either a sum of squares or a product.
Tensor for 2FIF.PDB, first alpha helix…
[[ 1.22330936e+06 -1.72942892e+02 3.12611704e+02]
[ -1.72942892e+02 3.12419338e+03 2.20207654e+03]
[ 3.12611704e+02 2.20207654e+03 1.22160168e+06]]
Corresponding eigenvalues…
[ 1223364.82869823 1221550.21654422 3120.18905051]
Corresponding eigenvectors…
[[ 9.84605185e-01 -1.74793048e-01 1.42197278e-04]
[ 1.75889188e-04 1.80429631e-03 9.99998357e-01]
[ 1.74793017e-01 9.84603542e-01 -1.80726371e-03]]
I’ll finish the software — we’ll be able to tell with the images.
Eddie Ma
19 Feb 10 at 12:23 pm
A3 Exercise 2a – Orthogonal Projection (perpendicular, vertical)
We need to project the terminal amino acids (one N from each the N-term and C-term) onto the inertial axis– this definition seems to be correct– again, I’ll have to try it to know if it can be applied correctly to this problem.
http://en.wikibooks.org/wiki/Linear_Algebra/Orthogonal_Projection_Into_a_Line
Eddie Ma
19 Feb 10 at 2:14 pm
A3 Exercise 2a – GAK!
Eddie Ma
19 Feb 10 at 2:56 pm
A3 Exercse 2a – IMPORTANT: Using returned eigenvectors from linalg.eig( )
(Saves time, money and lives!)
If you do something like this…
(eigval, eigvec) = linalg.eig(tensor)
DO NOT do this next:
unitaxis = eigvec[i]
DO THIS next instead:
unitaxis = eigvec[:, i]
Not quite done yet– but by sliding the unit axes back onto the centres of mass and by extending each an arbitrary amount in both directions, you can get this:

Notes: Added draft blue poles that directly connect N-term N and C-term N for each strand (or helix); Added draft pink balls for centres of mass.
Eddie Ma
19 Feb 10 at 4:15 pm
Numpy – Link to the official documentation for Numpy
http://docs.scipy.org/doc/
(Ooh! It’s hosted by SciPy!)
Eddie Ma
19 Feb 10 at 4:18 pm
A3 Exercise 2a – It turns out High Eigenvalues and Orthogonal Projection are both OK.
2FIF.PDB:

1MBO.PDB:

Going back, I only have to fix the way strands (or helices) are selected– the problem is, I’m using the chimera.Atom.residue.isSheet and chimera.Atom.residue.isHelix boolean flags and they’re unreasonably faulty– I’ve written to the TA and instructor to see if there’s a different field we should be looking in instead.
The error here is that I accidentally glued all of the strands and all the helices of each chain together! This can happen if Python’s list.extend is used instead of list.append!
Eddie Ma
19 Feb 10 at 4:39 pm
A2 Exercise 2a – All methods, hints above confirmed to work!
Final 1MBO.PDB

Final 2FIF.PDB

Yay! Next exercise!
Eddie Ma
19 Feb 10 at 10:21 pm
A2 Exercise 2a – “Some help would be appreciated.” – Gagan
Guess what!
You’ve managed to get the centres of masses correct!
AND you have the correct eigenvectors!
Just apply that formula on the wikibooks page and you’re done–
You only need to project the end points onto the unit vector and thus scale the vector out to meet those end points’ perpendicular projections–
The formula is VERY straight forward– Look at the diagrams, then look at the math in Example 1.3–
You need only ask yourself — from the example which vector is which:
What is the vector “s” = (1, 2) and what is (2, 3)?
i.e. Which one is the eigenvector?
http://en.wikibooks.org/wiki/Linear_Algebra/Orthogonal_Projection_Into_a_Line
Eddie Ma
20 Feb 10 at 11:44 am
A2 Exercise 2a – EDIT(title): Don’t forget to create new tensors for each helix (or strand)!
EDIT(image): Added Gagan’s image just prior to finding this fix. See if your’re having the same problem.
So after I create my tensor, and was done appending my tube coordinates to the point list (start and end), I forgot to empty my tensor before I’d move on to my next helix/sheet.
Thanks!
– Gagan
Gagan
20 Feb 10 at 10:07 pm
A2 Exercise 3 – Preliminary Outline of Steps 1 to 4
Edit: Fixed description for numpy.subtract( ) — removed reference to non-existent numpy.sum( ) function– it’s actually numpy.add( ) which takes two vectors and returns a third vector which is a component-wise sum.
I’ll remark on each of the steps [1, 4] given on the summary slides of the course notes (page 46).
1) Equivalence set not required (we should be saying “equivalence sequence” really– it preserves the order that the equivalence pairs occur).
Hint: Each molecule to be considered has the exact same number of amino acids (one has many more atoms than the other, don’t be fazed).
2) Calculate the centroids of each molecule …
3) … and put the centroids onto the origin
Hint: We’ve done something similar before, remember when we translated molecules in A1 to get the transmembranous pore of a homotetramer? We pushed everything to a central z-axis. This time, we aren’t moving things to a central z-axis, we’re moving them to the origin.
Note: This is OK since we’re to overlay the two molecules based on ONE chain only– hence in the above, only one half of each of the molecules actually overlaps– note also that we’re only using alpha carbons this time around.
Hint: If you want to use numpy arrays to do the coordinate transformation against the centroids, checkout the functions numpy.add( ) and numpy.subtract( )– this isn’t immediately applicable for the chimera.Point( ) constructor, but it’s easy to transform a numpy.array object to a chimera.Point object.
4a) Calculate the C matrix
Hint: Don’t be fazed by the superscript / subscript notation in the class notes (page 35). It breaks down as follows…
* C is a 3×3 matrix
* gamma is an indexer that runs from 1 to N (0 to N-1 in Python!)
* N is the number of equivalence pairs in the equivalence sequence
* y is a point corresponding to the set of atoms in the second molecule
* x is a point corresponding to the set of atoms in the first molecule
Hint: I haven’t finished this step yet, but I strongly suspect that the y-up-gamma cross x-up-gamma-transformed term can be reduced to numpy.outer(y-up-gamma, x-up-gamma) — that is, the outer product of an equivalence pair y at gamma and x at gamma.
4b) Calculate the SVD for C i.e. C = U S V-transformed
Hint: There’s probably a SVD function in numpy that we can use– I haven’t gotten that far yet, but I think it’s numpy.linalg.svd( )– I’m a bit fuzzy on the re-ordering of the singular values from greatest to least here.
Eddie Ma
20 Feb 10 at 11:08 pm
A3 Exercise 2b – Gagan’s Solution using Orthogonal Projections, Norms
Since we just finished writing code that can do orthogonal projections, just create a set of projected coordinates for each alpha carbon onto the unit axis. We then do numpy.linalg.norm( ) for each pair of alpha carbons and their projections. The norm provides us with the Euclidean distance for each alpha carbon and the unit axis. If that distance is higher than some threshold (3.5 Angstoms?), then perform the adjustments needed in calculating the tube endpoints.
Eddie Ma
21 Feb 10 at 7:07 pm
A3 Exercise 2b – Norms are Distances
Norms literally mean Euclidean distance, without any special context, one means “Frobenius norm”– which is the sum of the difference for each component squared. It’s what we’re all familiar with.
I accidentally thought it returned a normalized vector– don’t get confused. To get a normalized vector, you have to divide each component of a given vector by its norm– in this case, such a vector’s norm is its norm with respect to the origin (0, 0, 0)
Don’t get confused: Norm == Distance.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html
Eddie Ma
21 Feb 10 at 7:12 pm
A3 Exercise 3 EDIT(title): Use Numpy Matrices instead of Numpy Arrays
For Question 3, make sure you use matrix instead of arrays (arrays do not perform the operations properly). I just wanted to check what people are getting for their rotational matrix R; my R is wrong since R*R.T != I, but luckily det(R)=1
matrix([[-0.15740338, 0.83251404, -0.53117281],
[ 0.98353054, 0.18054278, -0.00848431],
[ 0.08883611, -0.52376014, -0.84722102]])
Annie
21 Feb 10 at 8:07 pm
A3 Exercise 3 – Expected Values for C, U, S, VT, R, (Confirmed)
I’ve confirmed (because I tried it and it worked) these values …
C
[[ 1622.06443558 6517.7708076 2344.37364511]
[ 3854.41100215 2248.72862519 -2044.18301078]
[-1945.82524679 -2165.47727046 -5627.83663222]]
U
[[ 0.76561039 -0.21152554 -0.60753409]
[ 0.23260183 -0.78947852 0.56799652]
[-0.59978089 -0.57617759 -0.55523173]]
S
[ 8909.01428405 5270.53766764 2774.78574773]
VT
[[ 0.3710266 0.76461246 0.52697917]
[-0.4297364 -0.36168957 0.82734955]
[ 0.82320465 -0.53343082 0.19438535]]
R
[[-0.12516285 0.98597924 0.11035944]
[ 0.8931465 0.16040955 -0.42018818]
[-0.43199952 0.0459752 -0.90070122]]
detR
1.0
R dot RT
[[ 1.00000000e+00 -3.19189120e-16 2.63677968e-16]
[ -3.19189120e-16 1.00000000e+00 5.55111512e-17]
[ 2.63677968e-16 5.55111512e-17 1.00000000e+00]]
Note that R dot RT is effectively identity 3×3 (to E-16 error– i.e. zero).
Eddie Ma
21 Feb 10 at 10:52 pm
A3 Exercise 3 – Possible Place for Errors
While working on this problem, I made a copy of all of the Alpha Carbon coordinates so I could do the math without worrying about filtering all the other atoms.
I then translated the centroids of each molecule onto the origin AND FORGOT to translate the copies of the Alpha Carbon coordinates.
Be sure to perform the same transformations for every copy of data that you make… OR… don’t make copies.
Eddie Ma
21 Feb 10 at 10:57 pm
A3 Exercise 3 – Outline of Steps 4 to 7 (end)
4a) Calculate the C matrix
Hint: an alternative method than the one mentioned by Dr. Burkowski involves using numpy.outer( ) to obtain a list of outer products. Remember that the dimensions are as follows– arg 1 is a column vector, arg 2 is a row vector, the return is thus a 3×3 square matrix– which is what we expect a single small ‘c’ to be. We take the sum of these small ‘c’s to get big ‘C’.
4b) Calculate the SVD of C — i.e. C = USV-transformed
Hint: it turns out we can use numpy.linalg.svd( ). This function already returns ‘s’ in ‘S’ in descending value! SVD returns U [3x3], S [3], V-transformed [3x3].
5) Calculate R = UV-transformed
Hint: don’t do anything weird to V-transformed, it’s already transformed. You can use numpy.dot( ) — it is defined as matrix multiply when dealing with arrays– Annie has suggested using matrices (numpy.matrix, a subclass of numpy.array). I haven’t played with these objects and do not know their function semantics.
6) Ensure that det(R) is 1.0– if true, goto (7) if not, go to (6.-1).
Hint: go find the function for this yourself
6.-1) Repair the SVD by ‘poisoning’ UV-transformed like Udiag(1, 1, -1)V-transformed.
I haven’t done this step yet, but it should be straight forward to test… imagine overlapping a protein onto its mirror image. From assignment 1, you know how to do this– which components of a set of points do you need to multiply by -1.0 for instance? What does this observation have to do with step (6) here?
7) Apply the rotation to the first molecule (atoms x[i] in molecule X).
This part I have done
We can thus confirm that steps 1 to 5 and 7 as hinted here are OK
Eddie Ma
21 Feb 10 at 11:18 pm
A3 Exercise 6 – General Observations
- Find centre of mass of the mentioned eight carbons
- Draw a plane such that the centre of mass is on that plane
- Change rotation of plane so that we minimize total distance between the plane and those eight carbons
Eddie Ma
22 Feb 10 at 9:37 pm
A3 Exercise 3
I got same C, V-trans, R, det(R), S and C as yours but when I apply rotation to first molecule using:
Pseudocode:
new coordinates = dot(R, old coordinates)
I don’t get the superimposition.
Couple of questions:
1) Are we suppose to shift the entire protein using the centroid or just the necessary alpha-carbon atoms?
2)Are we suppose to apply rotation to entire protein of just the chain?
Thanks,
Esh
23 Feb 10 at 2:23 am
A3 Exercise 3 @Esh
I’ll first clarify the procedure before answering your questions.
First, we calculate the translation vector and rotation matrix using the alpha carbons of the first chain in each PDB file.
Second, we apply the translation vector and rotation matrix to every single atom in each PDB file.
While we’re interested in moving entire proteins, you can see that we’ve only used a subset of the available Cartesian points to do our math. This is important for this part of the assignment since we haven’t created equivalence sets for corresponding atoms in each protein– we’ve just (correctly) assumed each pair lines up (in this case).
1) You shift both entire proteins so that the centroids of only one chain of each ends up at the origin. Remember, the centroid is defined by only a subset of atoms using one chain for each protein, not both chains for each protein.
2) You apply the rotation matrix to every single atom of only the first protein. Remember, the rotation matrix is defined as a distance minimization function between some atoms in the second protein and some atoms in the first protein– in that order (think ‘difference’, but not exactly [non-commutative binary function]).
Eddie Ma
23 Feb 10 at 8:45 am
A3 Exercise 1 – A Hint from Fiona
See slide 49 of RMSD lecture
Eddie Ma
24 Feb 10 at 9:49 pm
A3 Exercise 6 – Frustration
Draft Image 0: A crudely sketched plane with sophisticated imaging software.
Draft Image 1: Connecting the interesting carbons about the centroid at origin.
Eddie Ma
24 Feb 10 at 10:30 pm
A3 Exercise 6 – A Different Approach
Hint: This method won’t allot you the same number of marks as if you derive the solution based on the least squares approach indicated. If you treat this problem like one you’ve already seen in this assignment, and justify whether you are minimizing or maximizing some important quantity– then you can arrive at this alternative solution.
Now that the due date has come and gone, I can reveal what my solution was. I didn’t want to post it previously because it doesn’t follow the pattern that Dr. Burkowski wanted you to derive.
We previously worked on the inertial axis problem in Exercise 2A. This current problem ends up being fairly similar if you consider that the normal unit vector (N) to the plane can be treated as an axis of maximum squares distance. We’ve really just restated the least squares distance problem against a plane! The difference here is that you must now accept the eigenvector for the tensor for the axis corresponding to the maximum eigenvalue. This corresponds to the unit vector that draws the figure with the largest area given the carbons of interest (not convinced? It’s quadratic– maximizing area means reducing the spread in distance away from the centroid– just like drawing a thin rectangle versus a square).
Incidentally, I had wondered how to find a best fit plane when working on TIM barrels before
Now I have working code to that effect.
Eddie Ma
25 Feb 10 at 3:48 am
[...] CS[46]83 A3 Hints « Oh, right! Newick format for tree representation. [...]
Python Crash Course & CS[64]83 Next Directions « Ed's Big Plans
28 Feb 10 at 6:41 pm
EPILOGUE: Protein Structure Overlap Problem
Note to self — If I’m going to use this in real life, there’s a standing issue I have to look after. In either my rotation (likely) or translation (unlikely) operation, helices get crushed strangely if I’m working with a protein with helices. This is an important bug that should be addressed before being used in production.
Eddie Ma
16 Mar 10 at 11:37 am