10 PDB Files and PyMOL Visualizations
If you have taken BS1005: Biochemistry I, then the contents of this practical will be relatively straightforward1.
This lab aims to provide a gentle introduction to PDB files and also PyMOL.
10.1 PDB Files
The first part of the lab examines the structure of a PDB typical PDB file. The later part of this lab then requires you to write your own PDB file for a water molecule (before opening it up in PyMOL to ensure that you have done it correctly.)
The Protein Data Bank (i.e., PDB) is an initiative by the Brookhaven National Laboratory that stores three-dimensional data of biological molecules (i.e., biomolecules). A considerable amount of information has since been uploaded to the PDB from various fields in Biology, including but not limited to genomics, proteonomics, and structural biology.
10.1.1 What is a PDB File?
A PDB file is a plain text document that contains metainformation about a biomolecule - for instance, its atoms, its side chains, and its amino acid residues.
The PDB also has official documentation on the structure of its files that you can view for reference.
10.1.2 Points to Note
Although a typical PDB file contains numerous section and record types, BS3008 only focuses on a few of these records (and their important parts):
The
ATOM
RecordThis is used to define atoms in a PDB file. With this record, the file writer can also define the positions of the atoms (in a three-dimensional space), define side-chain groups, and also define amino acid residues.
Note that all coordinates of length will be in Angstroms in a PDB file.
B-factors
This is also known as the temperature factor. B-factors describe how much an atom’s placement differs from its average values - the higher the B-factor, the higher its displacement.
Areas with high B-factors are usually red (i.e., hot) and vice versa.
Occupancy
This parameter is associated with different conformations of a biomolecule.
Depending on how many conformations there are available, the occupancy may be less than one (but whatever the case, the values of all occupancices will always add up to one).
10.1.3 Making a PDB File for a Water Molecule
A water molecule has two hydrogen atoms and one oxygen atom. Its O-H bond length is 0.957 Angstroms and its H-O-H bond angle is \(109^\circ\).
Hence, first set the Oxygen atom to the origin (0, 0, 0) using an ATOM record2:
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890123456789
ATOM 1 O A 1 0.000 0.000 0.000 1.00 0.00 O
Thereafter, add in the first hydrogen atom at the coordinates (0.957, 0, 0) using another ATOM record. It not necessary to use the aforementioned coordinates - (0, 0.957, 0) and (0, 0, 0.957) also work:
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890123456789
ATOM 1 O A 1 0.000 0.000 0.000 1.00 0.00 O
ATOM 2 H A 1 0.957 0.000 0.000 1.00 0.00 H
Basic trigonometry can then be used to discover the coordinates of the final hydrogen atom. In this case:
\[\begin{align} x &= -\cos(71^\circ) * 0.957 \\ y &= \sin(71^\circ) * 0.957 \end{align}\]
Doing so should yield a pair of coordinates \((-0.312, 0.905)\). The final pair of coordinates (with TER
and END
records) should be:
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890123456789
ATOM 1 O A 1 0.000 0.000 0.000 1.00 0.00 O
ATOM 2 H A 1 0.957 0.000 0.000 1.00 0.00 H
ATOM 3 H A 1 -0.312 0.957 0.000 1.00 0.00 H
TER 4
END
You can open your PDB file in PyMOL to verify that your PDB file is correct.
10.2 Visualizing Molecules in PyMOL
PyMOL is an open-source software for visualizing molecules and measuring their various properties. These “molecules” are often stored in a PDB file format (hence the reason why the first portion of the practical deals with PDB files).
1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
ATOM 38 N A -33.869 10.617 7.317 1.00 76.26 N
ATOM 39 CA A -34.134 9.234 6.904 1.00 74.56 C
ATOM 40 C A -33.089 8.813 5.875 1.00 73.03 C
ATOM 41 O A -32.729 9.593 4.989 1.00 72.13 O
ATOM 42 CB A -35.552 9.101 6.317 1.00 75.48 C
ATOM 43 CG A -36.647 9.421 7.338 1.00 77.63 C
ATOM 44 OD1 A -36.667 8.874 8.452 1.00 81.93 O
ATOM 45 ND2 A -37.570 10.304 6.961 1.00 75.30 N
ATOM 46 N A -32.604 7.582 5.965 1.00 74.10 N
ATOM 47 CA A -31.570 7.168 5.019 1.00 77.48 C
ATOM 48 C A -32.043 6.300 3.846 1.00 73.71 C
ATOM 49 O A -31.538 5.201 3.613 1.00 76.65 O
ATOM 50 CB A -30.416 6.504 5.793 1.00 83.44 C
ATOM 51 CG A -29.901 7.343 6.934 1.00 90.57 C
ATOM 52 ND1 A -29.361 8.601 6.754 1.00 92.16 N
ATOM 53 CD2 A -29.899 7.125 8.275 1.00 92.50 C
ATOM 54 CE1 A -29.051 9.123 7.930 1.00 91.80 C
ATOM 55 NE2 A -29.369 8.247 8.871 1.00 92.32 N
ATOM 56 N A -33.001 6.831 3.089 1.00 67.38 N
ATOM 57 CA A -33.551 6.148 1.924 1.00 59.99 C
ATOM 58 C A -33.046 6.785 0.618 1.00 55.78 C
ATOM 59 O A -32.178 7.656 0.640 1.00 54.77 O
ATOM 60 CB A -35.079 6.191 1.993 1.00 58.20 C
ATOM 61 CG A -35.615 7.603 2.156 1.00 55.61 C
ATOM 62 OD1 A -36.782 7.745 2.593 1.00 55.40 O
ATOM 63 OD2 A -34.876 8.561 1.843 1.00 55.57 O
ATOM 64 N A -33.583 6.349 -0.515 1.00 54.36 N
ATOM 65 CA A -33.167 6.876 -1.814 1.00 53.16 C
ATOM 66 C A -33.626 8.317 -2.061 1.00 51.53 C
ATOM 67 O A -33.041 9.039 -2.874 1.00 50.28 O
ATOM 68 CB A -33.673 5.962 -2.929 1.00 56.36 C
ATOM 69 CG A -33.144 4.523 -2.821 1.00 59.12 C
ATOM 70 CD A -33.329 3.750 -4.125 1.00 57.06 C
ATOM 71 CE A -32.718 2.350 -4.022 1.00 60.09 C
ATOM 72 NZ A -32.740 1.596 -5.327 1.00 61.46 N
ATOM 73 N A -34.685 8.721 -1.368 1.00 48.02 N
ATOM 74 CA A -35.197 10.077 -1.478 1.00 42.97 C
ATOM 75 C A -34.131 10.992 -0.891 1.00 43.60 C
ATOM 76 O A -33.858 12.059 -1.426 1.00 44.56 O
ATOM 77 CB A -36.487 10.260 -0.668 1.00 40.66 C
ATOM 78 CG1 A -37.593 9.371 -1.249 1.00 39.01 C
ATOM 79 CG2 A -36.885 11.737 -0.658 1.00 39.33 C
ATOM 80 CD1 A -38.855 9.309 -0.390 1.00 36.41 C
In the second part of the practical, you are given a PDB file (see above) for a polypeptide that is missing its amino acid labels. You will need to open the file using PyMOL, identify the polypeptide’s N and C terminals, and fill in the three-letter code for each ATOM
record’s amino acid in columns 18 to 20.