
Hi friends... I am starting a new series of articles which will focus on biomolecular visualization and analysis. Today let us look into PDB format and get started in visualizing biomolecules using a computer! For this purpose, I will use VMD software.
Requirements
- I suggest beginners go through this nice and friendly article by @suesa: The Building Blocks of Life - Basics.
- Installation of VMD software on your computer: VMD is a Molecular Dynamics visualization software developed by Theoretical and Computational Biophysics group from the University of Illinois at Urbana-Champaign. You can download it from here. You will need to register there to download this free software. If you are having issues with the download or having trouble with installation, please ask for help in the comments (or you can ask me in discord. My discord handle: @dexterdev#4231). :smile:
- Download a PDB file from RCSB website.
Choosing a biomolecule
Let us take an available protein structure from RCSB website. I am selecting an HIV capsid protein for now. If you search 1E6J, you can find the structure in RCSB website.

Understanding PDB format
As choosing 1E6J.pdb as example, let us understand the content and format of a PDB(protein data bank) file. If you open 1E6J.pdb, you will find a file with a structure as below:
HEADER VIRAL PROTEIN 18-AUG-00 1E6J
TITLE CRYSTAL STRUCTURE OF HIV-1 CAPSID PROTEIN (P24) IN COMPLEX
TITLE 2 WITH FAB13B5
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: IMMUNOGLOBULIN;
COMPND 3 CHAIN: L;
COMPND 4 FRAGMENT: LIGHT CHAIN 1-210;
COMPND 5 OTHER_DETAILS: OBTAINED BY PAPAIN CLEAVAGE (FAB);
COMPND 6 MOL_ID: 2;
COMPND 7 MOLECULE: IMMUNOGLOBULIN;
COMPND 8 CHAIN: H;
COMPND 9 FRAGMENT: HEAVY CHAIN 1-219;
COMPND 10 OTHER_DETAILS: OBTAINED BY PAPAIN CLEAVAGE (FAB);
COMPND 11 MOL_ID: 3;
COMPND 12 MOLECULE: CAPSID PROTEIN P24;
COMPND 13 CHAIN: P;
COMPND 14 FRAGMENT: GAG POLYPROTEIN RESIDUES 143-352;
COMPND 15 SYNONYM: CA;
COMPND 16 ENGINEERED: YES;
COMPND 17 OTHER_DETAILS: HIS6 TAG AT N-TERM, PRO1 AND ILE2 DELETED,
COMPND 18 C-TERMINUS MODIFIED
.
.
.
.
.
KEYWDS HIV CAPSID PROTEIN (P24), P24, FAB, HIV-1, VIRUS ASSEMBLY, CAPSID,
KEYWDS 2 CA, ANTIGEN, ANTIBODY, PROTEIN-PROTEIN INTERACTIONS, VIRAL PROTEIN
EXPDTA X-RAY DIFFRACTION
AUTHOR C.BERTHET-COLOMINAS,S.MONACO,A.NOVELLI,G.SIBAI,F.MALLET, S.CUSACK
.
.
.
.
.
REMARK 470 MISSING ATOM
REMARK 470 THE FOLLOWING RESIDUES HAVE MISSING ATOMS (M=MODEL NUMBER;
REMARK 470 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE NUMBER;
REMARK 470 I=INSERTION CODE):
REMARK 470 M RES CSSEQI ATOMS
REMARK 470 PRO H 219 O
REMARK 470 ASN L 210 O
REMARK 470 GLY P 220 O
REMARK 475
REMARK 475 ZERO OCCUPANCY RESIDUES
.
.
.
.
.
SEQRES 1 H 219 GLU VAL GLN LEU GLN GLN SER GLY ALA GLU LEU ALA ARG
SEQRES 2 H 219 PRO GLY ALA SER VAL LYS MET SER CYS LYS ALA SER GLY
SEQRES 3 H 219 TYR THR PHE THR SER TYR THR MET HIS TRP VAL LYS GLN
SEQRES 4 H 219 ARG PRO GLY GLN GLY LEU GLU TRP ILE GLY TYR ILE ASN
SEQRES 5 H 219 PRO SER SER GLY TYR SER ASN TYR ASN GLN LYS PHE LYS
.
.
.
.
.
ATOM 3430 N LYS P 25 -18.910 -9.433 -1.456 1.00 54.99 N
ATOM 3431 CA LYS P 25 -17.777 -10.242 -1.877 1.00 58.12 C
ATOM 3432 C LYS P 25 -16.454 -9.507 -1.700 1.00 57.40 C
ATOM 3433 O LYS P 25 -15.452 -10.114 -1.325 1.00 58.93 O
ATOM 3434 CB LYS P 25 -17.962 -10.695 -3.327 1.00 63.00 C
ATOM 3435 CG LYS P 25 -17.782 -12.196 -3.527 1.00 74.97 C
ATOM 3436 CD LYS P 25 -18.598 -13.022 -2.529 1.00 81.03 C
ATOM 3437 CE LYS P 25 -20.085 -12.688 -2.570 1.00 85.48 C
ATOM 3438 NZ LYS P 25 -20.852 -13.453 -1.550 1.00 81.37 N
ATOM 3439 N VAL P 26 -16.460 -8.200 -1.947 1.00 58.95 N
ATOM 3440 CA VAL P 26 -15.259 -7.388 -1.795 1.00 53.87 C
ATOM 3441 C VAL P 26 -14.793 -7.456 -0.347 1.00 54.47 C
ATOM 3442 O VAL P 26 -13.702 -7.948 -0.072 1.00 61.02 O
ATOM 3443 CB VAL P 26 -15.504 -5.912 -2.190 1.00 48.70 C
ATOM 3444 CG1 VAL P 26 -14.259 -5.078 -1.922 1.00 46.68 C
ATOM 3445 CG2 VAL P 26 -15.873 -5.821 -3.657 1.00 45.76 C
ATOM 3446 N VAL P 27 -15.637 -7.009 0.578 1.00 57.38 N
ATOM 3447 CA VAL P 27 -15.305 -7.026 2.005 1.00 65.79 C
ATOM 3448 C VAL P 27 -14.978 -8.450 2.460 1.00 70.41 C
ATOM 3449 O VAL P 27 -14.173 -8.661 3.371 1.00 70.09 O
.
.
.
.
.
HETATM.....
In the above file, you can find few records like:
HEADER: Basically is the first line in PDB file. It's like a heading of a document. In this example, it saysVIRAL PROTEIN, a date18-AUG-00(most probably the date when this structure was published) and the PDB id1E6J.TITLE: A slightly more detailed description.COMPND: Compound details. The name of chains(In this example 3 chains L, H, and P are 3 proteins) and description of molecules.EXPDTA: How the molecule structure was decoded. In this example, it was X-RAY diffraction.AUTHOR: The researchers who published the work.REMARK: Details like missing atoms etcSEQRES: The amino-acid chain sequence in short. So protein is a polymer made of different amino acids. This field will have amino-acid sequence using their 3 letter short forms.ATOM: This field contains the real data. Till now it was like comments in a computer code. I will illustrate the information below:
By the way occupancy is the number which describes if the atom can have another coordinate. Say if occupancy=1.0, that means the XYZ coordinate of the atom is that particular coordinate only. Sometimes there can be multiple coordinates for same atoms. Like 2 entries with 60 and 40 percentages. So if you want to choose one coordinate you can select 60% case. Higher temperature factors indicate high fluctuations in those particular atoms.
HETATM: Hetero-atom entry the details likeATOM, but for atoms which are not part of the protein itself. For 1E6J.pdb, there are no HETATMs.
Let us start then
So by now, we have an idea what is inside the PDB file. So, let us look this using a visualization software. We can use VMD for this purpose. My machine is a Linux one. So I will just type vmd 1E6J.pdb in my command prompt.

OK. So we visualized a molecule! :tada:
Now let us see if I can visualize it in a different style. From the VMD Main window, let us select Graphics-->Representation option.

In the
Drawing method, select NewCartoon option. And the result is as below:

You can change from
Perspective to Orthographic in Display option from VMD Main window. Scrolling on the molecule enables you to zoom in and zoom out. I am not bombarding with too much information. So I will stop here.
Summary
- We looked into the PDB file format
- Visualized 1E6J.pdb using VMD. Of course, there are lot more things which we can do.
Coming up next
- VMD Shortcuts
- About the powerful Tcl scripting environment embedded in VMD
- Visualize each chain in the PDB file individually
References
- Humphrey, W., Dalke, A. and Schulten, K., "VMD - Visual Molecular Dynamics", J. Molec. Graphics, 1996, vol. 14, pp. 33-38.
- VMD userguide
- VMD tutorial
- Citing PDB related papers: See here.
- Monaco-Malbet, Stéphanie, et al. "Mutual conformational adaptations in antigen and antibody upon complex formation between an Fab and HIV-1 capsid protein p24." Structure 8.10 (2000): 1069-1077. (1E6J structure related paper)
Join #steemSTEM

And to steemSTEM beginners:

Follow me @dexterdev
____ _______ ______ _________ ____ ______
/ _ / __\ \//__ __/ __/ __/ _ / __/ \ |\
| | \| \ \ / / \ | \ | \/| | \| \ | | //
| |_/| /_ / \ | | | /_| | |_/| /_| \//
\____\____/__/\\ \_/ \____\_/\_\____\____\__/
FLIPPEN AMAZING. Its incredible what computers c\an do and how they are helping us move forward in the world. My friends studying chemistry had no clue what this software was but were as amazed as i am by this post. Thank you for sharing!!
Glad to hear that you liked it.
Nice work @dexterdev!
I made my Master final thesis in structural bioinformatics. PDB format is so ugly... I had to deal with it, topologies, and force fields files, it was a bit exasperating!
Cheers!
why PDB format is ugly?
I don't like the fact that it is a fixed-width column format. When I worked with PDB files, I used to make changes in the PDB files (because of project needs) and these changes, habitually used to generate errors in the file :S
It seems a little archaic format to me.
hmm :)
Nice one @dexterdev Its somewhat similar to the struture visualization in pymol, never tried VMD though. Going to definitely try it, thanks for sharing.
Pymol is user friendly I think. But vmd is more powerful as I understand.
I sometimes wish I could really communicate and understand the codes defined in the system more about bio-molecules but i like the way you simplified the interpretation for better understanding...Good work sir Dav
There is no complicated code here. All what I have explained is the format of a pdb file which is fairly simple. Let me know if you have any doubt.
No doubt...am just admiring the placement of codes while describing molecules..
No doubt...am just admiring the placement of codes while describing molecules..
Awesome work @dexterdev, thank you for providing a download source for the VMD software and valuable info regarding the content and format of PDB files.
Biomolecular engineering (is that right?) is not my field, however I do realize the significance of tutorials for young scientists.
Thank you for you efforts!
Thank you. Computational biophysics may be more apt to call this.
Thanks for the response.
Thanks @dexterdev. Am really swayed by this software. Can this version of PDB file be used in computational chemistry for molecule structure and properties? Thanks
PDB file format can technically encode any molecule. So it should be. (although PDB is protein data bank)
If you are asking about VMD software, if you have a valid PDB file you can use it. VMD also supports other formats.
Okay. Thanks. It's been awhile we chat!!!.