Visualizing bio-molecules in Computer[Part-1]: Let us inspect a PDB file and see it using VMD

in #steemstem8 years ago (edited)

Animated GIF-source.gif
3.6-Angstrom cryoEM structure of human adenovirus type 5 protein from here. Only showing alpha carbons. Each atom is colored on the basis of residue type. This is visualized in VMD software. Non-polar(hydrophobic) residues are white, basic residues are blue, acidic residues are red and polar residues in green.

Hi friends... I am starting a new series of articles which will focus on biomolecular visualization and analysis. Today let us look into PDB format and get started in visualizing biomolecules using a computer! For this purpose, I will use VMD software.

Requirements

  • I suggest beginners go through this nice and friendly article by @suesa: The Building Blocks of Life - Basics.
  • Installation of VMD software on your computer: VMD is a Molecular Dynamics visualization software developed by Theoretical and Computational Biophysics group from the University of Illinois at Urbana-Champaign. You can download it from here. You will need to register there to download this free software. If you are having issues with the download or having trouble with installation, please ask for help in the comments (or you can ask me in discord. My discord handle: @dexterdev#4231). :smile:
  • Download a PDB file from RCSB website.

Choosing a biomolecule

Let us take an available protein structure from RCSB website. I am selecting an HIV capsid protein for now. If you search 1E6J, you can find the structure in RCSB website.

A screenshot from RCSB website corresponding to 1E6J protein data bank entry. You can download the PDB file from the download button(in blue) by choosing PDB option from it.

Understanding PDB format

As choosing 1E6J.pdb as example, let us understand the content and format of a PDB(protein data bank) file. If you open 1E6J.pdb, you will find a file with a structure as below:

HEADER    VIRAL PROTEIN                           18-AUG-00   1E6J              
TITLE     CRYSTAL STRUCTURE OF HIV-1 CAPSID PROTEIN (P24) IN COMPLEX            
TITLE    2 WITH FAB13B5 
COMPND    MOL_ID: 1;                                                            
COMPND   2 MOLECULE: IMMUNOGLOBULIN;                                            
COMPND   3 CHAIN: L;                                                            
COMPND   4 FRAGMENT: LIGHT CHAIN 1-210;                                         
COMPND   5 OTHER_DETAILS: OBTAINED BY PAPAIN CLEAVAGE (FAB);                    
COMPND   6 MOL_ID: 2;                                                           
COMPND   7 MOLECULE: IMMUNOGLOBULIN;                                            
COMPND   8 CHAIN: H;                                                            
COMPND   9 FRAGMENT: HEAVY CHAIN 1-219;                                         
COMPND  10 OTHER_DETAILS: OBTAINED BY PAPAIN CLEAVAGE (FAB);                    
COMPND  11 MOL_ID: 3;                                                           
COMPND  12 MOLECULE: CAPSID PROTEIN P24;                                        
COMPND  13 CHAIN: P;                                                            
COMPND  14 FRAGMENT: GAG POLYPROTEIN RESIDUES 143-352;                          
COMPND  15 SYNONYM: CA;                                                         
COMPND  16 ENGINEERED: YES;                                                     
COMPND  17 OTHER_DETAILS: HIS6 TAG AT N-TERM, PRO1 AND ILE2 DELETED,            
COMPND  18  C-TERMINUS MODIFIED         
.
.
.
.
.

KEYWDS    HIV CAPSID PROTEIN (P24), P24, FAB, HIV-1, VIRUS ASSEMBLY, CAPSID,    
KEYWDS   2 CA, ANTIGEN, ANTIBODY, PROTEIN-PROTEIN INTERACTIONS, VIRAL PROTEIN   
EXPDTA    X-RAY DIFFRACTION                                                     
AUTHOR    C.BERTHET-COLOMINAS,S.MONACO,A.NOVELLI,G.SIBAI,F.MALLET, S.CUSACK     
      
.
.
.
.
.

REMARK 470 MISSING ATOM                                                         
REMARK 470 THE FOLLOWING RESIDUES HAVE MISSING ATOMS (M=MODEL NUMBER;           
REMARK 470 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE NUMBER;          
REMARK 470 I=INSERTION CODE):                                                   
REMARK 470   M RES CSSEQI  ATOMS                                                
REMARK 470     PRO H 219    O                                                   
REMARK 470     ASN L 210    O                                                   
REMARK 470     GLY P 220    O                                                   
REMARK 475                                                                      
REMARK 475 ZERO OCCUPANCY RESIDUES       
      
.
.
.
.
.

SEQRES   1 H  219  GLU VAL GLN LEU GLN GLN SER GLY ALA GLU LEU ALA ARG          
SEQRES   2 H  219  PRO GLY ALA SER VAL LYS MET SER CYS LYS ALA SER GLY          
SEQRES   3 H  219  TYR THR PHE THR SER TYR THR MET HIS TRP VAL LYS GLN          
SEQRES   4 H  219  ARG PRO GLY GLN GLY LEU GLU TRP ILE GLY TYR ILE ASN          
SEQRES   5 H  219  PRO SER SER GLY TYR SER ASN TYR ASN GLN LYS PHE LYS        
      
.
.
.
.
.

ATOM   3430  N   LYS P  25     -18.910  -9.433  -1.456  1.00 54.99           N  
ATOM   3431  CA  LYS P  25     -17.777 -10.242  -1.877  1.00 58.12           C  
ATOM   3432  C   LYS P  25     -16.454  -9.507  -1.700  1.00 57.40           C  
ATOM   3433  O   LYS P  25     -15.452 -10.114  -1.325  1.00 58.93           O  
ATOM   3434  CB  LYS P  25     -17.962 -10.695  -3.327  1.00 63.00           C  
ATOM   3435  CG  LYS P  25     -17.782 -12.196  -3.527  1.00 74.97           C  
ATOM   3436  CD  LYS P  25     -18.598 -13.022  -2.529  1.00 81.03           C  
ATOM   3437  CE  LYS P  25     -20.085 -12.688  -2.570  1.00 85.48           C  
ATOM   3438  NZ  LYS P  25     -20.852 -13.453  -1.550  1.00 81.37           N  
ATOM   3439  N   VAL P  26     -16.460  -8.200  -1.947  1.00 58.95           N  
ATOM   3440  CA  VAL P  26     -15.259  -7.388  -1.795  1.00 53.87           C  
ATOM   3441  C   VAL P  26     -14.793  -7.456  -0.347  1.00 54.47           C  
ATOM   3442  O   VAL P  26     -13.702  -7.948  -0.072  1.00 61.02           O  
ATOM   3443  CB  VAL P  26     -15.504  -5.912  -2.190  1.00 48.70           C  
ATOM   3444  CG1 VAL P  26     -14.259  -5.078  -1.922  1.00 46.68           C  
ATOM   3445  CG2 VAL P  26     -15.873  -5.821  -3.657  1.00 45.76           C  
ATOM   3446  N   VAL P  27     -15.637  -7.009   0.578  1.00 57.38           N  
ATOM   3447  CA  VAL P  27     -15.305  -7.026   2.005  1.00 65.79           C  
ATOM   3448  C   VAL P  27     -14.978  -8.450   2.460  1.00 70.41           C  
ATOM   3449  O   VAL P  27     -14.173  -8.661   3.371  1.00 70.09           O
      
.
.
.
.
.

HETATM.....

In the above file, you can find few records like:

  • HEADER: Basically is the first line in PDB file. It's like a heading of a document. In this example, it says VIRAL PROTEIN, a date 18-AUG-00(most probably the date when this structure was published) and the PDB id 1E6J.

  • TITLE: A slightly more detailed description.

  • COMPND: Compound details. The name of chains(In this example 3 chains L, H, and P are 3 proteins) and description of molecules.

  • EXPDTA: How the molecule structure was decoded. In this example, it was X-RAY diffraction.

  • AUTHOR: The researchers who published the work.

  • REMARK: Details like missing atoms etc

  • SEQRES: The amino-acid chain sequence in short. So protein is a polymer made of different amino acids. This field will have amino-acid sequence using their 3 letter short forms.

  • ATOM: This field contains the real data. Till now it was like comments in a computer code. I will illustrate the information below:

    By the way occupancy is the number which describes if the atom can have another coordinate. Say if occupancy=1.0, that means the XYZ coordinate of the atom is that particular coordinate only. Sometimes there can be multiple coordinates for same atoms. Like 2 entries with 60 and 40 percentages. So if you want to choose one coordinate you can select 60% case. Higher temperature factors indicate high fluctuations in those particular atoms.

  • HETATM: Hetero-atom entry the details like ATOM, but for atoms which are not part of the protein itself. For 1E6J.pdb, there are no HETATMs.

Let us start then

So by now, we have an idea what is inside the PDB file. So, let us look this using a visualization software. We can use VMD for this purpose. My machine is a Linux one. So I will just type vmd 1E6J.pdb in my command prompt.

OK. So we visualized a molecule! :tada:
Now let us see if I can visualize it in a different style. From the VMD Main window, let us select Graphics-->Representation option.


In the Drawing method, select NewCartoon option. And the result is as below:


You can change from Perspective to Orthographic in Display option from VMD Main window. Scrolling on the molecule enables you to zoom in and zoom out. I am not bombarding with too much information. So I will stop here.

Summary

  • We looked into the PDB file format
  • Visualized 1E6J.pdb using VMD. Of course, there are lot more things which we can do.

Coming up next

  • VMD Shortcuts
  • About the powerful Tcl scripting environment embedded in VMD
  • Visualize each chain in the PDB file individually

References

  • Humphrey, W., Dalke, A. and Schulten, K., "VMD - Visual Molecular Dynamics", J. Molec. Graphics, 1996, vol. 14, pp. 33-38.
  • VMD userguide
  • VMD tutorial
  • Citing PDB related papers: See here.
  • Monaco-Malbet, Stéphanie, et al. "Mutual conformational adaptations in antigen and antibody upon complex formation between an Fab and HIV-1 capsid protein p24." Structure 8.10 (2000): 1069-1077. (1E6J structure related paper)

Join #steemSTEM

Join the active science community #steemSTEM at discord: https://discord.gg/BZXkmWw

Image courtesy: @elvisxx71

And to steemSTEM beginners:

You can ask for help in our discord page. There are people ready to help you there.

gif courtesy: @rocking-dave


All images without image sources are my creations :)

Follow me @dexterdev


 ____ _______  ______ _________ ____ ______    
/  _ /  __\  \//__ __/  __/  __/  _ /  __/ \ |\
| | \|  \  \  /  / \ |  \ |  \/| | \|  \ | | //
| |_/|  /_ /  \  | | |  /_|    | |_/|  /_| \// 
\____\____/__/\\ \_/ \____\_/\_\____\____\__/

Sort:  

FLIPPEN AMAZING. Its incredible what computers c\an do and how they are helping us move forward in the world. My friends studying chemistry had no clue what this software was but were as amazed as i am by this post. Thank you for sharing!!

Glad to hear that you liked it.

Nice work @dexterdev!

I made my Master final thesis in structural bioinformatics. PDB format is so ugly... I had to deal with it, topologies, and force fields files, it was a bit exasperating!

Cheers!

why PDB format is ugly?

I don't like the fact that it is a fixed-width column format. When I worked with PDB files, I used to make changes in the PDB files (because of project needs) and these changes, habitually used to generate errors in the file :S
It seems a little archaic format to me.

Nice one @dexterdev Its somewhat similar to the struture visualization in pymol, never tried VMD though. Going to definitely try it, thanks for sharing.

Pymol is user friendly I think. But vmd is more powerful as I understand.

I sometimes wish I could really communicate and understand the codes defined in the system more about bio-molecules but i like the way you simplified the interpretation for better understanding...Good work sir Dav

There is no complicated code here. All what I have explained is the format of a pdb file which is fairly simple. Let me know if you have any doubt.

No doubt...am just admiring the placement of codes while describing molecules..

No doubt...am just admiring the placement of codes while describing molecules..

Awesome work @dexterdev, thank you for providing a download source for the VMD software and valuable info regarding the content and format of PDB files.

Biomolecular engineering (is that right?) is not my field, however I do realize the significance of tutorials for young scientists.

Thank you for you efforts!

Thank you. Computational biophysics may be more apt to call this.

Thanks for the response.

Thanks @dexterdev. Am really swayed by this software. Can this version of PDB file be used in computational chemistry for molecule structure and properties? Thanks

PDB file format can technically encode any molecule. So it should be. (although PDB is protein data bank)

Can this version of PDB file be used in computational chemistry

If you are asking about VMD software, if you have a valid PDB file you can use it. VMD also supports other formats.

Okay. Thanks. It's been awhile we chat!!!.

Loading...