Putting Proteins into Bilayers (v 3.1)

To incorporate a protein into a lipid bilayer

The principle is to build a molecular surface of the protein, and use a modified version of mdrun that reads the surface and puts an outward force on all atoms within it. The surface can be made using the programs GRASP or MSMS . If you don't have, and for some reason can't get, either grasp or MSMS, then you can also use the program to make a cylindrical hole, which will be good enough for a lot of membrane proteins.

The instructions 1-6 below will work if your protein has no holes. If it has a hole, i.e. is a channel, things might be a bit more difficult-- see the "possible problems" section.

This is a more chatty version of the protocol described in:

"Setting up and optimization of membrane protein simulations"
Jose D. Faraldo-Gómez, Graham R. Smith and Mark S. P. Sansom
Eur. Biophys J (2002) 31:217-227
If there are differences, the protocol described in the paper takes precedence. If you use this program, please cite this reference (as well as the usual gromacs references).
  1. We assume that you have a pdb and topology files for a pre-equilibrated hydrated lipid bilayer lying roughly in the x-y plane, and the protein you are going to insert will also have its axis along z.

    Get a pdb of your protein. Position it correctly in x,y and z relative to the bilayer (editconf, Quanta or grasp can do this), and align it normal to the bilayer, which should mean roughly along z (again, Quanta if you're doing it by hand)

    Now: prepare the Grasp (2) or msms (3) molecular surface, or if you just want a cylindrical hole go directly to (4)

  2. Run Grasp on the protein pdb. You use the right mouse button to go down through the menus; menu cascades are indicated with >.
    1. Set Parameters > System Miscellaneous > Maximum Surface Resolution. Set it to about 0.33 grids/Angstrom.
    2. Build > Accessible (NOT molecular) Surface > All Atoms (or a selection, if your pdb contains more atoms than you want to put into the bilayer).
    3. Mouse Functions > Scribing
    4. Draw a short line on the visible grasp surface of the protein by holding down the left mouse and dragging. You should get a blue line. Then double click on this line. An area of lime green should quickly extend out from the line to cover the whole of the visible surface.
    5. Write > Grasp Surface File > (give output file name if you want) > Absolute Centering > Currently Scribed Surface (NOT All Surfaces). It will tell you how many vertices and triangles it uses.
    6. Since you are running grasp, you must have a Silicon Graphics computer. If you also want to run mdrun on this machine, you can use the file you have just made with the program in Grasp_bin mode (see below). However, you may well want to run mdrun on an Intel-Linux machine, and this cannot read the binary surface. To convert the binary surface to ascii, compile the program readgsurf.f in src/tools on your sgi:
             f77 -o readgsurf readgsurf.f
             
      and run with
             readgsurf 
             
      It then asks you from the command line (or a < inputfile) for input and output file names and the number of vertices and triangles. The output file is ascii and can be read into the modified mdrun on any architecture if the Grasp_ascii option is specified.

  3. Run MSMS and ancilliary programs from the command line
    1. (optional but probably a good idea) put some dummy atoms in your initial protein pdb to fill any internal cavities. It doesn't matter if they overlap the protein atoms in an unphysical way.
    2. pdb_to_xyzr myfile.pdb > myfile.xyzr
    3. msms -if myfile.xyzr -probe_radius 1.4 -density 0.33 -of myfile
      creates myfile.vert, myfile.face (you may need higher -density, eg 1.0?)

  4. Now get the pdb or gro file for the lipid bilayer. First, remove some lipids that are certain to overlap with the inserted molecule (This step is not essential but will reduce the extent to which the bilayer is perturbed by the mdrun stage below. The script is in the tools directory), e.g.:
       make_hole.pl -f inbilayer.pdb -o outbilayer.pdb -r 1.4 -lipat P8 -lipid DMPC -cx 2.3 -cy 2.8 
       
    This removes all DMPC lipids who have their P8 atom within 1.4 nm of the point (2.3, 2.8). Prepare a .tpr with grompp, containing just the lipid and solvent (i.e NOT the protein you are intending to insert). Without restraint the lipid leaflets tend to separate, so it will help to put restraints on the z-coordinates of some of the lipid atoms ; To restrain just the P8 atoms of POPC, use a file lipid_posre.itp containing e.g.
    [ position_restraints ]
    ; atom  type      fx      fy      fz
         8     1     0.0     0.0  1000.0
    
    (Note that this restrains all Phosphorus atoms in the bilayer, not just the one in the first molecule). Add
    #ifdef POSRES
    #include "lipid_posre.itp"
    #endif
      
    to the .top file, then define = -DPOSRES in the run.mdp file (which is otherwise normal). The example directory share/tutor/make_hole/ contains an example posre file which restrains the phosphorus and the two end-of-chain carbons in popc.

    Now you are ready to run grompp:

      grompp -f run.mdp -s insert.tpr -c outbilayer.pdb -r outbilayer.pdb -p ... 
     
    Now
     mdrun -v -hole -holep hp.mdp -deffnm insert ... 
     
    where all the other parameters you need are set in the new hp.mdp file There are examples of this in share/tutor/make_hole/

    holep mdp file options

    holetype = Cyl, Grasp_bin, Grasp_ascii or MSMS select way to make hole
    hfm = X Force constant for repulsive force due to surface; default (maybe too big) 50 (units: kJ/mol/nm). Choosing this is not trivial; In Faraldo-Gómez et al., a two-stage protocol using first 10 then 100 was adopted.
    hr = X radius of hole (Cyl mode only)
    hx = X x crd of centre of hole (Cyl mode only)
    hy = X y crd of centre of hole (Cyl mode only)
    molsurf_file = string Filename of the binary grasp surface file, or the ascii grasp surface file made by readgsurf, OR 'Base' filename of the MSMS surface files, i.e. string = ../sdir/myprot reads both ../sdir/myprot.vert and ../sdir/myprot.face in MSMS mode.
    sofs = X Offset (nm) of molecular surface used relative to the one read from the surface file. Default 0. Can be negative, which might be useful if you had to use a big accessibility probe radius.
    hp1 = I , hp2 = I By default, all atoms in the system that are within the surface experience an outward force NORMAL to the surface. For atom numbers between hp1 and hp2 inclusive,the force is projected onto the (x,y)-plane. (the numbering starts at 1). This may be better for the lipid atoms, as it will not disrupt the bilayer so much. It is probably best to use the default surface normal force for the water.
    supf = I Frequency of updating the neighbour lists of the nearest-neighbour vertex of each atom. The default value of 10 is probably OK, but slows things down quite a lot.
    resforces = yes/no For all atoms in a residue that have a non-zero hole making force, i.e. that are inside the surface, average the forces over the atoms before applying. This helps prevent lipids being spreadeagled over the protein cavity.
    molsurf_log = string Uses this as extra log file and prints time, number of atoms inside the surface, depth inside of deepest.
    debugsurf = yes/no Produces pdb files giving information about the surface forces in the initial configuration, and then stops immediately. The files are insidesurf.pdb and molsurfpdb.pdb: see below.
    sfm = X It is also possible to apply an additional force in the z direction to a selected range of atoms. See Inserting Channels below for the rationale for this. This and the next few options control this force. You specify hz1, hz and hz2, with hz1 < hz < hz2. The force is in the -ve z direction for atoms with hz1 < z < hz. and in the +ve z direction for atoms with hz < z < hz2. sfm is the force constant for this z-directed repulsive force.
    s1 = I , s2 = I First / Last atom to be affected by additional z-directed repulsive force (typically, these will be solvent atoms)
    hz = X Approximate z coordinate of middle of the bilayer
    hz1 = X , hz2 = X z-coordinate of lower/ upper boundary for z-directed force

    It is recommended that you first run with debugsurf = yes to see what's going on, then without. This will stop at step 0; examine insidesurf.pdb and molsurfpdb.pdb.

    insidegr.pdb contains residues of type FDUM, consisting of two dummy atoms for each atom in the bilayer which is experiencing one of the extra forces (molecular surface and/or cylindrical hole and/or perpendicular to bilayer). One dummy atom (select *.CSF) is at the position of the atom, the other (select *.NSF) is displaced 1 angstrom in the direction of the surface force. Thus you can see where these atoms are and if the forces look reasonable (i.e. they are in the right direction - outwards). Atom and residue numbers should be preserved from the bilayer-and-water configuration file. You may well find that there are a few atoms that have a surface force on them even though they are obviously not inside the surface. To see why this is, it helps to look at the other file.

    molsurfpdb.pdb contains residues of type MSF, consisting of two dummy atoms for each vertex of the molecular surface: One type (*.CMS) shows the positions of the vertices in the surface, and the others (*.NMS) are dummy atoms displaced from each vertex along the normal vector at that vertex. You should have a single grey shell with a blue 'outside'. There are two common problems.

    1. One is that the programs will contruct 'pockets' inside if there's enough free space to get any water in, and if you have any such pockets then the insertion may well not work. Getting rid of them (or not writing them out) was the point of the rigmarole in sections 2.3-2.5 above, where you tried to select only the outside grasp surface. If you still have internal pockets, then you have a hole in your outer grasp surface, so the pocket is really an invagination or even a channel. Depending on the specific case, you may want to leave these, (after all there may be water or even lipid in them), but if think you want to get rid of them, you can try increasing the grasp surface probe radius (set parameters > probe radii > surfacing radius). If you have to make it very big, you can to some extent cancel the effect by setting -sofs below to be negative, though the shape of the protein will change too. Or alternatively, put some dummy atoms in your initial protein pdb to fill the holes before you run the molecular surface generator. (with MSMS, this is your only option). It doesn't matter if they overlap the protein atoms in an unphysical way.
    2. The other problem is that some surface normals can point 'inwards' because of protruberances in local regions of the surface. These cause surface-generating forces to be put on atoms that are in fact outside the surface. I'm not sure what to do about them. They don't do too much damage, especially with resforces = yes.

    Then do the full run (debugsurf = no).

  5. Choosing the force constants

    With hfm = 25, a short run (10 000 2fs steps) seems to be enough, but you may have to experiment. You can see what's going on by looking at the molsurf_log file.

    The number of atoms inside the surface does not go to zero, and neither does the distance inside of the most deeply-embedded. They relax over a time t_grasp to roughly constant values N_in and d_max. t_grasp, N_in and d_max all decrease as you increase hfm; assuming that the atoms that you are expelling do not interact with each other, then a simple calculation with Boltmann's law implies that N_in and d_max both ~ 1/hfm (and are proportional to the area of the surface).

    This seems to imply that you want hfm to be as large as possible. Unfortunately, t_grasp then also becomes very small (a few ps) and the lipids are likely to be severely 'crushed' in the process of being expelled (and, if you don't use positional z-restraints, they will get pushed out of the bilayer into the solvent, or the leaflets will separate). This compromises one of the ideas behind the method, which was to be able to use a carefully pre-equilibrated bilayer. The forces used in this process are VERY large e.g. even using force -hfm 1 corresponds to a pressure of about 100 atmospheres. On the other hand, it seems that d_max must be 0.4 nm or less before you can introduce and minimize the protein successfully. Thus the best idea is to do a couple of runs increasing hfm (if anyone wants to put some code into the program to do this automatically I'd be grateful - see notes). It has been found that using hfm of, e.g 10 first, then 100, gives OK results.

    But in any case the acid test is:

  6. Merge the coordinate file of the 'real' with that of the bilayer, and see if you can minimize/equilibrate the system.

    More Possible Problems:

    Inserting Channels

    If you are trying to insert a channel (ie, a torus rather than a sphere, topologically speaking), the molecular surface will be such that simple application of the protocol given above will result in some lipids being squeezed into the channel rather than out into the bulk of the bilayer. Probably the easiest thing is to first make a cylindrical hole of radius roughly equal to the outer radius of the channel protein at its narrowest point, then switch to using the grasp surface. First use the make_hole.pl script, then (maybe) clean up the cylindrical hole with mdrun_hole in 'cylinder' mode. Thus, all the lipids will, we hope, be near the outside grasp surface before we begin and so will be pushed into the bulk lipid.

    Moreover, in this case, waters from the bulk solvent might rush into the pore of the channel in an uncontrolled way. In fact this usually doesn't happen, because the surface tension of water on a hydrophobic surface on a molecular scale is huge. I reckon you should try it with no extra stuff and probably nothing will happen. However if it is a problem, you can prevent it using the -holep mdp variables hz, hz1, hz2, s1, s2 and sfm. So s1 and s2 would specify solvent atoms (starting from 1 as in coordinate files) and hz1 and hz2 would be the bottom and top of the bilayer (eg the average coordinate of the phosphorus atoms). If any atom is in a position to feel both the grasp surface force and the perpendicular force, then both are applied.

    Because of use of the extra z-force, or because of high surface tension, the molecular surface might end up with a vacuum in the pore. If this happens, then when you add the real protein coordinates back in step 5, you will also need to resolvate the protein/bilayer system before starting MD.

    Conical Proteins

    If your protein is wider at one side than the other (KcsA etc) then there is a danger you will end up with a higher density in one leaflet than the other. Delete some extra protein-overlapping lipids from the leaflet with the thicker half of the channel before you start. You can use Hole v 2.0 in its Connelly-surface mode to calculate the 'surface area' of the protein as a function of z-position.

    Peripheral Proteins

    If your protein does not make a clean hole all the way through, it may not be clear if there are still any lipids overlapping the surface at the end of the mdrun_hole run. In that case try the following:
    mycompyoota % cat confout.pdb molsurfpdb.pdb > confnsurf.pdb
    mycompyoota % rasmol confnsurf.pdb
    RasMol> select all
    RasMol> wireframe off
    RasMol> select *.CMS    
    RasMol> wireframe on
    RasMol> colour red
    RasMol> select within( 3.0, *.CMS) and not *.CMS 
    RasMol> wireframe on
    RasMol> colour green
    

    Miscellaneous

    If the program seems to hang up, try the same system with 'normal' mdrun to check there's not a problem with the starting coordinates before you come crying to me.

    I found that a cavity appeared in the water box on one run for no apparent reason. Careful with this...

    Careful with nm (mdrun_hole) and Å (grasp/msms).

    Changes in v 3.1 from v 2.0

    Changes in v 2.0 from v 1.6

    Notes: -stuff I want to change.

    One way to change hfm automatically would be to try to keep dN_in/dt constant at some value that will take it to zero over a few hundred picoseconds. You could do this by weak-coupling hfm (like temperature and pressure) to dN/dt: d(hfm)/dt = const * hfm * ( dN/dt - (dN/dt)_desired ). Or something.

    The algorithm that finds the closest vertices to a particular atom just uses an all-vertices-against-all-atoms search. A grid search would be much faster.

    It must be possible to remove the problem of the 'inward facing normals' that cause some atoms outside the surface to have forces applied to them. Maybe average each normal with its neighbours.

    Acknowledgements

    The cylinder code and original perl script was written by Peter Tieleman, the rest of the code is by Graham Smith, and the protocol was developed with Jose Faraldo-Gómez
    5/8/99
    7/12/99
    20/12/99
    28/2/00
    22/4/01 (for v2.0) 
    8-14/10/02 (for v3.1)