Split a multi model pdb from shell

awk '$0 ~ /ATOM      1/ {i++} {print >> "pdbs/out_"i".pdb"} {fflush("pdbs/out_"i".pdb")}' filename.pdb

It’s possible to use the csplit command too. In the example below, I’ve a pdb file containing 50000 models delimited by MODEL X and ENDMDL:

csplit -z -f /dev/shm/docking_ -n 5 docking_results.pdb '/ENDMDL/1' '{49999}'

-n: number of digits

-z: remove empty output files

In the previous example the output files are named (in the /dev/shm/ directory):

docking_00000
docking_00001
...
docking_49999

If you want to add a suffix (for example .pdb) you have to specify the format:

csplit -z -f /dev/shm/docking_ -b '%05d.pdb' docking_results.pdb '/ENDMDL/1' '{49999}'

Below is the final function:

function splitpdb () {
    n_models=$(grep -c MODEL $1)
    csplit -z -f /dev/shm/model_ -b '%04d.pdb' $1 '/ENDMDL/1' "{$(expr $n_models - 1)}"
}

awk '$0 ~ /ATOM 1/ {i++} {print >> "pdbs/smap_"int(i/100)".pdb"} {fflush("pdbs/smap_"int(i/100)".pdb")}' smap.pdb

If you want to ask me a question or leave me a message add @bougui505 in your comment.