Split a multi model pdb from shell
awk '$0 ~ /ATOM 1/ {i++} {print >> "pdbs/out_"i".pdb"} {fflush("pdbs/out_"i".pdb")}' filename.pdb
It’s possible to use the csplit command too. In the example below, I’ve a pdb file containing 50000 models delimited by MODEL X
and ENDMDL
:
csplit -z -f /dev/shm/docking_ -n 5 docking_results.pdb '/ENDMDL/1' '{49999}'
-n
: number of digits
-z
: remove empty output files
In the previous example the output files are named (in the /dev/shm/
directory):
docking_00000
docking_00001
...
docking_49999
If you want to add a suffix (for example .pdb
) you have to specify the format:
csplit -z -f /dev/shm/docking_ -b '%05d.pdb' docking_results.pdb '/ENDMDL/1' '{49999}'
Below is the final function:
function splitpdb () {
n_models=$(grep -c MODEL $1)
csplit -z -f /dev/shm/model_ -b '%04d.pdb' $1 '/ENDMDL/1' "{$(expr $n_models - 1)}"
}
- If you want 100 structures per output file
awk '$0 ~ /ATOM 1/ {i++} {print >> "pdbs/smap_"int(i/100)".pdb"} {fflush("pdbs/smap_"int(i/100)".pdb")}' smap.pdb