Extracting fields from a Protein Data Bank file (PDB) using python

Sometime, for large pdb files, one cannot use the split command from python. Below is the python line to extract fields from a pdb file:

with open('min.pdb') as pdbfile:
    for line in pdbfile:
        if line[:4] == 'ATOM' or line[:6] == "HETATM":
            print line
            splitted_line = [line[:6], line[6:11], line[12:16], line[17:20], line[21], line[22:26], line[30:38], line[38:46], line[46:54]]
            print splitted_line
            print "%-6s%5s %4s %3s %s%4s    %8s%8s%8s\n"%tuple(splitted_line)

Sample output:

# Original line
HETATM10000  O   HOH B3257       2.509  40.006  -4.097  1.00  0.00           O
# Splitted line
['HETATM', '10000', ' O  ', 'HOH', 'B', '3257', '   2.509', '  40.006', '  -4.097']
# Go back to pdb formatted line
HETATM10000  O   HOH B3257       2.509  40.006  -4.097
If you want to ask me a question or leave me a message add @bougui505 in your comment.