PDB File Header¶

This module defines functions for parsing header data from PDB files.

class Chemical(resname)[source]¶

A data structure for storing information on chemical components (or heterogens) in PDB structures.

A Chemical instance has the following attributes:

Attribute	Type	Description (RECORD TYPE)
resname	str	residue name (or chemical component identifier) (HET)
name	str	chemical name (HETNAM)
chain	str	chain identifier (HET)
resnum	int	residue (or sequence) number (HET)
icode	str	insertion code (HET)
natoms	int	number of atoms present in the structure (HET)
description	str	description of the chemical component (HET)
synonyms	list	synonyms (HETSYN)
formula	str	chemical formula (FORMUL)
pdbentry	str	PDB entry that chemical data is extracted from

Chemical class instances can be obtained as follows:

In [1]: from prody import *

In [2]: chemical = parsePDBHeader('1zz2', 'chemicals')[0]

In [3]: chemical
 Out[3]: <Chemical: B11 (1ZZ2_A_362)>

In [4]: chemical.name
 Out[4]: 'N-[3-(4-FLUOROPHENOXY)PHENYL]-4-[(2-HYDROXYBENZYL) AMINO]PIPERIDINE-1-SULFONAMIDE'

In [5]: chemical.natoms
 Out[5]: 33

In [6]: len(chemical)
 Out[6]: 33

chain¶: chain identifier

description¶: description of the chemical component

formula¶: chemical formula

icode¶: insertion code

name¶: chemical name

natoms¶: number of atoms present in the structure

pdbentry¶: PDB entry that chemical data is extracted from

resname¶: residue name (or chemical component identifier)

resnum¶: residue (or sequence) number

synonyms¶: list of synonyms

class Polymer(chid)[source]¶

A data structure for storing information on polymer components (protein or nucleic) of PDB structures.

A Polymer instance has the following attributes:

Attribute	Type	Description (RECORD TYPE)
chid	str	chain identifier
name	str	name of the polymer (macromolecule) (COMPND)
fragment	str	specifies a domain or region of the molecule (COMPND)
synonyms	list	synonyms for the polymer (COMPND)
ec	list	associated Enzyme Commission numbers (COMPND)
engineered	bool	indicates that the polymer was produced using recombinant technology or by purely chemical synthesis (COMPND)
mutation	bool	indicates presence of a mutation (COMPND)
comments	str	additional comments
sequence	str	polymer chain sequence (SEQRES)
dbrefs	list	sequence database records (DBREF[1\|2] and SEQADV), see `DBRef`
modified	list	modified residues (SEQMOD) when modified residues are present, each will be represented as: `(resname, resnum, icode, stdname, comment)`
pdbentry	str	PDB entry that polymer data is extracted from

Polymer class instances can be obtained as follows:

In [7]: polymer = parsePDBHeader('2k39', 'polymers')[0]

In [8]: polymer
 Out[8]: <Polymer: UBIQUITIN (2K39_A)>

In [9]: polymer.pdbentry
 Out[9]: '2K39'

In [10]: polymer.chid
Out[10]: 'A'

In [11]: polymer.name
Out[11]: 'UBIQUITIN'

In [12]: polymer.sequence
Out[12]: 'MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG'

In [13]: len(polymer.sequence)
Out[13]: 76

In [14]: len(polymer)
Out[14]: 76

In [15]: dbref = polymer.dbrefs[0]

In [16]: dbref.database
Out[16]: 'UniProt'

In [17]: dbref.accession
Out[17]: 'P62972'

In [18]: dbref.idcode
Out[18]: 'UBIQ_XENLA'

chid¶: chain identifier

comments¶: additional comments

dbrefs¶: sequence database reference records

ec¶: list of associated Enzyme Commission numbers

engineered¶: indicates that the molecule was produced using recombinant technology or by purely chemical synthesis

fragment¶: specifies a domain or region of the molecule

modified¶: modified residues

mutation¶: indicates presence of a mutation

name¶: name of the polymer (macromolecule)

pdbentry¶: PDB entry that polymer data is extracted from

sequence¶: polymer chain sequence

synonyms¶: list of synonyms for the molecule

class DBRef[source]¶

A data structure for storing reference to sequence databases for polymer components in PDB structures. Information if parsed from DBREF[1|2] and SEQADV records in PDB header.

accession¶: database accession code

database¶: sequence database, one of UniProt, GenBank, Norine, UNIMES, or PDB

dbabbr¶: database abbreviation, one of UNP, GB, NORINE, UNIMES, or PDB

diff¶: list of differences between PDB and database sequences, (resname, resnum, icode, dbResname, dbResnum, comment)

first¶: initial residue numbers, (resnum, icode, dbnum)

idcode¶: database identification code, i.e. entry name in UniProt

last¶: ending residue numbers, (resnum, icode, dbnum)

parsePDBHeader(pdb, *keys)[source]¶

Return header data dictionary for pdb. This function is equivalent to parsePDB(pdb, header=True, model=0, meta=False), likewise pdb may be an identifier or a filename.

List of header records that are parsed.

Record type	Dictionary key(s)	Description
HEADER	classification deposition_date identifier	molecule classification deposition date PDB identifier
TITLE	title	title for the experiment or analysis
SPLIT	split	list of PDB entries that make up the whole structure when combined with this one
COMPND	polymers	see `Polymer`
EXPDTA	experiment	information about the experiment
NUMMDL	n_models	number of models
MDLTYP	model_type	additional structural annotation
AUTHOR	authors	list of contributors
JRNL	reference	reference information dictionary: authors: list of authors title: title of the article editors: list of editors issn: reference: journal, vol, issue, etc. publisher: publisher information pmid: pubmed identifier doi: digital object identifier
DBREF[1\|2]	polymers	see `Polymer` and `DBRef`
SEQADV	polymers	see `Polymer`
SEQRES	polymers	see `Polymer`
MODRES	polymers	see `Polymer`
HELIX	polymers	see `Polymer`
SHEET	polymers	see `Polymer`
HET	chemicals	see `Chemical`
HETNAM	chemicals	see `Chemical`
HETSYN	chemicals	see `Chemical`
FORMUL	chemicals	see `Chemical`
REMARK 2	resolution	resolution of structures, when applicable
REMARK 4	version	PDB file version
REMARK 350	biomoltrans	biomolecular transformation lines (unprocessed)

Header records that are not parsed are: OBSLTE, CAVEAT, SOURCE, KEYWDS, REVDAT, SPRSDE, SSBOND, LINK, CISPEP, CRYST1, ORIGX1, ORIGX2, ORIGX3, MTRIX1, MTRIX2, MTRIX3, and REMARK X not mentioned above.

assignSecstr(header, atoms, coil=False)[source]¶

Assign secondary structure from header dictionary to atoms. header must be a dictionary parsed using the parsePDB(). atoms may be an instance of AtomGroup, Selection, Chain or Residue. ProDy can be configured to automatically parse and assign secondary structure information using confProDy(auto_secondary=True) command. See also confProDy() function.

The Dictionary of Protein Secondary Structure, in short DSSP, type single letter code assignments are used:

G = 3-turn helix (310 helix). Min length 3 residues.

H = 4-turn helix (alpha helix). Min length 4 residues.

I = 5-turn helix (pi helix). Min length 5 residues.

T = hydrogen bonded turn (3, 4 or 5 turn)

E = extended strand in parallel and/or anti-parallel beta-sheet conformation. Min length 2 residues.

B = residue in isolated beta-bridge (single pair beta-sheet hydrogen bond formation)

S = bend (the only non-hydrogen-bond based assignment).

C = residues not in one of above conformations.

See http://en.wikipedia.org/wiki/Protein_secondary_structure#The_DSSP_code for more details.

Following PDB helix classes are omitted:

Right-handed omega (2, class number)

Right-handed gamma (4)

Left-handed alpha (6)

Left-handed omega (7)

Left-handed gamma (8)

2 - 7 ribbon/helix (9)

Polyproline (10)

Secondary structures are assigned to all atoms in a residue. Amino acid residues without any secondary structure assignments in the header section will be assigned coil (C) conformation. This can be prevented by passing coil=False argument.

buildBiomolecules(header, atoms, biomol=None)[source]¶

Return atoms after applying biomolecular transformations from header dictionary. Biomolecular transformations are applied to all coordinate sets in the molecule.

Some PDB files contain transformations for more than 1 biomolecules. A specific set of transformations can be choosen using biomol argument. Transformation sets are identified by numbers, e.g. "1", "2", ...

If multiple biomolecular transformations are provided in the header dictionary, biomolecules will be returned as AtomGroup instances in a list().

If the resulting biomolecule has more than 26 chains, the molecular assembly will be split into multiple AtomGroup instances each containing at most 26 chains. These AtomGroup instances will be returned in a tuple.

Note that atoms in biomolecules are ordered according to chain identifiers.