Help for JST

Sequence listings

Letters

Numbering

Interaction with the sequence

Search for sequence patterns

3D model

Other

Design details

Combination of sequences from SEQRES and ATOM

In pdb-formatted files, SEQRES records contain the sequence of the macromolecule chains, as reported by the authors. ATOM records, on the other hand, provide the coordinates of each atom together with the residue name and number, so effectvely providing with a sequence of residues in the chains.

Although PDB specification says that both sets of records should match, it is common that they do not. This may be due to residues whose coordinates have not been resolved in the X-ray of NMR experiment (crystallographic disorder, physical gap) or to alternate residues found at the same sequence position (sequence microheterogeneity).

JST computes a combination sequence from both sources of information, and highlights the discrepancies using text formatting.

CAVEAT: the algorithm used by JST works reasonably well, but it is not completely trustable, particularly on gaps.
More research into it would be worthy. On the other hand, since we don't need to align two related proteins (as in a generic alignment task), but two sequences for the same one, failures are unlikely to occur.

A few refs. for future or prospective work:

Alignment algorithm

A rather simplistic implementation of aligment based on Needleman/Wunsch techniques.

Created according to guidelines in (May 2009):

A simple scoring scheme is assumed using

First attempt: values were: 1, 0, 0 - produces mismatches sometimes.
Second: 1,-1, 0 avoids mismatches; a bit slow for long sequences. (Later optimization of the code reduced this.)

Particular to our problem, against a generic alignment method, is that we prefer gaps over mismatches: it has little sense that SEQRES indicates Val and ATOM indicates Leu, for example, while in a generic alignment of two (related) proteins, that would be more expected than a gap.

Extraction of sequence from SEQRES

The SEQRES fields are read from the header section in the pdb file (obtained using Jmol built-in capabilities) and parsed with JavaScript to compile the sequence of each chain.

Extraction of sequence from ATOM

Rather than parsing the text content of the pdb file, the sequence is compiled from the same information used by Jmol in building the 3D model (that is, Jmol internal representation of data in the loaded file).