Help for JST

Brief instructions

  1. Load a structure (protein, nucleic acid or a complex of both) either from a local disk or online from the PDB database . You can also load one of the examples provided.
  2. Once the model has finished loading into the Jmol pane, click on the 'prepare' button to have the sequence analysed.
  3. Seq→3D: Hover the mouse pointer over the sequence to see the residues identified. Click on a residue to see it highlighted in the 3D model. Shift+click to focus on the residue in the 3D model.
  4. 3D→Seq: If enabled, click on an atom in the 3D model to have it located in the sequence.
  5. Find: To search for a single residue or a sequence of residues if the sequence listing, type the sequence, choose the chain to search on (or none to search on all chains), and click on the button to run the search.

Sequence listings

Letters

    data in SEQRES
   
yes
no
data in
ATOM
or
HETATM
yes, std
uppercase
[ ]
yes, non-std
x
[x]
yes, hetero
[x]
no
lowercase

 

Numbering

Interaction with the sequence

Search for sequence patterns

3D model

Other

Browser compatibility

JST makes heavy use of JavaScript, the DOM (document object model) and CSS styling. As a consequence, some old browsers do not work well with this tool.

Design details

Combination of sequences from SEQRES and ATOM

In pdb-formatted files, SEQRES records contain the sequence of the macromolecule chains, as reported by the authors. ATOM records, on the other hand, provide the coordinates of each atom together with the residue name and number, so effectively providing with a sequence of residues in the chains.

Although PDB specification says that both sets of records should match, it is common that they do not. This may be due to residues whose coordinates have not been resolved in the X-ray or NMR experiment (crystallographic disorder, physical gap) or to alternate residues found at the same sequence position (sequence microheterogeneity).

JST computes a combination sequence from both sources of information, and highlights the discrepancies using text formatting.

CAVEAT: the algorithm used by JST works reasonably well, but it is not completely trustable, particularly on gaps.
More research into it would be worthy. On the other hand, since we don't need to align two related proteins (as in a generic alignment task), but two sequences for the same one, failures are unlikely to occur.

A few refs. for future or prospective work:

Alignment algorithm

JST is using a rather simplistic implementation of aligment based on Needleman/Wunsch techniques.

Created according to guidelines (as of May 2009) in:

A simple scoring scheme is assumed using

First attempt: values were: 1, 0, 0; produces mismatches sometimes.
Second: 1,-1, 0 avoids mismatches; a bit slow for long sequences. (Later optimization of the code reduced this.)

Particular to our problem, in contrast to a generic alignment method, is that we prefer gaps over mismatches: it has little sense that SEQRES indicates Val and ATOM indicates Leu, for example, while in a generic alignment of two (related) proteins, that would be more expected than a gap.

Extraction of sequence from SEQRES

The SEQRES fields are read from the header section in the pdb file (obtained using Jmol built-in capabilities) and parsed with JavaScript to compile the sequence of each chain.

Extraction of sequence from ATOM

Rather than parsing the text content of the pdb file, the sequence is compiled from the same information used by Jmol in building the 3D model (that is, Jmol internal representation of data in the loaded file).