raj@grserv.med.jhmi.edu
The information supplied in this document is believed to be true but no liability is assumed for its use or for the infringements of the rights of the others resulting from its use.
This package is distributed without any conditions. It may be lent, re-sold, hired out or otherwise circulated without the supplier's prior consent, in any form of packaging or cover. Any part of this manual or accompanying software may be reproduced, stored in a retrieval system on optical or magnetic disk, tape or any other medium, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise for any purpose..
The commands/keywords currently recognised by DANSSR are given below.
Sequence Code Restrain
Measure Print Output
Coords Unix Source
Search Quit
Syntax: sequence {<"string">}
The DANSSR sequence command permits the input of a polypeptide sequence as a series of one letter codes. The codes representing each amino acid are read from a file called match.sets The file should exist in one of three places: a) the directory from which the program was launched b) in the users home directory c) in the directory pointed to by the variable $DANNSR_DIR A typical e.g.,of a file is as follows:
A A C C D D E E F F G G H H I I K K L L M M N N P P Q Q R R S S T T V V W W Y Y 0 ARNDCQEGHILKMFPSTWYV 1 ARNDCQEGHILKMFSTWYV 2 ARNDCQEHILKMFSTWYV 3 CILMFWV 4 STNDEQ 5 AKRHY 6 STNDEQKRHY a STND
The first twenty codes represent the 20 natural amino-acids The code 0 represents any amino acid, 1 - any amino acid but Proline and so on. The user may edit this file to suit their own requirements. The code should be only one character long. It is a good idea to not alter the first twenty codes in the above file. Following is a list of examples. The command:
sequence "AAAA"
searches for a four residue sequence where all positions are A.
sequence "A[STND]AAA"
searches for a four residue sequence where the second position is one of S,T,N or D while the rest are A.
sequence "0aa0"
searches for a four residue sequence where the second and third positions are one of S,T,N or D while the rest are any residue.
Note the quotes around the query sequence. This is necessary.
Syntax: code {<"string">}
The DANSSR code command is used to limit the phi,psi range of each residue in the query sequence to a specific region of the Ramachandran Map. Depending on the database being used various code schemes may be employed.
Three different databases are supplied. In the first, case the phi,psi region of the residue is marked as H,T,E or X depending on whether the residue is in a helix, turn, sheet or coil. The secondary structure was assigned from an automated examination of backbone-backbone hydrogen bonding patterns. The second flavor of the database assigns a single letter mnemonic for the every region of the phi,psi map following the nomencalature of Zimmermann and Scheraga (PNAS .....). A complete description of the codes is as follows:
CODE PHI-Range Psi-Range A -110 to -40 -90 to -10 B -110 to -40 -10 to 50 C -110 to -40 50 to 130 F -110 to -40 130 to 180 & -180 to -140 E -180 to -110 -180 to -140 & 110 to 180 G -180 to -110 -90 to -40 B -180 to -110 -40 to 10 D -180 to -110 10 to 110 a 110 to 40 90 to 10 b 110 to 40 10 to -50 c 110 to 40 50 to -130 f 110 to 40 -130 to -180 & 180 to 140 e 180 to 110 180 to 140 & -110 to -180 g 180 to 110 90 to 40 b 180 to 110 40 to -10 d 180 to 110 -10 to -110 H anything not above.The third flavor is related to the second method. In this case the phi,psi map is divided into 42 equal 60 deg. x 60 deg. bins for e.g. -180,180 -180,120 -180,60 -180,0 and so on, with each bin having an associated code. The description of the codes is:
CODE PHI-Range Psi-Range A -180 +/- 30 -180 +/- 30 B -180 +/- 30 -120 +/- 30 C -180 +/- 30 -60 +/- 30 D -180 +/- 30 0 +/- 30 E -180 +/- 30 60 +/- 30 F -180 +/- 30 120 +/- 30 A -180 +/- 30 180 +/- 30 G -120 +/- 30 -180 +/- 30 H -120 +/- 30 -120 +/- 30 I -120 +/- 30 -60 +/- 30 J -120 +/- 30 0 +/- 30 K -120 +/- 30 60 +/- 30 L -120 +/- 30 120 +/- 30 G -120 +/- 30 180 +/- 30 M -60 +/- 30 -180 +/- 30 N -60 +/- 30 -120 +/- 30 O -60 +/- 30 -60 +/- 30 P -60 +/- 30 0 +/- 30 Q -60 +/- 30 60 +/- 30 R -60 +/- 30 120 +/- 30 M -60 +/- 30 180 +/- 30 S 0 +/- 30 -180 +/- 30 T 0 +/- 30 -120 +/- 30 U 0 +/- 30 -60 +/- 30 V 0 +/- 30 0 +/- 30 W 0 +/- 30 60 +/- 30 X 0 +/- 30 120 +/- 30 S 0 +/- 30 180 +/- 30 m 60 +/- 30 -180 +/- 30 r 60 +/- 30 -120 +/- 30 q 60 +/- 30 -60 +/- 30 p 60 +/- 30 0 +/- 30 o 60 +/- 30 60 +/- 30 n 60 +/- 30 120 +/- 30 m 60 +/- 30 180 +/- 30 g 120 +/- 30 -180 +/- 30 l 120 +/- 30 -120 +/- 30 k 120 +/- 30 -60 +/- 30 j 120 +/- 30 0 +/- 30 i 120 +/- 30 60 +/- 30 h 120 +/- 30 120 +/- 30 g 120 +/- 30 180 +/- 30 a 180 +/- 30 -180 +/- 30 f 180 +/- 30 -120 +/- 30 e 180 +/- 30 -60 +/- 30 d 180 +/- 30 0 +/- 30 c 180 +/- 30 60 +/- 30 b 180 +/- 30 120 +/- 30 a 180 +/- 30 180 +/- 30Specifying the code for a query sequence is analogous to specifying the sequence itself. For e.g., code "HHHH" constrains all four residues to have the sequence code "H" code "H0HH" constrains the first,third and fourth residues to be in the H conformation while the second residue may adopt any conformation code"H[HE]HH" constrains the first,third and fourth residues to be in the H conformation while the second residue may adopt either the H or E conformation and so on.
Syntax: restrain [option] [target] [tolerance]
The DANSSR restrain command is used to specify various distance,angle and torsion restraints in the search. In addition the following pre-defined torsional restraints are available:
phi,psi,omega
These latter restraints are specified as follows:
restrain option residue-number target-value tolerance
where option is one of phi,psi,omega
target-value is the desired value ( e.g. 180.0)
and tolerance is the permissible variation from the target value ( e.g. 30.0)
Thus, the command
restrain phi 2 -60 30
constrains the phi value of the 2nd residue in the input sequence to be -60 +/- 30 degrees.
The command
restrain phi all -60 30
constrains the phi value of the all residues in the input sequence to be -60 +/- 30 degrees.
In addition atom based constraints may also be specified. Specifically, the distance between any two atoms, the angle between any three atoms, and the torsion between any four atoms may be defined.
Thus the command
restrain distance 1 O 5 N 2.0 3.5
constrains the distance between atom O of residue 1 and atom N of residue 5 to be between 2 and 3.5 Angstroms. The atom names stored in the database follow the Brookhaven convention strictly.
The command
restrain angle 1 O 5 N 5 CA 120 30
constrains the angle between atom O of residue 1 , atom N of residue 5 and atom CA of residue 5 to be 120 +/- 30 degrees.
Similarly, the command
restrain torsion 1 O 5 N 5 CA 4 C 180 30
constrains the dihedral angle between atom O of residue 1 , atom N of residue 5 ,atom CA of residue 5 and atom C of residue 4 to be 180 +/- 30 degrees.
Any query can have any combination of all or few of the restraints. The limit on the individual number of torsions, distances and angles is currently however limited to 20 of each. Specification of all three restraints listed above will recover sequences containing a hydrogen bond between residues 1 and 5, for example.
Syntax: measure [option]
The DANSSR measure command is used to measure distances,angles and torsions between atoms. Measurements of phi, psi, omega, chi1, chi2, chi3, chi4 and chi5 should not be specified with this option. These are accessible through the Print option.
Thus, the command
measure distance 1 O 5 N
measures the distance between atom O of residue 1 and atom N of residue 5.
The command
measure angle 1 O 5 N 5 CA
measures the angle between atom O of residue 1 , atom N of residue 5 and atom CA of residue 5.
Similarly, the command
measure torsion 1 O 5 N 5 CA 4 C
measures the dihedral angle between atom O of residue 1 , atom N of residue 5 ,atom CA of residue 5 and atom C of residue 4.
The limit on the individual number of torsions, distances and angles to be measured is currently limited to 20 of each. All mesurements are listed to the Output file.
Syntax: print [options]
The DANSSR print command is used to control the level of printing to the output file. Only the matched sequence is always printed by default. Additionally, when the measure command is part of the query, all measurements are printed. Additional information, if required, can be accessed through the print command. The following opitons are available:
phi - prints the phi torsion value of each residue
psi - prints the psi torsion value of each residue
omega - prints the omega torsion value of each residue
chi1 - prints the chi2 torsion value of each residue
chi2 - prints the chi2 torsion value of each residue
chi3 - prints the chi3 torsion value of each residue
chi4 - prints the chi4 torsion value of each residue
chi5 - prints the chi5 torsion value of each residue
code - prints the conformational code of each residue
coord - prints the coordinates of all atoms in each residue
to a file in pdb format
resarea - prints the solvent accessible surface areas
and the normalized areas of each residue
For e.g. the command
print phi psi omega codes
writes out the phi,psi etc. values. All options may be given on the same line or on separate/multiple lines and in any combination.
Syntax: output to filename
The DANSSR output option is used to specify the file to which results are to be written. The filename may not be longer than 128 characters. Any file in the current directory with the same name will be overwritten.
e.g. output to test1.dat
Syntax: coords to filename
The DANSSR coords option is used to specify the file to which coordinates are to be written. The filename may not be longer than 128 characters. Any file in the current directory with the same name will be overwritten.
e.g. output to test1.pdb
This command is necessary only if printing of coordinates has been requested.
Syntax: unix command
The DANSSR unix option is used to pass a command to the system. Any valid unix command may be passed. Some examples are
unix more myfile
unix /bin/sh
Remember that user defined aliases are not recognized. This can have disastorous consequences with commands such as rm , mv etc.
Syntax: source filename
The DANSSR source command is used to read in commands from a file. Any valid command is processed and action taken. Each session also produces a log file called danssr.log which can be edited, renamed and subsequently sourced.
Syntax: search
The DANSSR search command performs the actual search. If a sequence has not been defined before issuing this command, the user will be prompted for a sequence. Similarly it output filename(s) have not been specified, the user is prompted.
Syntax: quit
The DANSSR quit command ends the session.
The main data for the program, such as sequences, coordinates etc. comes from a file called danssr.db. Currently files for two different datasets are supplied. One contains 43 Chains and the other 274 chains. The files are respectively, database.43 and database.274. Additionally there are also files called database_zs.43 and database_rc.43 which are identical to database.42 except the secondary structure column (field 3 of the residue description line) contains codes assigned using the zimmermann & Scheraga nomenclature and the Rational Codes formalism ( look under the command Code for explanations) respectively. Similarly suffixed files are available for the 274 Chain database also.
On startup of the program, a search is made for the location of the files in the following three areas hierarchically:
i) current working directory
ii) users HOME directory &
iii) the directory pointed to by the variable $DANSSR_DIR
If the search is successful the files are used. If not, the user is prompted for the file to use. Supply the filename with full path information.
A brief description of the database is as follows. For each protein the first line is a five letter code ( four letter pdb code and chain name if any)
For each residue in each protein there is :
ii) one line for each atom in the residue containing its name, x, y & z coordinates, and accessible area.
dbmake "lstfil" "dbname"
where lstfil contains the list of files corresponding to each protein to be added to the database. List each file on a separate line, and give full path.
dbname is the name of the database file to create.
Since these files are text files, different databases may be combined ( for e.g. using the cat command in unix ) to produce a single larger database. The programs makerc and makezs read in a database produce by dbmake and replace the secondary structure colums with rational codes and Zimmermann-Scheraga codes respectively.
Searching for a Sequence
To search for the sequence Ser-X-X-Glu where X is any amino acid, and to print out the phi,psi,and omega values and the conformational codes, the following commands are used. The output is saved to the file sxxe.seq
sequence "S00E"
print phi psi omega code
output to sxxe.seq
search
Searching for sequence with specified backbone conformation
There are two different ways of specifying the backbone conformation.
Let us search for the sequence Ser-X-X-Glu where X is any amino acid and specify that the residues X,X,Glu should be in an alpha-helical conformation.
Method I:
Let us assume that the alpha-helical conformation is characterized by a phi value of -65 +/- 20 and a psi value of -40 +/- 20
sequence "S00E"
print phi psi omega code
output to sxxe.seq
restrain phi 2 -65 20
restrain phi 3 -65 20
restrain phi 4 -65 20
restrain psi 2 -40 20
restrain psi 3 -40 20
restrain psi 4 -40 20
search
Method II:
In this case we will specify the conformational codes for the residues. The codes will depend on the database being used. In this example it is assumed that the database being used is the one with the rational codes (O corresponds to the region -60+/-30,-60+/-30)
sequence "S00E"
print phi psi omega code
output to sxxe.seq
code "0OOO"
search