Medical Systems Biology - 2. Running Swat

2. Running Swat

change into the directory, where you unpacked swat and use the following command line to run swat:

java -jar swat.jar protein.fasta nucleotide.fasta

Important: The files of the protein sequence and the nucleotide sequence have to be in fasta-format and must end with ".fasta". The protein sequence has to be followed by the nucleotide sequence.

Parameters and defaults:

to control the alignment, the following parameters can be set. If not explicitly specified, the default will be used.

Parameter	Meaning	Range	Default
g=	penalty for starting a gap	-10000 to -1	-10
e=	penalty for extending a gap	-10000 to -1	-1
s=	penalty for a single frameshift	-10000 to -1	-20
d=	penalty for a double frameshift	-10000 to -1	-40
p=	penalty for a stop codon mismatch	-10000 to -1	-4
m=	scoring matrix (path to matrix-file)	-	BLOSUM62
ng	no gaps are allowed in the alignment	true / false	false
na	no affine gap scoring is used	true / false	false
nf	no frame shift mutations are allowed	true / false	false
w	use the worst case calculation for wild bases	true / false	false
o=	define a file to save the found mutations in a file in json format	-	-

Examples:

java -jar swat.jar prot.fasta nucl.fasta g=-12 e=-2 s=-15 d=-30 m=matrices\PAM250 w

= align prot.fasta and nucl.fasta with a gap open penalty of 12 and gap extend penalty of 2. framshifts are scored with -15 and -30. As scoring matrix the PAM250 matrix is used in the folder "matrices". Also a worst case calculation is used for the wild bases.

java -jar swat.jar prot.fasta nucl.fasta ng nf o=mutas.json

= perform an alignment without gaps or frameshifts and save the mutations in the file mutas.json

Output:

the commandline output contains the identifier of the aligned sequences, the length of the squences, a list of the used parameters, the alignment score (Smith-Waterman score), the start and the end positions of the local alignment and the length of the alignment.

Also a detailed alignment is displayed.

swat alignment protocol

Explanation of the Markers in the detailed output:

\|\|\|	=	exact match of an amino acid and a nucleotide triplet
+++	=	positive match of an amino acid and a nucleotide triplet
***	=	mismatch / replacement of an amino acid and a nucleotide triplet
III	=	insertion of a codon in the nucleotide sequence
DDD	=	deletion of a codon in the nucleotide sequence
\|-x	=	frameshift deletion at position x
-xy	=	doubleframeshift deletion at position x and y
\|\|i	=	frameshift insertion at position 3 of a codon
\|i\|	=	frameshift insertion at position 2 of a codon
i\|\|	=	frameshift insertion at position 1 of a codon
\|ii	=	frameshift insertion at position 2 and 3 of a codon
i\|i	=	frameshift insertion at position 1 and 3 of a codon
ii\|	=	frameshift insertion at position 1 and 2 of a codon

Finally informations about the amount of identical matches, positve matches, gaps, mutations and wild bases is displayed.