table of contents

name

malign - pairwise and multiple sequence alignment program

synopsis

malign file_of_seqs [-seg=#] [-m#] [-p#] [-o] [>output]

description

malign aligns sequences of amino acids in pir(1) format. the most common usage,

malign file_of_seqs

will align the sequence input file_of_seqs using the default matrix and penalties and send to the standard output all intermediate alignments, the final alignment and summary statistics. the output is directed to the screen. similarly,

malign file_of_seqs -m1 -p45 -o > output

matrix 1 is specified (the dayhoff 250 pam matrix), a gap penalty of 45, and the output of each of the pairwise alignments is requested. these options to malign are described below.

if no file_of_seqs is specified on the command line to malign, a standard error message is relayed that gives a brief description of the input required and the default values assumed if no arguements besides the file_of_seqs is given.

typically the output is directed to the format program mformat(1) which will then display the alignment. all intermediate or only the final alignment may be displayed at 50 residues per line.

mformat 50 < output | more

or
malign file_of_seqs | mformat 50 > output

the following options to malign are available:

-seg=#
you can specify a segment length for smoothing of the matrix from which the alignment is traced from; this increases the calculation times by quite a bit. try a value of 6 for starters. not advisable for large numbers of sequences.

-m# you have the option of choosing between ten different scoring matrices. if unspecified, the default value of matrix (0) is the birkbeck structure-based matrix [see: johnson & overington (1993). j. mol. biol., 232, in press]. the present matrix (birkbeck97) is derived from the analysis if 97 protein structure families; 381 structures.

-p# you have the option of specifying the penalty to be assessed for the presence of any gaps in the alignment. any decimal value may be input. the default value for the default matrix can be obtained by typing malign without any arguements.

-o display all intermediate binary alignments.

>output
if the output is not specified the ouput will be directed to the screen (or standard output).

description of method

the alignment is iteratively made using the philosophy of feng and doolittle (1987) and the algorithm of fredman (1984) for fast pairwise alignments.

all pairwise alignments are made, a binary tree is constructed from distances d between all pairs of sequences as estimated from the normalized similarity scores (s): d = 100.0 log s where s is calculated as s = s{a,b}/0.5 ( s{a,a} + s {b,b} ).

the multiple alignment is then constructed as the tree is constructed from the matrix of distance scores. in general, the most similar (small distances) sequences are aligned first; the most distance sequences last. for a tree (((a b) c)(d e)), ab would be aligned, de aligned, c aligned to the result of ab and finally the result of de with abc.

the summary results are the pairwise percentage sequence identities and the normalized alignment score. the percentage identities are calculated as the number of identically matched residues divided by the length of the smaller sequence. the normalized alignment score is (s): s = 100.0 s{a,b}/0.5 ( s{a,a} + s{b,b} ). that is the real alignment score is divided by the average of the maximum possible alignment score for each of the individual sequences a and b.

files

matricesxx.h - scoring matrices scaled between 0 and 100 (similarity) malign.h - include file

see also

mformat(1), pir(1), sprof(1).

source

written by msj; fredman alignment as interpreted by d-f feng.

level

this page describes those features in malign 3.0 and later. %m% %s% %j% - copyright (c) 1990 kram enterprises, inc.


table of contents

CCSG Comments,Corrections, Changes Departmental Homepage