|
table of contents
malign file_of_seqs will align the sequence input file_of_seqs using the default matrix and penalties and send to the standard output all intermediate alignments, the final alignment and summary statistics. the output is directed to the screen. similarly, malign file_of_seqs -m1 -p45 -o > output matrix 1 is specified (the dayhoff 250 pam matrix), a gap penalty of 45, and the output of each of the pairwise alignments is requested. these options to malign are described below. if no file_of_seqs is specified on the command line to malign, a standard error message is relayed that gives a brief description of the input required and the default values assumed if no arguements besides the file_of_seqs is given. typically the output is directed to the format program mformat(1) which will then display the alignment. all intermediate or only the final alignment may be displayed at 50 residues per line. mformat 50 < output | more
or the following options to malign are available:
-seg=# -m# you have the option of choosing between ten different scoring matrices. if unspecified, the default value of matrix (0) is the birkbeck structure-based matrix [see: johnson & overington (1993). j. mol. biol., 232, in press]. the present matrix (birkbeck97) is derived from the analysis if 97 protein structure families; 381 structures. -p# you have the option of specifying the penalty to be assessed for the presence of any gaps in the alignment. any decimal value may be input. the default value for the default matrix can be obtained by typing malign without any arguements. -o display all intermediate binary alignments.
>output
all pairwise alignments are made, a binary tree is constructed from distances d between all pairs of sequences as estimated from the normalized similarity scores (s): d = 100.0 log s where s is calculated as s = s{a,b}/0.5 ( s{a,a} + s {b,b} ). the multiple alignment is then constructed as the tree is constructed from the matrix of distance scores. in general, the most similar (small distances) sequences are aligned first; the most distance sequences last. for a tree (((a b) c)(d e)), ab would be aligned, de aligned, c aligned to the result of ab and finally the result of de with abc. the summary results are the pairwise percentage sequence identities and the normalized alignment score. the percentage identities are calculated as the number of identically matched residues divided by the length of the smaller sequence. the normalized alignment score is (s): s = 100.0 s{a,b}/0.5 ( s{a,a} + s{b,b} ). that is the real alignment score is divided by the average of the maximum possible alignment score for each of the individual sequences a and b.
|