> I'm interested in whether amino-acid replacements observed
> by aligning proteins from diverged species occur in
> clusters within the 3-D structure of the protein.
>
> If a cluster is startlingly obvious, then it'll be clear to the
> eye. However, it would be nice to apply a rigorous test to detect
> whether it is occurring: my particular interest is to show
> that it is not occurring.
Hello Denis,
Let me just point out that alignments have their limitations and
may or may not give biologically-significant matches/msmatches due to the
results' being skewed by unequal representation of sequences. An excellent
paper by Mark Gerstein et al might help you out, although it approaches
the problem a little bit differently.
Basically, their results of the "clusters" you mentioned reveal
some startling things after sequence alignment and analysis [of the
globins, among others, represented by 568 (!) sequences]:
1) the protein core's small (~2.5 %) variation [those at individual
sites are ~13 %] compared with those expected from random sequence changes
with no correlations between sites show that the small variation observed
may simply be a manifestation of the statistical law of large numbers and
not reflect any compensating changes in, or global constraints upon,
protein sequences;
and,
2) that *****protein sequences are random heteropolymers*****, [emphasis
mine - this might help you prove that "the clusters are not occuring" -
cf, above] which are edited only slightly by evolution, and that the
structural features of known protein folds can accommodate a wide range of
sequences.
For more info:
===========================================
M Gerstein, E Sonnhammer, C Chothia (1994).
"Volume Changes in Protein Evolution,"
J. Mol. Biol. 236: 1067-1078
===========================================
==========================================================================
A method to weight protein sequences to correct for unequal representation
Name of the archive: ProtEvol.tar.Z
ftp Address: ftp://hyper.stanford.edu/pub/mbg/ProtEvol/
ftp://cele.mrc-lmb.cam.ac.uk/pub/ProtEvol/
WWW URL : http://hyper.stanford.edu/~mbg/ftp/ProtEvol/
Contents of the archive:
ProtEvol_Align/ Structure-based alignments of:
ProtEvol_DHFR.seq 24 Dihydrofolate Reductase sequences
ProtEvol_HB.seq 577 Globin sequences
ProtEvol_PLAZ.seq 40 Plastocyanin-azurin sequences
ProtEvol_Paper/ Text and figures of the paper:
ProtEvol_StandardVolumes.txt
Standard Residue Volumes and Frequencies
ProtEvol_TreeWgt/ Program for weighting aligned sequence to
correct for over-representation.
treewgt.SGI Iris binary
treewgt.SUN Sun binary
treewgt.c Program source code
etc.
==================================================================
I hope this helps you out!
Sincerely,
Querix Frio
g3421101@mucc.mahidol.ac.th
PS. How about some "snigglets" in our glossary? :) ie,
Steroisomesmerism: The associated symptoms of slight dizziness and visual
disorientation arising from attempts to cross-eye stereo-view two
side-by-side panels in a journal which are actually different structures.
Usually arises from not reading the legend first :).