Re: Detecting clusters of amino-acid replacements

Enrique Jose Labadan Frio - SCBC - 3421101 (g3421101@mahidol.ac.th)
Thu, 8 Jun 1995 20:56:27 +0700 (GMT)

On Wed, 7 Jun 1995 dshields@biotech.bio.tcd.ie wrote:

> I'm interested in whether amino-acid replacements observed
> by aligning proteins from diverged species occur in
> clusters within the 3-D structure of the protein.
>
> If a cluster is startlingly obvious, then it'll be clear to the
> eye. However, it would be nice to apply a rigorous test to detect
> whether it is occurring: my particular interest is to show
> that it is not occurring.

Hello Denis,

Let me just point out that alignments have their limitations and
may or may not give biologically-significant matches/msmatches due to the
results' being skewed by unequal representation of sequences. An excellent
paper by Mark Gerstein et al might help you out, although it approaches
the problem a little bit differently.

Basically, their results of the "clusters" you mentioned reveal
some startling things after sequence alignment and analysis [of the
globins, among others, represented by 568 (!) sequences]:

1) the protein core's small (~2.5 %) variation [those at individual
sites are ~13 %] compared with those expected from random sequence changes
with no correlations between sites show that the small variation observed
may simply be a manifestation of the statistical law of large numbers and
not reflect any compensating changes in, or global constraints upon,
protein sequences;

and,

2) that *****protein sequences are random heteropolymers*****, [emphasis
mine - this might help you prove that "the clusters are not occuring" -
cf, above] which are edited only slightly by evolution, and that the
structural features of known protein folds can accommodate a wide range of
sequences.

For more info:

===========================================
M Gerstein, E Sonnhammer, C Chothia (1994).
"Volume Changes in Protein Evolution,"
J. Mol. Biol. 236: 1067-1078
===========================================

==========================================================================

A method to weight protein sequences to correct for unequal representation

Name of the archive: ProtEvol.tar.Z

ftp Address: ftp://hyper.stanford.edu/pub/mbg/ProtEvol/
ftp://cele.mrc-lmb.cam.ac.uk/pub/ProtEvol/
WWW URL : http://hyper.stanford.edu/~mbg/ftp/ProtEvol/

Contents of the archive:

ProtEvol_Align/ Structure-based alignments of:

ProtEvol_DHFR.seq 24 Dihydrofolate Reductase sequences
ProtEvol_HB.seq 577 Globin sequences
ProtEvol_PLAZ.seq 40 Plastocyanin-azurin sequences

ProtEvol_Paper/ Text and figures of the paper:

ProtEvol_StandardVolumes.txt
Standard Residue Volumes and Frequencies

ProtEvol_TreeWgt/ Program for weighting aligned sequence to
correct for over-representation.

treewgt.SGI Iris binary
treewgt.SUN Sun binary
treewgt.c Program source code

etc.
==================================================================

I hope this helps you out!

Sincerely,

Querix Frio

g3421101@mucc.mahidol.ac.th

PS. How about some "snigglets" in our glossary? :) ie,

Steroisomesmerism: The associated symptoms of slight dizziness and visual
disorientation arising from attempts to cross-eye stereo-view two
side-by-side panels in a journal which are actually different structures.
Usually arises from not reading the legend first :).