Index to Course Material Index to Section 2
From the size of the number above (how does it compare with the number of particles in the known universe?), it is apparent that only a fraction of possible primary structures exist (or have ever existed).
As a corollory it is very unlikely that two proteins with similar amino acid sequences have independently evolved. Such similarities therefore indicate that the two proteins must be related and share a common ancestor. Related proteins are termed homologous.
Over evolutionary time-spans, proteins mutate: i.e. their primary structure becomes altered, generally by one amino acid at a time (although more drastic single modifications can also occur). Such alterations are caused by mutations in the genes (linear sequences of nucleotides) which encode them. The storage of genetic information, and how it is translated into protein primary structure, is covered in a later section of this course.
Not only point mutations (the substitution of one amino acid for another) occur; a protein sequence may lose some of its amino acids (deletion mutation) or have amino acids inserted (insertion mutation).
If two primary sequences are more than approximately 20% identical (making reasonable allowance for insertions and deletions) then they are assumed to be homologous.
The fact that two sequences which are to be compared may be of different lengths, and the need to allow for deletions and insertions, makes the optimal alignment (that alignment which gives the closest match, i.e. the smallest number of differences) of the sequences a difficult task.
Generally, a particular type of protein has the same, or a very similar sequence within one species of organism. However there are cases of polymorphism, where several different functional sequences exist for a given type of protein within the population.
If the differences between two homologous species are examined, a general tendency is observed for chemically similar amino acid residues to be found at the same position. The substitution, for example, of one acidic residue (e.g. Glu for Asp) is likely to be of less consequence to the interactions with nearby residues than would the substitution of Glu for Val, a hydrophobic residue.
This tendency is summarized in this Dayhoff Matrix.
Mutations to dissimilar residues are more likely to lead to the 3D conformation being less stable, or even to the inability of the mutant polypeptide chain to ever fold. In such cases, the function of the protein is therefore impaired or disabled, which is likely to disadvantage the organism to some degree or other. The result is that such mutations tend to be lost from the population; they are selected against, while the 'neutral', or even advantageous, mutations persist (are 'fixed'). Of course, some mutations would be expected to be favourable, by altering the 3D structure such that it functions more efficiently.
Consequently, homologous proteins have similar 3D structures- the differences in primary structure do not result in a drastic rearrangement of the folded conformation. If they did, they would in most cases disappear from the population.
In the same way that different primary structures give rise to similar three-dimensional folds, different gene sequences can result in the same primary structures, as will become clear later. Thus, a relative scale of conservation of structure is as follows:
genes < protein primary structure < three-dimensional protein structure
The semiempirical Dayhoff matrix of similarity indices can be used in the alignment of two sequences, in order to detect homology. In such an alignment, the substitution of an amino acid by a 'dissimilar' residue incurs a larger penalty than the alignment of two 'similar' residues.
In the above case, a sub-sequence of the LDL-receptor is homologous to a sub-sequence of C9, and a different sub-sequence of the LDL-receptor is homologous to a sub-sequence of EGF. Strictly, two sequences, or two subsequences, may be either homologous, or they may be non-homologous; there are degrees of similarity between sequences, but no degrees of homology.
The bacterial and mammalian lines diverged from a common ancestor very early in evolution and this is een as evidence for convergent evolution. On the other hand, for some proteins similarities between mammalian sequences have been found to be less than with corresponding proteins from a different class (for example, the hormone relaxin). This is seen as evidence for the swapping of functional "microdomains" very early on in evolution, and challenges the current paradigm (Schwabe,C., 1986 Trends in Biochemical Sciences 11 280-283).
In practice, primary structure is in fact more easily determined by interpreting a gene sequence of nucleotides (with reference to the genetic code), if it is known, than directly from a purified protein itself. The genetic code, and its translation, will be examined in a later section of the course.
The recent advances in recombinant DNA technology have led to an explosion in the number of gene sequences from many organisms. Analysis of these sequences to determine if any are homologous to sequences of known structure allows prediction of possible structure/functions.
VSNS-BCD Homepage and Hypertext Coursebook.
The principles of sequence alignment apply not only to protein sequences but also to nucleotide
sequences of DNA and RNA. You should be aware of the aims of pairwise alignment, and of
multiple alignment (optimal alignment of more than 2 sequences); in addition you should
understand the principle of 'evolutionary distances' between homologous proteins, which can be
calculated from differences in their sequence. We shall be returning to the subject of sequence
analysis, and the databases involved, later on in the course.
Last updated 28th Jan '96