Primary Structure Comparisons
Statistical Analysis of Protein Sequences.
The sequence composition analysis was done with SAPS run on the NCSA Biology Workbench at University of Illinois - Urbana Champaign.
SAPS Algorithm Citation : Brendel, V., Bucher, P., Nourbakhsh, I.R., Blaisdell, B.E. & Karlin, S. (1992) "Methods and algorithms for statistical analysis of protein sequences" Proc. Natl. Acad. Sci. U.S.A. 89, 2002-2006.
Protein Sequences
INTERLEUKIN 4 (Protein data bank entry code 1rcb)
The human sequence has 129 amino acid residues.
1 HKCDITLQEI IKTLNSLTEQKTLCTELTVT DIFAASKNTT EKETFCRAAT
51 VLRQFYSHHEKDTRCLGATAQQFHRHKQLIRFLKRDLRNLWGLAGLNSCP
101 VKEANQSTLE NFLERLKTIMREKYSKCSS
--------------------------------------------------------------------------------
INTERLEUKIN 2 MUTANT WITH CYS 125 REPLACED BYALA (C125A) (Protein data bank entry code 3ink)
The human sequence with one mutation (C125A) has 133 amino acid residues.
1 APTSSSTKKT QLQLEHLLLD LQMILNGINN YKNPKLTRML TFKFYMPKKA
51 TELKHLQCLEEELKPLEEVL NLAQSKNFHL RPRDLISNIN VIVLELKGSE
101 TTFMCEYADE TATIVEFLNR WITFAQSIIS TLT
--------------------------------------------------------------------------------
GRANULOCYTE-MACROPHAGE COLONY-STIMULATING FACTOR (Protein data bank entry code 1gmf)
The human sequence has 127 amino acid residues.
1 APARSPSPST QPWEHVNAIQ EARRLLNLSR DTAAEMNETV EVISEMFDLQ
51 EPTCLQTRLE LYKQGLRGSL TKLKGPLTMM ASHYKQHCPP TPETSCATQI
101 ITFESFKENL KDFLLVIPFD CWEPVQE
--------------------------------------------------------------------------------
Compositional Analysis
The amino acid composition make-up is very similar for the 3 proteins. A significant difference is the number of prolines where GMCSF has the most (9) and IL4 has the least (only 1).
Summary of Amino Acid Composition Analysis
IL4 |
IL2 |
GMCSF | |
Gly |
3 (2.3%) |
2 (1.5%) |
3 (2.4%) |
Ala |
8 (6.2%) |
6 (4.5%) |
8 (6.3%) |
Pro |
1 (0.8%) |
5 (3.8%) |
11 (8.7%) |
charged (KRED) |
35 (27.1%) |
30 (22.5%) |
29 (22.8%) |
hydrophobic (LVIFMW) |
33 (25.6%) |
46 (34.6%) |
35 (27.5%) |
hydrophilic (SCHNTY) |
49 (38.0%) |
44 (33.1%) |
41 (32.3%) |
All three proteins have very similar number of Glycines and Alanines. On the other hand, GMCSF has significantly more Prolines than the ILs, IL4 in particular, which has only 1 Proline. The positions of these Prolines within the GMCSF structure are discussed below(see the Tertiary Structure Section). In summary, the three proteins have very similar composition of charged and hydrophilic residues. IL2 has about 10% more hydrophobic residues than IL4 and GMCSF.
The alignment was done with MSA run on the NCSA Biology Workbench at University of Illinois - Urbana Champaign.
MSA Algorithm Citation :
1. D. Lipman, S. Altschul and J. Kececioglu, "A Tool for Multiple Sequence Alignment", Proc. Natl. Acad. Sci. USA 86 (1989) 4412-4415.
2. S. K. Gupta, J. Kececioglu, and A. A. Schaffer, "Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment, J. Computational Biology, to appear. [Note: Our original title was "Making the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment More Space Efficient in Practice" and an extended abstract with the original title will appear in Proc. 6th Annual Combinatorial Pattern Matching conference (CPM '95).]
The sequence alignment reveals several "conserved" hydrophobic residues (mostly Leucine and 1 Phenylalanine), 1Aspartic and 1Threonine. The "conserved" Leucines and Phenylalanine are probably buried inside the protein to form the hydrophobic core, whereas the Aspartic and Threonine is not buried and likely to be on the surface. The location of these residues and their corresponding structural environment will be discussed in the Tertiary Structure Section.
IL-4 |
IL-2 |
GMCSF |
27L |
25L |
26L |
60E |
61E |
51E |
66L |
66L |
55L |
90L |
85L |
73L |
108T |
113T |
102T |
112F |
117F |
106F |
Content |
---|
Introduction |
---|
Secondary |
---|
Tertiary |
---|
Kingman Ng
October 25, 1996.