Primary Structure Comparisons


Statistical Analysis of Protein Sequences.

The sequence composition analysis was done with SAPS run on the NCSA Biology Workbench at University of Illinois - Urbana Champaign.

SAPS Algorithm Citation : Brendel, V., Bucher, P., Nourbakhsh, I.R., Blaisdell, B.E. & Karlin, S. (1992) "Methods and algorithms for statistical analysis of protein sequences" Proc. Natl. Acad. Sci. U.S.A. 89, 2002-2006.


Protein Sequences

INTERLEUKIN 4 (Protein data bank entry code 1rcb)

The human sequence has 129 amino acid residues.

1  HKCDITLQEI IKTLNSLTEQKTLCTELTVT DIFAASKNTT EKETFCRAAT

51 VLRQFYSHHEKDTRCLGATAQQFHRHKQLIRFLKRDLRNLWGLAGLNSCP

101 VKEANQSTLE NFLERLKTIMREKYSKCSS

--------------------------------------------------------------------------------

INTERLEUKIN 2 MUTANT WITH CYS 125 REPLACED BYALA (C125A) (Protein data bank entry code 3ink)

The human sequence with one mutation (C125A) has 133 amino acid residues.

1 APTSSSTKKT QLQLEHLLLD LQMILNGINN YKNPKLTRML TFKFYMPKKA

51 TELKHLQCLEEELKPLEEVL NLAQSKNFHL RPRDLISNIN VIVLELKGSE

101 TTFMCEYADE TATIVEFLNR WITFAQSIIS TLT

--------------------------------------------------------------------------------

GRANULOCYTE-MACROPHAGE COLONY-STIMULATING FACTOR (Protein data bank entry code 1gmf)

The human sequence has 127 amino acid residues.

1 APARSPSPST QPWEHVNAIQ EARRLLNLSR DTAAEMNETV EVISEMFDLQ

51 EPTCLQTRLE LYKQGLRGSL TKLKGPLTMM ASHYKQHCPP TPETSCATQI

101 ITFESFKENL KDFLLVIPFD CWEPVQE

--------------------------------------------------------------------------------

Compositional Analysis

[Image]

The amino acid composition make-up is very similar for the 3 proteins. A significant difference is the number of prolines where GMCSF has the most (9) and IL4 has the least (only 1).


Summary of Amino Acid Composition Analysis

IL4

IL2

GMCSF

Gly

3

(2.3%)

2

(1.5%)

3

(2.4%)

Ala

8

(6.2%)

6

(4.5%)

8

(6.3%)

Pro

1

(0.8%)

5

(3.8%)

11

(8.7%)

charged

(KRED)

35

(27.1%)

30

(22.5%)

29

(22.8%)

hydrophobic

(LVIFMW)

33

(25.6%)

46

(34.6%)

35

(27.5%)

hydrophilic

(SCHNTY)

49

(38.0%)

44

(33.1%)

41

(32.3%)

All three proteins have very similar number of Glycines and Alanines. On the other hand, GMCSF has significantly more Prolines than the ILs, IL4 in particular, which has only 1 Proline. The positions of these Prolines within the GMCSF structure are discussed below(see the Tertiary Structure Section). In summary, the three proteins have very similar composition of charged and hydrophilic residues. IL2 has about 10% more hydrophobic residues than IL4 and GMCSF.


Sequence Alignment

The alignment was done with MSA run on the NCSA Biology Workbench at University of Illinois - Urbana Champaign.

MSA Algorithm Citation :

1. D. Lipman, S. Altschul and J. Kececioglu, "A Tool for Multiple Sequence Alignment", Proc. Natl. Acad. Sci. USA 86 (1989) 4412-4415.

2. S. K. Gupta, J. Kececioglu, and A. A. Schaffer, "Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment, J. Computational Biology, to appear. [Note: Our original title was "Making the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment More Space Efficient in Practice" and an extended abstract with the original title will appear in Proc. 6th Annual Combinatorial Pattern Matching conference (CPM '95).]

The sequence alignment reveals several "conserved" hydrophobic residues (mostly Leucine and 1 Phenylalanine), 1Aspartic and 1Threonine. The "conserved" Leucines and Phenylalanine are probably buried inside the protein to form the hydrophobic core, whereas the Aspartic and Threonine is not buried and likely to be on the surface. The location of these residues and their corresponding structural environment will be discussed in the Tertiary Structure Section.

IL-4

IL-2

GMCSF

27L

25L

26L

60E

61E

51E

66L

66L

55L

90L

85L

73L

108T

113T

102T

112F

117F

106F


Content
Introduction
Secondary
Tertiary


Kingman Ng  

October 25, 1996.