See also the comprehensive resource put together by Ethan Benatan and Cornelius Krasel for Assignment 2.
A Tutorial on the Amino Acids, by Sami Raza
A set from Alan Ward, Dept.Microbiology, University of Newcastle upon Tyne, UK
Adenosine tri-phosphate (ATP) is perticularly important, being the main biochemical storage compound, entering into many enzymic reactions to provide energy (which usually comes from the cleavage of a phosphate moiety to produce ADP).
A,G,C,T, (or U in RNA) are used as single letter abbreviations for these bases, especially in the sequences of DNA (and RNA).
Alanine Ala A Cysteine Cys C Aspartic AciD Asp D Glutamic Acid Glu E Phenylalanine Phe F Glycine Gly G Histidine His H Isoleucine Ile I Lysine Lys K Leucine Leu L Methionine Met M AsparagiNe Asn N Proline Pro P Glutamine Gln Q ARginine Arg R Serine Ser S Threonine Thr T Valine Val V Tryptophan Trp W TYrosine Tyr Y
First Second Position Third
Position ------------------------------------ Position
| U(T) C A G |
U(T) Phe Ser Tyr Cys U(T)
Phe Ser Tyr Cys C
Leu Ser STOP STOP A
Leu Ser STOP Trp G
C Leu Pro His Arg U(T)
Leu Pro His Arg C
Leu Pro Gln Arg A
Leu Pro Gln Arg G
A Ile Thr Asn Ser U(T)
Ile Thr Asn Ser C
Ile Thr Lys Arg A
Met Thr Lys Arg G
G Val Ala Asp Gly U(T)
Val Ala Asp Gly C
Val Ala Glu Gly A
Val Ala Glu Gly G
Note that in most cases sufficient coding is performed by the first two bases, the third (or wobble) base playing a minor role.
Note also the STOP codons, which cause termination of translation by the ribosome.
Different organisms exhibit different statistical preferences of triplet codon usage, as well
as using the amino acids in widely varying proportions. See
However, you should appreciate that there are three possible reading frames which may be used, each one base out of step with the others, each of which may give a believable stretch of protein sequence, thus :-
5'-acacggctgaccgatgctagaccccatagtcgcgctatatgctcgaacttgttaa-3'
may code for
5'-acacggctgaccgatgctagaccccatagtcgcgctatatgctcgaacttgttaa-3'
ThrTrpLeuSerTyrSerArgProHisSerArgSerIleCysSerGluLeuLeu
or
5'-acacggctgaccgatgctagaccccatagtcgcgctatatgctcgaacttgttaa-3'
HisGlySTPProMetLeuAspProIleValAlaLeuTyrAlaArgThrCysSTP
or
5'-acacggctgaccgatgctagaccccatagtcgcgctatatgctcgaacttgttaa-3'
ThrAlaAspArgCysSTPThrProSTPSerArgTyrMetLeuGluLeuVal
Indeed, if this just happens to be the complementary strand, rather than the coding strand, then there are another three reading frames, making six in all.
Notice that only one of the sequences shown has NO STOP CODONS - this MAY indicate it's a coding sequence. It's called an Open Reading Frame (ORF).
There are programs in the Staden package and elsewhere that can use clues like this, and other more sophisticated statistical measures, to find coding stretches in DNA sequences. When such stretches are first found there's usually considerable doubt about which gene, if any, they belong to.
They are referred to as Unidentified Reading Frames (URFs).