|
Section 4 Index | Index to Course Material |
The relationship between the codons of nucleic acids, and the amino acids for which they code, is embodied in the Genetic Code, (which is NOT universal since slight variations on it are found in mitochondria and chloroplasts). The 64 possible triplets of bases in a codon, and the amino acid coded for are shown in this table :-
First Second Position Third Position ------------------------------------ Position | U(T) C A G | U(T) Phe Ser Tyr Cys U(T) Phe Ser Tyr Cys C Leu Ser STOP STOP A Leu Ser STOP Trp G C Leu Pro His Arg U(T) Leu Pro His Arg C Leu Pro Gln Arg A Leu Pro Gln Arg G A Ile Thr Asn Ser U(T) Ile Thr Asn Ser C Ile Thr Lys Arg A Met Thr Lys Arg G G Val Ala Asp Gly U(T) Val Ala Asp Gly C Val Ala Glu Gly A Val Ala Glu Gly G
Note that in most cases sufficient coding is performed by the first two bases, the third (or wobble) base playing a minor role.
Note also the STOP codons, which cause termination of translation by the ribosome.
Different organisms exhibit different statistical preferences of triplet codon usage, as well as using the amino acids in widely varying proportions. See Of URFs and ORFs' by Russell Doolittle, University Science Books (1986) ISBN 0-935702-54-7.
A piece of DNA sequence may or may not code for a piece of a protein, depending on whether it's part of a gene. If we obtain a stretch of sequence experimentally from genomic DNA, then we can try and guess what it might possible code for by using the Genetic Code to convert from bases to AAs.
However, you should appreciate that there are three possible reading frames which may be used, each one base out of step with the others, each of which may give a believable stretch of protein sequence, thus :-
5'-acacggctgaccgatgctagaccccatagtcgcgctatatgctcgaacttgttaa-3' may code for 5'-acacggctgaccgatgctagaccccatagtcgcgctatatgctcgaacttgttaa-3' ThrTrpLeuSerTyrSerArgProHisSerArgSerIleCysSerGluLeuLeu or 5'-acacggctgaccgatgctagaccccatagtcgcgctatatgctcgaacttgttaa-3' HisGlySTPProMetLeuAspProIleValAlaLeuTyrAlaArgThrCysSTP or 5'-acacggctgaccgatgctagaccccatagtcgcgctatatgctcgaacttgttaa-3' ThrAlaAspArgCysSTPThrProSTPSerArgTyrMetLeuGluLeuVal
Indeed, if this just happens to be the complementary strand, rather than the coding strand, then there are another three reading frames, making six in all.
Notice that only one of the sequences shown has NO STOP CODONS - this MAY indicate it's a coding sequence. It's called an Open Reading Frame (ORF).
There are programs in the Staden package and elsewhere that can use clues like this, and other more sophisticated statistical measures, to find coding stretches in DNA sequences. When such stretches are first found there's usually considerable doubt about which gene, if any, they belong to.
They are referred to as Unidentified Reading Frames (URFs).
Last updated 11th Nov '96