Re: A simple question about the protein assignment.

peter Murray-rust (p.murray-rust@mail.cryst.bbk.ac.uk)
Sat, 16 Mar 1996 10:13:07 +0000 (GMT)

On Sat, 16 Mar 1996, Ju-Seog Lee and Terry wrote about a 'simple' problem.

>
> >> Hi Peter
> >>
> >> I don't know of an amino acid abbreviated PCA, but this is a common
> >> abbreviation for "perchloric acid" or "perchlorate anion" which could be a
> >> counterion in the crystal. It would most probably be complexed to the amino
> >> terminus where there is a positively charged group to neutralise.

This is an intelligent suggestion but not correct. (BTW, I AM VERY GLAD
TO SEE COURSE MEMBERS HYPOTHESISING IN PUBLIC LIKE THIS. IT HAS ALWAYS
BEEN A KEY PART OF THE COURSE SPIRIT THAT WE TRY TO WORK THINGS OUT IN
PUBLIC AND THAT THIS MAY MEAN SOME (INCOORECT) IDEA GET PUT FORWARD. AND
I'M ALSO VERY GLAD TO SEE THE SPIRIT IN WHICH THEY ARE RECEIVED.)

>
>> > >> Terry > > >
> > >Thanks for the information. In my protein, PCA is indeed found at the
> >amino terminus. I am confused, however, that in the pdb file it should
> >appear as an amino acid, associated with alpha and beta carbons. The pdb
> >file starts as follows:

This is NOT a simple problem, so feel reassured. *I* don't know what the
answer is :-).

The primary problem is that the PDB does not have a good mechanism for
describing 'small-molecules' chemistry. This is because it was devised
20+ years ago when the number of proteins examined was very small (<10)
and where 'everyone knew' what was going on. (For example, you'd never
want a number greater than 9 in the PDB code :-) - and 3 letters would be
quite enough...)

There are two mechanisms for 'unusual' components. One is the HET and
HETATM groups. The other is to use a dictionary of 3-letter codes (as
here). For example ornithine is ORN (like lysine but one less carbon -
it occurs in bacterial polypeptides). PDB has a list of these and you
should be able to find it somewhere on the pages although it may still be
in postscript.

> >
> >ATOM 1 N PCA A -3
> >ATOM 2 CA PCA A -3
> >ATOM 3 C PCA A -3
> >ATOM 4 O PCA A -3
> >ATOM 5 CB PCA A -3
> >ATOM 6 N LEU A -2
> >etc.

I couldn't decode this! It looks like alanine, but obviously isn't. It
*might* be dehydro-alanine (N-C(=CH2)-C(=O)-N but that didn't make much
sense. So I thought there were probably atoms missing (it *is* at the
N-terminus). A common N-terminal mod is pyro-glutamate, where the
carboxy on the Glu side chain has attacked the N-term N and you get:

N--------C-C(=O)-N
| |
O=C-- C----C
if I have counted right.

>
> I really don't know what PCA means in relaxin. But It doesn't look like
> perchloric acid to me,
> since atomic component of PCA is N-Calpha-CO
> |
> Cbeta
> which look like amino acid.
> percholic acid is HClO4.
> I think you really look up their published paper to know this.
> J Mol Biol 221: 15-21 (1991)[92015205]

This is certainly a good idea. Of course, when we are in the electronic
age it's frustraing!!! (I am at home and don't have any journals )

>
> Also second PCA(NCCOCCCO)in helix C is not the same PCA(NCCOC) in helix A.
> If I am wrong, pls correct me.

I thought there would be a problem like this. UNLESS ALL THE ATOMS ARE
IN THE FILE IT CAN BE IMPOSSIBLE TO KNOW WHAT THE PROTEIN IS.
Even then it can be impossible to know whether there are double or single
bonds. .e.g. you couldn't tell alanine from dehydro-alanine simply from
the ATOM records - the positions are not accurate enough.

> > >Another question: Why does the residue numbering scheme start
with > >negative integers?
>
> The reason for this is we want to keep the same numbering among homologoue
> proteins.

Yes. In the serine proteases you will find a lot of this. And residues
called 16A 16B, etc.

> Here is explanation from pdb guide. >

Well done. Most of the information is on the WWW site somewhere. But
there are also situations which do require expert analysis. If you have
such a problem (like this) *and* have talked with other people who can't
sort it out, there is a mailing list for the PDB. It's quite acceptable
to ask a simple question so long as you have done your homework first.
And it is read by the PDB staff who may discover things they weren't
aware of.

In the present case I think we can solve this without recourse to the
list. We can:
- display the protein and carefully note the connectivity
- search PDB docs for the list of HET groups
- read all the REMARK and other cards
- go to the original literature

Best of luck.

P.

Peter Murray-Rust, Glaxo Research & Dev. (pmr1716@ggr.co.uk); (BioMOO: PeterMR)
Birkbeck College, ubcg09q@cryst.bbk.ac.uk, CBMT/Daresbury mbglx@seqnet.dl.ac.uk
http://www.cryst.bbk.ac.uk/PPS/index.html, http://www.dl.ac.uk/CBMT/HOME.html