PPS96 Projects
Cristina Cantale
A bit of history
Integrase is an approximately 40kDa protein, encoded by the 3' end of the
pol gene of retroviruses.
In studies and experiments carried out on
Avian retroviruses in the early 80's (Grandganett D. P. et al, 86)
it was recognized that integrases are involved in the integration process of
viral DNA into host genome.
From 1987 to 1989 more evidence confirmed these
first suggestions (Brown P.O. et al, 87 - Fujiwara T. and
Mizuuchi K., 88).
In 1989 Bowerman obtained the integration of viral
DNA into a target DNA in a in vitro system using a nucleoprotein
complex recovered in the cytoplasm of MLV (Murine Leukemia virus) acutely
infected cells after viral DNA synthesis (Bowerman B. et al, 89
and also
Brown P.O. et al., 89).
A general model for integration started to
be outlined.
Linear viral DNA present in PIC (PreIntegration
Complex) is the precursor of integrated DNA. It is cleaved at 3' ends and
these recessed ends are integrated in cell DNA, cut in a staggered fashion
(about 5 bp which are then duplicated at the integration site), presumably by
cellular DNA repair enzymes. This joining reaction doesn't require any exogenous
source of energy.
The central role of Integrase proteins in the above model
was demonstrated by Katz and coworkers in 1990. In their experiments, purified
32 kDa ASLV IN (Avian Sarcoma-Leukosis virus) alone was able to perform both the
breakage and the joining reactions, using, as substrates, synthetic
oligonucleotides mimicking LTR
(Katzman et al.,89 - Katz R.A. et al, 90).
The
same results were obtained using HIV in a similar in vitro system
(Bushman F.D. et al., 90 - Vink C. et al., 91-1).
Integration mechanism
Since then, a large amount of work has been carried out using in vitro
systems and mainly HIV IN purified protein (but also INs from ASLV, RSV (Rous
Sarcoma virus), MoMULV (Moloney Murine leukemia virus) and very recently HTLV-II
(Human T-cell Leukemia virus type II)), with the aim of clarifying the
integration mechanism.
It was demonstrated that IN is able to carry out
three different reactions:
The overall reaction carried out by IN is a transesterification, produced by
a nucleophilic attack on an activated phosphodiester bond performed by water or
by the recessed 5'CA(OH)3' hydroxyl group end.
It appears that the same
catalytic core domain is involved in both processing and the DNA transfer.
Both
reactions proceed by a one step mechanism, as demonstrated using a known
chirality substrate (phosphorothioate), without the formation of covalent
protein-DNA intermediate
(Engelman A. et al., 91)
These reactions do not need any
external energy but a divalent cation (Mg2+ or Mn2+) is
necessary for the reaction to proceed. Beside the hypotheses of involvement of
divalent cations in active core mechanism, a novel in vitro assay using
immobilized LTR oligonucleotides (Hazuda et al., 94-1) suggests
that the requirement for Mn2+ is correlated with the formation of
the oligomeric structure of IN in solution (Wolfe et al., 96).
This is considered the very first step of the overall integration reaction, that
is the assembly of a stable complex between integrase and viral DNA
(Ellison V. and Brown P.O., 94 - Ellison V. et al., 95
-
Vink C. et al., 94).
As two sterically and temporally
coordinated reactions (one at each end of viral DNA) are required for
integration of viral DNA, IN has to be at least a dimer, carrying the double
strand viral DNA.
Staggered cleavage of the host DNA should involve another
dimer (at least), suggesting that IN works as a tetramer (at least).
Complementation experiments using IN mutants lacking different portions support
this hypothesis
(van Gent et al., 93 - Jones C.S. et al., 92 -
Engelman A. et al.,93).
The use of mutants has proved also
that there are different domains of IN which play different roles
(transesterification, multimerization, DNA recognition).Furthermore it has
permitted the identification of the AAs that are fundamental for IN activities,
confirming the results obtained from the sequence analysis of various retroviral
INs.
To complete the picture, it should be emphasized that, even if the
in vitro experiments have the great worth to have clarified many aspects of
IN behaviour, they are not able to simulate entirely the in vivo system.
The
same mechanism of integration is only partially reproduced.
The in vitro
system lacks the aspects of concerted two ends strand transfer reactions and
just one strand is processed and joined to the DNA target, with the final
product having a typical Y form.
Along with the actual mechanism, the aspects connected with viral DNA and
host DNA recognition characteristics need to be deepened.
As
previously reported, the specificity of viral
DNA for LTR sequences is not so high and there are reported examples of IN
proteins able to react with an oligonucleotide simulating LTR from different
retroviruses (e.g. MoMULV IN with HIV LTR ends in an aspecific fashion, but not
the reverse (Vink C. et al., 91-2)). Nucleotides next to the
subterminal CA have been reported to be involved, namely the subterminal 6 to 8
nucleotides
(Reicin A.S. et al., 95). The prevailing idea is that the
specificity is not mainly connected with sequence but with other aspects of LTR
viral DNA.
Hovewer it has been underlined that in the in vivo
systems IN and the viral DNA are not free in cytoplasm, but both are part of an
ordered complex, the PreIntegration Complex (PIC). PICs are so stable
assemblages that they can be extracted from cytoplasm of infected cells retainig
their activities.
Consequentely, it has been proposed that IN doesn't need
such a large sequence specificity to recognize its substrate and only a short
repeat CA, highly conserved, is essential for right positioning and catalysis
(van Gent et al., 91 - Hazuda et al., 94-2) ,
together with a subterminal portion interacting with the HHCC region of IN
(Vincent K.A. et al., 93).
The main aspects promoting IN
attack on host DNA for strand transfer reaction are not still completely
understood.
It seems that in vivo the site of attack is strongly
influenced by chromatin. There are some preferences, like regions complexed
with transcription factors (Kassavetis et al., 89) or by histones
(Morse et al., 92) or DNaseI sensitive sites. Probably there is
some sequence bias, too.
Some in vitro experiments were carried out using
more and more complex target DNA structures; a particularly efficient
integration into nucleosomal DNA
(Pryciak P.M. and Varmus E.H., 92) and in the most severely deformed and
kinked DNA regions within the nucleosomal core (Pruss et al., 94),
was observed.
It has been proposed that this is due to the bending of DNA in
these regions, which may activate integration (Muller H.P. et al., 94).The
bending promotes a DNA conformation (which is favourable for integration),
widening the minor and/or major groove(s) on the exposed face of the DNA helix.
There are also other parameters that can be influenced by DNA bending, like
affinity for Mn 2+; also transfer reaction might require local
denaturation of DNA, easier in a bent region.
In any case, the in vivo
system is very complex: specific interations with host proteins have to be
taken into account, following what observed for retrotransposone Ty3 (Chalker
D.L. and Sandmeyer S.B., 92), together with the subnuclear localization
of viral PIC; the host cell state during integration could play a role too. Such
interactions have been also proposed to explain the capacity of retroviral DNA
to protect itself from the autointegration process (Lee M.S. and Craige
R., 94).
The sequence
The IN primary structure has been deeply examined; only to mention some of
the approaches, secondary structure prediction methods (Lin T. et al.,
89) and multiple alignment procedures in conjunction with point and
deletion mutageneses and partial proteolysis have been used in a concerted
effort with the aim of elucidating the reaction mechanism of viral integration
up to the molecular level.
Multiple sequences alignments have been carried
out, comparing portions of IN sequences from different sources. Integrases from
retroviruses and their analog proteins from retrotransposones and some families
of bacterial Insertion Elements (IS) share distinctive aspects, beside a very
low general similarity (Johnson M.S. et al., 86).There is a
pattern of AAs that has been considered as an integrase fingerprint, because it
is highly conserved among all these proteins. The motif is located at the inner
part of the sequence and it is called DD(35)E motif (Fayet O. et al., 90
-
Kulkosky J. et al., 92).
A point mutation of these AAs
eliminates the strand transfer reaction, as largely demonstrated; this suggests
that they are part of the catalytic core.
Another characteristic motif is
found at the N terminus of integrases from retroviruses and retrotransposones,
consisting of HHCC motif, resembling a zinc-finger motif, which is often
involved in DNA interactions.
Partial proteolysis has been another powerful
instrument to clarify the functional organization of IN, together with
site-directed mutagenesis. A very large spectrum of techniques have been used
and experimental set-ups have been developed, including epitope mapping and
monoclonal antibodies (Nielsen B.M. et al., 96)
Two main hypotheses have been advanced about functional organization of IN.
The first scenario takes into account monomers including one active site and one
DNA-binding domain, arranged into a tetramer. In the second one the same single
active site is flancked by two different DNA binding domains, one for viral DNA
and the other for target DNA, leading to a dimeric system.
The different
domains in IN proteins analyzed for understanding the different aspects of the
integration reaction are:
The functional specialization of these three domains has been derived mainly
from in vitro experiments, but in vivo tests are also needed, to
examine aspects which are present and possibly fundamental in the in vivo
systems, which are more complex and not entirely simulated in the in vitro
ones.
Furthermore it is really important to underline that the results
obtained by in vitro assays are deeply dependent on the assay details.
Metals and salts presence, their concentration, protein concentration, ionic
strength, temperature and any other experimental parameter may play an essential
role in conditioning the final results.
This fact increases the importance
of similar results obtained from different groups, but it also recommends a
great caution towards in drawing general conclusions.
N-TERMINAL DOMAIN
The H-X3-H-X20-30C-X2-C motif at
N-terminus was the first motif observed by comparison between IN sequences from
different sources (Johnson et al., 86).
As this motif
resembles known metal binding Zn finger domain, which is a characteristic
element of a variety of DNA-binding proteins, it was at first supposed that this
region was involved in DNA recognition and correct positioning.
Moreover,
mutants with different deletions at the N-terminal region were still able to
bind DNA and even to accomplish a detectable DNA disintegration reaction in
vitro, demostrating the inconsistence of this first hypothesis (Khan
E. et al., 90 - Engelman A. and Craige R., 92 - Vincent
K. et al., 93 - Vink C. et al., 93).
However the same
assays demonstrated that N-terminus integrity was necessary for processing and
transfer reactions, suggesting that its funtionality could correlate with the
site-specific cleavage activity.
Similar conclusions were drawn by in
vivo tests carried out using Mo-MuLV IN mutants (Roth M.J. et al.,
90). Point mutations at the conserved cysteines or histidines of HIV-1 IN
and MLV IN are not completely desruptive for catalytic activity in vitro,
while they abolish infectivity in vivo.
Generally speaking, there is
an aspect of the
in vivo assays that has to be considered, adding further complexity.
The IN protein, beside its specific functions regarding the overall viral DNA
transfer, is involved in all the other steps of the life cycle. It is part of
the gag-pol polyprotein (whose correct folding permits following proteolysis),
of the PIC and of the final mature virion. IN has a multitude of interactions
inside these structures, which are not known and which can be influenced by a IN
mutation, playing also a role in the overall life cycle and thus affecting the
results of an in vivo assay. Mutations of IN are reported which affect
gag proteins or which are lethal for the virus at differents stages
(Ansari-Lari et al., 95 - Shin C. et al., 94).
An
hypothesis of tertiary structure of HHCC motif was formulated by spectroscopy,
using a 55-AAs peptide simulating (1-55)HIV-1 IN in a Zn2+ complex
(Burke C.J. ae al., 92), so demostrating that this motif can fold
indipendently and it is able to bind Zn2+.
Recently, it has been
proposed that this domain can promote higher order multimerization of integrase
dimers, fundamental for the stable formation of a complex between the IN protein
and viral DNA (Ellison V. et al., 95). This reaction requires a
divalent cation (mainly Mn2+, but it was demonstrated that also Mg2+
is efficient, the results depending on the assay conditions (Engelman a.
and Craige R., 1995)). Zinc-binding domain from other proteins are
reported to play a role in protein-protein interaction.
Despite of the
amount of work carried out to define the role of the N-terminus, a model able
to explain all the different and often disagreeing observations is not yet
available, also because the different results are affected by the specific
reaction conditions used.
THE CATALYTIC CORE
It has been demonstrated by deletion mutants that the shortest sequence of
HIV-1 IN still able to accomplish disintegration reaction is mapped at IN50-186.
Therefore this region, which is also the most resistant to proteolytic cleavage,
has to contain the catalytic core (Engelman A. and Craige R., 92)
- Bushman F.D. et al., 93).
Key AAs forming the catalytic
triad are D-64, D-116 and E-152 (in HIV-1).
Even a conservative mutation of
one of these AAs eliminates all detectable activities of integration and viral
replication both in vitro and in vivo
(Kulkosky J. et al., 92).
The role of these AAs is supposed to
be the coordination of a divalent metal cofactor, in analogy to the behaviour of
other enzymes catalyzing phosphoryl transfer reactions (KulkoskyJ. et
al., 92)
Other AAs adjacent to DDE motif (like W61, T66, V75, S81,
T115, S123 and I135) are well conserved among retroviruses and the mutation of
each of them can be detrimental.
Beside containing the AAs involved in the
catalytic core, it is suggested that the central core region is also involved in
other functions.
It is reported that D116 is also involved in stable binding
of IN to its viral DNA, but the opinions about the role of the central core in
DNA binding, both unspecific and specific, are divergent, due to discrepancy in
results
(Hazuda D. et al , 94-2 - Vink C. et al., 94.
Recently,
results have been reported about chimeric INs. Swapping the N terminal domain
in HFV with HIV resulted in a chimeric IN having 3' processing activities with
HFV LTR, indicating that central domain is crucial for substrate recognition
(Pahl A. and Flugel R.M., 95). The same result is suggested by
other chimeras obtained swapping Visna with HIV-1 (Katzman M. and Sudol
M., 95).
Further results suggest that this region is involved into
dimerization. A potential leucine zipper domain motif has been identified,
mapped at 151-168 in HIV-1 IN (Lin T. et al ., 91) and it has
been pointed as the dimerization domain. In this region K159, R166 and E152 are
highly conserved residues.
Recently the tertiary structure of HIV-1 and ASV
INs core domain
has been solved.
C-TERMINAL DOMAIN
While studying the N-domain connection to DNA, the C-teminal domain was
discovered to be the one specifically involved into DNA recognition. The same
experiments previously referred, (e.g. Khan E. et al. 90) showed
that IN deletion mutants at the C-terminal domain were no longer able to bind
DNA.
It is unkown which characteristic motif is connected to this function.
The C-terminal domain is considered to be the least conserved, with less
distinctive aspects between the IN sequences.
Mutagenesis, complementation
and other assays have mapped, at the 200-270 region, a not-specific DNA binding
function, which does not require divalent metal ion (Engelman et al., 94).
So these region can be involved in the interaction with the target DNA.
Deletion
mutans at different levels have been tested in vitro to better define the
binding region. An interesting result has been reported about a single point
mutation, W235A, having no effect in vitro, but totally blocking the
provirus capacity to replicate in vivo. W235 was the only AA reported to
be highly conserved in this region (Johnson M.S. et al.,86).
W235 has been consequently proposed to be a key component of some local
structure involved in the target DNA interactions (Cannon P.M. et al.,94).
A more recent analysis of HIV-1 IN C-terminal domain has been carried out,
comparing retroviruses from different sub-families (Cannon P.M. et
al.,96). Three conserved regions among all retroviruses (except HFV) were
evidenced and designated as L, C and N. Region L is only conserved in the
lentiviruses; region C and N are conserved in all retroviruses, altought the
consensus sequences differ between lenti- and non lenti-viruses. C and N are
encompassed by that part of C-terminal region which is considered essential for
DNA binding activity, that is HIV-IN 213-266
(Engelman A. et al., 94) . W235 is near the beginning of the C
region. Some mutants selected on the basis of the previous observations, were
tested using
in vivo assays. The results confirm that W235 and K186, whose mutation
produce block of infectivity at a step beyond reverse transcription and
migration into the nucleus, can be involved in target DNA binding.
The
complete meaning of these observation has to be investigated. The presence of
these sequence features, characteristic of lentiviruses, could be correlated
with aspects of their life cycle that are distinct from other retroviruses, as
the ability to enter into the nucleus of host cell that are not in mitosis.
3D structure
HIV IN has been subject to various crystallization attempts, but its low
solubility and the tendency to form aggregates stopped them. Deletion mutants
prepared to test the functionality of different domains resulted also in
producing deleted proteins with different solubility from the wild type. The
soluble mutants were candidate for structural studies.
HIV-1 IN50-212
was a promising one. Its biophysical, enzymatic and spectroscopic properties
were measured and found not largely altereted in comparison with full length
protein. Its solubility was good, but the aggregation was retained (Hickman
A.B. et al., 94). Some point mutation of IN50-212 were
tested, looking for a mutation improving solubility against aggregation (Jenkins
et al., 95). A single mutation F(185)K produced a soluble protein,
existing as a monodispersed dimer in solution. This protein, containig the
catalytical core domain, was crystallized and its structure was resolved at 2.5
Angstrom (Dyde et al., 94).
The strategy of inducing mutations
to obtain soluble proteins has been applied at whole HIV-1 IN protein, and a
mutant containig two mutations, F185K and C280S, has been selected. It is
completely active and it exists in solution in an equilibrium between dimeric
and tetrameric form (1M NaCl). It is a good candidate for solving the HIV-1 IN
structure (Cannon P. et al , 96).
The structure of HIV 1 IN
220-270in solution has been determined using multidimensional NMR
spectroscopy (Lodi P.J. et al., 95).
The same strategy of
selecting suitable mutants was followed with ASV IN (Kulkosky J. et al.,
95).
In 1995 the ASV IN structure was also solved at high resolution,
using a deletion mutant ASV IN52-207, without further mutations
(Bujacz et al., 95).
3D structure of HIV-1 IN catalytic domain
The structure consists of a central 5 strands beta sheet and six helices.
Its overall topology resembles the E. coli ribonuclease H (RNase H). Other
enzymes share this topology, like RNase H domain of HIV-1 Reverse
Transcriptase, Holliday junction-specific endonuclease RuvC and the core domain
of the transposase protein of bacteriophage Mu.
The structure contains a
disordered region, from 140 to 154 AAs, including the E152, one of the three key
AAs of the catalytic triad. D64 and D116 are ordered and superimpose well on two
catalytic residues of HIV-1 RNase H.
The first six residues (from 50-55) at
N-terminus and the last four residues at C-terminus are not visible.
Fig. 4
is a simple diagram of HIV IN and RNase H topologies.
The core is a dimer, with a main, large region of contact encompassing
beta3, alpha1, alpha3, alpha5 and alpha6 (about 1300 square Angstroms per
subunit) and contains salt bridges and hydrogen bonds.
It is not mediated
through the proposed leucine zipper motif.
Here is a
picture of the core domain of HIV-1 IN (12Kb)
1ITG(77Kb)
Look
also at
1HRH (123Kb) HIV-1 RNase H domain and
here (18Kb)
end at
1HRJ
(338Kb) Holliday junction endonuclease RuvC from E.coli and
here (21Kb).
Look at this picture (24Kb) to see the
remarkable residues, that is the conserved ones and/or known to produce
interesting mutants.
Look also here at the catalytic
core highlighted residues (14 Kb)
3D structure of HIV-1 IN C-terminal domain
Here is a picture of the
C-terminus structure.
In solution IN220-270 is a dimer. Each
monomer is composed of a five stranded beta-barrel. The interface is formed by
three antiparallel strands (namely 2, 3 and 4) from each monomer faced in an
antiparallel fashion. The interface is mainly stabilized by hydrophobic
interactions. In this picture (37Kb)
the AAs interacting at the dimer interface are highligthed.
The overall topology is very similar to SH3 domains, which are found in proteins involved in signal trasduction (Eijkelenboom A. et al., 95). SH3-like folds are widely present in proteins, despite of the absence of any significant sequence identity.
Look at the structure of alpha spectrin SH3 domain
here (32Kb)
The AAs probably involved in DNA recognition are
evidentiated in this
picture (21Kb) and are compatible with a bind to the major groove of DNA.
3D structure of ASV IN catalytic domain
Some different structures were solved, because ASV IN52-207 was
crystallized using different conditions with reference to protein buffers (Hepes
or citrate) and precipitants (PEG/IPR and ammonium sulphate). A
selenomethionine substituted protein was also prepared and two different data
collection temperatures (20C° and -165C°) were used.
The effect of
different crystallization conditions are generally undetectable among the
structures except for a ten residue loop between beta5 and alpha4, which changes
conformation. The loop protudes from the molecule. The high temperature factors
found at its tips suggest that this is a flexible region. It contains two turns,
when crystals grow in PEG and one smooth turn, intramolecularly stabilized,
when crystals grow in ammonium sulphate. It is interesting to note that about
the same region (12 residue instead of 10) was found disordered in the
structure of core domain of HIV-IN .
A comparison
with 3D structure of HIV IN catalytic core reveals a similar general topology,
with previously reported analogy with RNase H and RuvC.
Fig. 5 is a
schematic diagram of the ASV IN catalytic core fragment topology.
Here is a
picture of ASV IN (27Kb)
The ASV IN catalytic core is a dimer, with an interface located between
alpha1 from a monomer and alpha5 from the other, with mainly hydrophobic
interactions. Protruding from the space between helices, residues from the beta3
strand also participate. The general shape of the interface is a cavity filled
by some positively charged residues and a few water molecules . Several other
polar interactions between charged residues contribute to stabilize the dimer.
This picture gives just
a rought idea of the catalytic core (which is a dimer).
Water molecules
play a key role both in catalytic core and in stabilization of folding, together
with some conserved residues. The necessity for a high conserved S85 residue
(S81 in HIV-IN) has been correlated with a structural effect in folding
stabilization.
The triad D64D121E157 is in such position as to form an
active site. The side chains interact with each other by three water molecules,
one of which (Wat324) appears to be particularly important and could be replaced
by a divalent cation.
Look here at
the catalytic core (30Kb).
The
comparison betweeen the two 3D available structures of integrases reveals a
similar architecture, but there are considerable deviations. Looking at the
dimers, the corresponding elements in each monomer can be recognized, but
shifted up 6A° , with a reported relative 13.3° rotation and 4.8 A°
translation.
Furthermore there are large differences in the orientations of
the key residues in the active site that are difficult to understand.
The
extension of dimerization is less in ASV IN core fragment, 766 square Angstroms
against 1395 square Angstroms in HIV-1 IN and is compatible with the weaker
association observed for ASV IN with respect to HIV IN.
Very recently the
structure of ASV IN catalytic domain crystallized in presence of divalent
cations Mg2+ and Mn2+ has been solved (Bujacz G. et al., 96).
Here
you can see a picture of the triad and the metal ion bound to asp64 and asp121.
Here you can download the respective coordinates.
1VSD
(90Kb)
1VSF (87Kb)
Therapeutical implications
Integration of viral DNA is a fundamental step in life cycle of
retroviruses, necessary for the production of progeny viruses.
Mutation in
the 3' proximal portion of the gene of HIV, in the region coding for IN, produce
mutants identical to the wild type except for integration. Proviral DNA is not
formed and the infection slowly disappears(Goff S.P.,92).
However the absolute requirement of integration for productive HIV-1 infection
has not yet been unequivocally established. On the contrary, examples are
reported about virus infection without integration.
In any case, integration
is a target for designing drugs against retroviruses, mainly with reference to
AIDS.
A great research effort is going on to identify inhibitors of
different steps of integration that can be used at therapeutical level.
Some substances that show inhibition against the catalytic activity of IN,
including caffeic acid phenylethyl ester (CAPE)
(Fesen M.R. et al., 93), topoisomerase inhibitors and
bis-catechols have been reported. Often they are not selective for IN.
Recently
a new class of compounds derived from caffeic acid has been tested , obtaining
encouraging results. These compounds are two dicaffeoylquinic acids (DCQAs)
extracted from medicinal plants (Achyrocline satureioides), together
with a synthetic compound, L-chicoric acid (Robinson W.E. et al., 96).
They are reported to be inhibitors of HIV-1 in biochemical assays and also in
in vitro assays of disintegration, the latter evidence suggesting an
interaction with the catalytic core domain of HIV IN. The effect on host DNA is
unknown but their selectivity in tissue culture is promising.
Besides, a Fab
fragment (Fab 35) of a specific monoclonal antibody (MAb35) has been
characterized (Barsov E.V. et al., 96). It is able to bind to an
epitope on the C-terminal domain of HIV-1 IN and block every activity. It has
been suggested that it interacts with the multimerization of IN.
PPS96List of ContentsRetrovirusesAbout PhylogenyReferences
Last updated 25th Oct '96