PPS96 Projects

Cristina Cantale

The Viral Integrases

The Integrase Protein

A bit of history

Integrase is an approximately 40kDa protein, encoded by the 3' end of the pol gene of retroviruses.
In studies and experiments carried out on Avian retroviruses in the early 80's (Grandganett D. P. et al, 86) it was recognized that integrases are involved in the integration process of viral DNA into host genome.
From 1987 to 1989 more evidence confirmed these first suggestions (Brown P.O. et al, 87 - Fujiwara T. and Mizuuchi K., 88).
In 1989 Bowerman obtained the integration of viral DNA into a target DNA in a in vitro system using a nucleoprotein complex recovered in the cytoplasm of MLV (Murine Leukemia virus) acutely infected cells after viral DNA synthesis (Bowerman B. et al, 89 and also Brown P.O. et al., 89).
A general model for integration started to be outlined.
Linear viral DNA present in PIC (PreIntegration Complex) is the precursor of integrated DNA. It is cleaved at 3' ends and these recessed ends are integrated in cell DNA, cut in a staggered fashion (about 5 bp which are then duplicated at the integration site), presumably by cellular DNA repair enzymes. This joining reaction doesn't require any exogenous source of energy.
The central role of Integrase proteins in the above model was demonstrated by Katz and coworkers in 1990. In their experiments, purified 32 kDa ASLV IN (Avian Sarcoma-Leukosis virus) alone was able to perform both the breakage and the joining reactions, using, as substrates, synthetic oligonucleotides mimicking LTR (Katzman et al.,89 - Katz R.A. et al, 90).
The same results were obtained using HIV in a similar in vitro system (Bushman F.D. et al., 90 - Vink C. et al., 91-1).

Integration mechanism

Since then, a large amount of work has been carried out using in vitro systems and mainly HIV IN purified protein (but also INs from ASLV, RSV (Rous Sarcoma virus), MoMULV (Moloney Murine leukemia virus) and very recently HTLV-II (Human T-cell Leukemia virus type II)), with the aim of clarifying the integration mechanism.
It was demonstrated that IN is able to carry out three different reactions:

The overall reaction carried out by IN is a transesterification, produced by a nucleophilic attack on an activated phosphodiester bond performed by water or by the recessed 5'CA(OH)3' hydroxyl group end.
It appears that the same catalytic core domain is involved in both processing and the DNA transfer.
Both reactions proceed by a one step mechanism, as demonstrated using a known chirality substrate (phosphorothioate), without the formation of covalent protein-DNA intermediate (Engelman A. et al., 91)
These reactions do not need any external energy but a divalent cation (Mg2+ or Mn2+) is necessary for the reaction to proceed. Beside the hypotheses of involvement of divalent cations in active core mechanism, a novel in vitro assay using immobilized LTR oligonucleotides (Hazuda et al., 94-1) suggests that the requirement for Mn2+ is correlated with the formation of the oligomeric structure of IN in solution (Wolfe et al., 96). This is considered the very first step of the overall integration reaction, that is the assembly of a stable complex between integrase and viral DNA (Ellison V. and Brown P.O., 94 - Ellison V. et al., 95 - Vink C. et al., 94).
As two sterically and temporally coordinated reactions (one at each end of viral DNA) are required for integration of viral DNA, IN has to be at least a dimer, carrying the double strand viral DNA.
Staggered cleavage of the host DNA should involve another dimer (at least), suggesting that IN works as a tetramer (at least). Complementation experiments using IN mutants lacking different portions support this hypothesis (van Gent et al., 93 - Jones C.S. et al., 92 - Engelman A. et al.,93).
The use of mutants has proved also that there are different domains of IN which play different roles (transesterification, multimerization, DNA recognition).Furthermore it has permitted the identification of the AAs that are fundamental for IN activities, confirming the results obtained from the sequence analysis of various retroviral INs.
To complete the picture, it should be emphasized that, even if the in vitro experiments have the great worth to have clarified many aspects of IN behaviour, they are not able to simulate entirely the in vivo system.
The same mechanism of integration is only partially reproduced.
The in vitro system lacks the aspects of concerted two ends strand transfer reactions and just one strand is processed and joined to the DNA target, with the final product having a typical Y form.

Fig. 3 A very simplified model of IN mechanism of reaction in in vivo and in vitro system

Along with the actual mechanism, the aspects connected with viral DNA and host DNA recognition characteristics need to be deepened.
As previously reported, the specificity of viral DNA for LTR sequences is not so high and there are reported examples of IN proteins able to react with an oligonucleotide simulating LTR from different retroviruses (e.g. MoMULV IN with HIV LTR ends in an aspecific fashion, but not the reverse (Vink C. et al., 91-2)). Nucleotides next to the subterminal CA have been reported to be involved, namely the subterminal 6 to 8 nucleotides (Reicin A.S. et al., 95). The prevailing idea is that the specificity is not mainly connected with sequence but with other aspects of LTR viral DNA.
Hovewer it has been underlined that in the in vivo systems IN and the viral DNA are not free in cytoplasm, but both are part of an ordered complex, the PreIntegration Complex (PIC). PICs are so stable assemblages that they can be extracted from cytoplasm of infected cells retainig their activities.
Consequentely, it has been proposed that IN doesn't need such a large sequence specificity to recognize its substrate and only a short repeat CA, highly conserved, is essential for right positioning and catalysis (van Gent et al., 91 - Hazuda et al., 94-2) , together with a subterminal portion interacting with the HHCC region of IN (Vincent K.A. et al., 93).
The main aspects promoting IN attack on host DNA for strand transfer reaction are not still completely understood.
It seems that in vivo the site of attack is strongly influenced by chromatin. There are some preferences, like regions complexed with transcription factors (Kassavetis et al., 89) or by histones (Morse et al., 92) or DNaseI sensitive sites. Probably there is some sequence bias, too.
Some in vitro experiments were carried out using more and more complex target DNA structures; a particularly efficient integration into nucleosomal DNA (Pryciak P.M. and Varmus E.H., 92) and in the most severely deformed and kinked DNA regions within the nucleosomal core (Pruss et al., 94), was observed.
It has been proposed that this is due to the bending of DNA in these regions, which may activate integration (Muller H.P. et al., 94).The bending promotes a DNA conformation (which is favourable for integration), widening the minor and/or major groove(s) on the exposed face of the DNA helix. There are also other parameters that can be influenced by DNA bending, like affinity for Mn 2+; also transfer reaction might require local denaturation of DNA, easier in a bent region.
In any case, the in vivo system is very complex: specific interations with host proteins have to be taken into account, following what observed for retrotransposone Ty3 (Chalker D.L. and Sandmeyer S.B., 92), together with the subnuclear localization of viral PIC; the host cell state during integration could play a role too. Such interactions have been also proposed to explain the capacity of retroviral DNA to protect itself from the autointegration process (Lee M.S. and Craige R., 94).

The sequence

The IN primary structure has been deeply examined; only to mention some of the approaches, secondary structure prediction methods (Lin T. et al., 89) and multiple alignment procedures in conjunction with point and deletion mutageneses and partial proteolysis have been used in a concerted effort with the aim of elucidating the reaction mechanism of viral integration up to the molecular level.
Multiple sequences alignments have been carried out, comparing portions of IN sequences from different sources. Integrases from retroviruses and their analog proteins from retrotransposones and some families of bacterial Insertion Elements (IS) share distinctive aspects, beside a very low general similarity (Johnson M.S. et al., 86).There is a pattern of AAs that has been considered as an integrase fingerprint, because it is highly conserved among all these proteins. The motif is located at the inner part of the sequence and it is called DD(35)E motif (Fayet O. et al., 90 - Kulkosky J. et al., 92).
A point mutation of these AAs eliminates the strand transfer reaction, as largely demonstrated; this suggests that they are part of the catalytic core.
Another characteristic motif is found at the N terminus of integrases from retroviruses and retrotransposones, consisting of HHCC motif, resembling a zinc-finger motif, which is often involved in DNA interactions.
Partial proteolysis has been another powerful instrument to clarify the functional organization of IN, together with site-directed mutagenesis. A very large spectrum of techniques have been used and experimental set-ups have been developed, including epitope mapping and monoclonal antibodies (Nielsen B.M. et al., 96)

Two main hypotheses have been advanced about functional organization of IN. The first scenario takes into account monomers including one active site and one DNA-binding domain, arranged into a tetramer. In the second one the same single active site is flancked by two different DNA binding domains, one for viral DNA and the other for target DNA, leading to a dimeric system.
The different domains in IN proteins analyzed for understanding the different aspects of the integration reaction are:

The functional specialization of these three domains has been derived mainly from in vitro experiments, but in vivo tests are also needed, to examine aspects which are present and possibly fundamental in the in vivo systems, which are more complex and not entirely simulated in the in vitro ones.
Furthermore it is really important to underline that the results obtained by in vitro assays are deeply dependent on the assay details. Metals and salts presence, their concentration, protein concentration, ionic strength, temperature and any other experimental parameter may play an essential role in conditioning the final results.
This fact increases the importance of similar results obtained from different groups, but it also recommends a great caution towards in drawing general conclusions.


The H-X3-H-X20-30C-X2-C motif at N-terminus was the first motif observed by comparison between IN sequences from different sources (Johnson et al., 86).
As this motif resembles known metal binding Zn finger domain, which is a characteristic element of a variety of DNA-binding proteins, it was at first supposed that this region was involved in DNA recognition and correct positioning.
Moreover, mutants with different deletions at the N-terminal region were still able to bind DNA and even to accomplish a detectable DNA disintegration reaction in vitro, demostrating the inconsistence of this first hypothesis (Khan E. et al., 90 - Engelman A. and Craige R., 92 - Vincent K. et al., 93 - Vink C. et al., 93).
However the same assays demonstrated that N-terminus integrity was necessary for processing and transfer reactions, suggesting that its funtionality could correlate with the site-specific cleavage activity.
Similar conclusions were drawn by in vivo tests carried out using Mo-MuLV IN mutants (Roth M.J. et al., 90). Point mutations at the conserved cysteines or histidines of HIV-1 IN and MLV IN are not completely desruptive for catalytic activity in vitro, while they abolish infectivity in vivo.
Generally speaking, there is an aspect of the in vivo assays that has to be considered, adding further complexity. The IN protein, beside its specific functions regarding the overall viral DNA transfer, is involved in all the other steps of the life cycle. It is part of the gag-pol polyprotein (whose correct folding permits following proteolysis), of the PIC and of the final mature virion. IN has a multitude of interactions inside these structures, which are not known and which can be influenced by a IN mutation, playing also a role in the overall life cycle and thus affecting the results of an in vivo assay. Mutations of IN are reported which affect gag proteins or which are lethal for the virus at differents stages (Ansari-Lari et al., 95 - Shin C. et al., 94).
An hypothesis of tertiary structure of HHCC motif was formulated by spectroscopy, using a 55-AAs peptide simulating (1-55)HIV-1 IN in a Zn2+ complex (Burke C.J. ae al., 92), so demostrating that this motif can fold indipendently and it is able to bind Zn2+.
Recently, it has been proposed that this domain can promote higher order multimerization of integrase dimers, fundamental for the stable formation of a complex between the IN protein and viral DNA (Ellison V. et al., 95). This reaction requires a divalent cation (mainly Mn2+, but it was demonstrated that also Mg2+ is efficient, the results depending on the assay conditions (Engelman a. and Craige R., 1995)). Zinc-binding domain from other proteins are reported to play a role in protein-protein interaction.
Despite of the amount of work carried out to define the role of the N-terminus, a model able to explain all the different and often disagreeing observations is not yet available, also because the different results are affected by the specific reaction conditions used.


It has been demonstrated by deletion mutants that the shortest sequence of HIV-1 IN still able to accomplish disintegration reaction is mapped at IN50-186. Therefore this region, which is also the most resistant to proteolytic cleavage, has to contain the catalytic core (Engelman A. and Craige R., 92) - Bushman F.D. et al., 93).
Key AAs forming the catalytic triad are D-64, D-116 and E-152 (in HIV-1).
Even a conservative mutation of one of these AAs eliminates all detectable activities of integration and viral replication both in vitro and in vivo (Kulkosky J. et al., 92).
The role of these AAs is supposed to be the coordination of a divalent metal cofactor, in analogy to the behaviour of other enzymes catalyzing phosphoryl transfer reactions (KulkoskyJ. et al., 92)
Other AAs adjacent to DDE motif (like W61, T66, V75, S81, T115, S123 and I135) are well conserved among retroviruses and the mutation of each of them can be detrimental.
Beside containing the AAs involved in the catalytic core, it is suggested that the central core region is also involved in other functions.
It is reported that D116 is also involved in stable binding of IN to its viral DNA, but the opinions about the role of the central core in DNA binding, both unspecific and specific, are divergent, due to discrepancy in results (Hazuda D. et al , 94-2 - Vink C. et al., 94.
Recently, results have been reported about chimeric INs. Swapping the N terminal domain in HFV with HIV resulted in a chimeric IN having 3' processing activities with HFV LTR, indicating that central domain is crucial for substrate recognition (Pahl A. and Flugel R.M., 95). The same result is suggested by other chimeras obtained swapping Visna with HIV-1 (Katzman M. and Sudol M., 95).
Further results suggest that this region is involved into dimerization. A potential leucine zipper domain motif has been identified, mapped at 151-168 in HIV-1 IN (Lin T. et al ., 91) and it has been pointed as the dimerization domain. In this region K159, R166 and E152 are highly conserved residues.
Recently the tertiary structure of HIV-1 and ASV INs core domain has been solved.


While studying the N-domain connection to DNA, the C-teminal domain was discovered to be the one specifically involved into DNA recognition. The same experiments previously referred, (e.g. Khan E. et al. 90) showed that IN deletion mutants at the C-terminal domain were no longer able to bind DNA.
It is unkown which characteristic motif is connected to this function. The C-terminal domain is considered to be the least conserved, with less distinctive aspects between the IN sequences.
Mutagenesis, complementation and other assays have mapped, at the 200-270 region, a not-specific DNA binding function, which does not require divalent metal ion (Engelman et al., 94). So these region can be involved in the interaction with the target DNA.
Deletion mutans at different levels have been tested in vitro to better define the binding region. An interesting result has been reported about a single point mutation, W235A, having no effect in vitro, but totally blocking the provirus capacity to replicate in vivo. W235 was the only AA reported to be highly conserved in this region (Johnson M.S. et al.,86). W235 has been consequently proposed to be a key component of some local structure involved in the target DNA interactions (Cannon P.M. et al.,94).
A more recent analysis of HIV-1 IN C-terminal domain has been carried out, comparing retroviruses from different sub-families (Cannon P.M. et al.,96). Three conserved regions among all retroviruses (except HFV) were evidenced and designated as L, C and N. Region L is only conserved in the lentiviruses; region C and N are conserved in all retroviruses, altought the consensus sequences differ between lenti- and non lenti-viruses. C and N are encompassed by that part of C-terminal region which is considered essential for DNA binding activity, that is HIV-IN 213-266 (Engelman A. et al., 94) . W235 is near the beginning of the C region. Some mutants selected on the basis of the previous observations, were tested using in vivo assays. The results confirm that W235 and K186, whose mutation produce block of infectivity at a step beyond reverse transcription and migration into the nucleus, can be involved in target DNA binding.
The complete meaning of these observation has to be investigated. The presence of these sequence features, characteristic of lentiviruses, could be correlated with aspects of their life cycle that are distinct from other retroviruses, as the ability to enter into the nucleus of host cell that are not in mitosis.

3D structure

HIV IN has been subject to various crystallization attempts, but its low solubility and the tendency to form aggregates stopped them. Deletion mutants prepared to test the functionality of different domains resulted also in producing deleted proteins with different solubility from the wild type. The soluble mutants were candidate for structural studies.
HIV-1 IN50-212 was a promising one. Its biophysical, enzymatic and spectroscopic properties were measured and found not largely altereted in comparison with full length protein. Its solubility was good, but the aggregation was retained (Hickman A.B. et al., 94). Some point mutation of IN50-212 were tested, looking for a mutation improving solubility against aggregation (Jenkins et al., 95). A single mutation F(185)K produced a soluble protein, existing as a monodispersed dimer in solution. This protein, containig the catalytical core domain, was crystallized and its structure was resolved at 2.5 Angstrom (Dyde et al., 94).
The strategy of inducing mutations to obtain soluble proteins has been applied at whole HIV-1 IN protein, and a mutant containig two mutations, F185K and C280S, has been selected. It is completely active and it exists in solution in an equilibrium between dimeric and tetrameric form (1M NaCl). It is a good candidate for solving the HIV-1 IN structure (Cannon P. et al , 96).
The structure of HIV 1 IN 220-270in solution has been determined using multidimensional NMR spectroscopy (Lodi P.J. et al., 95).
The same strategy of selecting suitable mutants was followed with ASV IN (Kulkosky J. et al., 95).
In 1995 the ASV IN structure was also solved at high resolution, using a deletion mutant ASV IN52-207, without further mutations (Bujacz et al., 95).

3D structure of HIV-1 IN catalytic domain

The structure consists of a central 5 strands beta sheet and six helices. Its overall topology resembles the E. coli ribonuclease H (RNase H). Other enzymes share this topology, like RNase H domain of HIV-1 Reverse Transcriptase, Holliday junction-specific endonuclease RuvC and the core domain of the transposase protein of bacteriophage Mu.
The structure contains a disordered region, from 140 to 154 AAs, including the E152, one of the three key AAs of the catalytic triad. D64 and D116 are ordered and superimpose well on two catalytic residues of HIV-1 RNase H.
The first six residues (from 50-55) at N-terminus and the last four residues at C-terminus are not visible.
Fig. 4 is a simple diagram of HIV IN and RNase H topologies.

The core is a dimer, with a main, large region of contact encompassing beta3, alpha1, alpha3, alpha5 and alpha6 (about 1300 square Angstroms per subunit) and contains salt bridges and hydrogen bonds.
It is not mediated through the proposed leucine zipper motif.
Here is a picture of the core domain of HIV-1 IN (12Kb)

Look also at 1HRH (123Kb) HIV-1 RNase H domain and here (18Kb)
end at 1HRJ (338Kb) Holliday junction endonuclease RuvC from E.coli and here (21Kb).

Look at this picture (24Kb) to see the remarkable residues, that is the conserved ones and/or known to produce interesting mutants.
Look also here at the catalytic core highlighted residues (14 Kb)

3D structure of HIV-1 IN C-terminal domain

Here is a picture of the C-terminus structure.
In solution IN220-270 is a dimer. Each monomer is composed of a five stranded beta-barrel. The interface is formed by three antiparallel strands (namely 2, 3 and 4) from each monomer faced in an antiparallel fashion. The interface is mainly stabilized by hydrophobic interactions. In this picture (37Kb) the AAs interacting at the dimer interface are highligthed.

The overall topology is very similar to SH3 domains, which are found in proteins involved in signal trasduction (Eijkelenboom A. et al., 95). SH3-like folds are widely present in proteins, despite of the absence of any significant sequence identity.

Look at the structure of alpha spectrin SH3 domain here (32Kb)
The AAs probably involved in DNA recognition are evidentiated in this picture (21Kb) and are compatible with a bind to the major groove of DNA.

3D structure of ASV IN catalytic domain

Some different structures were solved, because ASV IN52-207 was crystallized using different conditions with reference to protein buffers (Hepes or citrate) and precipitants (PEG/IPR and ammonium sulphate). A selenomethionine substituted protein was also prepared and two different data collection temperatures (20C° and -165C°) were used.
The effect of different crystallization conditions are generally undetectable among the structures except for a ten residue loop between beta5 and alpha4, which changes conformation. The loop protudes from the molecule. The high temperature factors found at its tips suggest that this is a flexible region. It contains two turns, when crystals grow in PEG and one smooth turn, intramolecularly stabilized, when crystals grow in ammonium sulphate. It is interesting to note that about the same region (12 residue instead of 10) was found disordered in the structure of core domain of HIV-IN .
A comparison with 3D structure of HIV IN catalytic core reveals a similar general topology, with previously reported analogy with RNase H and RuvC.
Fig. 5 is a schematic diagram of the ASV IN catalytic core fragment topology.

Here is a picture of ASV IN (27Kb)


The ASV IN catalytic core is a dimer, with an interface located between alpha1 from a monomer and alpha5 from the other, with mainly hydrophobic interactions. Protruding from the space between helices, residues from the beta3 strand also participate. The general shape of the interface is a cavity filled by some positively charged residues and a few water molecules . Several other polar interactions between charged residues contribute to stabilize the dimer.

This picture gives just a rought idea of the catalytic core (which is a dimer).
Water molecules play a key role both in catalytic core and in stabilization of folding, together with some conserved residues. The necessity for a high conserved S85 residue (S81 in HIV-IN) has been correlated with a structural effect in folding stabilization.
The triad D64D121E157 is in such position as to form an active site. The side chains interact with each other by three water molecules, one of which (Wat324) appears to be particularly important and could be replaced by a divalent cation.
Look here at the catalytic core (30Kb).
The comparison betweeen the two 3D available structures of integrases reveals a similar architecture, but there are considerable deviations. Looking at the dimers, the corresponding elements in each monomer can be recognized, but shifted up 6A° , with a reported relative 13.3° rotation and 4.8 A° translation.
Furthermore there are large differences in the orientations of the key residues in the active site that are difficult to understand.
The extension of dimerization is less in ASV IN core fragment, 766 square Angstroms against 1395 square Angstroms in HIV-1 IN and is compatible with the weaker association observed for ASV IN with respect to HIV IN.
Very recently the structure of ASV IN catalytic domain crystallized in presence of divalent cations Mg2+ and Mn2+ has been solved (Bujacz G. et al., 96).
Here you can see a picture of the triad and the metal ion bound to asp64 and asp121.

Here you can download the respective coordinates.
1VSD (90Kb)
1VSF (87Kb)

Therapeutical implications

Integration of viral DNA is a fundamental step in life cycle of retroviruses, necessary for the production of progeny viruses.
Mutation in the 3' proximal portion of the gene of HIV, in the region coding for IN, produce mutants identical to the wild type except for integration. Proviral DNA is not formed and the infection slowly disappears(Goff S.P.,92). However the absolute requirement of integration for productive HIV-1 infection has not yet been unequivocally established. On the contrary, examples are reported about virus infection without integration.
In any case, integration is a target for designing drugs against retroviruses, mainly with reference to AIDS.
A great research effort is going on to identify inhibitors of different steps of integration that can be used at therapeutical level.

Some substances that show inhibition against the catalytic activity of IN, including caffeic acid phenylethyl ester (CAPE) (Fesen M.R. et al., 93), topoisomerase inhibitors and bis-catechols have been reported. Often they are not selective for IN.
Recently a new class of compounds derived from caffeic acid has been tested , obtaining encouraging results. These compounds are two dicaffeoylquinic acids (DCQAs) extracted from medicinal plants (Achyrocline satureioides), together with a synthetic compound, L-chicoric acid (Robinson W.E. et al., 96). They are reported to be inhibitors of HIV-1 in biochemical assays and also in in vitro assays of disintegration, the latter evidence suggesting an interaction with the catalytic core domain of HIV IN. The effect on host DNA is unknown but their selectivity in tissue culture is promising.
Besides, a Fab fragment (Fab 35) of a specific monoclonal antibody (MAb35) has been characterized (Barsov E.V. et al., 96). It is able to bind to an epitope on the C-terminal domain of HIV-1 IN and block every activity. It has been suggested that it interacts with the multimerization of IN.

PPS96 IndexPPS96ContentList of ContentsRetrovirusesRetrovirusesAbout phylogenyAbout PhylogenyReferencesReferences

Last updated 25th Oct '96