Comparison of NMR and X-ray as methods of protein structure determination
written by Péter Hudáky (PP136) for PPS
Contents
1 Introduction
The determination of macromolecule structure is a recent part in natural
sciences. A number of macromolecule types are the target of structure determination,
however this work is focused on proteins, which is the most important in
this field. Recent developments of technical circumstances gave rise to
two competing methods of protein structure determination. ( Not mentioning
neutron and electron diffraction. ) The two methods are based on completely
different properties of these macromolecules and uses different calculation
methods when determining a structure, so NMR and X-ray are not only competing
but controlling each other.
At present neither of the above two methods can be used to determine
the amino acid sequence. Proteins, as target of NMR and X-ray must have
a known primary structure. The development of sequential assignment in
NMR and the increase of resolution of electron density maps in X-ray may
partially make possible amino acid sequence determination.
In the next two paragraphs first the NMR, than the X-ray measurement
methods will be discussed focusing on the comparison of the two strategies.
The methods are analysed in the same framework providing a good chance
for comparison. Details not concerning protein structure determination
are excluded. In section 4 outlines are drawn up for the direct comparison
of NMR and X-ray.
2 Principles of protein structure determination by NMR
2.1 Theoretical considerations
The NMR measurement is based on a property displayed by certain atoms in
the presence of static magnetic field. Nuclei having an overall spin angular
momentum of 1/2 have two spin states of different energy level generated
by the magnetic field. The spins of the lower energy level may be excited
by a second, radiofrequential field and the excited radiofrequential ray
of the return to the ground level can be observed and measured.
The application of NMR in determining molecule structure derives from
the phenomenon that the separation of energy levels depends not only on
the type of the nucleus, and the magnitude of the magnetic field, but also
on the overall chemical surroundings of the nucleus. This last effect is
called the chemical shielding and has the consequence, that nuclei with
different chemical surroundings are distinguishable. Neighbouring nuclei
have a further effect on each other called coupling, that does not modify
the chemical shift, but results in the fine structure of the peak. Measurable
coupling can occur through 1, 2 or 3 bonds. A further phenomenon is the
through space effect that can be measured up to a distance of ~5 Angstrom
between two protons. The different modern measurement methods of NMR try
to ask questions about different details of the chemical surroundings and
couplings of the nuclei.
The nuclei targeted by NMR measurements are those, that have an odd
number as a sum of protons and neutrons; such as H-1, C-13, N-15, F-19,
P-31. The most investigated nucleus is the proton (H-1), because it is
the only NMR active natural isotope in proteins with large frequency (99%),
however, proteins may be enriched by the C-13 or/and N-15 isotopes, to
help solving the structure.
In general, proteins having 50-250 amino acids can be investigated by
NMR measurements. Smaller ones usually do not have a single conformer (
except for small disulphide rich proteins ), while larger ones give very
low resolution NMR spectra (caused by relaxation).
2.2 Protein preparation for NMR measurement
Proteins for NMR measurements can be gained via isolation and purification
from natural sources or can be overproduced by expression systems. In the
latter case it is possible to generate isotopically labelled proteins.
The proteins prepared must have a >95% chemical purity. The amount needed
for NMR measurements is of several milligrams in a solution of at least
approximately 1 mM. This concentration of protein is higher than that of
natural circumstances. The high concentration may lead in several cases
to folding artefacts or unwanted aggregation (e.g. aggregation of haemoglobin).
The measurement requires a fine tuning of temperature, pH, ionic strength
and solvent because these factors influence the resolution of gained spectra,
but affect also protein stability as well.
2.3 Requirements for NMR instruments
The development of NMR instruments is very rapid. The protein structure
determination became possible several years ago with the appearance of
high field magnets in NMR ( >500MHz ). Higher magnitude of static magnetic
field, results in higher separation of energy levels of nuclei spin states,
and higher separation of peaks in the NMR spectrum. This rapid development
is further aided by the revolution of computational background of NMR instruments
and NMR labs.
2.4 Experimental techniques
The transition of spins to the higher energy level can be generated by
a radio frequency field that has the same energy than that between the
spin states (resonance). The energy emitted when the spins revert to ground
state is measured, and represented by peaks in the NMR spectrum.
The original method of measurement was the continuos wave (CW) method
when the frequency range was irradiated step by step and in each step the
emission was observed. Nowadays, in FT-NMR, the whole frequency range is
irradiated at once and the resulting free induction decay (FID) is analysed
by Fourier transformation.
2.4.1 One dimension measurement
The spectrum of a one dimension NMR experiment is the distribution of chemical
shifts of a nucleus type. Each nucleus has a peak at a specific chemical
shift controlled by its overall chemical surrounding, however the peaks
have also a fine structure determined by the coupling with other nuclei.
In H-1 NMR of proteins each type of proton ( NH, alphaH, betaH, aromaticH,
OH, etc. ) are positioned in a well defined area of the spectrum. Unfortunately
these areas are overlapping each other. The identification of most of the
peaks (assignment) is usually not possible in a one dimensional experiment
of proteins, so other measurement types are needed However, some peaks
positioned at the extreme end of the spectrum or at an area poor in other
peaks might be assigned.
2.4.2 COSY, Relayed-COSY, DQF-COSY, TOCSY
The second dimension is introduced in order gain information on a new dimension
of dispersion about the surrounding of a proton.
In COSY, Relayed-COSY, DQF-COSY and TOCSY experiments different details
of the through bond couplings are displayed. In these experiments the coupled
system of hydrogens for each amino acid is plotted. However, certain amino
acids are indistinguishable having the same coupling pattern and chemical
shift values. No measurable through bond coupling is present between hydrogens
of neighbouring amino acids because they are never closer than four bonds.
2.4.3 NOESY, ROESY
The other two, two dimensional experiments, NOESY and ROESY give the most
useful data about 3D structure of proteins. In these experiments the through
space couplings are evolved, so protons closer than 5 Angstrom give measurable
coupling independently from their sequential position. Unfortunately, the
NOE peaks and the distances of the corresponding two protons are in an
ambiguous relationship, therefore only the upper limit of proton-proton
distances can be determined.
Definite sequentially neighbouring protons are close enough to give
NOE coupling, what makes possible their use in sequential assignment of
the protein.
2.4.4 C13 and N15 labelled proteins
The structure determination of larger proteins ( <100 amino acid ) usually
requires isotope labelling to gain coupling information between different
nuclei. Coupling between C-13 and H-1 or N-15 and H-1 introduces a new
dimension for a new distribution of peaks depending on the chemical shifts
of C-13 or N-15 nuclei.
2.5 Constructing the structure from experimental data
2.5.1 Data gained from the measurement
The following data of protein 3D structure is available from NMR:
- hints for phi and chi1 torsional angles (Karplus equation)
- some of the proton
- proton distances shorter than 5 Angstrom
- protons with extreme electronic properties
- spin systems of the amino acids
- exchange data
- relaxation parameters
2.5.2 Aspects of assignment
The assignment is the rate limiting step for NMR. If one has collected
a set of one-, two-, perhaps three-dimensional spectra, he or she can start
the identification of the peaks. This includes the determination of chemical
shift of all protons ( and/or N-15, C-13 ) in the protein. There are several
methods ( e.g. sequential assignment ) but still there is no fully computerised
algorithm for the complete automated assignment. If the assignment is successful,
the most difficult part of the job is performed.
2.5.3 Calculation of 3D structure
Having finished the assignment, one has a set of proton - proton distance
limits (restraints). This set of distance limits and the amino acid sequence
are the input data of computer programs, that build the protein model(s).
During the calculation of structures considerations of theoretical chemistry
and applied molecular modelling are also taken into account. Because the
gained protein structure itself is a result of a computational method built
up from measurement data, several models are calculated, that agree with
the experiment.
The final calculation contains an R factor for the co-ordinates of each
atom. This R factor corresponds to the relative motions occurred during
molecular dynamics (see more at flexibility).
2.6 Discussion of the resulting protein structure
2.6.1 Confidence of the structure
The confidence of NMR structures is relies mainly on two components; the
experimental and the computational part. The crucial aim in NMR experiments
is to have as many NOE peaks as possible. Among the NOE peaks, the long
range peaks, ( peaks where the two interacting protons are sequentially
at least five residues apart ) are the most important. These peaks form
the basis of folding determination. A number of 12-14 NOE peaks per amino,
acid are usually needed to have a reliable structure. Less peaks may result
in several different calculated structures or poorly defined regions of
the protein fold.
It is common that secondary structure elements are better defined by
an NMR experiment, than loops connecting them. In fact alpha helices and
beta sheets may be defined prior the 3d structure calculation. This phenomenon
is due to the well defined structural pattern of proton-proton arrangement
of periodical structural elements. Loops exposed to solvent or having looser
conformation are much less defined by NMR experiments.
Calculation of protein structure is becoming cheaper and cheaper. Consecutive
application of simulated annealing and molecular dynamics until the best
possible R values is faster and faster, that improves the quality of NMR
structures. However the possibility of misfolding and finding local energy
minima is still a problem.
2.6.2 Flexibility of the structure
In liquid state protein structures are more or less flexible. The flexibility
results in the motion of the protein segments compared to each other. These
motions, together with the torsional rotations of amino acid side chains
are below the time scale of an NMR experiment, so the measured values always
represent an average of motions of protein structure. In case of large
flexibility of loops, however NOE peaks may be completely lost by averaging
of different states. This is usually the case at both ends of amino acid
chains, co-ordinates of chain terminals are seldom defined. High flexibility
is one of the reasons that cause larger R factors at certain regions. It
is to be noted that large R factors may be also caused by low number of
NOE peaks at that region.
3 Principles of protein structure determination by X-ray
3.1 Theoretical Considerations
When photons of an X-ray beam collide with electrons surrounding the atoms
of a molecule, they are catched and a new photon with the same energy but
different direction is emitted. This is known as scattering and leads to
the appearance of detectable photons in other directions than that of irradiation.
Because in proteins the interaction of photons and electrons occur with
a low cross-section ( most of the photons go straight through ) detectable
scattering is observed only if many identical proteins are present, and
to interpret the scattering, the proteins also have to be arranged in a
regular array. This is why crystals are needed for an X-ray measurement.
Of course, photons of other wavelength ( e.g. visible light ) also interact
with electrons, but the resolution of the detectable (computable) image
of the object is in the same order as the magnitude of the wavelength.
For atomic resolution a wavelength of 1-3 Angstrom is needed, and that
is X-ray.
Unfortunately images of molecules cannot be produced optically because
the scattered beam cannot be refocused. ( No lens has a refractive index
for X-ray different from one. ) The only possibility is to record the diffraction
image that contains in each point parallel beams scattered by the protein,
influenced by interference of the differently phased rays ( Bragg`s law
). Photons are detected in the diffraction plane at directions, where the
interference of the scattered beams amplify each other and no photons are
detected at those directions, where the beams extinguish or just do not
amplify each other. The central question of calculation of the image arises
from loosing the phases of individual X-ray waves when recording the diffraction
pattern. The experiment methods worked out for X-ray are procedures to
determine the lost phases.
The main restriction for proteins to investigate them in an X-ray experiment
is still rather practical: the ability to form crystals of sufficient size.
If a 0.1-0.5 mm large crystal is available there is no theoretical restriction
to determine the 3D structure.
3.2 Protein preparation for X-ray measurement
Proteins for X-ray measurements can be gained by isolation and purification
from natural sources or can be overproduced in expression systems.
Proteins must be crystallised for a protein crystallographic measurement.
This procedure determines material demand, concentration, pH, purity and
solvent. There is no general rule or prediction methods for conditions
of crystallisation of proteins. Crystallisation is a trial of perhaps hundreds
of conditions with only ~75% success. Two to twenty milligrams of 97% pure
protein in a solution of 2-20 mg/ml is usually enough to start hanging
drop crystallisation attempts. Still, in 20-30% of the cases crystallisation
remains unsuccessful, especially crystallisation of membrane proteins what
is hindered by solubility restrictions in almost all cases.
3.3 Requirements for X-ray instruments
Instruments used to generate X-ray beams have been applied for a long time.
The limitations for using X-rays for protein structure determination arises
from the interpretation of diffraction patterns.
The experiment requires a monochromatic, parallel beam of X-rays bombarding
the object. This is usually obtained by a monochromatic X-ray source and
is led through a 10-15 cm long, ~0.5 mm diameter tube. ( However, much
more intensive X-ray beams are available at synchrotrons. ) The scattered
photons are detected on an about 20 cm x 20 cm sized detector in the diffraction
plane. The detector should be further apart for large proteins and closer
for smaller ones.
3.4 Experimental techniques
The crystals of proteins are very sensitive, they must be in the atmosphere
of their solution and must be cooled during the measurement. To have diffraction
patterns of scattering from different directions of irradiation, the sample
must be rotated.
3.4.1 Isomorphous replacement
In isomorphous replacement a second diffraction pattern is recorded with
a co-ordinated heavy atom to the object molecule. The large electron density
of the heavy atom influences the intensity of points of the diffraction
image. This way the lost phases can be determined and the image calculated
in those cases where the position of the heavy atom in the crystal is known.
3.4.2 Double isomorphous replacement
In case of proteins the exact co-ordination of the heavy atoms is not known
so the introduction of two different heavy atoms becomes necessary that
bind to different sites on the protein. This procedure can lead to determination
of the positions of heavy atoms. The different effect of the heavy atoms
on the diffraction pattern facilitates the phase determination. ( Several
used heavy atoms and their co-ordination: mercury(cystein), platinum(histidin)
)
3.4.3 Anomalous scattering
If the energy of the X-ray photon penetrating into the crystal is slightly
larger than that needed to excite electrons of the heavy atom S shell into
the K or L shell, then while reverting to ground state the electron emits
an X-ray photon with lower energy then that of the incident ray. This is
anomalous scattering. The positive/negative contribution of anomalous scattering
measured from opposite sides of the crystal makes the phase determination
with the introduction of one single heavy atom possible.
3.4.4 Homologous proteins
Structures of homologous proteins can be used for phase determination in
case of at least 30% sequence identity. This degree of sequence identity
is accompanied by a structure homology that is suitable to be the starting
structure for the current protein.
3.5 Constructing the structure from experimental data
3.5.1 Data gained from the measurement
The following data of protein 3D structure are available for X-ray:
- electron density of the atom chains ( no hydrogens are detectable for
X-ray, because hydrogens are very poor of electrons )
- crystal packing of the proteins
3.5.2 Aspects of data on the diffraction plane
The diffraction pattern is the consequence of the Brags law which gives
the directions of scattered photons where interference is additive. This
depends on size of the crystal-lattice, reflection angle and wavelength.
Each point on the diffraction plane contains scattered X-rays from all
of the atoms, the creation of the image from this pattern is possible with
Fourier transformation. If the determined phases of the diffraction pattern
are correct the result of the Fourier transformation is an electron density
map having large electron density at the atom chains of the protein.
3.5.3 Calculation of 3D structure
The creation of 3D structure is nothing else then fitting the amino acid
chain into the channel of large electron density. This is not as easy as
it seems to be. The electron density map is usually not a continuos channel
of amino acid chain with knobs for side chains, but broken clouds at several
places, affected by phase determination errors. The difficulty of this
procedure is reduced by the increase of resolution.
The first step of the fitting is to find the sequential position of
the amino acid chain in the electron density channel. Neighbouring large
amino acid residues have large shapes, they can be initial points to start
positioning and going on in both directions of the sequence all of the
residues find their place. The second step is refining the structure until
the backbone, the side chains and the water molecules arrive correctly
into he middle of the electron density cloud.
3.6 Discussion of the resulting protein structure
3.6.1 Confidence of the structure
The basic question about reliability of protein structures determined by
X-ray spectroscopy is the effect of crystal-lattice to the proteins. The
conditions of crystallisation and the crystal itself are radically different
than that of natural occurrences. However, it seems to be true that in
most cases there are no major changes of protein structure.
The packing of a protein crystal is very loose, in the channels between
the proteins there are a large number of solvent molecules (mainly water),
creating similar surrounding as in solution. These water molecules have
no fixed position, the uni-direction movement of intra-crystal water allows
to seep heavy atoms into the crystal and to induce chemical reactions as
well such as the co-ordination of heavy atoms or specific reactions in
the substrate binding pocket.
The technical aspect of structure confidence is the resolution. In low
resolution images ( ~5 Angstrom ) the determination of alpha helices is
the maximum that can be reached. Middle resolution images ( ~3 Angstrom
) can differentiate between small and large side chains. High resolution
images ( >2.5 Angstrom ) are suitable to give proper conformation and torsional
angles.
3.6.2 Flexibility of the structure
The crystal-lattice gives much less freedom for internal movement of protein
domains or loops. The somewhat flexible 3D folding is stucked into a well
defined conformation in the crystal. This conformation usually, but not
necessarily characterises the average conformation of the folding.
The R-factors given by X-ray measurements have a very loose connection
with the flexibility of that particular part of the protein, it informs
about the confidence of the determined atom position. ( It is created by
the calculation of diffraction amplitude from a hypothetical crystal of
the model. ) Large R-values are caused among others by larger motions in
the crystal, which occurs partly from the fewer contacts with other parts
of the protein ( this phenomenon is in connection with the flexibility
in solution ), thermal vibration, and loose packing in the crystal.
4 Comparison of the two methods
The above description of NMR and X-ray focuses on the differences between
them. Though the procedures are different their systematic description
in the same framework of protein structure determination highlights the
relation between the two methods. In the next points the relationship of
the main elements of structure determination methods are highlighted.
4.1 Basic considerations
The two methods of structure determination are based on completely different
properties of proteins. An NMR structure is calculated from magnetic properties
of several nuclei while an X-ray structure is derived from electron density
of non hydrogen atoms.
The calculation of an NMR structure is an indirect method. Some atom-atom
distances are approximately gained but the overall structure is calculated
by the computer using a procedure parametrized empirically for proteins.
This is the reason why several structures are determined by NMR instead
of a single 3D structure.
The evaluation of an X-ray structure is much more a direct method. The
electron densities depend on the position of atoms, so all non hydrogen
atoms can be localised in space. This method allows to derive one single
structure.
4.2 Dealing with the protein
For both methods, proteins can be prepared from any source, natural, synthetic
or overexpressed, but an amount of several milligrams must be gained. The
preparation of proteins for the experiment is similar in the first step,
that is the protein must be brought into a far more concentrated state
than in natural circumstances. This fact is a limitation for both methods.
However the rate of concentrating the samples differs, for NMR it is less
than for X-ray. Adjusting the pH, concentration, temperature, different
ions and other molecules are tools that are in relation with protein stability
and necessary measurement conditions too.
4.3 R factors
Both technologies determine R-factors for the co-ordinates of atoms in
the structure. Taking into account, that the basis of NMR and X-ray experiments
are different it is not surprising that R-factors given by the two methods
are not the same. R-factor of an NMR experiment is the result of modelled
flexibility constants while in X-ray it is rather the confidence of position
determination in the crystal. However it is also true that both R-factors
are more or less related to the flexibility of regions of proteins.
4.4 Confidence of structure
Structures of both methods have a high confidence, the determination of
secondary structure elements, their relation and loops playing role in
catalytic activity is reassuring. The possibility of some catalytic reactions
in the circumstances of measurement reinforces the confidence of 3D structures.
Regions, weakly defined by NMR or possibly affected by crystal packing,
are determined less confidently. These are usually longer surface loops,
chain terminals or domain interconnecting loops that have flexible conformation
in natural circumstances as well. Pockets with catalytic activity and the
framework of secondary structures that is responsible to fix the pocket
are usually stabile enough not to be affected by the changes of circumstances.
The nature of the two methods results in the fact that NMR structures
are never so concrete as X-ray ones, they allow larger freedom for motions
of loops and terminals. It is probable that this freedom is related to
dynamism of these regions in solution, however the modelling character
of molecular dynamic calculations reminds to carefully handle such comparison.
5 References
Croasmun - Carlson: Two Dimensional NMR Spectroscopy
G. C. K. Roberts: NMR of Macromolecules
M. Perutz: Protein Structure
C. Branden, J. Tooze: Introduction to Protein Structure