Comparison of NMR and X-ray as methods of protein structure determination

The determination of macromolecule structure is a recent part in natural sciences. A number of macromolecule types are the target of structure determination, however this work is focused on proteins, which is the most important in this field. Recent developments of technical circumstances gave rise to two competing methods of protein structure determination. ( Not mentioning neutron and electron diffraction. ) The two methods are based on completely different properties of these macromolecules and uses different calculation methods when determining a structure, so NMR and X-ray are not only competing but controlling each other.

At present neither of the above two methods can be used to determine the amino acid sequence. Proteins, as target of NMR and X-ray must have a known primary structure. The development of sequential assignment in NMR and the increase of resolution of electron density maps in X-ray may partially make possible amino acid sequence determination.

In the next two paragraphs first the NMR, than the X-ray measurement methods will be discussed focusing on the comparison of the two strategies. The methods are analysed in the same framework providing a good chance for comparison. Details not concerning protein structure determination are excluded. In section 4 outlines are drawn up for the direct comparison of NMR and X-ray.

2 Principles of protein structure determination by NMR

2.1 Theoretical considerations

The NMR measurement is based on a property displayed by certain atoms in the presence of static magnetic field. Nuclei having an overall spin angular momentum of 1/2 have two spin states of different energy level generated by the magnetic field. The spins of the lower energy level may be excited by a second, radiofrequential field and the excited radiofrequential ray of the return to the ground level can be observed and measured.

The application of NMR in determining molecule structure derives from the phenomenon that the separation of energy levels depends not only on the type of the nucleus, and the magnitude of the magnetic field, but also on the overall chemical surroundings of the nucleus. This last effect is called the chemical shielding and has the consequence, that nuclei with different chemical surroundings are distinguishable. Neighbouring nuclei have a further effect on each other called coupling, that does not modify the chemical shift, but results in the fine structure of the peak. Measurable coupling can occur through 1, 2 or 3 bonds. A further phenomenon is the through space effect that can be measured up to a distance of ~5 Angstrom between two protons. The different modern measurement methods of NMR try to ask questions about different details of the chemical surroundings and couplings of the nuclei.

The nuclei targeted by NMR measurements are those, that have an odd number as a sum of protons and neutrons; such as H-1, C-13, N-15, F-19, P-31. The most investigated nucleus is the proton (H-1), because it is the only NMR active natural isotope in proteins with large frequency (99%), however, proteins may be enriched by the C-13 or/and N-15 isotopes, to help solving the structure.

In general, proteins having 50-250 amino acids can be investigated by NMR measurements. Smaller ones usually do not have a single conformer ( except for small disulphide rich proteins ), while larger ones give very low resolution NMR spectra (caused by relaxation).

2.2 Protein preparation for NMR measurement

Proteins for NMR measurements can be gained via isolation and purification from natural sources or can be overproduced by expression systems. In the latter case it is possible to generate isotopically labelled proteins.

The proteins prepared must have a >95% chemical purity. The amount needed for NMR measurements is of several milligrams in a solution of at least approximately 1 mM. This concentration of protein is higher than that of natural circumstances. The high concentration may lead in several cases to folding artefacts or unwanted aggregation (e.g. aggregation of haemoglobin).

The measurement requires a fine tuning of temperature, pH, ionic strength and solvent because these factors influence the resolution of gained spectra, but affect also protein stability as well.

2.3 Requirements for NMR instruments

The development of NMR instruments is very rapid. The protein structure determination became possible several years ago with the appearance of high field magnets in NMR ( >500MHz ). Higher magnitude of static magnetic field, results in higher separation of energy levels of nuclei spin states, and higher separation of peaks in the NMR spectrum. This rapid development is further aided by the revolution of computational background of NMR instruments and NMR labs.

2.4 Experimental techniques

The transition of spins to the higher energy level can be generated by a radio frequency field that has the same energy than that between the spin states (resonance). The energy emitted when the spins revert to ground state is measured, and represented by peaks in the NMR spectrum.

The original method of measurement was the continuos wave (CW) method when the frequency range was irradiated step by step and in each step the emission was observed. Nowadays, in FT-NMR, the whole frequency range is irradiated at once and the resulting free induction decay (FID) is analysed by Fourier transformation.

2.4.1 One dimension measurement

The spectrum of a one dimension NMR experiment is the distribution of chemical shifts of a nucleus type. Each nucleus has a peak at a specific chemical shift controlled by its overall chemical surrounding, however the peaks have also a fine structure determined by the coupling with other nuclei.

In H-1 NMR of proteins each type of proton ( NH, alphaH, betaH, aromaticH, OH, etc. ) are positioned in a well defined area of the spectrum. Unfortunately these areas are overlapping each other. The identification of most of the peaks (assignment) is usually not possible in a one dimensional experiment of proteins, so other measurement types are needed However, some peaks positioned at the extreme end of the spectrum or at an area poor in other peaks might be assigned.

2.4.2 COSY, Relayed-COSY, DQF-COSY, TOCSY

The second dimension is introduced in order gain information on a new dimension of dispersion about the surrounding of a proton.

In COSY, Relayed-COSY, DQF-COSY and TOCSY experiments different details of the through bond couplings are displayed. In these experiments the coupled system of hydrogens for each amino acid is plotted. However, certain amino acids are indistinguishable having the same coupling pattern and chemical shift values. No measurable through bond coupling is present between hydrogens of neighbouring amino acids because they are never closer than four bonds.

2.4.3 NOESY, ROESY

The other two, two dimensional experiments, NOESY and ROESY give the most useful data about 3D structure of proteins. In these experiments the through space couplings are evolved, so protons closer than 5 Angstrom give measurable coupling independently from their sequential position. Unfortunately, the NOE peaks and the distances of the corresponding two protons are in an ambiguous relationship, therefore only the upper limit of proton-proton distances can be determined.

Definite sequentially neighbouring protons are close enough to give NOE coupling, what makes possible their use in sequential assignment of the protein.

2.4.4 C13 and N15 labelled proteins

The structure determination of larger proteins ( <100 amino acid ) usually requires isotope labelling to gain coupling information between different nuclei. Coupling between C-13 and H-1 or N-15 and H-1 introduces a new dimension for a new distribution of peaks depending on the chemical shifts of C-13 or N-15 nuclei.

2.5 Constructing the structure from experimental data

2.5.1 Data gained from the measurement

The following data of protein 3D structure is available from NMR:

- hints for phi and chi1 torsional angles (Karplus equation)

- some of the proton

- proton distances shorter than 5 Angstrom

- protons with extreme electronic properties

- spin systems of the amino acids

- exchange data

- relaxation parameters

2.5.2 Aspects of assignment

The assignment is the rate limiting step for NMR. If one has collected a set of one-, two-, perhaps three-dimensional spectra, he or she can start the identification of the peaks. This includes the determination of chemical shift of all protons ( and/or N-15, C-13 ) in the protein. There are several methods ( e.g. sequential assignment ) but still there is no fully computerised algorithm for the complete automated assignment. If the assignment is successful, the most difficult part of the job is performed.

2.5.3 Calculation of 3D structure

Having finished the assignment, one has a set of proton - proton distance limits (restraints). This set of distance limits and the amino acid sequence are the input data of computer programs, that build the protein model(s). During the calculation of structures considerations of theoretical chemistry and applied molecular modelling are also taken into account. Because the gained protein structure itself is a result of a computational method built up from measurement data, several models are calculated, that agree with the experiment.

The final calculation contains an R factor for the co-ordinates of each atom. This R factor corresponds to the relative motions occurred during molecular dynamics (see more at flexibility).

2.6 Discussion of the resulting protein structure

2.6.1 Confidence of the structure

The confidence of NMR structures is relies mainly on two components; the experimental and the computational part. The crucial aim in NMR experiments is to have as many NOE peaks as possible. Among the NOE peaks, the long range peaks, ( peaks where the two interacting protons are sequentially at least five residues apart ) are the most important. These peaks form the basis of folding determination. A number of 12-14 NOE peaks per amino, acid are usually needed to have a reliable structure. Less peaks may result in several different calculated structures or poorly defined regions of the protein fold.

It is common that secondary structure elements are better defined by an NMR experiment, than loops connecting them. In fact alpha helices and beta sheets may be defined prior the 3d structure calculation. This phenomenon is due to the well defined structural pattern of proton-proton arrangement of periodical structural elements. Loops exposed to solvent or having looser conformation are much less defined by NMR experiments.

Calculation of protein structure is becoming cheaper and cheaper. Consecutive application of simulated annealing and molecular dynamics until the best possible R values is faster and faster, that improves the quality of NMR structures. However the possibility of misfolding and finding local energy minima is still a problem.

2.6.2 Flexibility of the structure

In liquid state protein structures are more or less flexible. The flexibility results in the motion of the protein segments compared to each other. These motions, together with the torsional rotations of amino acid side chains are below the time scale of an NMR experiment, so the measured values always represent an average of motions of protein structure. In case of large flexibility of loops, however NOE peaks may be completely lost by averaging of different states. This is usually the case at both ends of amino acid chains, co-ordinates of chain terminals are seldom defined. High flexibility is one of the reasons that cause larger R factors at certain regions. It is to be noted that large R factors may be also caused by low number of NOE peaks at that region.

3 Principles of protein structure determination by X-ray

3.1 Theoretical Considerations

When photons of an X-ray beam collide with electrons surrounding the atoms of a molecule, they are catched and a new photon with the same energy but different direction is emitted. This is known as scattering and leads to the appearance of detectable photons in other directions than that of irradiation. Because in proteins the interaction of photons and electrons occur with a low cross-section ( most of the photons go straight through ) detectable scattering is observed only if many identical proteins are present, and to interpret the scattering, the proteins also have to be arranged in a regular array. This is why crystals are needed for an X-ray measurement.

Of course, photons of other wavelength ( e.g. visible light ) also interact with electrons, but the resolution of the detectable (computable) image of the object is in the same order as the magnitude of the wavelength. For atomic resolution a wavelength of 1-3 Angstrom is needed, and that is X-ray.

Unfortunately images of molecules cannot be produced optically because the scattered beam cannot be refocused. ( No lens has a refractive index for X-ray different from one. ) The only possibility is to record the diffraction image that contains in each point parallel beams scattered by the protein, influenced by interference of the differently phased rays ( Bragg`s law ). Photons are detected in the diffraction plane at directions, where the interference of the scattered beams amplify each other and no photons are detected at those directions, where the beams extinguish or just do not amplify each other. The central question of calculation of the image arises from loosing the phases of individual X-ray waves when recording the diffraction pattern. The experiment methods worked out for X-ray are procedures to determine the lost phases.

The main restriction for proteins to investigate them in an X-ray experiment is still rather practical: the ability to form crystals of sufficient size. If a 0.1-0.5 mm large crystal is available there is no theoretical restriction to determine the 3D structure.

3.2 Protein preparation for X-ray measurement

Proteins for X-ray measurements can be gained by isolation and purification from natural sources or can be overproduced in expression systems.

Proteins must be crystallised for a protein crystallographic measurement. This procedure determines material demand, concentration, pH, purity and solvent. There is no general rule or prediction methods for conditions of crystallisation of proteins. Crystallisation is a trial of perhaps hundreds of conditions with only ~75% success. Two to twenty milligrams of 97% pure protein in a solution of 2-20 mg/ml is usually enough to start hanging drop crystallisation attempts. Still, in 20-30% of the cases crystallisation remains unsuccessful, especially crystallisation of membrane proteins what is hindered by solubility restrictions in almost all cases.

3.3 Requirements for X-ray instruments

Instruments used to generate X-ray beams have been applied for a long time. The limitations for using X-rays for protein structure determination arises from the interpretation of diffraction patterns.

The experiment requires a monochromatic, parallel beam of X-rays bombarding the object. This is usually obtained by a monochromatic X-ray source and is led through a 10-15 cm long, ~0.5 mm diameter tube. ( However, much more intensive X-ray beams are available at synchrotrons. ) The scattered photons are detected on an about 20 cm x 20 cm sized detector in the diffraction plane. The detector should be further apart for large proteins and closer for smaller ones.

3.4 Experimental techniques

The crystals of proteins are very sensitive, they must be in the atmosphere of their solution and must be cooled during the measurement. To have diffraction patterns of scattering from different directions of irradiation, the sample must be rotated.

3.4.1 Isomorphous replacement

In isomorphous replacement a second diffraction pattern is recorded with a co-ordinated heavy atom to the object molecule. The large electron density of the heavy atom influences the intensity of points of the diffraction image. This way the lost phases can be determined and the image calculated in those cases where the position of the heavy atom in the crystal is known.

3.4.2 Double isomorphous replacement

In case of proteins the exact co-ordination of the heavy atoms is not known so the introduction of two different heavy atoms becomes necessary that bind to different sites on the protein. This procedure can lead to determination of the positions of heavy atoms. The different effect of the heavy atoms on the diffraction pattern facilitates the phase determination. ( Several used heavy atoms and their co-ordination: mercury(cystein), platinum(histidin) )

3.4.3 Anomalous scattering

If the energy of the X-ray photon penetrating into the crystal is slightly larger than that needed to excite electrons of the heavy atom S shell into the K or L shell, then while reverting to ground state the electron emits an X-ray photon with lower energy then that of the incident ray. This is anomalous scattering. The positive/negative contribution of anomalous scattering measured from opposite sides of the crystal makes the phase determination with the introduction of one single heavy atom possible.

3.4.4 Homologous proteins

Structures of homologous proteins can be used for phase determination in case of at least 30% sequence identity. This degree of sequence identity is accompanied by a structure homology that is suitable to be the starting structure for the current protein.

3.5 Constructing the structure from experimental data

3.5.1 Data gained from the measurement

The following data of protein 3D structure are available for X-ray:

- electron density of the atom chains ( no hydrogens are detectable for X-ray, because hydrogens are very poor of electrons )

- crystal packing of the proteins

3.5.2 Aspects of data on the diffraction plane

The diffraction pattern is the consequence of the Brags law which gives the directions of scattered photons where interference is additive. This depends on size of the crystal-lattice, reflection angle and wavelength. Each point on the diffraction plane contains scattered X-rays from all of the atoms, the creation of the image from this pattern is possible with Fourier transformation. If the determined phases of the diffraction pattern are correct the result of the Fourier transformation is an electron density map having large electron density at the atom chains of the protein.

3.5.3 Calculation of 3D structure

The creation of 3D structure is nothing else then fitting the amino acid chain into the channel of large electron density. This is not as easy as it seems to be. The electron density map is usually not a continuos channel of amino acid chain with knobs for side chains, but broken clouds at several places, affected by phase determination errors. The difficulty of this procedure is reduced by the increase of resolution.

The first step of the fitting is to find the sequential position of the amino acid chain in the electron density channel. Neighbouring large amino acid residues have large shapes, they can be initial points to start positioning and going on in both directions of the sequence all of the residues find their place. The second step is refining the structure until the backbone, the side chains and the water molecules arrive correctly into he middle of the electron density cloud.

3.6 Discussion of the resulting protein structure

3.6.1 Confidence of the structure

The basic question about reliability of protein structures determined by X-ray spectroscopy is the effect of crystal-lattice to the proteins. The conditions of crystallisation and the crystal itself are radically different than that of natural occurrences. However, it seems to be true that in most cases there are no major changes of protein structure.

The packing of a protein crystal is very loose, in the channels between the proteins there are a large number of solvent molecules (mainly water), creating similar surrounding as in solution. These water molecules have no fixed position, the uni-direction movement of intra-crystal water allows to seep heavy atoms into the crystal and to induce chemical reactions as well such as the co-ordination of heavy atoms or specific reactions in the substrate binding pocket.

The technical aspect of structure confidence is the resolution. In low resolution images ( ~5 Angstrom ) the determination of alpha helices is the maximum that can be reached. Middle resolution images ( ~3 Angstrom ) can differentiate between small and large side chains. High resolution images ( >2.5 Angstrom ) are suitable to give proper conformation and torsional angles.

3.6.2 Flexibility of the structure

The crystal-lattice gives much less freedom for internal movement of protein domains or loops. The somewhat flexible 3D folding is stucked into a well defined conformation in the crystal. This conformation usually, but not necessarily characterises the average conformation of the folding.

The R-factors given by X-ray measurements have a very loose connection with the flexibility of that particular part of the protein, it informs about the confidence of the determined atom position. ( It is created by the calculation of diffraction amplitude from a hypothetical crystal of the model. ) Large R-values are caused among others by larger motions in the crystal, which occurs partly from the fewer contacts with other parts of the protein ( this phenomenon is in connection with the flexibility in solution ), thermal vibration, and loose packing in the crystal.

4 Comparison of the two methods

The above description of NMR and X-ray focuses on the differences between them. Though the procedures are different their systematic description in the same framework of protein structure determination highlights the relation between the two methods. In the next points the relationship of the main elements of structure determination methods are highlighted.

4.1 Basic considerations

The two methods of structure determination are based on completely different properties of proteins. An NMR structure is calculated from magnetic properties of several nuclei while an X-ray structure is derived from electron density of non hydrogen atoms.

The calculation of an NMR structure is an indirect method. Some atom-atom distances are approximately gained but the overall structure is calculated by the computer using a procedure parametrized empirically for proteins. This is the reason why several structures are determined by NMR instead of a single 3D structure.

The evaluation of an X-ray structure is much more a direct method. The electron densities depend on the position of atoms, so all non hydrogen atoms can be localised in space. This method allows to derive one single structure.

4.2 Dealing with the protein

For both methods, proteins can be prepared from any source, natural, synthetic or overexpressed, but an amount of several milligrams must be gained. The preparation of proteins for the experiment is similar in the first step, that is the protein must be brought into a far more concentrated state than in natural circumstances. This fact is a limitation for both methods. However the rate of concentrating the samples differs, for NMR it is less than for X-ray. Adjusting the pH, concentration, temperature, different ions and other molecules are tools that are in relation with protein stability and necessary measurement conditions too.

4.3 R factors

Both technologies determine R-factors for the co-ordinates of atoms in the structure. Taking into account, that the basis of NMR and X-ray experiments are different it is not surprising that R-factors given by the two methods are not the same. R-factor of an NMR experiment is the result of modelled flexibility constants while in X-ray it is rather the confidence of position determination in the crystal. However it is also true that both R-factors are more or less related to the flexibility of regions of proteins.

4.4 Confidence of structure

Structures of both methods have a high confidence, the determination of secondary structure elements, their relation and loops playing role in catalytic activity is reassuring. The possibility of some catalytic reactions in the circumstances of measurement reinforces the confidence of 3D structures. Regions, weakly defined by NMR or possibly affected by crystal packing, are determined less confidently. These are usually longer surface loops, chain terminals or domain interconnecting loops that have flexible conformation in natural circumstances as well. Pockets with catalytic activity and the framework of secondary structures that is responsible to fix the pocket are usually stabile enough not to be affected by the changes of circumstances.

The nature of the two methods results in the fact that NMR structures are never so concrete as X-ray ones, they allow larger freedom for motions of loops and terminals. It is probable that this freedom is related to dynamism of these regions in solution, however the modelling character of molecular dynamic calculations reminds to carefully handle such comparison.

5 References

Croasmun - Carlson: Two Dimensional NMR Spectroscopy

G. C. K. Roberts: NMR of Macromolecules

M. Perutz: Protein Structure

C. Branden, J. Tooze: Introduction to Protein Structure