User Guide - The Input Form

Protein Name

File location

File Format

Input Units

Lowest wavelength datapoint

Wavelength Specifications

Analysis Programmes

Reference sets

Optional Scaling Factor


Protein Name....

The Protein Name element of the form acts as an identifier and becomes the prefix of any input and output files produced from an analysis. It is restricted to 10 alphanumeric characters only.


File Location....

The file location is the path name to the CD data file. This string is checked for errors and if the server cannot locate the file then the analysis will be terminated and an error message generated. It is advisable to use the browse button as it will specify the correct file location automatically.

File extensions are restricted to anything except exe, gif, jpg, doc, ppt or any other unsuitable formats. It is recommended  that files are uploaded as raw text (.txt) files.


File Format.....

The select options in the file format field are derived from the file formats output from different CD spectroscopy machines. Mainly, the formats differ in the size of the header and the column layout of the data.
 

Example file formats can be viewed below:

Aviv 60 DS v4.1* Aviv1.txt header: 25 lines data columns: 2
Aviv CDS AvivC.txt header:  14 lines data columns: 2
Aviv v2.86 Aviv2.txt header:  19 lines data columns: 2
Jasco 1.30 Jasco.txt header: 19 lines data columns: 2
YY YY.txt header: 4 lines data columns: 1 or 5 (reading accross rows)
DRS DRS.txt header: 13 lines data columns: 7
BP (2nd column)** BP2.txt header: 22 lines data columns: 4
BP (4th column)** BP.txt header: 22 lines data columns: 4
* The 60 DS format may be obtained, even in later versions of the software, by choosing the "export to 60 DS format" option in the instrument data browser window, from the "export data set" pulldown.
** It has been reported that the dichroism data can appear in either column 2 or column 4 for the BP format. Please check which format your BP file is in and select BP (data in col. 2) or BP (data in col. 4) accordingly.

If your data exists in some other format please edit it to match one of the above file formats or use the FREE format option which requires two columns, wavelength and CD data respectively. The data may begin with either high or low wavelength. If the format has been incorrectly chosen an error message will be generated stating that the file uploaded was not suitable for analysis.


Input Units....

Circular dichroism can be measured in several ways. Within the literature their are several conflicting measures and definitions. Most of these have been accommodated in the select box, but for clarity, the conversion equations used are detailed below:
 

Delta Epsilons De
The per residue molar absorption units of circular dichroism measured in mdeg M-1cm-1. De is sometimes referred to as molar circular dichroism. Data peaks are usually in the range of 0 - 10

All of the analysis programmes accept these input units except K2D. So if your data is in De then no conversions are required.
 

Mean Residue Ellipticity MRE [q]
Mean residue ellipticity is the most commonly reported unit and is measured in degrees cm2 dmol-1 residue-1 . Data peaks are usually in their 10,000's and the relationship between [q] and De is shown below:

De =  [q] / 3298

Theta Machine Units q
To convert from machine units in millidegrees, to delta epsilons, the following equation is applied. Machine units measure the difference in molar extinction coefficients between left and right handed light, usually between 1 and 100, and need to be corrected to account for the amount of protein used in the sample.

Note: on selection of this option you will be asked to specify the mean residue weight (MRW = protein mean weight (in atomic mass units/daltons) / number of residues) amu for the protein,  path length (P) in cm and protein concentration (CONC) in mg/ml.

De = q   X  ( 0.1 * MRW)
                         ( P * CONC) * 3298

DRS yy units
Often, CD data units are particularly large measurements and in order to acheive accurate data measures after unit conversion, it may be necessary to multiply the machine values. These units are commonly used at Daresbury with the yy file format. The data is usualy in the range 0.001- 0.01.

DRS-yy units are Theta machine units multiplied by a factor of 100. Therefore, the relationship with Delta epsilons is as follows:

   De =  ( q * 100     X ( 0.1 * MRW )
(P * CONC) * 3298

DRS units
These are standard Daresbury units (machine units that have been divided by a factor of 10,000). The relationship with delta epsilons is shown below:

De =    q     X   ( 0.1 * MRW)
10 000 (P * CONC) * 3298

 

Molar Ellipticity (q)m
Molar ellipticity is a little used unit which has the dimensions degrees decilitres mol -1 decimeter-1 . DichroWeb does not accept data in units of (q)m, but such data may be converted to units of DE by using the following formula, where Nr represents the number of amino acids in the protein :

DE = (q)m * Nr / 3298

If you have data in units of (q)m, please convert the values to units of DE and then submit to DichroWeb.


Lowest Wavelength Datapoint.....

Sometimes part of a data set may be collected under conditions which are less than optimal. In these cases, it is desirable to remove the block of unreliable data points from the dataset and avoid trying to use them in any analysis. The "lowest wavelength datapoint" box allows for this without the need to edit the input file which is being submitted to DichroWeb. Just enter the wavelength of the last data point which is of good quality and DichroWeb will ensure that any data below that value cannot be submitted in an analysis. The suspect data is always taken as being the wavelengths below the entered value as the low wavelength data is generally the problematic area of a CD spectrum.

Why would data be unreliable?
With a conventional radiation source (such as a Xenon lamp), the intensity of the emitted signal drops significantly towards the lowest wavelengths in its range. The lower intensities can still be collected and utilised, but in order to compensate for the loss of signal strength, the detector (typically a photomultiplier unit) has to increase its sensitivity and consequently requires an increased high tension voltage. There is a maximum high tension voltage at which a photomultiplier unit can accurately record transmitted radiation, and when this is approached, the readings become unreliable. Data collected when the high tension voltage is abnormally high, should not be used in the analysis and the "lowest wavelength datapoint" box allows a convenient method for truncating a dataset for this purpose. After applying this cut off criterion, if your data does not extend to sufficiently low wavelengths to enable the various databases and methods to be used for the analyses, then it is suggested that you re-collect the data changing the conditions - i.e. using shorter pathlengths, lower concentrations of buffers/additives or different buffers/additives. As a good practice guideline, the high tension voltage should not be above 550 mV at 190 nm for the sample or not above 500 mV at all for the baseline.


Wavelength Specifications.....

Wavelength Step
CD machines can be set to record data at various wavelength intervals. All of the analysis programmes accept data at 1nm interval only and so all other datafiles will be truncated. If the wrong wavelength step is specified the server will detect this and throw out an error message stating that your file is unsuitable for analysis.

Initial and Final Wavelengths
The wavelength range required for each reference data set is different:

reference sets 1,2,5178-260 nm
reference sets 4, 7190-240 nm
reference sets 6, 3185-240 nm
NB: The K2D program does not require a reference data set, but requires data in the range 200-240 nm

A full breakdown of the contents of the reference sets can be found here.

If your data range exceeds the data range required by the programme, then the file will be edited to produce the required datapoints only. However if the data range is smaller than the programme's required range then analyses will not proceed. This is because truncating the reference database to  match the test data range for each analysis would be inefficient and require much editing of the analysis software and data files, and the addition of extra datapoints as zeros or undefined points to the test file would greatly reduce the accuracy of analysis.


Optional Scaling Factor.....

The scaling factor allows the user to modify the experimental data by small amounts in order to try to compensate for errors in the intensity of the spectra and to hopefully thus improve the fit. It is possible that some spectrometers have incorrect intensity calibration and where this is known, a scaling factor may be applied to compensate for such errors.

The scaling factor is applied to all data points, and has a default value of 1, meaning no scaling. It would be highly unusual to require a large scaling factor and typical scaling values would be in the range 0.95 - 1.05. Scaling factors which are outside of the range 0.5 - 1.5 are unfeasibly large and will be ignored by Dichroweb.

WARNING

Scaling factors should only be applied to data where there is a known reason for doing so. It is possible to improve the NRMSD of an analysis by tweaking the scaling factor randomly, but this does not necessarily mean that the structure assignment is improved. Scaling factors should be used with caution.


Back to top Up to User Guide Menu Dichroweb homepage