## Structure Solution

A crystal structure is considered solved

when the phases
of enough reflections are known well enough to reveal most if not all of the atoms in
the unique part of the unit cell. There are several ways to solve the crystal
structure of a small molecule compound. The technique most commonly used today
is called direct methods. Another series of common methods still in use are based
on the Patterson function. Most Patterson methods are significantly aided by the
presence of one or more heavy atoms

in the structure. In addition to
these methods, difficult small-molecule structures are sometimes solved either by
superposition maps or by rotation and translation functions, that are both very
specialized extensions of the Patterson method.

Solving protein crystal structures are usually more challenging. If the crystal structure of a similarly sized protein, having similar cell parameters and space group symmetry, is already known, then the phases from the previously known structure may be used as a starting point to solve the current problem. This method is known as molecular replacement. Sometimes the solutions of heavy atom compounds are soaked into crystals in hopes that the heavy atoms will settle in a few specific locations of the crystals. The locations of the heavy atoms are determined and these phases are used to bootstrap the structure of the protein. Usually this method requires that two or more such heavy atom derivatives be prepared giving this method the name multiple isomorphous replacement. At synchrotrons, the wavelength can be tuned to enhance the anomalous scattering of a given atom type (e.g. Se that has replaced S in methionines), and anonalous scattering is used to enhance the phasing power of one or more heavy atom derivatives. The methods to solve protein crystal structures are beyond the scope of these notes; however, a good introduction to these methods is presented at Bernhard Rupp’s site in the Phasing Techniques section.

### Table of Contents

### Direct Methods

H. Schenk has prepared an introduction to Direct Methods.

Currently nearly all small-molecule crystal structures are solved by direct methods. Direct methods use probability relationships to assign phases to a small subset of the data. From modified electron density maps based on this subset data it is possible to extract enough of the atom positions to consider that the structure is solved.

The essential core of direct methods assumes that there is information about the phases contained in the structure factor amplitudes. Two general features of crystal structures have led to the development of a broad range of mathematical relations among the structure factor phases based upon knowing the values of the amplitudes.

- The electron density of the correct model must be ≥ 0 throughout the unique volume of the unit cell (positive electron density condition).
- The structure is composed of discrete atoms (discrete atom condition).

*Positive electron density condition* -- If totally random phases are input
into the electron density function, then it is unlikely that the resulting electron
density would have *ρ*(**r**)
≥ 0 for all **r**. This criterion of
positive electron density led Karle and Hauptman in 1950 to derive numerous
inequalities of determinants that relate the phase angles of different structure
factors to one another. They later received a Nobel prize for their pioneering
work on phase determination in crystallography.

*Discrete atom condition* -- In 1952, three researchers (Sayre, 1952;
Cochran, 1952; and Zachariasen, 1952) independently arrived at an important relation
involving the phases using the discrete atom condition. Sayre’s derivation is the
most clear and is given below.

For structures with well resolved, equal atoms, Sayre observed that the functions
*ρ*(** r**) and

*ρ*

^{2}(

**) are quite similar and show maxima at the same positions.**

*r*The Fourier transform of *ρ*(** r**)
is (1/

*V*)

*F*. The structure factor

_{h}*F*is related to the positions of the atoms by:

_{h}*F_{h}* =

*Σ*

*f*exp[2

_{h}*π*i

*(*·

**h***)]*

**r**_{j}where the summation *j* runs over the *N* data. For equal atom
structures this becomes:

*F_{h}* =

*f*

_{h}*Σ*exp[2

*π*i

*(*·

**h***)]*

**r**_{j}A similar expression can be written for the Fourier transform of
*ρ*^{2}(** r**)

*G_{h}* =

*g*

_{h}*Σ*exp(2

*π*i

*·*

**h***)*

**r**_{j}where *g_{h}* is the scattering
factor of the

squaredatom.

It can be shown using convolutions that

T[g(x)*h(x)]=T[g(x)] * T[h(x)].

Thus the transform of
*ρ*^{2}(** r**), that
is equal to (1/

*V*)

*G*, is also equal to (1/

_{h}*V*)

*F** (1/

_{h}*V*)

*F*. Since

_{h}*F*is a discrete function defined only at the points of the reciprocal lattice, the convolution becomes a summation:

_{h}*G_{h}* = (1/

*V*)

*Σ*

*F*

_{k}*F*

_{h-k}where the summation is over the * k* peaks. From the
expressions for

*F*and

_{h}*G*above then

_{h}*F_{h}* =
(

*f*/

_{h}*g*)

_{h}*G*=

_{h}*θ*

_{h}*G*

_{h}*F_{h}* =
(

*θ*

*/*

_{h}*V*)

*Σ*

*F*

**F**_{k}_{h-k}which is Sayre’s equation. Multiplying both sides of this last expression
by *F_{-h}* gives

|*F_{h}*|

^{2}= (

*θ*

*/*

_{h}*V*)

*Σ*|

*F*| exp[i(

**F**_{h}**F**_{k}_{h-k}*φ*+

_{-h}*φ*+

_{k}*φ*)]

_{h-k}where the summation is over the * k* reflections. Peaks
with large values of |

*F*| will also have large values for |

_{h}*F*|

_{h}^{2}and presumably at least some terms in the right hand expression with large values for |

*F*| and |

_{k}*F*|. For these terms in the right hand summation it follows that

_{h-k}*φ_{-h}* +

*φ*+

_{k}*φ*= 0

_{h-k}For structures in centrosymmetric space groups this expression becomes

*S*(* -h*) ·

*S*(

*) ·*

**k***S*(

*) = +*

**h-k**where *S*(* h*) stands for the sign of reflection

*. In the last two expressions the*

**h**=indicate only approximate equalities. The probability that the above expressions are true increases with increasing values for |

*F*|, |

_{h}*F*|, and |

_{k}*F*|.

_{h-k}Direct methods work with structure factors that have been modified
to behave as if they denote the scattering from point atoms located at the same
positions as the original atoms. The modified structure factors, called
*E* values or normalized

structure factors, are calculated
as shown below.

*E*_{hkl}^{2} =
*F*_{hkl}^{2} / (*ε*
∑ *f*_{j}^{2})

where *f*_{j} = *f*_{j}^{o}
exp(-B sin_{2}*θ* /
*λ*_{2}) is the scattering factor for the jth atom and
*ε* is an integer, 1 or greater, that corrects for the fact
that some classes of reflections have expectation values that are less than
∑ *f*_{j}^{2} by an integer amount.
Note that the phases associated with the *E*_{hkl} values
are the same as the phases associated with the *F*_{hkl}
values.

It has been found that only a fraction of the whole data set are needed to correctly
identify an initial model of the structure. The data with the strongest *E*
values contain the most information about the locations of the atoms. Because of
these facts, about 10% of the whole data set having the strongest *E* values
are chosen to carry out direct methods.

*Structure Invariants*

For a given basis system describing the unit cell, let
* r_{j}* be the positional vector of the

*j*th atom. From this reference system the structure factor would be

** F_{h}** = ∑

*f*exp(2

_{j}*π*i

**·**

*h**)*

**r**_{j}where the summation *j* runs over the *N* atoms in the cell.
Consider the effect on the structure factor if the origin is shifted by
* q*. With this shift of the origin the positional vector
of the

*j*th atom becomes

*’ =*

**r**_{j}*-*

**r**_{j}*and the structure factor expression becomes*

**q**** F_{h}**’ = ∑

*f*exp[2

_{j}*π*i

**·**

*h**’)] =*

**r**_{j}∑

*f*exp(2

_{j}*π*i

**· (**

*h**-*

**r**_{j}*) =*

**q****exp(-2**

*F*_{h}*π*i

**·**

*h**)*

**q**From this relation, it is clear that the structure factor modulus does not change with an origin shift, and that the phase value changes according to

*φ*_{h}’ =
*φ*_{h} -
2*π*i ** h** ·

**q**Thus the structure factor amplitudes are said to be *structure invariants*
because they are not changed with a shift or translation in the unit cell origin.

Do any structure factors exist whose phase would not change with an origin
shift? From the above expression only the phase of
**F**_{000} = ∑
*f _{j}* is invariant to any origin translation.

Are there functions involving the phases that would remain invariant of any origin translation? Consider the product

**F**_{h1}* F_{h2}* ...

*= |*

**F**_{hn}*F*

_{h1}*F*...

_{h2}*F*| exp[i(

_{hn}*φ*

_{1}+

*φ*

_{2}+ ... +

*φ*

_{n})]

According to the phase relation above the product of structure factors transformed by an origin shift would become

* F_{h1}*’

*’ ...*

**F**_{h2}*’ =*

**F**_{hn}

**F**_{h1}*...*

**F**_{h2}*exp[-2*

**F**_{hn}*π*i (

*+*

**h**_{1}*+ ... +*

**h**_{2}*) ·*

**h**_{n}*]*

**q**This suggests that the product of structure factors would be invariant to an origin shift if

* h_{1}* +

*+ ... +*

**h**_{2}*= 0*

**h**_{n}Products of structure factors that satisfy the last expression are called
*structure invariants*, since their values do not depend on the origin,
and therefore depend only on the structure (Hauptman & Karle, 1953; Giacovazo,
1998). The phase relations developed by Sayre is also a structure invariant.

Several structure invariants include:

- n = 1, then
**F**_{000}must be a structure invariant. - n = 2, then
**h**_{1}+**h**_{2}= 0,**h**_{2}= -**h**_{1},·**F**_{h}= |**F**_{-h}*F*|_{h}^{2}is a structure invariant. - n = 3, then
+**h**_{1}+**h**_{2}= 0, thus,**h**_{3}**F**_{h1}**F**_{h2}= |**F**_{-(h1 + h2)}*F*_{h1}*F*_{h2}*F*| exp[i(_{(h1 + h2)}*φ*_{h1}+*φ*_{h2}-*φ*_{h1 + h2}) is called a*triplet invariant*.

Current direct methods programs operate successfully the vast majority of time. Modern direct methods combine a variety of methods to generate phases usually starting from a random phase set and then refining the phases with an annealing step and with some tangent refinement and extension. The tangent refinement formula is shown below.

tan(*φ _{hkl}*) =
{∑

*κ*(

*H,K*) sin[

*φ*(

*K*) +

*φ*(

*H*-

*K*)]} / {∑

*κ*(

*H,K*) cos[

*φ*(

*K*) +

*φ*(

*H - K*)]}

### Patterson Methods

In 1935, A. L. Patterson published a classic paper on the utility of a
Fourier map that uses |*F*|^{2} as coefficients and phase
angles all assumed to be 0°.(Patterson, 1935) He
demonstrated that such a map gave peaks corresponding to all vectors between
any given pair of points.

*P*_{(u,v,w)} = ∑
*F _{hkl}*

^{2}cos[2π(

*hu+kv+lw*)]

The peak heights were found to be proportional to
Z_{i}·Z_{j}. Thus a point at *uvw* in a Patterson
map indicates that there are atoms in
the crystal at (*x*_{1}*y*_{1}*z*_{1})
and at (*x*_{2}*y*_{2}*z*_{2}) such
that

*u* = *x*_{1} - *x*_{2},
*v* = *y*_{1} - *y*_{2},
*w* = *z*_{1} -
*z*_{2}

For a crystal structure with N atoms, there will be N^{2}
peaks in the Patterson map. N of these peaks will be peaks of zero length
corresponding to the vector of given point to itself. The remaining N^{2} -
N peaks are distributed throughout the cell. Since the cell of the Patterson
function is the same size as the cell of the crystal, the Patterson function is
much more densely packed than the corresponding electron density map. This higher
density of peaks causes many peaks in the Patterson map to be overlapped. The
greater intrinsic breadth of Patterson peaks accentuates this overlap. Vectors
that are connected between the atom sites, each with a definite width, cause the
vector peaks to be as broad as the sum of the widths of the two atom peaks.

Consider a structure with only one heavy atom, say an iodine atom with Z = 53 and
the remaining atoms no heavier than an oxygen with Z = 8. Then the Z_{light}
* Z_{light} peaks would have intensities roughly proportional to 64. The
Z_{light} * Z_{heavy} peaks would have intensites roughly proportional
to 424 and the Z_{heavy} * Z_{heavy} peaks would have intensities
roughly proportional to 2809. Thus the vectors that have at least one end on a
heavy atom are relatively easy to identify in the Patterson map.

Because of the many overlaps and the broad peaks, Patterson maps tend
to be an almost featureless distribution of peak density. To remedy this problem the
|*F*|^{2} values are typically modified to make the scattering appear
to come from point atoms. In modern programs this sharpening

of the map
is often accomplished by using *E*^{2} values instead of
|*F*|^{2}.

A less serious problem with Patterson maps is the very large peak
at the origin. Patterson showed that it was possible to subtract the origin peak from
the map by subtracting the average

value of |*F*|^{2} from
the coefficients.

|*F*|^{2}_{origin removed} =
|*F*|^{2} -
∑ *f*_{j}^{2}

where *f*_{j} is the scattering factor for the
*j*th atom. For a Patterson function sharpened to resemble scattering from
point atoms, the *f*^{j} values become equal to Z_{j}.
*E*^{2} values are calculated to have an average value of 1.
So Patterson maps calculated using *E*^{2} - 1 as coefficients will
have the large peak at the origin removed.

Patterson maps have the same symmetry as that of the corresponding
Laue group for the sample. Patterson maps are centrosymmetric because the vectors
between the atoms can point both directions. Although symmetry elements of the
crystal’s space group do not necessarily appear as such in the Patterson map, they
do leave their traces in the form of higher concentrations of peaks. The locations
to look for these stronger peaks are called Harker lines and planes (Harker, 1936).
These regions of the Patterson map correspond to the vectors between atoms related
by the space group symmetry of the structure. For example in *P*2_{1}
the symmetry operators are *x, y, z* and *-x,* ½*+y, -z*.
In the Patterson map there should be a concentration of vectors at 2*x*, ½,
2*z* in the *v* = ½ plane. This method of locating vectors can be
successfully applied to finding a heavy atom in a crystal structure.

First, consider the relative peak heights of vectors that are in a
map. If the crystal structure only contains light (Z < 10) atoms then the peaks
will all be of the same height, Z_{light} * Z_{light}’ except for
accidental overlaps and a slight build up in the corresponding Harker lines or
planes. If however, there is a single heavy atom in the compound, then the peaks
between the heavy atom and the light atoms will have heights proportional to
Z_{light} * Z_{heavy}, and peaks between symmetry-related heavy atoms
in the cell will be proportional to Z_{heavy} * Z_{heavy}. Thus the
peaks corresponding to the heavy atom should stand out in comparison to the rest
of the Patterson map peaks.

The vectors between symmetry-related peaks are obtained from the
symmetry operators of the space group. A table is created with the symmetry operators
listed along the left side column and along the top row. The vectors are determined by
subtracting the left side operator from the top row operator. An example of this
process is illustrated for the space group *P*2_{1}/*c*.

x, y, z | -x, -y, -z | -x, ½+y, ½-z | x, ½-y, ½+z | |
---|---|---|---|---|

x, y, z | 0, 0, 0 | -2x, -2y, -2z | -2x, ½, ½-2z | 0, ½-2y, ½ |

-x, -y, -z | 2x, 2y, 2z | 0, 0, 0 | 0, ½+2y, ½ | 2x, ½, ½+2z |

-x, ½+y, ½-z | 2x, ½, ½+2z | 0, ½-2y, ½ | 0, 0, 0 | 2x, -2y, 2z |

x, ½-y, ½+z | 0, ½+2y, ½ | -2x, ½, ½-2z | -2x, 2y, -2z | 0, 0, 0 |

The Harker line for this space group is at *u* = 0 and
*w* = ½; and the Harker plane for this space group is at *v* = ½.
To make sure that a heavy atom is properly located from the peaks in the Harker line
(from the *v* coordinate locate *y*) and the Harker plane (from
*u* and *w* locate *x* and *z*), check that there is also
a vector in the Patterson corresponding to 2*x*, 2*y*, 2*z*.

For the a structure with only one heavy atom in the asymmetric unif of this example space group, the highest peak at u, ½, w in the Patterson map would be equated to the expression 2x, ½, ½+2z. Thus the x and z coordinates for the heavy atom would be at x = u/2, z = (w - ½)/2. The highest peak in the 0, v, ½ line in the map should correspond to the expression 0, ½+2y, ½ giving y = (v - ½)/2. Then verify these coordinates by looking in the Patterson map for a strong peak with u, v, w coordinates of u = 2x, v = 2y, and w = 2z.

Use these x, y, z coordinates for the heavy atom to calculate a simple Fourier map. From this new Fourier map it is usually possible to locate most if not all of the remaining atoms in the crystal structure.

Dr. Donald Ward has prepared tables of Harker lines and planes for
all space groups and published these tables in *Patterson Peaks*. A
brief teaching edition of this book is available on the web at
http://www2.chemistry.msu.edu/staff/ward/PattPeaks/brief.shtml.

A Patterson map shows vectors between all atoms creating a large number of images of the molecule. One way to think about all of these images is to consider moving each atom of the structure to the origin and then plotting all vectors between atoms. If the Patterson function could be manipulated to leave only one image of the structure, then the structure would be solved. This type of manipulation is performed by a technique known as Patterson superposition. The steps in performing a superposition calculation are discussed below.

A heavy atom vector is located in the Patterson map. One copy of the Patterson map is shifted in space by this vector amount and for every point in the map, the intensities of the shifted map are superimposed on the intensities of an unshifted Patterson map. The minimum intensity between these two maps is then stored as a shifted map. A new high intensity peak is chosen from the shifted map and a copy of the original Patterson map is moved by this vector amount and the intensities of the shifted map are compared with the intensies of this moved Patterson. Again a minimum function is used in the comparison process. With each shifting and comparison step, the new map contains many fewer peaks. Usually 3-4 such superpositions are needed to reveal a single image of the structure. At this point, the symmetry elements of the space group are located in the map and the positions of the atoms (the remaining peaks in the map) are shifted to put the space group’s symmetry elements in the conventional position(s).

There are a few details about the superposition method that should be mentioned. First the original Patterson map should be calculated on a fine grid so that each point represents ≤ 0.25 Å separation. Both the original Patterson and all shifted maps should be calculated for the entire unit cell. Finally, the superposition program should properly interpolate the intensites from the map based on the shift vectors. At this time, there are no commercially available superposition programs.

### Molecular Replacement

The molecular replacement method was first developed by Rossmann and Blow (1963,
1964).^{8} This method utilizes a structurally-similar model as represented in a
known crystal structure or chemical model. Parts of the known model that are believed to
not be similar to the problem structure are removed. The model is then rotated in all unique
directions of a Patterson map and translated within the unit cell of the Patterson map
to attempt to locate a fit with the model.

### References

- W. Cochran,
*Acta Cryst.*,**1952**,*5*, 65-67. - C. Giacovazzo, (1998), "Direct Phasing in Crystallography, Fundamentals and Applications," Oxford:New York, Chapter 2.
- D. Harker,
*J. Chem. Phys.*,**1936**,*4*, 381-390. - H. Hauptman & J. Karle, (1953),
The Solution of the Phase Problem I. The Centrosymmetric Crystal,

ACA Monograph 3, New York. - J. Karle & H. Hauptman,
*Acta Cryst.*,**1950**,*3*, 181. - A. L. Patterson,
*Z. Krist.*,**1935**, A*90*, 517-542. See also,*Phys. Rev.*,**1934**,*46*, 372-376.*Acta Cryst.*,**1949**,*2*, 339-340. - D. Sayre,
*Acta Cryst.*,**1952**,*5*, 60-65. - W. H. Zachariasen,
*Acta Cryst.*,**1952**,*5*, 68-73; see also L. Lavine,*Acta Cryst.*,**1952**,*5*, 846-847. - a) M. G. Rossmann, and D. M. Blow,
*Acta Cryst.*,**1963**,*16*, 39. b) M. G. Rossmann, and D. M. Blow*Acta Cryst.*,**1964**,*17*, 1474.