Introduction to protein folding
From Foldeomics Wiki
Proteins
Arguably, the most important class of biological macromolecules are proteins. They are essential for the existence of all biological systems, and because of their diverse range of functions they are essential for many different reasons. For example: haemoglobin is essential for the transport of oxygen in the blood of mammals (Perutz, 1942; Perutz et al., 1998); glycogen phosphorylase provides energy for the cell catalysing the first step in converting glycogen into glucose (Hajdu, 1987; 1988); oestrogen receptor and other hormone receptors regulate gene expression (Schwabe et al., 1990; Halachmi, 1994); myosin forms the thick filaments of striated muscle (Granzier & Labeit, 2002) and the molecular chaperone GroEL aids the folding of other proteins (Mayhew et al., 1996).
The function of every protein molecule depends on its precise native three-dimensional structure. The native three-dimensional structure of a protein is determined purely by its environment (e.g., pH) and the sequence of amino-acids (Anfinsen, 1973). The native three-dimensional structure and energy of a protein involves a complex set of atomic interactions. The acquisition of a protein’s three-dimensional structure from a disordered state is called protein folding. Normally the protein folding process occurs correctly, however, occasionally proteins misfold. Alzheimer’s, Parkinson’s, Huntingdon’s and CJD are examples of diseases that can result if a protein misfolds and aggregates (Taubes, 1996; Perutz, 1997; Mastrianni & Roos, 2000). Clearly, it is important to understand the process of protein folding, but still protein chemists search for fundamental answers.
Protein Folding
The Protein Folding Problem
For a protein to fulfil its function it must first fold from the unfolded (U) or denatured (D) state – a largely unstructured and highly flexible linear chain of amino acids, into the folded (F) or native (N) state – a compact and specific three-dimensional structure. How the protein folds into the same final native state is called the protein-folding problem (Fersht & Serrano, 1993; Makhatadze & Privalov, 1995).
Protein Folding In Vitro
Although many proteins can fold reversibly in vitro, many cannot. There are various causes for this irreversibility, including: 1, Folding can be irreversible if the unfolded or partially folded protein aggregates prior to, or during, the folding process. This irreversibility is generally caused by poor solubility of the peptide chain in aqueous solutions and is aggravated if the protein consists of several domains or sub-units (Fink, 1995). 2, Folding may be irreversible if a protein undergoes post-translational modification, altering elements essential for folding. For example, subtilisin has a pro-sequence which is required to direct folding (Shinde & Inouye, 1993; Bryan et al., 1995). This pro-sequence is removed by autolysis once subtilisin has folded, thus folding becomes irreversible.
Protein Folding In Vivo
In vivo, protein folding is significantly more problematic. This is due to high protein concentrations and macromolecular crowding in the cell (Ellis, 2001). Because they have exposed hydrophobic surfaces, denatured states and folding intermediates are generally “sticky” and thus at high concentrations will tend to aggregate. Even small, fast-folding proteins aggregate or precipitate at high, or sometimes even low concentrations, in vitro (Silow et al., 1999). To combat aggregation and to aid the folding of proteins, the cell uses a series of accessory proteins, collectively known as molecular chaperones.
Molecular chaperones can aid a protein at a number of stages during the folding process. Examples include: protein disulphide isomerases which rapidly catalyse the shuffling of disulphide linkages (Creighton, 1995a; Freedman, 1995); and peptidyl-prolyl isomerase enzymes such as cyclophilin and FKBP12 which catalyse cis-trans isomerisation (Schmid, 1993). In E. coli, the unfolded state of a nascent polypeptide chain is passed to DnaK (Hsp70 homologue), which maintains it in an extended form, and under the influence of ATP and co-chaperones such as DnaJ and GrpE, it is passed on to chaperonin GroEL which aids its folding (Langer et al., 1992). Chaperonins can bind to different states, including unfolded and partially folded amino-acid chains, thus preventing aggregation (Gething & Sambrook, 1992). Chaperonins can also unfold misfolded states, thus allowing them further chances to refold correctly (Grantcharova et al., 2001).
The exact mechanism of chaperone-asssisted folding is not entirely understood, but some specific details have been elucidated. It is generally accepted that GroEL provides a macromolecular cage (a “fusion-cage” or folding cage) in which proteins that are prone to aggregation can fold in a unimolecular environment (Grantcharova et al., 2001; Hartl, 1996; Mayhew et al., 1996). The crystal structures of the GroEL and GroEL/GroES complex support this view (Braig et al., 1994; Xu et al., 1997). However it has since been shown that monomeric minichaperones, which correspond only to the apical domain of GroEL, possess chaperone activity in vitro and in vivo (Zahn et al., 1996; Chatellier et al., 1998). The crystal structure revealed that the active site is a flexible hydrophobic patch, which fits best to β strands (Buckle et al., 1997). Thus it is proposed that GroEL binds unspecifically to exposed hydrophobic patches of denatured states, folding intermediates and misfolded states, preventing them from self-aggregating and so allowing them to fold correctly.
It is thought that only 10% of proteins present in the cell require molecular chaperones to fold (Fersht, 1998) and in fact their folding in vitro can be retarded by the presence of chaperonins, because a chaperonin will bind preferentially to a denatured state (Gray et al., 1993). It has been found that unstressed bacteria do not have the required quantity of chaperones present to fold all of the cells proteins (Lorimer, 1996; Ewalt et al., 1997). Thus, proteins requiring folding assistance may be restricted to a specific class or a specific occasion, such as during synthesis or transport of a protein or during stress (for example, heat shock (Martin et al., 1992; Schmidt et al., 1994; Clarke, 1996)).
A History of the study of Protein Folding
For over thirty years many protein chemists have studied and tried to understand the fundamental processes, which govern a proteins ability to fold. Anfinsen and co-workers were the first to show that denatured bovine pancreatic ribonuclease could spontaneously refold into its native three-dimensional structure with corresponding function, when taken from denaturing to native conditions, in vitro (Anfinsen et al., 1961; Anfinsen, 1973). The experiment proved that all of the information required for a protein to fold into its final three-dimensional structure was stored (encoded) in the protein.
The Levinthal Paradox
So how does a protein fold into its stable, native state? Levinthal argued that if folding was under thermodynamic control then a chain of amino-acids would have to randomly navigate through many possible conformations before finding its minimal energy native structure. However, this cannot be the case because even a small protein cannot possibly fold via this method in a biologically reasonable time scale (Levinthal, 1968). For example, if two conformations per residue were allowed for a small protein of 100 amino-acids residues, and these two conformations interconverted on a femtoscond timescale, then 1030 possible conformations would have to be explored before the correct one was found. Thus, it was proposed that in order for the folding process to occur both efficiently and rapidly a protein must fold along a defined pathway. Such complete kinetic control, however, implies that the protein folding will be dependent on its environment, and yet it was known that proteins could refold under varying conditions in vivo and in vitro to the same three-dimensional active native structure. This became known as the Levinthal Paradox.
In retrospect such a calculation was naïve. It has assumed that all conformations have equal energy, when in fact, once one conformation is chosen by one amino acid, the others around it become sterically and thus energetically restricted in their choices. Despite this, Levinthal’s Paradox was to influence the folding community for many years and it became widely accepted that proteins must have defined folding pathways. Three major mechanisms of folding have been postulated (see Figure 1).
Figure 1: The Three Classical Models for Protein Folding.
The Framework Model
In 1973, Ptitsyn proposed that a protein could fold through several intermediates. Each intermediate would have an increasing number of native-like structural features and these features would be fixed at subsequent stages of folding (Ptitsyn, 1973). This proposal was called the Framework Model. It allowed both kinetics and thermodynamics to control refolding.
Concepts for this model have been refined and a more precise hypothesis of how proteins fold under this model mechanism has been proposed (Ptitsyn, 1995b). The protein is proposed to have three stages of development in the process of forming its final three-dimensional structure. Initially, local elements of native-like secondary structure form, followed by an overall folding pattern or tertiary structure. Finally, the long distance interactions, which make the overall structure fixed and rigid, are formed. It is the local interactions among immediate neighbours which initiates folding and the existence of at least two intermediates is necessary: an early intermediate which has secondary structure fluctuating around native positions and a late molten-globule intermediate which is compact and has overall native-like structure, including three-dimensional structural information, but lacks tight packing and final rigidity. This model implies that the folding code is contained within individual amino-acid residues and the local interactions that each residue makes with its immediate neighbours. It is the formation of secondary structure which allows tertiary structure to assemble and so it is the immediate environment of each residue which determines the final protein structure (Kim & Baldwin, 1990). Another refinement, the diffusion-collision model, suggests that elements of secondary structure could diffuse, until they collide and eventually cohere to form stable tertiary structure (Karplus & Weaver, 1994).
The Hydrophobic Collapse Model
Dill and co-workers review an alternative Hydrophobic Collapse Model for protein folding (Dill et al., 1995). In this model, it is proposed that non-local interactions between hydrophobic atoms distant in the primary sequence drive a collapse, which gives rise to a final folded state. The protein collapses rapidly around its hydrophobic side chains and then rearranges from the restricted conformational space occupied by the intermediate. This model implies that, native-like tertiary interactions drives secondary structure formation, rather than the converse. The folding code resides in global patterns of specific non-local interactions, which have arisen from the arrangement of polar and non-polar residues in the primary sequence.
In this model, non-local hydrophobic interactions drive the protein to become compact and acquire a non-polar core. Some of these interactions are sequence dependent and so craft a specific tertiary architecture. The collapse helps drive the formation of secondary structure, because compactness induces a stabilisation of hydrogen-bonding interactions. Intermediates, containing many buried hydrophobic residues are formed during the process of hydrophobic collapse. The intermediates must go through a rate determining transition state in which side chains adopt tight packing before the formation of the native state.
The “Classical” Nucleation Model
Like the Framework Model, the Nucleation Model proposed that initially neighbouring residues would form native-like elements of secondary structure. These small elements of secondary structure act as a nucleus from which the native structure propagates in a stepwise manner (Wetlaufer, 1973; 1990). In this classical model the nucleus is strong and localised, for example, two or three turns of an α-helix and there are few or no long-range interactions. The structure then grows from the nucleus.
Intermediates versus Transition States
Many other models have been proposed such as The Jigsaw Model, which suggests that folding is a heterogeneous process and that each molecule folds along its own distinct path (Harrison & Durbin, 1985), however, most are variations or refinements of the three above. Both the framework and the hydrophobic collapse models infer that folding intermediates must be present – for they could reduce the size of the conformational space a protein had to search through, thus solving the Levinthal Paradox. The protein folding field became dominated by the study of proteins which folded through stable and populated intermediate states which could be easily characterised (Kim & Baldwin, 1982; Kim & Baldwin, 1990; Evans & Radford, 1994). It was assumed that the presence of intermediates on the folding pathway was essential, and so the nucleation mechanisms were disregarded.
However, in 1991, it was shown that stable intermediates were not essential for the fast efficient folding of a protein (Jackson & Fersht, 1991a) and it has subsequently been suggested that stable intermediates may slow the folding process (Fersht, 1995). This discovery switched the experimental focus away from the study of intermediate, disulphide, molten globule, and proline-limited states, and towards the study of transition states and the simple two-state folding of smaller proteins. The problem with the early mechanisms proposed was that each uncoupled the formation of secondary structure from tertiary structure, thus simplifying the search for the folded state. It is now known that this removes a stringent requirement: to have simultaneous formation of secondary and tertiary structure.
Methods used to Analyse Protein Folding
To begin to understand how a protein folds, the folding pathway of a protein must first be thoroughly characterised. This involves a detailed characterisation, both structurally and energetically, of all species on the folding pathway. In the simplest case, when protein folds in a two-state manner without populating any intermediate states, characterisation of the denatured state ensemble (D), transition state ensemble (‡), and the native state (N), is required. The three-dimensional structure of native proteins can be determined using x-ray crystallography and NMR spectroscopy. Heteronuclear multidimensional NMR techniques have been used to characterise the denatured states of a number of small proteins (Logan et al., 1993; Arcus et al., 1994; Logan et al., 1994; Freund et al., 1996; Wong et al., 1996). From these and other studies it has become clear that unfolded proteins are rarely true “random coils” but can contain regions of residual structure (Shortle & Abeygunawardana, 1993). Unfortunately, the structure of a transition state cannot be determined directly. It can only be inferred indirectly by the study of the kinetics of either refolding or unfolding. Although folding intermediates (Kim & Baldwin, 1982; Evans & Radford, 1994) and transition states have been characterised using a variety of biophysical methods and probes, it was not until the advent of H/D pulsed-quenched flow techniques (Baldwin, 1993) and protein engineering methods (Matouschek et al., 1989) that high resolution structural information became available.
Characterising Folding Pathways
Certain properties of proteins, for example, intrinsic fluorescence, change on going from the denatured to the native state. These changes can be used to probe the stability of the protein and its folding pathway. The thermodynamic stability of proteins can be investigated, by performing unfolding/folding experiments under equilibrium conditions. Equilibrium parameters are then measured. The unfolding/folding equilibrium is perturbed by a variety of methods and the degree to which the protein is consequently unfolded or refolded is measured. It becomes possible to measure the equilibrium constant over a range of conditions. To perform these experiments protein is denatured, by varying one parameter. A probe is utilised to measure the equilibrium that is produced. Such experiments involve unfolding the protein using chemical denaturants (urea or guanidinium chloride (GdmCl)), pH or temperature. The Gibbs free energy for the unfolding of a protein, , can be calculated from these denaturation experiments. If thermal unfolding is reversible, then thermal denaturations also allow the enthalpy, , and the entropy, , of unfolding to be calculated.
Characterising the Transition State
The kinetics of unfolding/refolding allows the energetics of the transition state to be calculated. This indirectly allows the structure of the transition state to be studied. Measuring kinetics under different experimental conditions allows observation of different characteristics of the transition state. A number of different experimental approaches have been used to study the structure and energetics of folding transition states. There are two main approaches: 1, The study of wild-type protein, where the rates of unfolding and refolding are measured, as a function of various experimental conditions and 2, The protein engineering method.
The first approach studies wild-type protein and measures the rates of unfolding and refolding as experimental conditions such as temperature, denaturant concentration, ligand concentration, etc. are varied. This provides information on the thermodynamic nature of the transition state, as well as its compactness, as measured by burial of hydrophobic side chains. Studies on the folding and unfolding of wild-type proteins in the presence and absence of ligand have yielded information on structure formation in ligand-binding pockets during folding (Sancho et al., 1991). Recently, this type of approach has been extended by studying the effect of sugars and alcohols on the rate of folding and unfolding (Chiti et al., 1998b; Chiti et al., 1999a). These experiments have yielded information on the extents of hydration and secondary structure, particularly α-helices, in the transition state. The perturbations applied to wild-type protein provide average properties of the structure and energetics of the transition state (see summary in Table 1). Thus, only a low-resolution picture of the transition state is obtained. To obtain detail at the level of individual residues and atoms, protein engineering techniques must be applied.
The second approach is to use protein engineering techniques, and measure the relative stabilities, and rates of folding of wild-type and mutant protein. Protein engineering techniques can be used to dissect the interactions and structure present in the transition state for folding. The effect of a point mutation on the energetics of the native state, , and the transition state, , relative to the denatured state is measured using a combination of equilibrium and kinetic experiments (Fersht et al., 1992). The and values can be compared using a -value analysis (Fersht et al., 1992). A -value analysis quantitatively measures the energetics of structure formation in the transition state. It is the only method available which studies directly the side chain interactions made in transition states, thus providing a tool, which directly infers secondary and tertiary structural information.
The details, limitations and background theory for the -value analysis technique are discussed in detail in Section 2. Briefly, a -value is the ratio of / and will normally range from 0 to 1. A high -value indicates that the interactions, made by the mutated residue’s side chain in the native-state, which were lost upon mutation, are highly, or completely ( = 1), formed in the transition state of the folding pathway. A low -value indicates that the interactions, made by the mutated residue’s side chain in the native-state, which were lost upon mutation, are hardly, or not ( = 0), formed in the transition state of the folding pathway.
Molecular Dynamics Simulations
Molecular dynamics, which solves Newton’s equations for every atom in a protein and surrounding solvent, can be used to simulate protein folding at atomic resolution (Daggett et al., 1996; Li & Daggett, 1996; Daggett et al., 1998; Ladurner et al., 1998). There are major difficulties in applying the method to protein folding. First, atomic position and trajectories must be calculated for each of the many thousands of atoms in the protein structure, over a relatively long timescale (nanoseconds in femtosecond steps). Second, the limited time scale accessible to simulation means that millisecond folding reactions must be accelerated to a nanosecond timescale. The relevance of performing a folding reaction at the extremely high temperatures necessary for speeding up the reaction is questionable. Third, there are the same uncertainties in the potential functions that are used to calculate the energetics, as there are in calculating the stabilities of the native state.
In contrast, molecular dynamics is very well suited to analysing protein unfolding, chiefly because the calculations can start from well-defined crystal and solution structures. Unfolding can then be sped up, by simulating denaturing conditions, such as high temperature or the presence of simulated denaturants. The folding pathway can be simulated, simply by reversing the unfolding results. Whilst molecular dynamics simulations have the potential of describing the whole pathway of folding, they do require experimental validation. A -value analysis can provide this validation, and in turn, the simulations provide structural information for the -value data.
Techniques Used to Follow Folding and Stability
Many methods can be used to perturb the system, including: addition of denaturant (e.g urea or GdmCl); changing the pH; heating or cooling; addition of a ligand; using pressure and introducing co-solvents. These methods have been summarised in Table 1.1. By coupling a probe, for example intrinsic fluorescence, with a method of perturbation, the protein’s folding pathway can be studied under both equilibrium and non-equilibrium conditions. For example, fluorescence experiments allow the effect of the chemical denaturant, pH or temperature to be monitored by measuring a change in fluorescence as a function of the denaturant concentration, pH or temperature. The fluorescent experiment usually probes a tryptophan or tyrosine residue within the protein but the non-covalent binding of other fluorescent molecules, for example, 8-anilinonaphthalene-1-sulphonic acid (ANS), with the protein, depending on its degree of unfolding can also be measured. Folding experiments can also be monitored using circular dichroism (CD) spectroscopy. This technique is useful because, by measuring the ellipticity of the chiral carbon in the amide backbone, it enables the detection of the secondary structure of the protein and by measuring the ellipticity of the environment which the tyrosine residues and tryptophan residue lie in, it detects the tertiary structure of the protein. Near-UV (250 - 320 nm) CD detects the tertiary structure and far-UV (200 - 250 nm) CD detects the secondary structure. Examples of techniques used to probe protein stability and folding, and what protein characteristic they follow (directly or indirectly), are summarised in Table 1.2 (Plaxco & Dobson, 1996).
Table 1: Examples of Pertubations that Affect Protein Stability and Folding.
| Perturbation | Information provided by perturbation | Equilibrium and kinetic parameters that can be measured |
|---|---|---|
| Increasing or decreasing denaturant concentration. | Equilibrium: Information on the midpoint, cooperativity and extent of unfolding. Kinetics: Information on the rate of folding and unfolding. Information on the compactness of the transition state. |
Equilibrium: the midpoint of folding [denaturant]50% and the term, mD-Na Kinetics: kfa, kua, mkfa and mkua The Tanford value a is calculated for the transition state (Tanford, 1968; 1970). |
| Changing pH | Equilibrium: Low-pH and molten globule states which may look like a kinetic intermediate (Jennings & Wright, 1993; Jamin & Baldwin, 1996). Kinetics: pH and ionic strength dependence of the stability of intermediate state I. Characterization of Dphys (see below). |
Unfolding kinetics: ∆Q‡-NpH, the change in the number of protons taken up in the transition from native to transition state (Oliveberg & Fersht, 1996).
Refolding kinetics: ∆Q‡-DpH, the change in the number of protons taken up in the transition from denatured to transition state (Tan et al., 1996). |
| Ligand concentration. Varying the ligand concentration alongside the denaturant concentration. | Kinetics: Information on the structure of the ligand binding site in the transition state(Sancho et al., 1991), | |
| Increasing or decreasing temperature | Kinetics: Information on the thermodynamic nature of the transition state. | Kinetics: Values for the change in: enthalpy, ∆H‡; entropy, ∆S‡; and heat capacity, ∆Cp‡, between the initial state and the transition state can be calculated. |
| Co-solvent concentration. The sugar or alcohol concentration can be varied alongside or separately to denaturant and pH variations. | Kinetics: Information on the extent of hydration and secondary structure (particularly α-helices) in the transition state (Chiti et al., 1998a; Chiti et al., 1999a). |
a The parameters mD-N, kf, ku, mkf, mku and βT are discussed in detail in Section 2.
Table 2a: Summary of Experimental Techniques/Probes used to Study Protein Stability and Folding.
| Probe | Property of protein technique probes. | Measurements that can be made from applying probe. |
|---|---|---|
| Ultraviolet (UV) Absorbance | Molecular dimensions. | The environment and orientation of predominantly tyrosine side chains. |
| Biological Activity | Tertiary contacts and native structure. | The formation of native tertiary structure and the active site. |
| Far-UV Circular Dichroism (CD) | Secondary structure and persistent hydrogen bonds. | Backbone conformation. |
| Near-UV Circular Dichroism (CD): | Tertiary contacts and native structure. | Formation of stable aromatic and disulphide bond tertiary contacts. |
| Cysteinyl Quenching | Core packing/tertiary contacts. | Protection of cysteine side chains from hydrophilic reactants. |
| Fluorescence: Anisotropy | Molecular dimensions | Tryptophan side chain mobility and overall molecular dimensions. |
| Fluorescence: Energy Transfer | Molecular dimensions | Scalar distance between tryptophan and a covalently attached fluorophore. |
| Fluorescence: Extrinsic, e.g. ANS | Core packing/tertiary contacts. | Formation and disruption of organised hydrophobic patches and clefts. |
| Fluorescence: Intrinsic | Core packing/tertiary contacts. | The orientation and environment of predominantly tryptophan, and some tyrosine side chains. |
| Fluorescence: Quenching | Core packing/tertiary contacts. | Isolation of tryptophan side chains from hydrophilic fluorescence quenchers. |
| Nuclear Magnetic Resonance (NMR) | Depending on the technique used: either tertiary contacts; or secondary structure/persistent hydrogen bonds. | Depending on the technique used: specific side chain tertiary contacts; or the formation of stable backbone hydrogen bonds, sequence specific formation of stable amide and tryptophan hydrogen bonds. |
| Protein Engineering | Site-directed mutagenesis combined with equilibrium and kinetic experiments. | Stability of point mutations. Detailed residue-specific information on the interactions formed in the transition state of folding. |
aThis table is a précis of a table from Plaxco & Dobson, 1996 (see references within).
The Nucleation-Condensation Model and the Unified Folding Scheme
The “New View” of Protein Folding It has been shown that proteins are biased from random coils towards compact states which could form structure (Shortle, 1993). Studies on the denatured state show there to be a correlation between intrinsic conformational preferences of residues and their secondary structure propensities (Smith et al., 1996). The flaw in Levinthal’s analysis is that it is assumed that a protein’s search for its folded state will be unbiased. Levinthal had presupposed that the groups present on the protein rotate around their bonds at random, with no stabilisation of any particular conformation until all were in the correct orientation, at which point the native structure snaps into place. If there is a conformational bias in the sequence towards the correct structure then the paradox disappears (Zwanzig et al., 1992; Finkelstein & Badretdinov, 1997; Karplus, 1997).
A dramatic change in the field of protein folding has taken place in the last decade. The discovery of two-state folding proteins and the ongoing development of molecular biology has meant that experimentalists can now focus on describing the structure of each macroscopic state of the folding process, at an atomic level. This has coincided with the development of bioinformatics, including a tremendous increase in protein databases, sequence information and theoretical methods for the analysis of folding. Computer simulations range from lattice-based simulations (polymer chains) to precise atomic analysis by molecular dynamics, which I have already discussed. A consequence of this new wealth of information has made experimentalists and theoreticians move towards a broader “New View” of protein folding (Dill & Chan, 1997; Dobson & Karplus, 1999), although not without controversy (Baldwin & Rose, 1999a,b). The shape of protein energy landscapes are no longer flat, but rather slopes that funnel the protein into its native structure (Dill & Chan, 1997; Leopold et al., 1992).
Figure 2: Protein Energy Landscapes (Dill & Chan, 1997). Include this figure? a, The Levinthal 'golf-course' landscape. N is the native conformation. The chain searches for N randomly, that is, on a level playing field of energies. b, The 'pathway' solution to the random search problem of Figure 1.2a. A pathway is assumed to lead from a denatured conformation A to the native conformation N, so conformational searching is more directed and folding is faster than for random searching. c, d, Examples of the “New View” of a Folding Landscape. c, An idealized funnel landscape. As the chain forms increasing numbers of intrachain contacts, and lowers its internal free energy, its conformational freedom is also reduced. d, A rugged energy landscape with kinetic traps, energy barriers, and some narrow throughway paths to native. Folding can be multi-state.
The Folding of Chymotrypsin Inhibitor 2 (CI2)
Using protein engineering, one of the most characterised folding pathways has become that of chymotrypsin inhibitor 2 (CI2). CI2 is a 64-residue polypeptide inhibitor of serine proteases. It has a binding loop, a single α-helix, and a mixed parallel and antiparallel β-sheet. There are four peptidyl-proline bonds and all are in the favoured trans conformation. The α and β elements are interspersed, giving it an α/β structure. The interatomic interactions that CI2 makes are uniform and do not segregate into regions which make more tertiary interactions with themselves than they do with neighbouring regions, i.e. it has a globular single domain. Because CI2 is a single module of structure it can be regarded as a single folding unit or “foldon” (Panchenko et al., 1996). It could even be a model for a foldon within a larger protein.
CI2 folds according to first-order (two-state) kinetics, with a half-life of 13 ms in H2O at 25 °C:
Image:D-N.png
Note, that there is a small fraction of protein which folds slowly because of cis–trans peptidyl proline isomerization, which can be ignored for the purposes of this discussion. Over 100 mutants have been made and some fold 10 times faster than wild type (Itzhaki et al., 1995a). ΦF-values are generally low: 0.2 - 0.5 in the hydrophobic core; slightly higher in the C-terminus of the α-helix; and highest at the N-cap, around 0.6 - 0.8. The residues with the highest ΦF-values interact with two residues in the β-sheet to form a core. This core is the most highly structured region in the transition state.
Studying the folding pathway of CI2 and characterising its transition state identified some important new properties:
(i) Small proteins can fold rapidly in a first-order reaction from a relatively expanded denatured state.
(ii) The transition state is an expanded structure in which secondary and tertiary structures are formed in parallel.
(iii) There are no completely fully formed elements of secondary structure in the transition state of folding – all elements are in the process of being formed.
These three observations rule out the framework, diffusion-collision, hydrophobic collapse, classical nucleation, parallel pathways and jigsaw models of protein folding. In a classical nucleation model and the framework model, the preformed secondary structure would have values of 1.0, and these are not seen. Hydrophobic-collapse and framework models also predict a formation of either tertiary (hydrophobic-collapse) or secondary (framework) interactions first. The Brønsted plot is linear, not sloping, therefore fractional ΦF values cannot be due to parallel pathways (see Section 3.6 for background theory).
The Nucleation-Condensation Model for Protein Folding
A new Nucleation-Condensation Model was proposed to describe the folding of CI2 (Fersht, 1995; Tan et al., 1996; Fersht, 1997). The mechanism involves a nucleus that consists primarily of adjacent residues, however, it cannot form stable structure without assistance from interactions made with residues that are distant in sequence. Formation of the small nucleus cannot be solely rate determining, because a significant fraction of the overall structure must be in approximately the correct conformation, providing the long-range interactions, which stabilize the nucleus. The nucleus consolidates as the structure forms - consolidation of the nucleus and the extended structure occurs concurrently. The nucleation-condensation model requires no folding intermediate, stable or otherwise. Instead, the coupling of nucleation and condensation leads to a relatively compact transition state. Note that the nucleation site does not need to be preformed in the denatured state and whilst it is extensively formed in the transition state, it may not be completely formed even then. It may be in the process of being formed and the onset of cooperative stabilizing interactions rapidly completes its formation.
The physicochemical basis of the nucleation-condensation mechanism is that there must be a critical number of interactions made in the transition state for folding (Creighton, 1995b; Fersht, 1998). As a protein folds there is an unfavourable loss of chain entropy, which is compensated by the favourable decrease in enthalpy of the interactions that are being formed. There is also an increase in entropy due to the release of H2O from hydrophobic groups. A critical number of interactions is reached and the decrease in enthalpy becomes more rapid than the loss in entropy as further interactions are formed. Enthalpy lowers more rapidly because the stabilising interactions cluster and so are formed cooperatively.
The Folding of Barnase
Barnase is a 110 residue RNase, secreted from Bacillus amyloliquefaciens. It has an α + β structure, with a major α-helix (helix1) and two smaller helices in the first half of its sequence and a five stranded antiparallel β-sheet in the second (Mauguen et al., 1982). The packing of helix1 against one face of the β-sheet forms a well-packed hydrophobic core. All peptidyl-proline bonds are in the trans conformation. There are obvious regions in its structure which make more interactions between themselves than they do with other regions (Yanagawa et al., 1993), thus barnase is a multi-domain protein. Because it has two hydrophobic cores, barnase is also described as a representative of the small multi-foldon proteins (Fersht, 1998).
Barnase has only one observable step for refolding, which has first order kinetics and a half-life of around 30 ms in H2O at 25°C (Matouschek et al., 1992). It corresponds to the formation of the native structure from a folding intermediate that is a meta-stable species under conditions which favour folding. Again there is a small fraction that folds slowly, because of cis–trans peptidyl-proline isomerisation, which can be ignored. Note that the kinetics for the formation of the intermediate from the fully unfolded state is unknown.
Over 130 barnase mutant proteins have been made and analysed (Fersht, 1998). The transition state for folding was found to have some regions with ΦF values of 0, other regions with ΦF-values close to 1 and a few regions with intermediate ΦF-values. The folding intermediate had a similar structure, with the biggest difference occurring in the hydrophobic core, which was only very weakly formed. Note that the ΦF values that are 1 in the transition state are already 1 or are only slightly lower, in the intermediate. This suggests that the rate-determining step involves the formation of the hydrophobic core, by the docking of the preformed helix1 onto the preformed β-sheet.
The Folding of Barstar at Microsecond Resolution
Barstar is an 89-residue polypeptide inhibitor of barnase. It has an α/β structure, with three large and two small helices and three strands of a parallel β-sheet. The distinction between single- and multi-foldon proteins can be subjective. CI2 is clearly a single-foldon, barnase clearly has more than one folding unit (a multi-foldon), but barstar falls somewhere between the two. One peptide bond is in the cis conformation in the native state. Thus, trans-cis peptidyl-proline isomerization dominates folding kinetics (trans is the major form in the denatured state).
There is a fast formation of a trans folding intermediate (t1/2 ~ 200 μs), followed by formation of a second, (also trans) highly native-like intermediate (t1/2 ~ 60 ms), which undergoes trans-cis peptidyl-proline isomerization (t1/2 ~ minutes) to give the final native structure (Nolting et al., 1995; Nolting et al., 1997). NMR has been used to detect weak residual native structure in helix1 and helix2 – long-range interactions stabilize secondary structure in the denatured state. ΦF-value analysis shows that the first helix (and second to a smaller extent) becomes substantially consolidated in the first few hundred microseconds of intermediate formation. This conversion of D to Itrans fits the nucleation-condensation mechanism. It also provides insight into the kinetics of formation of the folding intermediate for barnase.
The Unified Folding Scheme
Proteins vary so much in size, structure and properties that it is unlikely that there is a single mechanism for protein folding. Evolution may also interfere with folding mechanisms, sacrificing stability or optimisation of a folding rate towards a specific function (Lee & Vasmatzis, 1997). However, the folding mechanisms of CI2, barnase, barstar and many others all point towards a unified scheme and variations of such a scheme could describe a large number of folding pathways. CI2 clearly folds by a nucleation-condensation mechanism. Barnase folds stepwise: the rapid formation of individual foldons (by nucleation-condensation); followed by their rate-determining docking and consolidation. Barstar folds with rapid nucleation-condensation in the first stage, however the formation of secondary and tertiary structure is less coupled than in CI2 and so there is an element of the barnase pathway.
This Unifying Model, also called the Extended-Nucleus Model, reconciles many of the conflicting results (Otzen et al., 1994). The model suggests that the folding of barnase and barstar are models which can be applied to larger proteins in general, whilst the folding of CI2 is a model for the folding of individual foldons in larger proteins (see Figure 1.3). Folding is concerted or stepwise depending on the stability of individual substructures within the protein and whether or not they are considered in isolation or loosely complexed. The more stable a foldon, the more likely it is to form independently (via nucleation-condensation) of the rest of the protein and so the folding process becomes stepwise and hierarchical. For example, the 129 residue, two-domain protein Che-Y. In the transition state, one foldon folds by the nucleation-condensation mechanism, whilst the other remains unstructured (Lopez-Hernandez & Serrano, 1996).
Table 2 a: Proteins Analysed by ΦF-values.
| Protein | Chain Length | Number of intermediates | Number of mutants | Φ Pattern model | Comments |
|---|---|---|---|---|---|
| α-Helical proteins | |||||
| Monomeric λ repressor | 80 | 0 | 8 | Barnase | Fits diffusion-collision model. |
| Acylbinding protein (bovine) | 86 | 0 | 26 | CI2 | Nucleus in hydrophobic core. |
| SH3 domains (β -barrels) | |||||
| α-spectrin | 62 | 0 | 10 | CI2 | |
| Src | 64 | 0 | 21 | Barnased | |
| β-sandwich domains | |||||
| TNfn3(long form) | 92 | 0 | 33 | CI2 | |
| 10FNIII | 94 | 1 | 41 | CI2 | |
| CD2d1 | 98 | 1 | 7 | CI2 | |
| α/β proteins | |||||
| IgG binding domainb | 62 | 0 | 4 | Barnase | Turns analysed |
| CI2 | 64 | 0 | 150 | CI2 | |
| ADAh2c | 81 | 0 | 15 | CI2 | |
| Barstar | 89 | 2 | 25 | CI2+Barnase | |
| FKBP12 | 107 | 0 | 43 | CI2 | |
| Barnase | 110 | 1 | 130 | Barnase | |
| CheY | 129 | 0 | 34 | CI2 | One domain = CI2. The other domain is unstructured. |
| PGK | 394 | 0 | 8 | CI2 | Double mutant cycles detected tertiary interactions. |
| α+β proteins | |||||
| p13suc1e | 113 | 1 | 57 | CI2 | |
| Bimolecular association | |||||
| CI2 fragments 1-40 + 41-64 |
64 | 0 | 23 | CI2 | Concurrent folding and association. |
| Barnase fragments 1-22 +23-110 |
110 | 0 | 5(1-22) | Barnase | Docking of fully formed helix of 1-22 and consolidation of 23-110. |
| Arc repressor (dimer) | 53 | 0 | 44 | CI2 | Concurrent folding and dimerization. |
aThis table is a précis of a table from Fersht, 1998 (see references within).
bIgG binding domain of streptococcal protein L.
cActivation domain of procarboxypeptidase A2.
dIt has been suggested that src SH3 domain folding is more like CI2 than barnase (Gruebele & Wolynes, 1998).
eSchymkowitz et al., (2001)
Two-State Folding Proteins
More than twenty small proteins have now been shown to fold with simple two-state kinetics (Jackson, 1998). Table 3 details those and others which fold via an intermediate, which have undergone a ΦF-value analysis and whether their folding pathway is more like CI2 or more like barnase (Fersht, 1998). The majority have a distribution of ΦF-values similar to CI2, suggesting that nucleation-condensation is the most common mechanism for small proteins. It is likely that many other small proteins, or domains of larger proteins, will also fold in this way. Such proteins fold without detectable intermediates, so providing the simplest models of protein folding – only the denatured state (D) and the native state (N) are populated on the folding pathway. Despite these proteins all folding with two-state kinetics, there is still a tremendous variety in the rates with which two-state proteins fold, the structures of their native state, their stabilities, sequences and the position of the transition state on the reaction co-ordinate.
Many theoretical and experimental studies have attempted to identify the most important factors in determining how a protein folds. Chain length (size), topology, sequence and stability could all be crucial determinants in the rate of folding (Dobson et al., 1998, Plaxco et al., 2000). Some studies suggest topology to be the most important determinant of how a protein will fold (Plaxco et al., 1998; Chiti et al., 1999b; Martinez & Serrano, 1999; Riddle et al., 1999). However, studies on the topologically similar members of the immunoglobulin family have shown that they fold with rate constants which correlate with stability (Clarke et al., 1999). Studies on horse and yeast cytochrome c, also suggest that stability is an important factor (with the exception of two proteins which have very similar topology) (Mines et al., 1996), however, results from studies on the folding of homologues of cold shock protein B disagree (Perl et al., 1998). Another observation has been that conserved residues appear important in forming the folding nucleus (Gunasekaran et al., 2001; Shakhnovich et al., 1996). Conflicting theoretical and experimental studies on the importance of: chain length, topology, stability, sequence and conservation of residues show that there is still only a basic understanding of the determinants of protein folding. Despite this, as the list of characterised folding pathways grows, there is slowly an appearance of trends.
The Nature of the Transistion State
In a simple chemical reaction one or two high-energy covalent bonds are formed and broken. The transition state for folding involves the simultaneous making and breaking of many weak non-covalent interactions. Like the native state, it will be an ensemble of structures of similar energy. Predicting and characterising the exact nature of this ensemble has become the work of many theoreticians and experimentalists. Different theoretical studies have predicted wide and narrow ranges of structures present in the transition state (Abkevich et al., 1994; Dobson et al., 1998). However, for most two-state folding proteins, experimental evidence suggests that the transition state is an ensemble of closely related structures. Protein engineering and the Brønsted analysis suggests this the most strongly (Fersht et al., 1994a,b). A Tanford (βT)-value analysis, allowing the effect of mutations on the position of the transition state also indicates this (Jackson, 1998). If there is a wide distribution within the transition state ensembles then a mutation could destabilise one set of similar structures and stabilise another set. Thus, the greater the distribution of ensembles, the greater the sensitivity of the position of the transition state to mutation. However, results show that there is only a small movement of with mutation (Jackson, 1998). This has been attributed to Hammond behaviour – the structure of the transition state becomes more native-like as the energy difference between the transition and native state is decreased (Matouschek et al., 1995; Dalby et al., 1998c). There is one experimental exception, a monomeric λ repressor, whose transition state for folding is greatly affected by mutation and large changes in βT (βT varies between 0.39 and 0.83) are observed (Burton et al., 1997).
Two- or Three- State Kinetics - Looking for Trends
What determines whether a protein folds with two- or three-state kinetics? In a recent review, two- and three-state proteins were compared and it was shown that the main differences appeared to be chain length and stability (Jackson, 1998). Small proteins, with chain lengths less than 110 residues tended to fold with two-state kinetics. Also, proteins which fold with three-state kinetics, are generally more stable. This is consistent with studies on an immunoglobulin family of β-sheet proteins (Clarke et al., 1999). Other studies have shown that destabilising the native state of acylphosphatase causes it to switch from three- to two-state kinetics (Chiti et al., 1998a; Chiti et al., 1999a). This destabilisation of the native state may result in a destabilisation of the intermediate relative to the denatured state, thus changing the energy pathway. Stabilising with co-solvents, causes ubiquitin to switch from two- to three-state kinetics (Khorasanizadeh et al., 1996). Yet the addition of sodium sulphate does not switch the kinetics from two- to three-state for FKBP12 or the IgG binding domain of peptostreptococcalprotein L (Scalley et al., 1997; Main et al., 1999a).
Proteins that fold with two-state kinetics are generally small. It is difficult to find other general trends, because two-state folding proteins have different structures, stabilities and folding rates. The stability of the native state is as low as 2 kcal mol-1 or as high as 8 kcal mol-1. Unfolding and refolding rates vary by more than a factor of 105, which corresponds to a difference of 7 kcal mol-1 in energy terms at 25 °C.
Topology versus Stability – Contact Order
Correlations have been sought between proteins which exhibit two-state folding and their size, stability and topology (Jackson, 1998; Fersht, 2000). A parameter called contact order (CO) was developed to correlate rate constants of folding (kf) with their topology (Plaxco et al., 1998, 2000). If a protein has a low contact order, then on average, residues interact with other residues that are close in sequence, for example, α-helical proteins. If a protein has a high contact order, then on average, residues interact with other residues that are far apart in sequence, for example, β-sheet proteins. This correlation points to topology as being an important factor in the rate of folding. The contact order theory is that, on average, bringing into contact residues that are close in sequence will not require as extensive a search through conformational space as does bringing together residues that are distant in space. It has been observed that the rate constant of folding, , decreases with an increasing contact order, thus supporting the contact order theory (Chiti et al., 1999b; Martinez & Serrano, 1999; Riddle et al., 1999). However, in contradiction, it has been found that three members of a family of the same topology fold with rate constants that correlate with stability and not contact order (Clarke et al., 1999). Protein engineering studies also show that mutations which don’t affect the contact order, can change the folding rate by many orders of magnitude (Jackson, 1998). Thus, other factors must also be significant.
Kinetic Traps/Off-pathway Intermediates/Molten Globule States
A number of proteins that have previously been shown to fold with multi-state kinetics, are now believed to populate intermediate states which act as kinetic traps, and in some cases may be off-pathway (Silow & Oliveberg, 1997). For example, studies on lysozyme revealed the existence of multiple folding pathways (Radford et al., 1992; Lu et al., 1997; Matagne et al., 1997). Under strongly native conditions the majority of molecules (>80%) fold on a slow pathway with a well-populated intermediate state. Extensive characterisation of this intermediate and the transition from it to the native-state shows it to be a kinetic trap to folding – the polypeptide chain must rearrange before proceeding to the native state. Kiefhaber has studied the much smaller fraction of molecules (<20%), which fold rapidly along a “fast track”, using “interrupted-folding experiments” (Kiefhaber, 1995). Results suggest that the fast pathway could correspond to a two-state transition, with no intermediates populated. Yet another experiment, suggests the existence of the highly native-like intermediate also on the lysozyme folding landscape, but not necessarily on the same route (Lu et al., 1997). This lysozyme example is consistent with an energy landscape of folding.
Thus, it is now believed that folding is not intrinsically slow and in many cases, proteins which have been observed to fold slowly may do so as a result of a misfolded species, on or off the folding pathway, but part of the energy landscape. In such a case there may be a fast track to folding which is only populated under certain experimental conditions. Already, in several cases, conditions have been found such that proteins which normally fold through a populated intermediate state can fold with two-state kinetics (Khorasanizadeh et al., 1996; Dalby et al., 1998a; Dalby et al., 1998b).
The highly unfolded state of a protein in concentrated denaturant can be called U, because it most resembles the random coil (Fersht, 1998). The starting state for protein folding studies in vitro is one that is present under physiological conditions – refolding is initiated by restoring a protein from denaturing to folding conditions. U can rearrange in the dead-time of a stopped flow mixing experiment (< 1 ms). This state is usually called a folding intermediate, but because it is more stable than U, it is the most stable of the non-native states under physiological conditions. Thus the intermediate is called Dphys, the denatured state under physiological conditions (Fersht, 1998).
Dphys can be sometimes be related to states thought to be molten-globules. Molten-globules are slightly compact, partly folded states of proteins that can sometimes be isolated under mildly denaturing conditions, or when a cofactor or metal ion that is essential for stability has been removed (Ptitsyn, 1995a; Ptitsyn, 1996). Molten-globule states are characterised by having few tertiary interactions, some secondary structure and a fluctuating hydrophobic core. They are separated from the native state by a high activation energy. Their hydrophobic nature is detected by the binding of 8-anilinonaphthalene-1-sulphonic acid (ANS). ANS has an affinity for mobile hydrophobic regions (Stryer, 1965). Work on the helical protein, apomyoglobin, has characterised an equilibrium molten-globule state, which is maximally populated at pH 4 and is very similar to the structure of the kinetic intermediate (Jennings & Wright, 1993; Jamin & Baldwin, 1996).
Disulphide Bonds
One of the reasons why CI2, barnase and barstar were chosen to study the early stages of protein folding is because they have no disulfide crosslinks. Disulfide crosslinks are incorporated at the late stage of protein biosynthesis (Freedman, 1994, 1995). They effectively “staple” the structure together. When a protein with disulfide bonds is denatured, it may retain its crosslinks and so have the constraints of the native-state, thus easily “bouncing” back into shape. Note, however that disulfide-linked proteins provide insight into other stages of folding processes. For example, intermediates on the folding pathway of bovine pancreatic trypsin inhibitor that are covalently linked by -S-S- bond formation, have been trapped and analysed (Creighton, 1992a; Creighton, 1992b; Creighton, 1997).
Section 2 The Theory of Protein Folding
Thermodynamics of Protein Folding
Solvent Denaturation of Proteins
This analysis is for a two-state model of denaturation where only the native and the denatured states are populated. The equilibrium constant for unfolding, KD-N, and the free energy of unfolding, ΔGD-N, in the presence of a denaturant may be calculated from:
Image:KDN formula.png equation 1
where F is the observed fluorescence and FN and FD are the values of the fluorescence of the native and denatured forms of the protein respectively. R is the gas constant, 8.314 J mol-1 K-1. T is the temperature and was 298 K for all experiments unless specified otherwise. If the values for FN and FD are independent of the concentration of denaturant, [denaturant], then equation 1 can be applied directly.
Tanford's Equation
It has been found experimentally that the free energy of unfolding of proteins in the presence of a denaturant is linearly related to the concentration of that denaturant (Tanford, 1968; 1970; Pace, 1986):
Image:DGDN denat formula.png equation 2
where Image:DGDN denat.png is the free energy of unfolding at a particular denaturant concentration; Image:DGDN H2O.png is the free energy of unfolding in water and Image:MDN image.png is a constant that is proportional to the increase in degree of exposure of the protein upon denaturation. Image:MDN image.png has the dimensions of cal-1 mol-1 M-1 or J-1 mol-1 M-1.
Image:MDN formula.png equation 3
Equation 2 can only be tested over a very narrow range of denaturant concentrations. This is because the fraction of denatured state switches from being immeasurably small to being indistinguishable from 100% in a very small concentration range (e.g. 3 to 5 M urea for a typical small protein. In this small concentration range good linearity is observed. However, it is still difficult to estimate Image:DGDN H2O.png from Equation 2 because of the long extrapolation back to 0 M denaturant.
The mD-N Value
Interesting information can be obtained from the term mD-N. Suppose in a protein, each group, i, has a free energy of transfer of Image:Dgtri.png cal mol-1 from water to a 1 M solution of denaturant when it is fully exposed to the solution. However, if the group does not become fully exposed in the denatured state, then it will increase its exposure by a fraction . i.e.  is the fractional degree of exposure of residues upon unfolding. Thus, the effective free energy change for the group on denaturation is Image:Aidgtri.png. The mD-Nvalue can be described as a sum of all the groups free energy changes:
Image:MDN formula extended equation 4
Tanford developed a model which related the mD-N-value to , and the amino acid composition of the protein (Tanford, 1968; 1970):
 Image:DGDN formula extended equation 5
Image:Eadgtri can be calculated, as a function of denaturant concentration, using the solvent accessible surface area (s.a.s.a.) of residues (to obtain α ) and values of Image:Dgtri.png. The s.a.s.a. for residues is calculated from the crystal structure relative to those calculated for model peptides (Miller et al., 1987). Values for Image:Dgtri.png are obtained from model compound studies (Pace, 1986). A low experimental value for mD-N compared with the calculated Image:Eadgtri.png value indicates either that the protein does not become highly unfolded when denatured, or that the denaturation process is occurring stepwise rather than in a single cooperative transition. Note also, that mD-N is proportional to the number of groups in the protein. This means that small proteins, because they experience only a small change in surface area upon denaturation compared with larger proteins, will naturally have lower mD-N values.
The solvent-accessible surface area of side and main chain groups of wild-type and mutant FKBP12 were calculated using the program Xplor (Brunger, 1992b). The percentage of solvent accessible surface area in the native state relative to the unfolded state was calculated using solvent accessible surface areas of model tripeptides (Miller et al., 1987).
For many proteins data can be fitted to (Ahmad & Bigelow, 1986):
Image:Log Eadgtri formula extended equation 6
where A is a constant for a particular protein and B is a constant for a particular denaturant, such that:
Image:Log Eadgtri formula equation 7
[denaturant] versus Image:Eadgtri.png can be plotted. The slope at a particular denaturant concentration gives AB[denaturant]B-1. Thus, as the slope also equals mD-N, (equation 4) one can calculate mD-N over a range of denaturant concentrations.
Calculation of: ΔGD-NH2O [denaturant] 50% and mD-N
If both FN and  FD (equation 1) are independent of denaturant concentration one obtains equation 9 by combining equations 1 and 2:
Image:F formula equation 8
However, this is not normally the case. Experimental spectroscopic data has sloping baselines because the signals of the denatured and native states change approximately linearly with denaturant concentration. i.e., we observe that both FN and FD are linearly dependent on denaturant concentration. Therefore FN = αN + βN [denaturant] and FN = αD + βD [denaturant] and:
Image:F formula extended equation 9
αN is the fluorescence signal of the native state protein at 0 M denaturant (i.e. the intercept) and  is the slope of the fluorescence base-line at low denaturant concentration: βN = dαN/d[denaturant]. αD is the extrapolated fluorescence signal of the denatured state protein at 0 M denaturant and βD is the corresponding slope for the denatured state (at high denaturant concentrations): βD = dαD/d[denaturant]. i.e., αN and αD are the intercepts and βN and βD are the slopes of the baselines at low (N) and high (D) denaturant concentrations. For a detailed derivation see (Jackson & Fersht, 1991a; Fersht, 1998).
When comparing the stability of wild-type and mutant proteins it is often more important to know the accuracy of [denaturant]50% , the concentration at which 50% of the protein is unfolded, i.e., the midpoint of unfolding. At [denaturant]50% it is apparent from equation 1 that ΔGD-N[denaturant]50% equals zero and so from equation 2 ΔGD-NH2O, the free energy of unfolding in water equals:
Image:DGDNH2O equation 10
Combining Equation 9 and Equation 10 gives:
[[Image:F formula long]] 3.11
The entire data set from the fluorescence-monitored denaturation experiments can be fitted to equation 11 to obtain values for [denaturant]50% and mD-N. ΔGD-NH2O is obtained using equation 10.
Calculation of: ΔΔ<i>GD-N</i>
The more general equation can be applied:
Image:DGDN formula equation 12.
Calculation of: ΔΔGD-N
There are four different ΔΔGD-N values which can be calculated using [denaturant]50% and mD-N values for wild-type and mutant (*) protein:
1, It is useful to define ΔΔD-NH2O, which can be calculated without using an average  value using:
Image:DDGDNH2O formula equation 13
2,  ΔΔGD-N[denaturant], the difference in the free energies of wild-type and mutant protein can also be obtained at any denaturant concentration using the more general equation:
Image:DDGDNDenaturant formula  equation 14, where: [denaturant]50%  is the midpoint of unfolding of wild type; [denaturant]*50% is the midpoint of unfolding of mutant; and m*D-N is the mD-N value for the mutant. 3, ΔΔGD-N[denaturant]50%, is the difference in the free energy of unfolding of the wild-type and mutant protein at the mean value of the [denaturant]50% for the two proteins. The value of  is found for the concentration:
[denaturant] = 0.5([denaturant]50% + [denaturant]*50%).
ΔΔGD-N[denaturant]50% can be obtained accurately by applying:
 ΔΔD-N[denaturant]50% equation 15
where Δ[denaturant]50% is the difference in the of the wild-type and mutant protein: Δ[denaturant]50% = [denaturant]50% - [denaturant]*50%.
4, Repetitive measurements of mD-N for an individual mutant have a variability of + 5 to 10%, whereas [denaturant]50%  is very reproducible at + 0.05 M (Itzhaki et al., 1995a). This is because [denaturant]50% is insensitive to small errors in baselines. It can be expected that mD-N values for wild-type and mutants will be the same within experimental error except for a few outliers. Thus the value of ΔΔGD-N<mD-N> can be calculated using:
Image:DDGDNmDN formula equation 16
where <mD-N> is the mean of all  values within the data set.
Kinetic Data Analysis
Unfolding Kinetics If unfolding is monophasic, then data can be fitted to a single exponential process with linear drift and offset:  equation 17 where: F(t) is the fluorescence at time, t; A0 is the amplitude; ku is the rate constant for unfolding; n is the slope of the drift and C is an offset.
The natural logarithm of the rate constant for unfolding, ku , can be plotted against the final denaturant concentration. This plot is found to be linear for many proteins when >(Fersht, 1998):  equation 18  is the rate constant of unfolding at a given denaturant concentration;  is the rate constant of unfolding in water (i.e. it is the value extrapolated to the absence of denaturant);  is a constant of proportionality (i.e. the slope of the plot of  versus ), and  the final denaturant concentration.
The rate constant for unfolding, , directly relates to the stability of the transition state using:  equation 19 where  is the Gibbs free energy of the wild-type transition state relative to the wild-type folded state. A is the pre-exponential factor. Equation 19 can also be explained in terms of activation energies:  equation 20  is the free energy of activation for unfolding in water. The subscripts ‡ and N refer to transition states and native states respectively. (Note that this equation is analogous to equation 2, , the equation for equilibrium unfolding. The term  is analogous to ).
The plot of versus  can, however, show slight deviations from linearity. Such non-linearity has been observed for other proteins and may result from small movements in the position of the transition state with denaturant concentration (Matouschek & Fersht, 1993; Matouschek et al., 1995), or may be due to the intrinsic non-linearity of the free energy of transfer, , with  (equations 3.6 and 3.7). To account for this non-linearity data can also be fitted to a second order polynomial equation:  equation 21 where  and  are the coefficients for the first and second-order  terms respectively. The slope of the plot at a particular  is given by:  equation 22
Refolding Kinetics The refolding of proteins can be a multiphasic process, with three phases well-resolved in water (0 M urea) at 25 °C. The data can be fitted to equations describing either a single, double, double plus drift or a triple exponential process:  single exponential (23)  double exponential (24)  double exponential plus drift (25)  triple exponential (26)  is the fluorescence at time t; ,, are the amplitudes; ,, are the rate constants for folding; n is the slope of the drift and C is an offset.
Two-State Kinetics For FKBP12, CI2 and some other small proteins, it is found that when <, the folding rate constant, , plotted against the concentration of denaturant follows a linear relationship. If folding is a reversible transition between just two states, the native and denatured state, then the rate constant for folding, , must follow the rate law:  equation 27 where  is the rate constant of refolding at a given denaturant concentration,  is the rate constant of refolding in water and is the slope. Again this can be explained in terms of activation energies:  equation 28  is the free energy of activation for refolding in water. The subscripts ‡ and D refer to transition and denatured states respectively.
Chevron Plot Combining the two rate constant equations 3.12 and 3.22, produces the equation for a chevron plot. This is a V–shaped kinetics curve showing the refolding curve and the unfolding curve.  3.29  is the rate of unfolding or refolding at a particular denaturant concentration (Jackson & Fersht, 1991a). Data can also be fitted to:  3.30 The point of the V is where the midpoint of unfolding, , occurs.
The Tanford β Value The values of  and  can be related to the average fractional change in degree of exposure of residues between initial and transition state in an analogous manner to  (Tanford, 1968; 1970). The tanford value, , is defined as a measure of the fractional change in degree of exposure of residues in the transition state relative to the denatured state from the native state (Fersht, 1998), i.e., the fraction of the surface area that is buried, between the denatured and transition state, relative to the change in surface area between the denatured and native states (Jackson, 1998):  equation 31 This is equivalent to:  equation 32 and  equation 33 or  equation 34 for a two-state system.
Two State Model Various criteria indicate that a system follows a two-state model of folding (Jackson & Fersht, 1991a). Some of these criteria have been discussed above or are implicit. However, the most important indication of two-state folding is that: the values for  and m calculated from kinetic experiments must be the same as the values calculated from equilibrium experiments (Jackson & Fersht, 1991a). i.e.,  or  equation 35 and  equation 36 must hold.
Temperature Dependence Studies Non-linearity in Eyring plots is observed when there is a significant decrease in the heat capacity between the initial state and the transition state. In this case: the activation enthalpy, ; the activation entropy, ; and the activation energy, , are dependent on temperature. Chen and co-workers describe an analysis such that data can be fitted to (Chen et al., 1989):  equation 37 where:     is the heat capacity change between the initial and transition state.  is the change in enthalpy between the initial and transition state. is the change in entropy between the initial and transition state.  is the change in free energy between the initial and transition state.  is the Boltzmann constant ( = 1.32 * 10-2 J K-1).  is a reference temperature, in this case 25 °C.
The Tanford β (Heat Capacity) Value A value analogous to  can be calculated from the change in heat capacity of the transition state from unfolding and folding experiments:  equation 38 where is the heat capacity change between the denatured and transition state and is the the heat capacity change between the native and transition state.
Transition State Theory
Calculation of  The effect of mutation on the energy of the transition state of unfolding can be calculated using transition state theory. In order to obtain information about the structure of the transition state, the stability of the transition state of mutant protein relative to that of wild-type protein is calculated using:  equation 39 where  is the difference in energy of the transition state of unfolding relative to the folded state between wild-type and mutant protein;  is the Gibbs free energy of the wild-type transition state relative to the wild-type folded state;  is the Gibbs free energy of the mutant transition state relative to the mutant folded state;  is the rate constant of unfolding for wild-type protein and  is the rate constant of unfolding for mutant protein.
Calculation of  Information about the structure of the transition state of folding can be obtained by calculating the stability of the transition state of mutant protein relative to that of wild-type protein using the equation:  equation 40 where  is the difference in energy of the transition state of unfolding relative to the unfolded state between wild-type and mutant protein,  is the Gibbs free energy of the wild-type transition state relative to the wild-type unfolded state;  is the Gibbs free energy of the mutant transition state relative to the mutant unfolded state; is the rate constant of folding for wild-type protein and is the rate constant of folding for mutant protein.
Phi Value Analysis
The structure of a protein’s transition state for folding can be analysed by combining kinetic and equilibrium data to produce a ratio called a value. This value is obtained by normalising either (obtained from unfolding kinetics), or (obtained from refolding kinetics) against (obtained from equilibrium data). The value for folding, , is given by:
equation 41
and the value for unfolding, , is given by: equation 42 For proteins which fold with two-state kinetics (e.g. CI2, FKBP12) it is found that: equation 43 and equation 44 Thus: equation 45
equation 46 In general, -values are discussed in terms of , unless stated otherwise.
Interpretation of -Values
= 1 and = 0.
There are two limiting cases of that may be interpreted in a simple manner. A value of = 1 occurs when , i.e., the interaction energy lost upon mutation is the same in the native and transition states. This implies that the structure at the site of mutation is as folded in the transition state as it is in the folded state i.e., the interactions deleted at the site of mutation are fully formed in the transition state (see Figure 3a). Conversely, a value of = 0 occurs when , i.e., when the mutation has no effect upon the energy of the transition state relative to the unfolded state. This shows that at the site of mutation there is as little structure in the transition state as in the denatured structure, i.e., none of the native state interactions have been formed in the transition state at the site of mutation (see Figure 3b).
Fractional -Values. Fractional values of are more difficult to interpret because a number of different situations can give rise to a fractional value. Fractional values may result from changes in the solvation energy of the mutated side chain in either the unfolded, transition or native state and be unrelated to structure per se. Consequently, attempts are always made to obtain Φ-values for mutations that substitute a non-polar residue for another non-polar residue, where differences in changes of solvation are small. For these cases, fractional values may result from either: the structure at the site of mutation being weakened in the transition state, or, a mixture of species arising from folding via parallel pathways i.e., portions of the protein are native-like in the transition state of one pathway (ΦF = 1) but unfolded in the transition state of another pathway (ΦF = 0) (see Figure 3.2).
Figure 3: -value analysis of a two-state folding reaction depicted as a free energy diagram.
A
B
a, = 1 is illustrated. b, = 0 is illustrated. Wild type is shown in black and the mutant in red. Note that this figure is annotated with alternative nomeclature where: U is equivalent to D, the unfolded or denatured state; and F is equivalent to N, the folded or native state. Figure kindly supplied by Dr Jane Clarke, Cambridge, UK.
Figure 4: A Folding Mechanism that Proceeds along Two Parallel Pathways for a Two-State System.

Pathway A shows that the interactions probed are as formed in the transition state as in the native state, producing a = 1. Pathway B shows that the interactions probed are as unfolded in the transitions state as in the denatured state, producing a = 0. If a protein folded along two such parallel pathways then the observed -value would equal 0.5.
Brønsted Analysis
Whether there is a genuine single pathway or parallel pathways, which would result in multiple distinct transition states, is a fundamental question. Fersht et al. have proposed a kinetic test to distinguish between these possibilities (Fersht, 1994). This test is based on Brønsted behaviour observed in physical organic chemistry for simple systems in which a single bond is made/broken in the transition state of the reaction. Brønsted behaviour for protein folding reactions can be analysed using equation 47: equation 47 where is the rate constant for unfolding of the parent molecule, is the rate constant for unfolding of the mutant molecule, and is a constant which is related to the degree of structure formation in the transition state. For parallel pathways, involving transition states in which interactions are either fully formed or fully broken, one would expect non-linear Brønsted behaviour. For parallel pathways, a change in pathway is expected, as one pathway becomes destabilised relative to another, depending on the elements of structure that are destabilised upon mutation. For a protein folding reaction to show Brønsted behaviour, plots of versus should be linear for single elements of structure.
Theoretical Calculations
Contact Order (CO)
equation 48
where N, is the total number of contacts in a protein; , is the number of residues separating contacts i and j; and L is the number of residues in the protein. If a protein has a low contact order, then on average, residues interact with other residues that are close in sequence, for example, α-helical proteins. If a protein has a high contact order, then on average, residues interact with other residues that are far apart in sequence, for example, β-sheet proteins.

