Flexibility of the “rigid” classics or rugged bottom of the folding funnels of myoglobin, lysozyme, RNase A, chymotrypsin, cytochrome c, and carboxypeptidase A1

ABSTRACT The abilities to crystalize of a globular protein and to solve its crystal structure seem to represent triumph of the lock-and-key model of protein functionality, where the presence of unique 3D structure resembling aperiodic crystal is considered as a prerequisite for a given protein to possess specific biologic activity. The history of protein crystallography has its roots in first crystal structures of myoglobin, lysozyme, RNase A, chymotrypsin, cytochrome c, and carboxypeptidase A1 solved more than 50 y ago. This article briefly considers extensive structural information currently available for these proteins and shows that the bottoms of their folding funnels (i.e., the lowest parts of their potential energy landscapes) are not smoothed but rugged. In other words, these crystallization classics are characterized by significant conformational flexibility and are not rigid (immobile) crystal-like entities.

KEYWORDS conformational dynamics; intrinsically disordered proteins; protein function; protein structure; structural flexibility; structure-function relationship For a long time, it was believed that the specific functionality of a given protein is predetermined by the precise spatial positioning of amino acid side chains and prosthetic groups, which, in its turn, is predestinated through a defined 3-D structure of this protein (the so-called structure-function paradigm). Although proteins participate in a large variety of biologic functions, historically, one class, namely enzymes, has attracted the majority of the attention of researchers at the early stages of protein science. This is because the enzymes are biologic catalysts that regulate numerous biologic chemical reactions, the kinetic mechanism of which could be deduced indirectly from the effect of enzymes on their substrates, which can be easily monitored by one of many spectroscopic techniques. As a result, in early studies (see, for example, ref. 1 ), much was learned about the kinetics of enzyme action and the mechanisms of enzymatic catalysis just by the simple analysis of substrate to product conversion, despite the fact that little information was available about structural peculiarities of enzymes themselves. 2,3 However, even at those times, it was clear that the precise knowledge of enzyme structure might allow better understanding of the molecular mechanisms of its catalytic action and potentially might open ways to modulate or change existing enzymatic activities, or even create new enzymes with new functions. This explains why enzymology became a major focus for the scientific curiosity of a great many researchers, and why the majority of significant breakthroughs in the understanding of protein folding, structure, and function were made using enzymes as models.
The classic structure-function paradigm, where protein functionality is directly linked to its unique rigid 3-D structure, is based on the fruitful analogy of enzymes to inorganic catalysts, which are known to accelerate chemical reaction by providing a lower energy pathway between reactants and products, typically via the formation of an intermediate, which cannot be formed in the absence of the catalyst. In this view, since the active surface of a classical catalyst is expected (and is known) to be rigid to do its job efficiently, protein catalysts were assumed to have rigid 3-D structures (at least in the vicinity of their active sites) to be functional. This hypothesis represents critical foundation of the modern structural biology and was a cornerstone of protein science well before the resolution of the first protein 3-D structure. As a matter of fact, as early as 1894, Emil Fischer formulated his famous "lock-and-key" hypothesis to explain the remarkable specificity of the enzymatic hydrolysis of glucosidic bonds by different enzymes, where the efficiency of catalysis was suggested to depend on the unique complementarity of rigid structures of a substrate and an enzyme. 4 It is not surprising, therefore, that when the first crystal structures of proteins were solved by X-ray diffraction, this was taken as a strong support of a global validity of the "lock-and-key" hypothesis. As a result, the sequence-structure-function paradigm, according to which unique 3-D structure of a protein is a prerequisite for its function, seemed to become the absolute truth. In fact, already the first 3-D structure determined for an enzyme, where the bound inhibitor (Nacetylglucosamine, (NAG) 3 ) was co-crystallized with lysozyme, clearly showed that the precise locations of the amino acid side chains in the active site were crucial for facilitating catalysis. 5 The unique and very specific spatial orientation of substrate/inhibitor/products within active sites of enzymes relative to catalytic amino acid side chains, being incontrovertibly demonstrated for many enzymes, is now a well-established fact. Furthermore, the X-ray structures gave exceptional stereochemical clarity and insight to enzyme action, enabling the ascription of precise roles to functional groups that were localize in the active sites. This hypothesis was tested and proven in many instances by modifying specific functional groups (i.e., by using the site-directed mutagenesis, a technique that allows amino-acid sequences in proteins to be altered at will) and demonstrating the effects of such substitutions on enzymatic activity. [6][7][8][9][10][11][12] However, although structure-function paradigm is broadly accepted and seems to be strictly based upon the crucial need of the presence of unique 3-D structure for protein functionality, not all proteins are structured throughout their entire lengths, and many proteins are, in fact, highly flexible or structurally disordered as a whole or contain substantial intrinsically disordered regions. [13][14][15][16][17][18][19][20][21][22][23][24] Furthermore, even if one excludes such intrinsically disordered proteins and hybrid proteins containing ordered domains and functional intrinsically disordered regions from the consideration and focuses only on ordered, well-folded globular proteins, which are known to be characterized by the presence of unique 3D structures, one could immediately see many of these ordered proteins cannot be considered as completely rigid, rock-like entities. On the contrary, the importance of conformational flexibility and the need of structural dynamics for the successful functionality of globular proteins (even enzymes) was emphasized in many studies over the past 55 y (e.g., refs. [25][26][27][28][29][30][31][32][33][34][35][36][37]. In fact, internal dynamics is known to be crucial for the biologic activity of many ordered, well-folded proteins. Here, functional dynamics involves movements of not only individual amino acid residues or groups of amino acids relative to each, but even displacements of entire domains. In enzymes, such function-related movements are needed to facilitate catalytic activity, and they can happen in a wide spread of time-scales, ranging from femtoseconds to seconds. 27,33,34 Therefore, the honest description of a functional globular protein should include consideration of the presence of conformational substates (some of which could be quite different). Such substates of the same overall protein structure originate from the atomic displacements of different range and result in the appearance of interconverting local configurations. [38][39][40][41][42][43][44] This idea is illustrated by Fig. 1 schematically representing the potential energy landscape of an ordered protein. 45 Although such landscape is traditionally considered as a folding funnel with a large set of unfolded conformations constituting a broad mouth at the top of the funnel, and with the lowest energy state corresponding to the native structure being located at the narrow end at the bottom of funnel, [46][47][48][49][50] careful analysis revealed that the surface of the bottom of the funnel for many globular proteins is actually not smooth, being rough or rugged, reflecting the presence of many smaller minima corresponding to different substates sampled by the a protein (see Fig. 1).
The first crystal structure of myoglobin (an important protein containing a heme group that reversibly binds oxygen) was solved in 1958, 51 and then refined in 1960. 52 This, actually, was the very first high-resolution crystal structure of a protein molecule. The crystal structure of lysozyme determined in 1965 was the first crystal structure of an enzyme. 53 Same year, crystal structure of the lysozyme-inhibitor complexes was determined. 54 Crystal structure of another enzyme, RNase A, an RNA-cleaving enzyme stabilized by 4 disulfide bonds was determined in 1967. 55 Same year, crystal structure of another enzyme, chymotrypsin, which is one of the serine proteases, was solved. 56 The first crystal structure of another heme-containing protein, cytochrome c, one of the first mammalian proteins subjected to X-ray crystallography, was also determined in 1967. 57 Finally, the first crystal structure of carboxypeptidase A1, a zinc metallopeptidase, was solved in 1969. 58 Curiously, although first crystal structures of globular proteins (such as myoglobin, lysozyme, RNase A, chymotrypsin, cytochrome c, and carboxypeptidase A1) were crucial for "crystallization" of a rigid view of functional protein, the analysis of currently available structural information for these proteins clearly shows that these "rigid" crystallization classics are in fact rather flexible and are characterized by rugged bottoms of their folding funnels (i.e., by presence of several, and often rather different) structural substates. This idea is illustrated by Fig. 2 that represents 2 aligned structures determined by X-ray crystallography for each of these 6 crystallization classics. Structures presented in this figure clearly shows that all 6 proteins are characterized by rather significant structural flexibility, which is limited to local structural rearrangement of different range for myoglobin, lysozyme, chymotrypsin, and carboxypeptidase A1, but reaches very large scales for cytochrome c and RNase A.
Since for each of these proteins multiple crystal structures are known, the PDBFlex database (http:// pdbflex.org/) can be exploited to analyze the degree of their structural flexibility. 59 In fact, PDBFlex generates useful information on the flexibility of protein structure based on the analysis of variations between the different structural models of the same protein in the Protein Data Bank (PDB). 59 Brief description of the results of these analyses are summarized below.
For evaluation of the flexibility of myoglobin, the 101mA cluster of the PDBFlex database (http://pdbflex. org/php/api/rmsdProfile.php?pdbIDD101mandchain-IDDA) was analyzed. This cluster contains structural information on 232 individual chains of sperm whale myoglobin with known X-ray structures. Analysis of this cluster revealed that myoglobin is characterized by low, but noticeable structural flexibility. In fact, the average Root Mean Square Deviation of Ca atoms (RMSD) between all structures in this cluster is 0.551 A , and these structures are characterized by an average contact map overlap (CMO) of 0.942. Since CMO corresponds to the proportion of inter-residue contacts conserved between a pair of similar structures, CMO values of 1 correspond to the completely preserved contacts, whereas a CMO value of 0 indicates that all contacts are different. The maximal pairwise RMSD of 2.476 A , which is the averaged RMSD between a pair of similar structures, was found for the structures with PDB IDs 105mA and 2eb8A. The minimal pairwise CMO that serves as the measure of the proportion of inter-residue contacts conserved between a pair of similar structures, was 0.779. It was noted that the RMSD and CMO, being specifically sensitivity to different levels of protein flexibility, provide different measures of protein structural dynamics. Here, the presence of high global structural flexibility in the form of hinge-like movements is reflected in high RSMD scores, whereas the presence of high local flexibility corresponding to the changes in secondary structure can be found by low CMO values. 59 Finally, to compare the structural flexibility of myoglobin with the intrinsic disorder predisposition of this protein, Fig. 3A represent the output of the PDBFlex analysis of the 101mA cluster combined with the intrinsic disorder profile of Physeter catodon myoglobin (UniProt ID: P02185) generated by the superposition of the outputs of PONDR Ò VLXT, PONDR Ò FIT, PONDR Ò VL3, PONDR Ò VSL2, IUPred_short, and IUPred_long, and a consensus disorder profile calculated by averaging disorder profiles of individual predictors. This comparison revealed a rather good correlation between the results of intrinsic disorder analyses and the PDBFlex output, indicating that structural flexibility of this protein can be, at least in part, explained by the peculiarities of intrinsic disorder distribution within its sequence. Figure 3B compares the PDBFlex output for hen egg white lysozyme with the intrinsic disorder propensity of this protein (UniProt ID: P00698). In this case, the PDBFlex cluster 1lsgA (http://pdbflex.org/php/ api/rmsdProfile.php?pdbID D 1lsgandchainID D A) includes 673 chains. The members of this cluster are characterized by relatively low structural flexibility, possessing the average RMSD of 0.504 A and the average CMO of 0.950. However, some members of the cluster (e.g., chains 1lkrB and 1a2yC) possess noticeable changes in local secondary structure. As a result, the maximal pairwise RMSD for this pair of chains is 1.97 A , and its minimal pairwise CMO is rather low, 0.661, indicating the presence significant structural difference between chains 1lkrB and 1a2yC. Again, Fig. 3B shows there is a remarkable agreement between the results of structural flexibility analysis and the propensity of lysozyme for intrinsic disorder. Maximal structural flexibility (in a form of average local RMSD) is observed for the 90-11 region that includes 99-104 fragment, which previously was reported to be  65 for myoglobin, 1lkrB (blue ribbon) and 1a2yC (green ribbon) for lysozyme, 1f0vB (blue ribbon) 62 and 3fkzA (green ribbon) 66 for RNase A, 1oxgA (blue ribbon) 67 and 1ex3A (green ribbon) 68 for chymotrypsin, 1u75B (blue ribbon) 69 and 3nbsC (green ribbon) 63 for cytochrome c, and 2abzB (blue ribbon) 70 and 1hdqA (green ribbon) 71 for carboxypeptidase A1. Structures presented in this plot were generated using a molecular graphics program VMD. 72 e1355205-4 variable. 60 This Val99-Gly104 variable region is known to be a part of the active site cleft, and can be found in a conformation inflected toward the active site (proximal conformational) or can turn away from the active site cleft (distal conformation), 61 suggesting that structural flexibility of this region can be of functional importance for the hen egg white lysozyme.  [73][74][75][76][77][78] as well as the IUPred web server, 79 IUPred_short and IUPred_long. The corresponding outputs are shown by black, red, green, pink, yellow, and blue lines, respectively. In each plot, dark red line shows the mean disorder propensity calculated by averaging disorder profiles of individual predictors. The light pink shadow around the PONDR Ò FIT shows error distribution. In these analyses, the predicted intrinsic disorder scores above 0.5 are considered to correspond to the disordered residues/regions, whereas regions with the disorder scores between 0.2 and 0.5 are considered flexible. Figure 3C shows the results of the structural flexibility and intrinsic disorder analyses for pancreatic ribonuclease A (RNase A) from Bos taurus (UniProt ID: P61823). This protein can be present as monomer or exists in 2 dimeric forms (major and minor) displaying different types of 3D domain swapping. 62 In fact, the minor dimer forms is stabilized by swapping of the N-terminal a-helix of one protomer with that of another molecule, whereas the major dimer is formed by swapping its C-terminal b-strand. 62 This clearly indicates the presence of noticeable structural flexibility in the termini of this protein. In agreement with these observations, PDBFlex analysis of the 1kh8A cluster revealed that this cluster contains 301 chains and is characterized by the average RMSD and CMO values of 2.009 A and 0.949, respectively. The maximal pairwise RMSD of 18.076 A and the minimal pairwise CMO of 0.747, respectively, were found for the chains 3fkzA and 1f0vB. Curiously, the average RMSD for PDBFlex cluster corresponding to this protein was comparable to the maximal pairwise RMSD values found for myoglobin and lysozyme. These observations indicate that RNase A is characterized by high structural flexibility, especially in its N-and C-terminal regions and that this structural flexibility is needed for function of this protein (at least, it seems to play different roles in different modes of RNase A dimerization). Both termini are also predicted to have significant levels of intrinsic disorder (see Fig. 3C).
As it follows from Fig. 3D and F, chymotrypsin from Bos taurus (UniProt ID: P00766) and carboxypeptidase A1 from Bos taurus (UniProt ID: P00730) are both characterized by relatively low structural flexibility and are expected to have mostly ordered structure. However, even these mostly ordered proteins possess several regions of local structural flexibility (see also Fig. 2) that perfectly coincide with the regions predicted to by disordered/flexible by a set of disorder predictors (seeFig. 3D and F).
The 1cgiE PDBFlex cluster corresponding to chymotrypsin includes 40  show that although heme-containing myoglobin is characterized by low structural flexibility, another heme-containing protein from this set of crystallization classics, cytochrome c from Equus caballus (UniProt ID: P00004), has high structural flexibility. In fact, the PDBFlex cluster 1crcA corresponding to this protein shows the average RMSD and the average CMO values of 6.519 A and 0.936, respectively. Among the members of this cluster (http://pdbflex.org/ php/api/rmsdProfile.php?pdbID D 1crcandchainID D A), the largest structural difference is found between chains 3nbsC and 1u75B that have the maximal pairwise RMSD of 16.062 A and the minimal pairwise CMO of 0.784. Similar to RNase A, cytochrome c is able to form domain swapped dimers using its C-terminal a-helix. 63 Also, similar to RNase A, cytochrome c is predicted to contain significant levels of intrinsic disorder. Therefore, it seems that high structural flexibility manifested as difference between different crystal structures solved for a given protein goes hand by hand with the relatively high predisposition of this protein for intrinsic disorder.
I find it really peculiar (if not funny) that proteins considered as crystallization classics and typically used as a strong "living" proof of the "lock-and-key" model of protein functionality and associated with it the sequence-structure-function paradigm are in fact characterized by rather high structural flexibility. Numerous structures reported for crystallization classics emphasize that (a) crystallization of a given protein can be induced by different conditions, (b) a protein can be crystallized in different crystal forms, and (c) these different crystal forms can correspond to rather different protein structures. Even more structural differences can be induced by interaction with biologic partners, suggesting that movements of different parts of a protein molecule relative to each other could be crucial for its functionality. All this clearly indicates that crystal structure of a protein should be considered only as a synchronized snapshot of a conformational ensemble. Although members of such an ensemble are characterized by variable structural flexibility and intrachain mobility, the amplitudes and frequencies of their structural fluctuations or movements can be changed by changes in the protein environment or by interaction of a query protein with biologic partners. Note that these considerations of structural flexibility are applicable for different structures solved for a given protein. However, the manifestations of protein structural flexibility are not limited to the existence of these different structures, but also can be found within a given X-ray structure, where there are regions with high B-factor (isotropic temperature factor), which is a reflection of another level of structural flexibility; i.e., fluctuations about the mean (native state) positions, and finally there are disordered (or highly flexible) regions characterized by missing electron density.

Disclosure of potential conflicts of interest
No potential conflicts of interest were disclosed.