Mapping intermolecular interactions and active site conformations: from human MMP-1 crystal structure to molecular dynamics free energy calculations

The zinc-dependent Matrix Metalloproteinases (MMPs) found within the extracellular matrix (ECM) of vertebrates are linked to pathological processes such as arthritis, skin ulceration and cancer. Although a general backbone proteolytic mechanism is understood, crystallographic data continue to suggest an active site that is too narrow to encompass the respective substrate. We present a fully parameterised Molecular Dynamics (MD) study of the structural properties of an MMP-1-collagen crystallographic structure (Protein Data Bank – 4AUO), followed by an exploration of the free energy surface of a collagen polypeptide chain entering the active site, using a combined meta-dynamics and umbrella sampling (MDUS) approach. We conclude that the interactions between MMP-1 and the collagen substrate are in good agreement with a number of experimental studies. As such, our unrestrained MD simulations and our MDUS results, which indicate an energetic barrier for a local uncoiling and insertion event, can inform future investigations of the collagen-peptide non-bonded association steps with the active site prior to proteolytic mechanisms. The elucidation of such free energy barriers provides a better understanding of the role of the enzyme in the ECM and is important in the design of future MMP inhibitors.

The general structure of MMPs consists of several distinct domains conserved between types. All types, except MT-MMPs (membrane type -MMPs) which are membrane-bound, begin with a cell secretion leader or predomain at the N-terminus, followed by a highly conserved propeptide, approximately 80 amino acids in length. Cleavage of the propeptide domain results in the separation of a conserved cysteine switch from the catalytic zinc ion, enabling enzymatic activity (Stricklin, Jeffrey, Roswit, & Eisen, 1983). The catalytic domain (CAT) (Figure 1(A)) is approximately 170 amino acids in length and contains the conserved zinc-binding domain amino acid motif HEXXHXXGXXH (Van Wart & Birkedal-Hansen, 1990) (Figure 1(B)). Three histidine residues and a water molecule make up the complete coordination sphere of the catalytic zinc ion. A second zinc ion is found buried close to the rear of the active site cavity (Figure 1(C)), with a coordination sphere consisting of three histidine residues and an aspartic acid residue. A C-terminal hemopexin-like (HPX) domain (Figure 1(E)) of approximately 200 residues in length, arranged as a fourfold symmetrical beta-propeller structure (bladed regions 1-4) around a funnel-like opening, is connected to the CAT domain by a short hinge linker region (Figure 1(D)). Finally, the HPX domain has been implicated as a binding mediator for the degradation of ECM substrates (Bode, 1995;Gomis-Rüth et al., 1996).
The proteolytic mechanism of MMP is thought to depend on the electrostatic attraction between the backbone carbonyl group of an ECM protein with the Zn ion in the active site cavity. Of particular interest to this study is the binding of the collagenases, MMP1, MMP8 and MMP13 to a collagen molecule: a trimeric coiledcoil arrangement of helical polypeptide collagen alpha chains. The zinc ion becomes penta-coordinated during a cascade of nucleophilic rearrangements between a conserved glutamic acid, a carbonyl group from a collagen polypeptide chain backbone and a water molecule (Verma & Hansch, 2007). Quantum mechanical calculations reveal that the presence of water reduces the electrophilic activation barrier of the carbonyl oxygen by approximately 5 kcal/mol, thus increasing the likelihood of a reaction (Pelmenschikov & Siegbahn, 2002).
Even with the current understanding of the proteolytic mechanism of zinc-dependent metzincins, the inceptive period of protein insertion into the active site of MMP, prior to any proteolytic activity, remains unknown. From crystallographic observations of the catalytic domains of MMP-1 (Borkakoti et al., 1994;Manka et al., 2012) and MMP-8 (Bode et al., 1994) and the full length of MMP-1 (Li et al., 1995), the entrance to the active site appears too narrow for the width of a triplehelical collagen molecule. An early report suggests that the HPX domain might bind to collagen through a folding mechanism of the hinge linker region. Not only would this expose the scissile bond of collagen to the active site for a prolonged period of time but favourable interactions between the HPX domain and the collagen residues could also facilitate collagen unwinding (Bode, 1995;Bode et al., 1994;Gomis-Rüth et al., 1996;Ottl et al., 2000). Interestingly, a study by Zhao et al., yielded atomic force microscopy images of a potential fold mechanism in MT1-MMP (Zhao et al., 2015). However, this mechanism is at odds with a recent crystal structure of collagen-bound MMP-1, which shows no indication of a folding type mechanism (Manka et al., 2012). Furthermore, a study based on the unwinding and enzymatic cleavage of collagen confirmed that although inactive, mutated enzyme MMP-1 (E200A) induces a change to the collagen structure (Chung et al., 2004). In particular, collagen digestion was detected after an active secondary cutter had been introduced. It was remarked that the discovery of fragmented collagen from a secondary cutter, typically ineffective until an inactive MMP-1 has been introduced, could rule out the possibility that MMPs themselves undergo the conformational change required to accommodate the substrate. Finally, it would be reasonable to suggest that a conformational folding of the MMP-1 could sterically hinder the secondary cutter from accessing the scissile collagen bond.
An alternative mechanism proposed by de Souza et al., suggests that the collagenase hinge linker, rich in proline residues, adopts a transient poly-Pro II-like helical structure. Upon interacting with collagen, by forming a proline-zipper, the hinge domain competes for a single collagen polypeptide by disrupting the native collagen fold (de Souza, Pereira, Jacchieri, & Brentani, 1996). In the case of MMP-1, the flexibility in the hinge region is attributed to G272 (Tsukada & Pourmotabbed, 2002).
How MMP-1 facilitates collagen unwinding and collagen insertion remains unclear. Whether these events are dependent on conformational dynamics of the enzyme itself, or whether the enzyme retains a consistent structure is also open to debate. Here, we use a Molecular Dynamics (MD) modelling approach to provide further insight into the structure and dynamics of the active site of MMP-1. Firstly, we report on the structural stability of the MMP-1 in complex with a model collagen protein from crystallographic coordinates (4AUO). This also helps to identify any issues with the force-field parameterisation of the protein and the zinc ions. Secondly, the MD model from the first stage is used to elucidate the free energy of insertion and non-bonded association with the active site of a single collagen peptide, the initial event prior to later proteolytic activity, using a combined molecular dynamics and umbrella sampling (MDUS) approach. Given that this manuscript focuses on nonbonded interactions and without the added complexity of covalent bond rearrangement, free energy calculations derived from MD principles are appropriate for a protein of this size.

Simulation details
Atomistic MD simulations were performed using the GROMACS simulation package, version 5.0.5 (Pronk et al., 2013). The time step used in the simulations was 2 fs, and was integrated using the leapfrog algorithm. All bonded interactions were holonomically constrained using the LINCS constraint algorithm (Hess, Bekker, Berendsen, & Fraaije, 1997). The short-range neighbour interaction list cut-off was fixed to 0.8 nm and updated every 10 fs. Short-range interactions were modelled using a 12-6 Lennard-Jones potential and were truncated at 0.8 nm. Short-range electrostatic interactions were truncated at 0.8 nm and long-range electrostatics were calculated using the Particle Mesh Ewald scheme (Darden, York, & Pedersen, 1993). The pressure was maintained at one atmosphere by applying an isotropic coupling Parrinello-Rahman scheme (NPT ensemble) (Parrinello & Rahman, 1981). The temperature was maintained at 294.5 K using the Nose-Hoover thermostat (NVT and NPT ensemble) (Hoover, 1985;Nosé, 1984). Loose coupling was applied between the protein complex and the water environment. Periodic boundary conditions were applied in three dimensions. Coordinates, velocities and energies were saved every 100 ps.
Meta-dynamics were performed using the PLUMED 2.0.2 plugin for Gromacs (Tribello, Bonomi, Branduardi, Camilloni, & Bussi, 2014). The height and width of the bias potentials were maintained at 0.25 kJ/mol and 1.5 Å, respectively. Gaussian potentials were deployed every 250 MD step calculations (500 fs). Umbrella sampling was performed along an absolute distance reaction coordinate defined by the meta-dynamics trajectory (see results and discussion). Fourteen individual umbrella windows were set up to cover the distance interval from 0.38 nm to 1.5 nm and each was equilibrated for 1 ns by applying the above pressure and temperature simulation details whilst subjecting the protein atoms and zinc ions to a 1000 kJ/mol position restraint. An umbrella potential of 1000 kJ/mol in three dimensions was applied between the scissile hydroxyl group oxygen on the collagen peptide backbone and the catalytic zinc ion. Individual umbrella simulations were performed for 20 ns. A total of 380 ns of simulation was performed to adequately sample the reaction coordinate. Force constants were recorded every 2 ps, totalling 10,000 records per simulation. A potential of mean force (PMF) profile was constructed from the individual umbrella windows using the weighted histogram analysis method (WHAM) (Kumar, Rosenberg, Bouzida, Swendsen, & Kollman, 1992). A detailed description of the meta-dynamics and umbrella sampling methodology, including convergence of free energy, can be found in the Supplementary Information.

Derivation of partial charges
In order to accurately model the two zinc-binding domains, zinc and zinc-ligand partial charges were calculated. Two zinc-dependent coordination models were constructed from the crystal structure geometry. The first comprised residues 199 H, 203 H and 209 H in addition to the catalytic zinc and the second comprised residues 149 H, 164 H, 177 H and 151 D and the structural zinc. A single water molecule was added to the extracted zinc catalytic coordination model to make up a full coordination shell as would be present in vivo. Hydrogen atoms were added to the extracted crystal structure residues and partial backbones were terminated according to the R.E.D (RESP ESP charge Derived) Server instructions for restrained partial charge fitting. Partial charges were fitted to the Amber99SB-FF using PyR.E.D (Vanquelef et al., 2011). Geometries were optimised at the Hartree-Fock level of theory with a 6-31G* basis-set to keep in line with the philosophy of the Amber force field.

Model construction
Having defined the simulation parameters and derived partial charges for both zinc-binding sites, a MMP-1 collagen-bound protein complex was generated from crystal structure coordinates using the pdb2gmx program. Protonation of nitrogen epsilon or nitrogen delta of histidine sidechains was achieved at random to represent the almost equal propensity for protonation in experimental systems. The ab initio optimised electronic structure of both structural and catalytic zinc-binding domains was used to replace the respective residues in the newly generated atomistic structure. Bond distances and angles were constrained between zinc and their respective nitrogen and oxygen coordination donors. Force constants were taken from Tuccinardi et al. (2006). Partial charges of each backbone fragment and accompanying sidechain along with the zinc ion were replaced with the R.E.D. derived partial charges. The partial charge on heavy atoms of the main functional group per side chain ligand, in addition to the intermolecular distance between coordinating side chain atoms with each zinc ion, can be seen in Figure 2 (see Supplementary Information for a complete list of partial charges, including charges for hydrogen atoms and the amino acid backbone). The new partial charges over the coordinating zinc ligands and each zinc ion ensure the compatibility with computational techniques used to compute free energy values. The unit cell, adjusted to prevent periodic image artefacts, was fully solvated using TIP4P-ew water molecules enhanced for PME calculations (Horn et al., 2004). Calcium ions were added to yield a net neutral charge.
The protein complex and water environment were subjected to a steepest descent energy minimisation to correct for overlapping atoms. The maximum force of 1000 kJ/mol on any one atom was used as an energy minimisation stopping criteria. To ensure that the temperature of the system had converged to 294.5 K before applying the pressure-coupling scheme, a short 200 ps simulation using the constant number of particles, volume and temperature (NVT) ensemble was performed. During this short simulation, harmonic position restraints of 1000 kJ/mol were applied to all protein heavy atoms. The position restraints were maintained and the system was subjected to a 200 ps simulation at constant pressure using the Parrinello-Rahman pressure-coupling scheme (NPT ensemble). Position restraints were reduced to 100 kJ/mol for an additional 200 ps. A final 200 ps were added with the position restraints reduced to 10 kJ/mol. All position restraints were finally lifted and the system ran for 50 ns. During the unrestrained 50 ns, the pressure, temperature and total energy had all converged at a very early stage.

Results and discussion
The MMP-1 collagen complex To prepare the MMP-1 model, new parameters were defined for the catalytic and structural zinc ions along with their respective coordination donor residues. The ab initio calculations were performed with a water molecule present to act as the fourth coordination donor. It is understood that water coordinates with 200 E in the active site of the wild type, acting as a nucleophilic agent during the proteolytic reaction (Pelmenschikov & Siegbahn, 2002). It must be noted, that the protein sequence, 4AUO, used in this study contains the experimental substitution E200A.
After inserting the optimised zinc-coordination geometry into the structural and catalytic domains, the single catalytic water molecule was removed from the ab initio output. To do otherwise would have resulted in either a bonded water model that prevented an exchange of water ligands during the simulation, or a non-bonded model that could result in a unique water molecule escaping into the bulk. The small difference between the derived partial charge and the forcefield partial charge of water suggests that water native to the forcefield would suffice as a zinc ligand. The structural zinc ion along with ligands 164 H, 149 H, 177 H and 151 D showed very little deviation from the crystal structure.

Properties of the intermolecular association interface
An unrestrained production simulation was performed for 50 ns. There was little in the way of structural deviation of the MMP-1 production run from the original crystallographic structure (Figure 3(A)), nor did alteration to the secondary structure occur (see Supplementary Figure 1), indicating that the system had equilibrated. We therefore inferred that this duration was sufficient to explore the intermolecular associations between enzyme and substrate.
After equilibration, the polarity between interfacial residues within a 5 Å cut-off was evaluated. There was a consistent match in polarity at the interface between collagen and the HPX domain: 285 R, 289 F, 290 Y, 292 E, 294 E, 296 N, 297 F, 300 V, 301 F, 316 A, 317 D, 319 D, 334 G and 335 Q. Interestingly, 290 Y and 289 F protruded from the polar HPX domain in the shape of a small clasp, and gradually shifted across the interfacial region of the collagen. Besides this subtle rearrangement, the paired protein complex was remarkably stable. The polar interface of HPX was followed by a small number of non-polar HPX interfacial residues adjacent to the MMP-1 hinge linker region. These corresponded with the non-polar nature of the accompanying stretch of collagen turns.
The interfacial residues of the CAT domain found within 5 Å of collagen were: 85 (Figure 3(B)) at the top and bottom of the active site were composed predominantly of polar residues glutamine, glutamic acid, histidine, serine and asparagine. The collagen motif G-L-A, revealed to be the collagen cut site (Chung et al., 2004), remained within close proximity of the active site for the entirety of the simulation.
A thorough and verbose description of a network of hydrogen bonds between the MMP-1 and collagen interface is documented in the Supplementary Information. In summary, the simulated dynamics do not deviate far from the initial crystal structure, providing a stable configuration from which a set of free energy calculations can be derived with confidence.

The free energy of local collagen insertion into the active site
A 200 ns meta-dynamic MD simulation was performed using the final time step from the simulation described above. The absolute distance between the scissile bond and the catalytic zinc ion made up the reaction coordinate. It has been noted that at physiological temperature, the numerous non-polar residues surrounding the collagenase cleavage site of the polypeptide chain destabilise and become vulnerable to proteolysis (Fields, 1991;Leikina, Mertts, Kuznetsova, & Leikin, 2002). This made the proposed reaction coordinate an apt starting point for investigating collagen insertion and association with the MMP-1 active site prior to proteolytic activity. During the simulation, the unwinding of the scissile polypeptide chain from the remaining collagen protein was almost an immediate effect. The relaxation of the Nterminal collagen structure could have influenced the free energy surface (FES) landscape during the separation of a single polypeptide collagen chain. However, free energy calculations of collagen-I have yielded metastable states of unwound collagen local to the scissile bond (Nerenberg & Stultz, 2008) An increase in ΔFES by approximately 10.5 kJ/mol (2.509 kcal/mol) resulted in a peak from the initial non-inserted structure (Figure 4(A)) before a significant free energy descent by ΔFES of −44.7 kJ/mol (−10.686 kcal/mol) upon entry of a collagen polypeptide chain into the active site cavity ( Figure 5). This minimum resulted in coordination between the carbonyl group from glycine and the zinc at an approximate distance of 0.3 nm.
The inserted peptide chain was also shown to form a tightly packed association within the active site, notably through a match in polarity with a beta-sheet, a helix, and the accompanying random coils, which make up the rear of the active cavity.

568
A. Nash et al.
The meta-dynamics study was augmented by performing umbrella sampling using recovered frames that sampled the absolute distance between 0.38 and 1.5 nm from the scissile bond to the catalytic zinc ion. This range does not include a representation of the FES minimum 0.3 nm along the reaction coordinate, and it became clear early on that umbrella windows at that distance resulted in an exponential growth in the PMF profile as approaching atoms began to repel one another. Fourteen individual umbrella simulations were sufficient to sample the absolute distance-based reaction coordinate. Individual umbrella windows were simulated for 20 ns and the first nano-second was used as equilibration of the system and therefore disregarded from WHAM calculations. It is immediately obvious that the FES from the meta-dynam-ics analysis is very different from the final PMF profile (Figure 4(B)). It is reasonable to assume that the meta-dynamics simulation sampled degenerate energy states of unique structures at identical points along the single reaction coordinate. The MDUS combined approach though, would take into account all of these different conformations within the confines of the reaction coordinate during the construction of the PMF profile. The greatest level of ΔG deviation was approximately 5.65 kJ/mol (1.35 kcal/mol) as the collagen peptide was within 0.4 nm of the zinc ion. Further from the zinc ion, within the region of 1.2-1.45 nm, there is almost no deviation from the predicted ΔG. The increase in ΔG from the coiled collagen conformation, 1.4 nm along the reaction coordinate, towards the inserted collagen peptide configuration is , approximately 1.4 nm, along the reaction coordinate, respectively. The collagen peptide is in green ribbon; the surrounding structure of the CAT domain is represented in structural illustrations (sheets, helices and turns). E200A substitution has been identified, situated on the α-helix directly behind the coordinating histidine residues. Water within 8 Å of the zinc ion has been identified explicitly. indicative of a system approaching an energy barrier. Given that 200 E was substituted for 200 A during the experimental preparation of this crystal structure, any potential energy minima expected prior to the cleavage of the collagen chain may be missing. Furthermore, an increase in ΔG after the loss of favourable energetic contributions from the entwined collagen triple helix to a loosened collagen molecule model ( Figure 6) is expected. Such distortion to the collagen triple-helical structure will result in a clash in polarity as hydrophobic side chains associate with water molecules. In the PMF profile, there is a small inflection 0.61-0.71 nm along the reaction coordinate after a ΔΔG of approximately +21 kJ/mol (+5 kcal/mol) from the lowest minimum. Within this particular umbrella sampling configuration, the collagen peptide is inserted in the active site ( Figure 5(A)). Such a small, yet significant, difference in free energy may be indicative of a free collagen molecule undergoing an insertion event due to a fluctuation in the dynamics of the microenvironment, such as temperature (Fields, 1991;Leikina et al., 2002). A radial distribution function (RDF) was performed over the solvent from the scissile bond (see Figure 7(A)) of the non-inserted trajectory, 1.4 nm along the reaction coordinate. Two pronounced peaks, each indicative of a solvation sphere were observed. The first peak can be seen 0.8 nm from the scissile bond followed by the second 1.8 nm from the bond. Observations clearly show a separation of the active site from the collagen peptide by water. Given the occupation of the active site by the collagen peptide, it comes as little surprise that the trajectory pertaining to a position 0.65 nm along the reaction coordinate demonstrates little or no order from the scissile bond to bulk water. The RDF calculations of both trajectories yield an expectation of bulk water 3 nm from the bond, with absolute certainty of complete bulk water a further 1.5 nm away.
This study was concluded with a brief investigation into the stability of an unrestrained simulation of the inserted collagen peptide taken from the saddle points observed in the PMF profile. A set of 1000 kJ/mol restraints was applied to the heavy atoms of the complete protein complex and the system progressed for 10 ns. The restraints were then lifted from all but the α-carbon of the zinc-coordinating glycine and the system was extended by a further 10 ns using stored velocities from the previous time step. The two position restraints were reduced to 100 kJ/mol followed by 10 kJ/mol, per additional 10 ns of simulation time. Finally, position restraints were lifted altogether and the system was allowed to progress for 10 ns.
The average distance during the first 8 ns between the carbonyl oxygen of glycine and the zinc ion was approximately 6 Å (Figure 7(B)). The water molecules in the active site cavity were not shown to permeate between the carbonyl group and zinc during this time. During the final 2 ns, the distance was shown to rapidly increase by approximately 2 nm, rendering the interaction ineffective due to the short-range cut-off. Simultaneously, density in water molecules between glycine and the zinc ion also increased. Given the inactive enzymatic substitution of glutamic acid for alanine, there is a loss of the water bridge between the carbonyl groups on the backbone of collagen and the carboxylate functional group on the absent glutamic acid. However, it was still possible to capture a prolonged presence of the polypeptide chain within the active site cavity. There are understood limitations of this particular simulation. Firstly, there would be a discrepancy in partial charges during an attempt to donate the carbonyl group of glycine native to the forcefield to a zinc ion, which had been parameterised without the glycine present. Secondly, it is thought that the presence of a glutamic acid would also have an effect on the polarity of a neighbouring water molecule during the parameterising of the forcefield. We are confident, though, that the short-lived presence of the scissile bond carbonyl within the active site represents the coordination, if only for a brief moment in time, of the collagen backbone during the start of the collagenase reaction mechanism before adjusting to a favourable conformation, as suggested by the ΔG of collagen insertion.
Our results have indicated that there exists only a small positive free energy gradient from a non-inserted collagen to the inserted collagen state. Reflecting on to the three proposed mechanisms of insertion and cleavage, it is entirely possible that the collagen molecule local to the active site may unwind and insert, easily overcoming without assistance the free energy gradient, as proposed by Chung et al. (2004).

Conclusions
The force-field parameterisation and MD simulations of the MMP-1-collagen complex, 4AUO, yielded a stable conformation within the confines of the simulation time.
The meta-dynamics calculations suggest that the collagen sequence adjacent to the active site is amenable to unwinding, thus allowing insertion of a single polypeptide chain into the active site. However, free energy characterisation from umbrella sampling of the transition from non-inserted collagen to inserted collagen was energetically unfavourable. The energy barrier may be surmounted via a rare thermal fluctuation to a triple-coiled collagen molecule unwinding and inserting into the active site. Our final simulation of an unrestrained inserted collagen peptide, demonstrated a short-lived interaction between the catalytic active site and the collagen peptide carbonyl group before returning to a favourable conformation. A quantitative measure of the interaction strength between enzyme and substrate is beyond the scope of this study, although it will be the subject of future work. Our detailed documentation of the intermolecular associations between substrate and enzyme will aid the mapping of human type I collagen onto MMP-1 until a crystal structure has been calculated. Finally, our presented energetic calculations are fundamental in the elucidation of the interaction between MMPs and their substrates and will aid the rational design of de novo MMP-1 inhibitors.

Supplementary material
The supplementary material for this paper is available