Approaching the catalytic mechanism of protein lysine methyltransferases by biochemical and simulation techniques

Abstract Protein lysine methyltransferases (PKMTs) transfer up to three methyl groups to the side chains of lysine residues in proteins and fulfill important regulatory functions by controlling protein stability, localization and protein/protein interactions. The methylation reactions are highly regulated, and aberrant methylation of proteins is associated with several types of diseases including neurologic disorders, cardiovascular diseases, and various types of cancer. This review describes novel insights into the catalytic machinery of various PKMTs achieved by the combined application of biochemical experiments and simulation approaches during the last years, focusing on clinically relevant and well-studied enzymes of this group like DOT1L, SMYD1-3, SET7/9, G9a/GLP, SETD2, SUV420H2, NSD1/2, different MLLs and EZH2. Biochemical experiments have unraveled many mechanistic features of PKMTs concerning their substrate and product specificity, processivity and the effects of somatic mutations observed in PKMTs in cancer cells. Structural data additionally provided information about the substrate recognition, enzyme-substrate complex formation, and allowed for simulations of the substrate peptide interaction and mechanism of PKMTs with atomistic resolution by molecular dynamics and hybrid quantum mechanics/molecular mechanics methods. These simulation technologies uncovered important mechanistic details of the PKMT reaction mechanism including the processes responsible for the deprotonation of the target lysine residue, essential conformational changes of the PKMT upon substrate binding, but also rationalized regulatory principles like PKMT autoinhibition. Further developments are discussed that could bring us closer to a mechanistic understanding of catalysis of this important class of enzymes in the near future. The results described here illustrate the power of the investigation of enzyme mechanisms by the combined application of biochemical experiments and simulation technologies.


Introduction
the genetic information of every cell is encoded in form of the base pair sequence in the DNA but cellular differentiation is driven by differences in the expression of genes.epigenetics describes the mechanisms of these often stable but still reversible changes in gene expression patterns that do not involve alterations in the DNA sequence (Allis and Jenuwein 2016).chromatin is the complex of DNA and proteins that compacts the DNA within the nucleus of eukaryotic cells.Nucleosomes are the smallest structural unit of chromatin and consist of a stretch of about 147 base pairs of DNA wrapped around an octamer of histone proteins, containing two copies of the core histones, namely H2A, H2B, H3, and H4.Nucleosomes are connected by linker DNA segments, which vary in length and composition forming a 'beads-on-a-string' structure.this linear array of nucleosomes further compacts the chromatin to form higher-order structures, such as chromatin fibers and chromosomes.DNA and protein methylation are among the most important biochemical reactions for influencing gene expression (Jambhekar et al. 2019;chen and Zhang 2020;Millán-Zambrano et al. 2022).For instance, cytosine bases in DNA can be methylated by DNA methyltransferases (DNMts) using the cofactor S-adenosyl-lmethionine (SAM) as a methyl group donor.SAM dependent methylation also occurs in proteins or peptides and can be found at side chains of lysine (K), arginine (R), aspartate (D), glutamate (e), histidine (H), asparagine (N), glutamine (Q), and cysteine (c), as well as at N-terminal α-amino and c-terminal carboxylate residues (clarke 2013).Histone tails are the flexible ends of the histone proteins sticking out from the histone octamer and are a key target for lysine and arginine methylation and other post translational modifications (PtMs) like acetylation, phosphorylation, ubiquitination, and more (Bannister and Kouzarides 2011;Millán-Zambrano et al. 2022).the modified amino acids serve as a platform for the recruitment of proteins and protein complexes that interpret and regulate these modifications.eventually a signaling cascade of downstream effects is triggered, influencing the activity of chromatin remodelers which themselves alter the accessibility of DNA and thus gene transcription.
enzymes catalyzing the transfer of methyl groups from SAM to histone tails or other proteins are called protein methyltransferases (PMt) if the receiving amino acid is a lysine residue, they are referred to as protein lysine methyltransferases (PKMt).this class of enzymes is going to be the main focus of this review (Figure 1).For other PMts with e.g.arginine or histidine as the methylation target, refer to other reviews (clarke 2013; Jarrold and Davies 2019; Kwiatkowski and Drozak 2020;Jakobsson 2021a;wu et al. 2023).the functional diversity of PKMts is vast, with numerous members belonging to the different families of Set domain and non-Set domain also called seven-beta strand (7BS) PKMts, each characterized by defined structural features (Qian and Zhou 2006;Falnes et al. 2023).Many PKMts methylate lysine residues in histone proteins (often the flexible N-terminal tails of histone H3 and H4) which play important roles in the regulation of chromatin (Husmann and Gozani 2019;Millán-Zambrano et al. 2022).However, the last decades of research have shown, that lysine methylation is not restricted to histone proteins (Husmann and Gozani 2019), but it frequently is observed in many other proteins where it regulates protein stability, localization and protein/protein interactions, often in combination with other post-translational modification (clarke 2013; Biggar and li 2015;Zhang et al. 2015;cornett et al. 2019;Di Blasi et al. 2021).Some PKMts are highly specific, targeting only defined lysine residues in one or few substrate proteins, while others exhibit broader substrate recognition capabilities.Additionally, PKMts can function individually or as components of large protein complexes, adding another layer of complexity to their regulatory mechanisms.Based on their important physiological roles, PKMts have been implicated in many diseases and they represent important and emerging drug targets (copeland 2018; Bhat et al. 2021).
in this review, we will specifically summarize studies addressing the catalytic mechanism of PKMts using simulation technologies mainly including Molecular Dynamics (MD) simulations and Hybrid Quantum Mechanical/Molecular Mechanical (QM/MM) methods.Biochemical experiments matching the simulation experiments were used to complement the simulation approaches and their results.Relevant catalytic properties of PKMts discussed in this review include: (i) the mechanism of target lysine deprotonation (a necessary precondition for its methylation); (ii) conformational changes upon cofactor and substrate binding; (iii) substrate specificity (the question which target lysine is methylated); (iv) product specificity (the question if mono-, di-or trimethylated lysine is generated); and (v) the assembly of PKMts with other proteins to regulate their methylation activity or control substrate engagement.One goal of this review is to convey mechanistic principles of PKMt catalyzed methylation reactions of different PKMt subfamilies.clinically relevant PKMts like DOt1l, SMYD1-3, Set7/9, G9a/GlP, SetD2, NSD1/2, Mll enzymes and eZH2 were prioritized, but the mechanisms presented are presumably transferable at least to structurally similar PKMts not mentioned here specifically.Besides investigations addressing mechanistic principles of PKMts, numerous modelling studies have been conducted to investigate PKMt inhibitors which involved molecular docking, MD simulation, free energy calculation and virtual screening approaches.these studies are extremely valuable especially in a pharmacological context.However, they were excluded from this review, because the approaches and systems used in these studies deal with artificially designed small molecules and/or inhibitors that cause non-natural behaviors, conformational changes and altered PKMts activities.Hence, the obtained results of inhibition studies do not necessarily provide insights into the natural catalytic mechanisms of the investigated PKMts.we therefore refer to other excellent reviews covering the interplay of various PKMts with inhibitors using different simulation and modeling approaches (luo 2015;Schapira 2016;copeland 2018;lópez-lópez et al. 2020;vougiouklakis et al. 2020;Feoli et al. 2022).

Simulation technologies
Simulation science is an emerging field of research that combines retrospective analysis and modeling methods with predictive simulations approaches.Modeling of proteins at atomistic resolution can be used to rationalize results of biochemical experiments and give novel insights into the underlying mechanisms like enzyme-substrate interactions, mechanistic consequences of cancer mutants, or conformational changes of flexible protein regions.Moreover, simulations possess an immense predictive capacity, enabling scientists to anticipate the outcomes of experiments and design them accordingly before conducting them physically.By recreating experimental conditions and subsequent systematic changes of parameters, researchers can explore alternative experimental scenarios and test hypotheses in rapid time.this promotes mechanistic understanding, reduces experimental costs, and accelerates the scientific process. in the next two paragraphs, the two main simulation technologies are summarized, which have been applied to PKMts in recent years.

Molecular dynamics simulations
At present, the Protein Data Bank (PDB) holds more than 206,000 experimentally solved structures, including the cofactor s-adenosyl-l-methionine (sam) provides the methyl group.it is released after the transfer as s-adenosyl-l-homocysteine (sah).B| the protein substrate (cyan) and sam (orange, methyl group is colored black) bind at opposing sites of the set domain (grey) in set domain PKmts.the target lysine (pink) is inserted into a narrow tunnel, where the lysine is deprotonated and oriented for the methyl group transfer (image created using simulation results of PdB 6vdB (schuhmacher et al. 2020)).C| the methyl group is transferred using a bimolecular nucleophilic substitution (s n 2) mechanism, in which multiple geometric criteria need to be fulfilled to reach the transition state.
more than 200,000 proteins (https://www.rcsb.org/retrieved in June 26, 2023).these structures are extremely valuable for identifying architectures of enzymes, understanding enzyme mechanism and protein interaction with other biomolecules or as templates for homology modeling.However, despite their enormous utility, the structures stored in the PDB provide only a partial view on the 3D structure of proteins, because proteins are flexible entities, and dynamic conformational changes play key roles in their function.this is particularly true for enzymes and their catalytical function.For instance, the autoinhibitory loop of Set domain-containing PKMts blocks their active sites, but opens up upon binding to the protein substrate (Yang et al. 2016;Sato et al. 2021;Schnee et al. 2022).while at least snapshots of this process are captured in crystal structures (PDB 5lSU for closed, 5v21 for open), the complete process involving the regulatory contacts between enzyme and substrate can only be investigated using simulation technologies.Adding to this, other processes involve regions, which are so flexible that crystallization or cryo-eM techniques are incapable of capturing them.An example is the PKMt SetD2 (aka KMt3A, HYPB, Set2), which was found to have a flexible loop in the post-Set domain.this loop is unstructured in the apo enzyme, but forms a helix and interacts with the core of the enzyme after binding of a substrate (Yang et al. 2016;Schnee et al. 2022).
Another limitation of resolved structures of macromolecules is the necessity for stable complex formation to enable structural analysis, which is specifically relevant for enzymes, since they catalyze reactions after binding their substrate and then release the product.Bound inhibitors can be used to circumvent this problem, but then critical information about catalysis is lost.For example, Set domain-containing PKMts use a water channel to deprotonate the target lysine residue in the active site.this water channel is only visible if the cofactor SAM is bound and not if the cofactor product SAH binds in the cofactor binding pocket (Zhang and Bruice 2008a).However, in resolved structures mostly SAH is the present cofactor, since it had been used for crystallisation to avoid turnover.Alternatively, a substrate inhibitor, like target lysine K-to-M mutation peptides can be used, which stably bind to Set domain-containing PKMts, since the methionine cannot be methylated (Fang et al. 2016;Yang et al. 2016;Zhang et al. 2017;Schuhmacher et al. 2020).However, important interactions between the missing target lysine and the PKMts active site are lost in this experimental setup.
information from resolved structures provide a solid understanding of an enzyme's 3D structure, but in combination with subsequent modeling approaches a closer approximation to reality is possible.in the modeling methods the resolved structure can be solvated in different buffers, like physiological molarity of sodium chloride, but also in buffers containing different concentrations of detergents or denaturing agents, or the effect of different pH values can be simulated.Missing or flexible amino acids can be rebuilt, or an inhibitor can be exchanged for the real cofactor or substrate.Besides modifying resolved structures, homology modeling with e.g.AlphaFold2 (Jumper et al. 2021), PyMOD 3.0 (Janson and Paiardini 2021) or SwiSS-MODel (waterhouse et al. 2018) has become a hallmark in modern protein science.
Another limitation of structural models is that they lack the dynamic behavior of proteins.this obstacle can be overcome by employing MD simulations, which apply forces to the atoms in the modelled system based on their interactions with each other.At the start of an MD simulation, each atom is given a random velocity, based on a temperature dependent Maxwell-Boltzmann distribution (Meller 2001;Yu and Dalby 2020).Subsequently, the forces are applied and by utilizing Newton's laws of motion, it becomes possible to determine the dynamical behavior and spatial arrangement of all atoms over time.this results in a sequential progression through time, iteratively applying forces to atoms and updating their positions and velocities.Ultimately, the outcome is a dynamic, three-dimensional, atomic-level resolution representation of a protein, e.g. the behavior of a PKMt in the presence of its substrate and SAM (Hospital et al. 2015;Hollingsworth and Dror 2018).
to calculate the forces acting on each atom, a molecular mechanics force field is employed.this force field is derived and calibrated to fit to the results of quantum mechanical calculations and often incorporates specific experimental parameters (Monticelli and tieleman 2013).the forces can be categorized into two main types: bonded interactions and non-bonded interactions.Bonded interactions describe the chemical bonds between atoms by a bond potential, which encompasses stretching, bending, and torsional rotation.Non-bonded interactions involve can der waals interactions described by the lennard-Jones potential accounting for both attractive and repulsive forces, and electrostatic forces between charged atoms which carry full or partial charges modeled by a coulomb potential and an appropriate dielectric constant (Figure 2(A)).electrons are not explicitly treated in classical molecular dynamics force fields.comparison of simulations with a variety of experimental data indicates that force fields have improved substantially over the years (lindorff-larsen et al. 2012;lopes et al. 2015;chmiela et al. 2018;Dasetty et al. 2019), but the uncertainty introduced by the approximations and small inaccuracies inherent in the force fields should be considered when interpreting simulation results.Moreover, in a classical MD simulation, no covalent bonds are formed or broken meaning that chemical reactions cannot be directly represented.Hybrid quantum mechanics/molecular mechanics (QM/MM) simulations, in which a small part of Figure 2. molecular dynamics (md) and hybrid quantum mechanics/molecular mechanics (Qm/mm) simulations have been applied to investigate protein properties.A| in md simulations, atoms are connected via spring-like bonds.the bonded forces comprise bond stretching, angle bending and dihedral torsion.non-bonded interactions consist of electrostatic and van der waals forces.electrostatic forces are applied between fully or partially charged atoms.van der waals forces are described using the lennard-Jones potential.B| in Qm/mm simulations, the simulated system is divided into a Qm region (blue transparent sphere) and a mm region.in the mm region, forces are calculated as for md simulations using force fields.in the Qm region, bonds can be broken or formed, and atoms are treated using quantum mechanics.in applications with PKmts, the active site has been treated using Qm methods to cover the methyl group transfer, the rest of the enzyme and bulk solvent was treated with mm (image created using simulation results of PdB 5v21 (Zhang et al. 2017)).C| Phylogenetic tree of set domain-containing PKmts.modified from (wu et al. 2010; richon et al. 2011; luo 2018).
the system is modeled using quantum mechanical calculations taking electron densities into account, and the remainder of the system is modeled by MD simulation, are frequently employed to study chemical reactions and enzyme catalysis that involve changes in covalent bonds.this method is described in the next chapter.
to ensure the stability of the molecular structures during the simulation, the time steps in an MD simulation must be short, typically only a few femtoseconds (10 −15 s) each. in contrast, conformational changes of enzymes take place on timescales of nanoseconds, microseconds, up to seconds or even minutes.Hence, billions of time steps are needed to capture the complete dynamics.in each step, all interatomic interactions in the system are calculated, which is computationally expensive.Over time, the continuous and almost exponential advancements in high performance computing power, especially the availability of powerful Graphics Processing Units (GPU), have allowed for more efficient and longer simulations.An example, benchmark simulations with a 159-residue protein (DHFR) in explicit solvent (<24000 atoms) using OpenMM (eastman and Pande 2010; eastman et al. 2017) yielded >1 ms/day on a Nvidia Rt2080ti/A40/v100 system (https:// openmm.org/benchmarks).to simulate even longer and larger systems, coarse-grained MD simulations offer a loophole.Here, one artificial particle represents a group of atoms rather than a single atom.thereby, the resolution decreases but accessible timescales increase by orders of magnitude (Marrink and tieleman 2013).However, these approaches are purely based on empirical force fields which need to be established beforehand and calibrated with experimental data.Recent reviews on this technique can be found in (Kmiecik et al. 2016;Joshi and Deshmukh 2020).
Due to limited computational resources, it is necessary to restrict the system size in simulation experiments.therefore, periodic boundary conditions are used, and the system of interest is confined in a box, assuming that the properties of the actual system can be approximated by a virtual infinite system of repeating side-by-side boxes.if a molecule passes through the initial box boundary, it will re-enter the box from the opposite boundary, forming a periodic space (Meller 2001;Yu and Dalby 2020). in summary, MD simulations are a helpful tool to reveal molecular processes at an atomistic resolution and can be used to

QM/MM simulations
Hybrid quantum mechanical/molecular mechanical (QM/MM) methods are simulation approaches combining quantum mechanical and molecular mechanical models in one simulation.therefore, the simulated system is divided into two regions (Figure 2(B)): the QM region includes the atoms that are treated using quantum mechanics, while the MM region consists of the surrounding environment described by a force field and simulated by MD methods.the QM region is typically small and localized around the specific chemical processes or properties of interest, such as the atoms involved in chemical reactions, or chromophores, if spectroscopic properties are studied.Quantum mechanical methods, such as density functional theory (DFt) are applied to this region to accurately describe the electronic structure and quantum effects.On the other hand, the MM region represents the larger rest of the protein surrounding the QM region, also including the environment like solvent molecules, ions, and other biomolecules.the MM region is treated using force fields, which describe the behavior of atoms based on bonded and non-bonded interactions as described for MD simulations without forming or breaking covalent bonds.the QM and MM regions interact through electrostatic forces, van der waals interactions, and sometimes covalent bonding.the boundary between the QM and MM regions is typically defined by a set of atoms called the 'QM/MM interface' or 'link atoms' , which bridge the two regions.For a more detailed review focusing also on the physical, mathematical and methodological basis of hybrid QM/ MM simulation approaches, we refer to (Senn and thiel 2009;Groenhof 2013;van der Kamp and Mulholland 2013).
Drawbacks of QM/MM simulation approaches are additional computational cost for the high accuracy in the QM region.this goes in hand with a limited system size, timescale and sampling of the conformational space.Nevertheless, hybrid QM/MM methods provide a powerful tool for studying complex biological systems, as they combine the accuracy of quantum mechanics for the region of interest with the efficiency of classical mechanics for the surrounding environment.they have been widely applied to investigate chemical reactions, enzyme mechanisms or protein-ligand interactions (Bramley et al. 2023).More precisely they can be used to investigate: • enzymatic reactions by treating the reactive region of the system (e.g. the active site) at the quantum mechanical level, while the surrounding environment (e.g.solvent, and the rest of the protein) is treated with molecular mechanics.this allows for an accurate description of the reaction mechanism and the influence of the surrounding environment on the reaction kinetics.

•
Protein-ligand interactions, where the ligand can be treated quantum mechanically to accurately describe its electronic structure and reactivity, while the protein environment is described with molecular mechanics.this approach allows for the investigation of binding mechanisms, protein-ligand interactions, and the precise role of specific residues in ligand binding.

•
Proton transfer reactions in biomolecules, such as enzymatic proton transfers or proton-coupled electron transfer reactions.these simulations can reveal the reaction mechanism for the investigated enzyme, protonation states of key residues, and the influence of the surrounding protein environment on the proton transfer process.

•
Photoactive proteins, where QM/MM simulations can be used to study the excited-state properties of photoactive proteins.thereby, the precise mechanisms of light absorption, energy transfer, fluorescence and signal transduction can be uncovered.

Potential mean force calculations
Free energy calculations were frequently used in the studies described in this review to examine enzymatic reaction mechanisms.their goal is to determine the free energy difference between reactants and products along a predefined reaction coordinate.Free energy calculations for enzyme reactions in QM/MM simulations typically involve the use of enhanced sampling techniques and the calculation of potential of mean force (PMF) profiles.At first, a suitable reaction coordinate needs to be defined that precisely describes the progress of the reaction.this could be an atom to atom distance, an angle, or any other collective variable that captures the essence of the reaction.the reaction coordinate should span from the reactant state to the product state.in the case of a methyl group transfer, the distance of the methyl group c-atom to the donor and acceptor atoms appears suitable to describe the reaction.For PMF calculations, QM/MM is the simulation method of choice, since covalent bonds might be broken or formed.the active site or the amino acids directly involved in the reaction are treated quantum mechanically, while the rest of the enzyme, solvent, and any other surroundings are treated with molecular mechanics.Since enzyme reactions often involve high energy barriers, enhanced sampling techniques are used to explore different regions of the reaction coordinate and enhance the sampling of rare events (trzesniak et al. 2007).Methods like metadynamics, umbrella sampling, or adaptive biasing force are commonly employed to overcome energy barriers and enhance conformational sampling along the reaction coordinate (Park and Schulten 2004;You et al. 2019;Yoneda et al. 2021).By performing simulations at different stages of the reaction along the reaction coordinate and applying biasing potentials or forces, the PMF can be calculated.the PMF represents the free energy profile along the reaction coordinate, providing information about the free energy barriers and the relative stability of different states along the reaction pathway.the free energy difference between the reactant and product states are determined by integrating the PMF profile.this provides insights into the thermodynamics and kinetics of the enzyme reaction, including reaction energetics, transition state (tS) stability, and barriers needed to be overcome.Different substrates, products and enzyme mutants can be compared for their ability to minimize the free energy using their catalytic machinery (You et al. 2019).However, the obtained parameters need to be compared to experimental data and carefully used in qualitatively statements by e.g.comparing the differences of two states.An example PMF profile is discussed and visualized in the section Analysis of the product specificity and processivity of PKMTs and in Figure 5(A).
Firstly, the addition of methyl groups to lysine does not affect the overall charge of the residue at physiological pH, unlike acylation modifications that convert the positively charged ε-amine into a neutral amide.Secondly, lysine methylation represents the smallest PtM, resulting in minimal changes to the size of the side chain when compared to other types of lysine modifications (luo 2018).thirdly, up to three methyl groups can be transferred to a single target lysine creating monomethyl lysine (Kme1), dimethyl lysine (Kme2) and trimethyl lysine (Kme3) and it even can be combined with other modifications as in the recently discovered acetyl-methyllysine marks (lu-culligan et al. 2023).lysine (K), Kme1 and Kme2 possess lone-pair electrons on their ε-amine groups, which can be methylated.However, due to their high pK a values (10.2 − 10.7), the ε-amines of K, Kme1, and Kme2 predominantly exist in a protonated state under physiological conditions (pH 7.4) making them unreactive as a nucleophile.therefore, one critical requirement of the lysine methylation reaction catalyzed by PKMt enzymes is to overcome the activation barrier of lysine deprotonation.At the next stage, the deprotonated lysine and SAM must be bound in a conformation that facilitates the subsequent nucleophilic substitution reaction leading to the transfer of the methyl group.
the human genome encodes over 60 characterized PKMts, which can be categorized into two classes: Set domain-containing PKMts (class v methyltransferase, denoted as Set due to the genetic phenotypes observed in Drosophila called Suppressor of variegation 3-9, enhancer of zeste, and trithorax) and "non-Set" 7BS Mtases (class i methyltransferases) (Figure 2(c)) (copeland et al. 2009;Richon et al. 2011;luo 2012).the majority of characterized PKMts belong to the Set domain family (approximately 90%) but several human 7BS PKMts have already been investigated as well (luo 2018).PKMts can act in complexes with other proteins or contain domains to achieve binding to certain structures, recognition of specific modifications of the substrate protein or regulation of their own activity.this often includes domains interacting with modified histone tails, like chromodomains, bromodomains, tudor domains, PHD domains or PwwP domains (Patel and wang 2013).For the Mll (Mixed-lineage leukemia) complex PKMts, also known as cOMPASS (complex Proteins Associated with Set1), it is necessary to associate with other proteins and establish important interaction with each other and the nucleosome to stimulate their methylation activity (Shilatifard 2012;Basta and Rauchman 2015;Park et al. 2019).it is therefore crucial to take caution when attributing their biological functions solely to the methyltransferase domain of PKMts, as these domains may be heavily influenced by interactions partners in their activity and substrate targeting and can have biological functions even independent of their catalytic activities.

SET domain-containing PKMTs
the Set domain is responsible for the methylation activity of this family of PKMts. it consists of approximately 130 amino acids and is often flanked by so-called pre-Set and post-Set domains (Qian c and Zhou 2006).through phylogenetic analysis of Set domain sequences, human Set domain PKMts can be further classified into subfamilies, each with its own characteristic structural arrangement (wu et al. 2010).For example, G9a (aka eHMt2, KMt1c), SUv39H1 (aka KMt1A) and SUv39H2 (aka KMt1B) belong to the classical PKMt subfamily, where their Set domains alone are sufficient for catalyzing the methyl group transfer (Figure 3(A)) (tachibana et al. 2001).On the other hand, SetD2, NSD1 (aka KMt3B), NSD2 (aka MMSet, wHSc1), and NSD3 (aka wHSc1l1) reside within a subfamily of PKMts featuring an autoinhibitory loop (Figure 3(B)) (An et al. 2011;Yang et al. 2016;Bennett et al. 2017). in this case, the apo enzyme of the Set domain is expected to be catalytically inactive (or only weakly active) and needs to undergo conformational changes for substrate binding and enzyme activity.the Set domains of the Mll subfamily PKMts are inactive on their own, but become catalytically active in the presence of binding partners such as wDR5, RbBP5, ASH2l, and DPY30, collectively referred to as wRAD (Figure 3(c)) (cao F et al. 2014;Borkin et al. 2015;Grebien et al. 2015).Similar requirements for complex formation have been observed in eZH1 and eZH2 (aka KMt6) (eZH1/2, eeD and Suz12 referred as PRc2 complexes) (Kim w et al. 2013;He et al. 2017).the 17 members of the PRDM subfamily contain a PR domain that is similar to the Set domain and a variable number of Zinc (Zn)-finger repeats (Di Zazzo et al., 2013).the 5 member SMYD subfamily is characterized by an insertion of a MYND domain  (Jayaram et al. 2016)).set domain-containing PKmts incorporate Zn-ions for structural stability in their aws (associated with set, magenta), post-set (yellow) or mynd (rose) domain depending on the enzyme (dillon 2005, wu et al. 2011).however, the Zn-ions are not involved in catalysis or conformational changes and they are not shown explicitely in protein structures presented in this review.B| setd2 complexed with the h3K36 substrate peptide, and cofactor sam (PdB 5JlB (yang et al. 2016)).the autoinhibitory loop (rose) is in an open position to accommodate the protein substrate.C| mll1 set domain (white) associated with wdr5 (green), rbBP5 (light blue), ash2l (rose) and dPy30 (cyan) bound to a nucleosome core particle (PdB 6Pwv (Park et al. 2019)).mll1 set domain complexed with the h3K4 peptide (cyan, wiht the target K in pink) (PdB 6uh5 (hsu et al. 2019)).D| smyd2 complexed with the erα substrate peptide (PdB 4o6F (Jiang et al. 2014)) showing the the bilobal or clamshell-like structure and the mynd domain.E| 7Bs PKmt dot1l and cofactor sam (PdB 1nw3 (min et al. 2003)).the architecture consists of a dot1l specific region (magenta), a seven-beta sheet rossman fold (white) and a ubiquitin interaction region (green).
(myeloid translocation protein 8, Nervy and DeAF-1), within their Set domain.the MYND domain is responsible for protein-protein interactions possibly recruiting the enzymes to specific substrate proteins.SMYD enzymes are characterized by a bilobal architecture with the protein substrate in the middle (Figure 3(D)) (Saddic et al. 2010;Sirinupong et al. 2010;Ferguson et al. 2011;Sirinupong et al. 2011;Mazur et al. 2014;Mzoughi et al. 2016).

Simulations of the catalytic mechanism of PKTMs
Key mechanistic features of PKMts, like the S N 2 reaction mechanism and the deprotonation of the lysine substrate, have been investigated by simulations technologies and the results are discussed in the next chapters.

Investigation of the S N 2 reaction mechanisms
QM/MM simulations of Set7/9 (aka SetD7, Set7, Set9, KMt7) were first to investigate details of the bimolecular nucleophilic substitution (S N 2) mechanism PKMt use to transfer a methyl group from the cofactor SAM to the ε-amino group of a lysine residue (Hu and Zhang 2006).the lysine nitrogen (Nε) is first deprotonated (as described later) and then acts as the nucleophile attacking the SAM methyl group, whereas the SAM sulphonium cation (S + ) acts as the leaving group converting SAM to SAH. the free electron pair of the deprotonated lysine nitrogen is present in a sp 3 orbital at an 109° angle.the S N 2 reaction occurs at an aliphatic sp 3 carbon center (the c-atom of the transferred methyl group) with the electronegative sulphonium leaving group attached to it.the nucleophile attacks the carbon at a minimal distance of approximately 2.2 Å (chen et al. 2019).Breaking of the c-S bond and the formation of the new bond between the methyl c-atom and the nucleophile occurs instantaneously through a trigonal bipyramidal tS in which the methyl c-atom is sp 2 hybridized.the nucleophile attacks the methyl c-atom at a 180° angle to the leaving group, since this provides the best overlap between the nucleophile's lone pair and the c-S antibonding orbital.the leaving group is then pushed off at the opposite side and the product is formed with inversion of the tetrahedral geometry at the central carbon atom, which is irrelevant in case of the achiral methyl group.the tS structure then collapses and the methyl group covalently binds to the nucleophile, the lysine Nε atom in case of PKMts, and SAH leaves as product (copeland et al. 2009).important to note is that the methyl group is transferred rapidly to a deprotonated target lysine once a geometry favorable for the reaction has been achieved due to the high group transfer potential of SAM, thereby preventing a reprotonation. in conclusion, three geometric criteria can be used to describe a state which most accurately describes the tS of the methyl group transfer (Schnee et al. 2022;Khella et al. 2023): • the distance between the lysine Nε and the SAM methyl c-atom is < 4.6 Å. • the angle between the lysine Nε, the lysine cδ and the SAM methyl c-atom is close to 109°.• the angle between the lysine Nε, the SAM methyl c-atom and the SAM S-atom is close to 180°.
later studies based on a combination of computational modeling, QM/MM and kinetic isotope effects showed that PKMt reactions can have different transitions states depending on the enzyme and methylation substrate (linscott et al. 2016;Poulin, Schneck, Matico, McDevitt, et al. 2016;chen et al. 2019).For example, in case of Set8 (aka Pr-Set7, SetD8, KMt5A) an early S N 2 tS was shown with a short c − S distance (2.0 Å) and a long c − Nε distance (2.4 Å) while a late S N 2 tS was observed for NSD2 with a long c − S distance (2.5 Å) and a short c − Nε distance (2.1 Å) (Figure 1(c)).these are interesting results indicating that although the lysine nucleophile, methyl group transfer and leaving groups are identical, the nature of the tS differs between PKMts.
Other studies approached the question, how the enzyme-substrate complex reaches a tS conformation with MD simulations.Since these simulations cannot account for bond breakage or formation, tS-like conformations were defined as conformations, which fulfill the geometric S N 2 reaction criteria.the frequency of formation of tS-like conformations was then monitored throughout the simulation with different substrates, enzyme mutants or conditions.thereby, MD simulations can estimate the likelihood of a S N 2 reaction without simulating the actual methyl transfer.Of note is that this can only provide qualitative statements and a relation to quantitative biochemical turnover numbers is not recommended (Schnee et al. 2022;Khella et al. 2023).
A combination of MAlDi-tOF MS experiments, NMR analysis, QM/MM and MD simulations revealed that lysine has the optimal length and shape for the S N 2 methyl group transfer in the active site of PKMts.Shorter or longer alkane chains containing a terminal amine (ornithine and homolysine, respectively) lead to a disruption of the S N 2 reaction geometry (Figure 4(A)) (Al temimi et al. 2017). in MD simulations of Set7/9, the average distance between the SAM methyl c-atom and the substrate lysine Nε was 3.2 Å and the nucleophile and leaving group were in line fulfilling the geometric 180° criteria for a S N 2 reaction.For ornithine placed in the same substrate peptide, the average Nδ to methyl c-atom distance was 4.5 Å and the nucleophile and leaving group were not in line.this difference was also represented in the free energy (PMF) profiles of the two methylation reactions, where exchanging lysine to ornithine raised the free energy barrier from 18.8 to 32 kcal/mol.it was speculated that the lysine nitrogen is stabilized and oriented in the active site in a productive conformation by surrounding residues, namely Y305 in case of Set7/9.the stabilization and orientation were less efficient for the shorter ornithine (Al temimi et al. 2017).Unnatural amine containing N-nucleophiles with the same length as lysine (e.g.azalysine) showed methylation activity, whereas amide/ guanidine-containing N-nucleophiles as well as simple O-and c-nucleophiles were not methylated by Set7/9 (Al temimi et al. 2019a).Surprisingly, azalysine had a slightly lower free energy barrier compared to lysine in the Set8 catalyzed methylation reaction (18.0 and 19.4 kcal/mol, respectively).it was assumed that the Nε of azalysine is slightly more acidic than that of lysine and undergoes easier deprotonation in the PKMt active site.the nucleophilic character and the similar conformation of azalysine in the active site compared to lysine may therefore explain the observations that azalysine can in general undergo the PKMt-catalyzed methylation to a similar degree as lysine (Al temimi et al. 2019a).Slightly bulkier substrates like cyclopropyllysine were also accepted by Set7/9 as substrates.However, severely bulkier substrates like meta-amino phenylalanine were not accomodated (Al temimi et al. 2020).Apart from altering the size and shape of lysine mimics, one methylene group in the hydrophobic carbon side chain was substituted with a sulfur resulting in the unnatural amino acid γ-thialysine.enzyme kinetics and QM/MM free-energy calculations revealed that γ-thialysine and lysine exhibit comparable efficiencies for methyl transfer reactions, indicating that γ-thialysine is a good lysine mimic for PKMt catalysis (Simon et al. 2007;Al temimi et al. 2019b).

Mechanism of target lysine deprotonation
An essential parameter of lysine methylation is the protonation state of the lysine Nε atom.whereas under physiological conditions the lysine side chain is protonated, it needs to be deprotonated in the active site of PKMts to become nucleophilic (trievel et al. 2002).First models trying to explain the deprotonation of the target lysine focused on a conserved tyrosine residue in the Set domain which might deprotonate the target lysine, e.g.Y335 in Set7/9 (trievel et al. 2002;Kwon et al. 2003), Y283 in DiM-5 (Zhang et al. 2002), Y287 in Rubisco lSMt (trievel et al. 2003), and Y336 in Set8 (Jacobs et al. 2002;Xiao et al. 2005).this hypothesis was ruled out based on the crystal structures of the PKMt in complex with peptide substrates, which showed that Y335 donates a hydrogen bond to the backbone carbonyl oxygen of A295 and it is too far away from the target lysine to perform the deprotonation function (wilson et al. 2002;Zhang et al. 2002;trievel et al. 2003;Xiao et al. 2003;couture et al. 2005;Xiao et al. 2005).later, Guo and coworkers suggested a refined model, in which Y335 donates a hydrogen bond to the backbone carboxyl oxygen of A295 when SAM was bound (Guo and Guo 2007). in simulations without bound cofactor, they observed that Y355 donates a hydrogen bond to a water molecule suggesting Y355 could be deprotonated in this environment.Subsequently, the deprotonated Y355 was suggested to deprotonate the target lysine.this mechanism was later challenged as the calculated pK a of Y335 (>13) indicated that Y355 cannot take the role as base in this process (Zhang and Bruice 2007b).Although Y355 could not be responsible for target lysine deprotonation, it was shown via NMR 1 H chemical shift coupled with quantum mechanics calculations that the Y335 hydroxyl group has another critical function in catalysis, because it helps to stabilize the SAM methyl group by a cH-O hydrogen bond and is therefore indispensable for SAM binding and stabilization of the partial positive charge on the methyl group c-atom developing in the tS (Horowitz et al. 2011;Horowitz et al. 2014).
Despite the lack of identification of a general base in the catalytic center of Set7/9, computational modeling, MD and QM/MM simulations by Zhang and Bruice showcased a conserved mechanism for the deprotonation of the target lysine for multiple Set domaincontaining PKMts (Zhang and Bruice 2007b). in this mechanism, the deprotonation of the Nε occurs through the transient formation of dynamic water channels in the enzymes' active sites.the residues involved in this water channel are G264, G292, A295, Y305, Y335, Y337 in the case of Set7/9 (Hu and Zhang 2006;Zhang andBruice 2008b, 2008a).A water molecule was frequently discovered in the crystal structures of Set domain-containing PKMts, however it cannot act as final proton acceptor by itself, since H 3 O + is a much stronger acid than Kme1.instead, the water molecule is suggested to transfer the proton through a chain of water molecules into the aqueous solvent and finally to a buffer molecule (Figure 4(B-D)).Adding to this, the electrostatic interactions between the positive charges of the SAM sulfonium moiety and the protonated Nε atom decrease the pK a of the latter from 10.9 to 8.2 (Zhang and Bruice 2007b).this could explain the weak methylation activity of PKMts in acidic and even neutral buffers, as the necessary deprotonation of the target lysine is impeded. in a basic environment, the activity of PKMts was observed to be increased.Biochemical studies showed that Set7/9 and DiM-5 are active at pH 8 or higher and have a pH optimum of ~10 (wilson et al. 2002;Zhang et al. 2002).Based on this, Bruice and Zhang suggested a stepwise process (Zhang and Bruice 2007b), in which 1) the water channel appears, 2) the target lysine is deprotonated, and the proton is transferred into the solvent, 3) the target lysine is methylated using the methyl group provided by SAM.Of note, the steps 2 and 3 can occur in a concerted manner.this concept was based on MD simulation results of several Set domain-containing PKMts, in which the water channel appearance was tracked and correlated with the experimentally observed product specificity (Zhang and Bruice 2007a, 2007b, 2007c, 2008b, 2008c, 2008a).As an example, a water channel was observed for unmethylated and monomethylated lysine substrates in the Rubisco lSMt, but not for dimethylated lysine, explaining the biochemically observed dimethyltransferase activity of this enzyme.it is important to note, that no water channel could be detected in the absence of cofactor indicating that essential conformational changes occur upon SAM binding.
exceptions for this model need to be made for 7BS PKMts like DOt1l (aka KiAA1814, KMt4).Here, the residues located at the target lysine binding pocket seem to be incapable of facilitating a deprotonation or formation of a water channel.it has been speculated that a more hydrophobic active site could reduce the pK a of the target lysine and that the carboxylate of SAM could help in the subsequent deprotonation process (Min et al. 2003;cheng et al. 2005;cortopassi et al. 2016).

Analysis of the product specificity and processivity of PKMTs
Set7/9 was initially identified as a histone H3 lysine 4 (H3K4) monomethyltransferase and it was one of the first model enzymes to investigate the product specificity of PKMts (Hu and Zhang 2006;Zhang and Bruice 2007b;Dhayalan et al. 2011).Product specificity refers to the capability of PKMts to transfer a precise number of methyl groups to its target either only one, or up to two or three methyl groups creating Kme1, Kme2 or Km3, respectively.Despite their similarity in the Set domain, PKMts exhibit different product specificities.Distinct mechanisms are hence needed to control the number of methylation steps for each individual enzyme.in QM/MM simulation studies, Set7/9 was confirmed as a monomethyltransferase.the free energy barriers determined by PMF for the monomethyl transfer were 22.5 kcal/mol (wang et al. 2007;Hu et al. 2008;Zhang and Bruice 2008b) or 17-19 kcal/mol (Guo and Guo 2007;Zhang and Bruice 2007c;Yao et al. 2012) depending on the study.Of note, the free energy for a spontaneous methyl transfer in aqueous solution was determined to be 8 kcal/mol higher illustrating the catalytic function of the enzyme (wang et al. 2007).For further methylation of Kme1 substrates to Kme2, an increase in free energy of 3-5 kcal/mol was observed consistent between the studies.For the generation of Kme3 from Kme2, a further increase in free energy of 3 kcal/mol was observed (Figure 5(A)) (Guo and Guo 2007;Hu et al. 2008;Yao et al. 2012).Multiple explanations for the distinct product specificities of PKMts were proposed, which will be discussed in the following paragraphs.

Regulation of PKMT product specificity by deprotonation of the target lysine
PKMts deprotonate the target lysine prior to the methyl transfer to create a suitable nucleophile.the deprotonation process could therefore be a regulatory step controlling the product specificity.As described above, Zhang and Bruice showcased with multiple PKMts that a chain of water molecules forms a water tunnel to deprotonate the lysine Nε (Zhang and Bruice 2007b). in the context of the product specificity of Set7/9, they proposed that the water channel only forms under defined conditions. in MD simulations of Set7/9, the water channel was only observed in the presence of SAM and unmethylated K4 peptide, but not without cofactor, with SAH or if K4me1 was bound.this could regulate the lysine deprotonation and stop the methylation process after monomethylation.Mechanistically, the methyl group of the monomethylated peptide takes the position of the proton that would be removed through the water channel.Deprotonation and further methylation of Kme1 is therefore impossible (Figure 5(B)) (Zhang and Bruice 2008b).
later, ~10 times longer MD simulations found that Set7/9 can weakly dimethylate P53-K372, suggesting that there exists a strong preference for monomethylation, but not an absolute limitation to the formation of only Kme1.However, these studies confirmed the proposed mechanism, as a water tunnel was found as well in the complex of monomethylated lysine (Bai et al. 2011).Methylation stopped at the dimethylated lysine because this together with freshly bound SAM disrupted the water channel.Free histone methylation experiments indeed showed that Set7/9 can act also as a weak dimethyltransferase (Kwon et al. 2003).Furthermore, Dhayalan et al. showed that although Set7/9 was able to transfer two methyl groups to both histone-and non-histone targets in vitro, it catalyzed dimethylation with much lower rates (~10% of the mono-methylation rate) (Dhayalan et al. 2011).while lysine deprotonation might influence the product specificity, QM/MM studies suggest that the deprotonation is not the rate-determining step in the methylation of an unmethylated lysine, as the calculated barrier for the proton transfer is 8.4 kcal/mol, which is more than 10 kcal/mol lower than that of the methyl transfer step (22.5 kcal/ mol) (Hu et al. 2008).

Regulation of PKMT product specificity by steric constraints
PKMts transfer one or multiple methyl group(s) from the cofactor SAM to a target lysine in a linear S N 2 reaction mechanism (trievel et al. 2002).the above-mentioned geometric requirements are crucial to form the transitions state.As pointed out by Zhang and Bruice in MD and QM/MM simulations of the Set7/9 complexed with the K4 or K4me1 peptide, the distance between the SAM sulfur group and the lysine nitrogen was higher for K4me1 (6.1 Å) than for the unmethylated K4 (5.7 Å) (Zhang and Bruice 2007b).A similar observation was made by Hu and Zhang in QM/MM simulations of Set7/9 with a bound peptide substrate, where the Set7/9 Y245 positioned the unmethylated lysine nitrogen for the S N 2 reaction by hydrogen bonds (Hu and Zhang 2006). in simulations with the H3K4me1 peptide, Y245 precluded the rotation of K4me1, leading to a blockage of the active site, which prevented the adoption of a productive S N 2 tS conformation (Figure 5(c)) (Hu and Zhang 2006;Yao et al. 2012).to support this model, the Set7/9 Y245A mutant was created and tested for product specificity showing that the in-silico activity towards unmethylated lysine decreased whereas di-and trimethylation activity was elevated (Hu and Zhang 2006;Yao et al. 2012).this was supported by biochemical experiments showing that the K m of Set7/9 wt and Set7/9 Y245A towards the unmethylated tAF10 peptide (K189 as target lysine) were only slightly different (wt 160 ± 17 µM, Y245A 200 ± 35 µM), whereas the k cat differed heavily (wt 17 ± 0.6 min −1 , Y245A 0.53 ± 0.04 min −1 ) (Del Rizzo et al. 2010).this indicates that the peptide binding is not altered by Y245A, but the stabilization of the S N 2 tS involving Kme0, Kme1 and Kme2.
the space emptied in the active site pocket by the Y-to-A mutation allowed for accommodation and proper orientation of the methylated lysine residues.these simulations are in agreement with results of previous structural and biochemical experiments, which showed that the Y245A mutation converts Set7/9 into a trimethyltransferase with weak monomethyltransferase activity (Xiao et al. 2003).Based on this, the Y245A mutation was postulated to function as a switch between mono and up to trimethylation activity, which was later confirmed by crystal structures showing the bound di-and trimethylated tAF10 (tAtA box binding protein associated factor) peptide in the Set7/9 Y245A active site (Del Rizzo et al. 2010).Additionally, QM/MM simulations of Set7/9 Y245A showed that the free B| restricted second methylation because of a disrupted water channel and blocked lysine deprotonation of monomethylated target lysine (green, sticks, PdB 1XQh).C| the s n 2 ts cannot be adopted if a monomethyl substrate is present.D| the F/y-switch position controls the product specificity of certain PKmts.Phenylalanine (white, sticks) provides more space in the active site allowing to accommodate a dimethyl product (pinks, sticks), while the additional hydroxyl group of a tyrosine leads to sterical clashes preventing formation of the dimethylated product.E| Position of the tyrosine residues 245, 305, 335 in the set7/9 active site discussed in the main text.
energy barrier was similar for mono-, di-and trimethylation (Yao et al. 2012) indicating that the reaction could proceed similarly once a productive conformation is reached.
A similar observation regarding the critical orientation of the target lysine was made for PRDM9 (aka PMF6) Y276F, which structurally overlaps with Set7/9 Y245.Here, QM/MM simulations found that the energy barrier of Y276F increased for the first methyl transfer compared to the wt (chu et al. 2015), which was confirmed by biochemical experiments (wu et al. 2013).this effect is potentially caused by the tyrosine hydroxyl group, which forms hydrogen bonds with the Nε atom of the target lysine residue and helps stabilizing the S N 2 tS (Hu and Zhang 2006;Yao et al. 2012).this coordination is missing for the phenylalanine mutant accounting for lower tS stabilization.

Regulation of PKMT product specificity by F/Y switches
in multiple sequence alignments it had been identified, that PKMts that possess a tyrosine at a so-called F/Y switch position are limited to catalyzing mono-or dimethylation, whereas enzymes that possess a phenylalanine or another hydrophobic residue at this position display di-or trimethyltransferase activity (collins et al. 2005).this observation was validated and rationalized in a variety of MD, and QM/MM simulations, biochemical and structural experiments.in many examples, the switch of that specific tyrosine or phenylalanine to the other amino acid led to changes in product specificity (trievel et al. 2003;Zhang et al. 2003;collins et al. 2005;Qian c and Zhou 2006;wu et al. 2010;cortopassi et al. 2016;DiFiore et al. 2020).Swaps in product specificity were observed with several Set domain-containing PKMts, e.g. the trimethyltransferase DiM-5 can be converted into a mono/ dimethyltransferase by the F281Y mutation (Zhang et al. 2003).the monomethyltransferases Set7/9 and Set8 can be changed to dimethyltransferases through the corresponding Y/F mutation, Y305F for Set7/9 (Zhang et al. 2003;Del Rizzo et al. 2010) and Y334F for Set8 (couture et al. 2005; couture et al. 2008; chu et al. 2010). in Mll3 (aka KMt2c), the Y4884c mutation which has been observed as somatic mutation in cancer cells was shown to convert the enzyme from a monomethyltransferase with substrate preference for H3K4me0 to a trimethyltransferase with H3K4me1 as the preferred substrate (weirich et al. 2015).Hence Y4884c can be considered a natural mutant using the F/Y switch mechanism for the control of the product specificity of a PKMt. the F/Y switch might not be fixed to a defined position, as more than one position can function in this way.the dimethyltransferase G9a could be turned into a trimethyltransferase by the Y1067F mutation (wu et al. 2010), but into a monomethyltransferase by F1152Y indicating that two susceptible positions exist (collins et al. 2005).likewise, the dimethyltransferase GlP (G9a-like Protein) could be converted into a monomethyltransferase by F1209Y (collins et al. 2005) and into a trimethyltransferase by Y1124F (wu et al. 2010).However, it is important to note that not every of the tyrosine residues surrounding the active site can be mutated into a phenylalanine to alter the product specificity, since Set7/9 Y245F is catalytically inactive (Xiao et al. 2003).
the mechanistic basis of the F/Y-switch solely relies on the presence of one single hydroxyl group.the missing hydroxyl group in the Y-to-F mutants creates additional space in the active site.thereby, accommodation of water molecules gets easier, leading to an easier deprotonation of the target lysine.in crystal structures of Set7/9 Y305F and the tAF10me1 substrate peptide, water molecules were found in the active site.this was not the case in corresponding structures of wt Set7/9 (Del Rizzo et al. 2010).Additionally, MD and QM/MM simulations of Set8 and its F/Y switch mutant Y334F showed a lowered energy barrier for the dimethylation reaction for the Y334F mutant and a connection to the water channel (chu et al. 2010; chu et al. 2012). in contrast, the effect of F-to-Y mutations, which turns trimethyltransferases to mono/dimethyltransferases, could be based on steric effects by the additional hydroxyl group making the active site too narrow to accommodate multiple methyl groups at the lysine Nε (Figure 5(D)) (Hu and Zhang 2006;chu et al. 2012).the concept of active site volume dependent product specificity changes has also been illustrated for other mutations like the NSD2 t1150A cancer mutant turning NSD2 from a di-to a trimethyltransferase (Khella et al. 2023).

Investigation of the processivity of PKMTs
enzymes that perform multiple rounds of catalysis on a macromolecular substrate can do so in two different modes of action: in a distributive mechanism, each round of catalysis results in product dissociation and rebinding of a fresh substrate, whereas in a processive mechanism, multiple rounds of catalysis proceed on the same substrate before dissociation of the product.PKMts using a distributive mechanism dissociate from the substrate protein after methylation of the target lysine residue.Subsequent catalytic turnovers require rebinding of the enzyme to the protein substrate.this also involves dissociation of SAH and rebinding of a new SAM molecule to provide the next methyl group.each methylation event is generally independent leading to the stochastic generation of Kme1, Kme2 and Kme3, depending on the product specificity of the PKMt. in contrast, processive PKMts remain associated to their substrate protein and the target lysine residue remains bound in the active site pocket for multiple consecutive catalytic turnovers without dissociation.this allows them to methylate the same lysine residue multiple times without repeated dissociation and binding steps leading to the direct generation of higher methylation states.Still, after the first methyl transfer, SAH is replaced by a fresh SAM, indicating that a SAM/SAH exchange pathway must exist in the presence of the substrate protein.
Biochemical data showed that PKMts use both mechanisms.Several Set domain-containing PKMts and also PRMts were shown to perform multiple rounds of lysine methylation in a processive mechanism (cheng et al. 2005;Smith Bc and Denu 2009;van Aller et al. 2016).Kinetic studies revealed that G9a catalyzes the transfer of two methyl groups to H3K9 in a processive manner (Poulard et al. 2021), which was confirmed in competition methylation experiments (Patnaik et al. 2004).A similar behavior was observed for the H3K9 methyltransferase DiM-5 (Zhang et al. 2003), SMYD3 (aka ZMYND1) (van Aller et al. 2016) and the H3K36 methyltransferase NSD1 (Khella et al. 2023). in MAlDi mass spectrometry experiments of NSD1 starting with the unmethylated H3K36 peptide, a clear peak of H3K36me2 product was observed.in contrast, much lower methylation was detectable when the reaction was started under the same conditions with the H3K36me1 substrate indicating that H3K36me0 is the preferred substrate and two methyl groups were transferred in a processive reaction mechanism.
However, attention needs to be paid to details and the general assumption of Set domain PKMts always acting in a processive manner is incorrect.A distributive methylation mechanism was observed for example for Set domain monomethyltransferases with mutations, which caused them to exceed their monomethylation capability.Set8 Y334F was converted to a dimethyltransferase but exhibited a distributive reaction mechanism (couture et al. 2008).this effect was also observed for Mll1 Y3924F (Patel et al. 2009), Set7/9 Y305F and Y245A (Del Rizzo et al. 2010).the assumption is plausible, that many canonical di-and trimethyltransferases catalyze multiple methyl group transfers in a processive manner to avoid release of the monomethylated intermediates which in the case of histone methylation often transmit different biological signals than the di-or tri-methylated products.Monomethyltransferases, naturally have to release their substrate after the methyl group transfer.this mechanism seems to be carried over if monomethyltransferases were artificially converted into di-or trimethyltransferases, which then retain the distributive reaction mechanism.One reason for this could be that the active sites of monomethyltransferases have not been evolutionary optimized for multiple turnovers, e.g. by not allowing cofactor exchange while the protein substrate is still bound.the mutation allows for higher methylation levels in these enzymes, but the catalytic mechanism remains unchanged.An exception to this postulation might be the Drosophila Su(var)3-9, which was shown to transfer two methyl groups to H3K9 in a non-processive manner despite it represents the natural enzyme without any mutation (eskeland et al. 2004). in contrast to the Set domain-containing PKMts, DOt1l has been shown to perform multiple rounds of H3K79 methylation through a distributive mechanism (Frederiks et al. 2008).
Rationalizing why a certain PKMt or mutated PKMt operates in processive or distributive manner is still difficult in many cases (Zhao et al. 2022).MD simulation appears to be a suitable method to unravel this mechanism.Possible MD approaches to investigate the catalytic mechanism could involve the analysis of conformational changes and active site dynamics.comparing the active site dynamics and interactions between a processive and distributive a PKMt could give information about conserved mechanisms.Particularly, in a processive mechanism, the active site needs to undergo substantial conformational changes and rearrangements to firstly accommodate the transferred methyl group and subsequently position the target lysine correctly for the next round of catalysis without substrate dissociation.this mechanism also needs to include the requirement for cofactor exchange.these potential rearrangement mechanisms would not be required in distributive PKMts.
in a distributive mechanism, enzyme, product and cofactor dissociate after catalysis.A repulsive interaction between the product and active site residues after the methyl group transfer could therefore be detected.this can be observed using MD simulation techniques. in an alternative approach, the re-binding of methylated substrates could be probed.Distributive PKMts need to bind methylated substrates in contrast to processive PKMts. in such an experiment, adaptive sampling techniques like steered MD simulation (sMD), Monte carlo sampling or metadynamics could be used to observe differences between the substrate binding of distributive and processive PKMts. in the sMD approach, external forces are used to guide bimolecular association processes.thereby, reactions that otherwise would be too slow to be modelled in the timescale accessible to MD simulations are accelerated and at the same time the conformational sampling is concentrated along a specific, predefined reaction coordinate (Yang t et al. 2019).Here, it could be studied if a methylated substrate could efficiently be bound by the investigated PKMt.Distributive PKMts might show mechanisms or conformational changes to productively bind the methylated substrate, whereas processive PKMts should not. in metadynamics, the sampling of rare events is enhanced by adding bias to the potential energy surface.it builds up a history-dependent bias potential to push the system away from previously visited regions, allowing exploration of new conformational spaces and events (Bussi and laio 2020).For processive PKMts, the binding of a methylated substrate in the active site will be less likely compared to distributive PKMts.this would be represented in an energy increase needed to visit these states.

Mechanism of substrate specificity of PKMTs
Given their biological functions, it is essential that PKMts methylate specific lysine residues on histone tails and other proteins.Mistakes in the substrate choice would lead to aberrant methylation signals which could result in misregulation of chromatin states or protein activity.A highly specific recognition of the protein substrate by PKMts is therefore indispensable. in this context, multiple research opportunities arise.Approximately 60% of the Set domain-containing proteins in humans have well-documented methylation activity on histone and/or non-histone proteins (Husmann and Gozani 2019) while the activity of the remaining ones needs to be discovered.Moreover, in humans approximately 150 7BS proteins exist, which methylate various targets including lysine residues (Bhat et al. 2021).Among them, some PKMts have been discovered, but more enzymes of this class may be identified in future work.As soon as an initial methylation activity of a potential PKMt is detected, deciphering the substrate specificity can directly hint towards its natural methylation targets and the potential biological roles (Kudithipudi and Jeltsch 2016).Secondly, different PKMts might recognize their target amino acid sequence with a variety of recognition processes.learning about the individual mechanism directly benefits further studies, e.g.specific drugs design, in which molecules can be tailored towards the PKMts amino acids involved in the substrate recognition.

Analysis of the specificity of PKMTs by SPOT peptide array methylation
A highly suitable technique to decipher the substrate specificity of PKMts are experiments investigating the methylation of celluspots peptide arrays (Bock et al. 2011;weirich and Jeltsch 2022). in this technology, peptides are synthesized on a cellulose membrane with solid-phase peptide synthesis.the advantage of this approach is that numerous peptide spots can be synthesized on one membrane, each spot containing an individual peptide sequence (Figure 6(A)).the membrane is then incubated with the PKMt of interest and radioactively labeled SAM allowing to detect methylation by autoradiography.then, the signal intensity of the different spots directly indicates which substrate peptides are preferred and which are disfavored by the PKMt.when creating peptide arrays containing all possible single amino acid exchanges of the original substrate sequence, a PKMt-specific substrate specificity profile can be created (Figure 6(B)) (Rathert, Dhayalan, Ma, et al. 2008;Dhayalan et al. 2011;Kudithipudi, Kusevic, et al. 2014;Kudithipudi, lungu, et al. 2014;Schuhmacher et al. 2015;weirich et al. 2016;Kusevic et al. 2017).with the obtained specificity profile as a search sequence, novel protein substrate candidates have been identified (Rathert, Dhayalan, Murakami, et al. 2008;Dhayalan et al. 2011;Schuhmacher et al. 2020;weirich et al. 2020).During the investigation of the substrate specificity of the PKMt SetD2, it was found that the canonical H3K36 substrate sequence was not optimal for activity (Schuhmacher et al. 2020).Surprisingly, multiple single amino acid mutations caused a higher methylation of the peptide substrates.By combination of the amino acids found to be preferred, a non-natural peptide sequence referred to as 'super-substrate K36 (ssK36)' was created.this ssK36 peptide differed at 4 positions from the canonical H3 peptide sequence and was methylated about 100 times more efficiently (Figure 6(c)).
in the case of SUv420H2, specifically methylating H4K20me1 (but not H4K20me0) to H4K20me2/3, MD and QM/MM simulations showed that specific features in the active site are responsible for the particular preference of this enzyme for a monomethylated target (Qian P et al. 2017).the relatively low free energy barrier for dimethylation (17.9 kcal/mol) compared to that for monomethylation (23.9 kcal/mol) indicates the more effective tS stabilization with the monomethylated substrate.this is achieved through the strengthening of interactions occuring in the tS such as the cH•••O interactions involving the K20me1 methyl group and SUv420H2 residues F160, S161, A179, i181, Y217 and the presence of a cation − π interaction between F191 and the methyl group in the tS of the reaction (Qian P et al. 2017).the free energy barrier for trimethylation (23 kcal/mol) is then almost at the level of the monomethylation indicating the preferential generation of H4K20me2.

MD Simulations to investiate the substrate specificity of PKMTs
complementary to the biochemical approach described in the last chapter, MD simulations can be used to investigate the molecular mechanisms behind the substrate specificity of PKMts. to do this, MD simulation experiments of PKMts in complex with preferred and disfavored peptide substrates can be conducted.Potential differences between the two peptide-PKMt complexes regarding contacts or the stabilization of tS-like conformations might be observed.For this, known crystal or cryo-eM structures can be utilized as template even though they were obtained using SAH or target K-to-M mutant substrates, because the active cofactor SAM and the lysine in the substrate peptide can be modelled (Figure 6(D)).this approach was used to investigate the increased methylation activity of SetD2 towards the super-substrate peptide described above (Schnee et al. 2022).Multiple differences in the SetD2-peptide interaction profile were observed for the H3K36 and the ssK36 peptides.this was especially true for the regions surrounding the residues differing between both peptides.eventually, the altered residues in ssK36 established specific contacts leading to a better stabilization of tS-like conformations.Furthermore, both peptides approached the tS-like conformation in different ways, indicating that a PKMt may employ different tS conformations depending on the amino acid sequence of the substrate (linscott et al. 2016;Poulin, Schneck, Matico, McDevitt, et al. 2016;chen et al. 2019).However, the interactions in the SetD2-peptide complex cover only a part of the whole methylation process.this is important, since MD simulations of the free H3K36 and ssK36 peptide in aqueous solution showed that H3K36 in solution adopts an extended conformation, whereas the amino acid changes in ssK36 caused the peptide to preferentially adopt a U-shaped 'hairpin' conformation, which was confirmed by FRet experiments (Figure 6(e)) (Schnee et al. 2022). in steered molecular dynamics simulations, the hairpin conformation was shown to associate faster into the active site of SetD2 and establish more S N 2 tS-like conformations when compared to an extended peptide (Figure 6(F)) (Schnee et al. 2022).Upon binding into the binding cleft, the hairpin structure of ssK36 was shown to unfold and extend, gradually establishing contacts with SetD2 towards the peptide ends (Figure 6(G)).this is in line with FRet experiments and crystal structures of SetD2 presenting the bound peptide in an extended conformation (PDB 5v21, 5JJY, 6vDB (Yang et al. 2016;Zhang et al. 2017;Schuhmacher et al. 2020;Schnee et al. 2022)).A similar behavior was observed for Set7/9, as the enzyme showed an increased activity towards an artificially designed peptide with a disulfide bridge, stabilizing a hairpin conformation (Dhayalan et al. 2011).Related to these findings is that the lysine demethylase JMJD2A/KDM2A requires a bent peptide conformation for catalytic activity (Ng SS et al. 2007).Moreover, substrate conformation might add an additional layer of PKMt regulation.For PKMts with multiple substrates with similar sequences, discrimination between the substrates could be achieved through the individual substrate conformation, e.g. a hairpin or extended conformation.

SET domain autoinhibition
Autoinhibition refers to a mechanism where the enzyme's catalytic activity is suppressed or regulated by one of its own structural elements.Upon stimuli, the autoinhibition can be released, leading to the activation of the catalytic activity.the release of autoinhibition can occur through conformational changes induced by substrate binding, binding of regulatory proteins, or post-translational modifications.Autoinhibition thereby provides a control mechanism allowing enzymes to act in a regulated manner, sensible for cellular signals and environmental conditions. in the case of Set domain-containing proteins like ASH1l (aka KMt2H), NSD1-3, SetD2, and SUv39 family member, autoinhibition is mediated through intramolecular interactions between the Set domain and adjacent regions, such as the post-Set domain or the Set-i (Set insertion or inhibitory Set) domain.Autoinhibition in SMYD enzymes follows a different mechanism and will be discussed in the SMYD enzyme section.

Autoinhibition by the autoinhibitory loop
First analysis of crystal structures (PDB 3OOi) and MD simulations of NSD1 with bound cofactor SAM, but without bound substrate, showed that an autoinhibitory loop (Al) is placed on top of the substrate binding cleft effectively blocking the entrance of a target peptide (Figure 7(A,B)) (trievel et al. 2003;Xiao et al. 2003;Zhang et al. 2003).the Al connects the Set with the post-Set domain, but its sequence is not conserved among Set domain-containing PKMts (couture et al. 2005;Xiao et al. 2005).Mutational studies on the Al of ASH1l showed that stabilizing the closed position of the Al by enforcing hydrophobic interactions (ASH1l K2264l interaction with M2183, c2195, F2197) decreased the methylation activity of ASH1l (Rogawski et al. 2015).Besides influencing the general methylation activity, the Al was speculated to regulate the product specificity of PKMts.For ASH1l, mutational studies found that a Q2265A mutation turns the enzyme from a dimethyltransferase to a trimethyltransferase. Analysis of NMR structures of ASH1l Q2265A showed altered orientations of residues involved in SAM binding and active site shape (An et al. 2011;Rogawski et al. 2015).this indirect effect could be an explanation for different product specificities observed for PKMts despite a high sequence similarity in the active site e.g. at the F/Y-switch position.this model was postulated for NSD1, DiM-5 and SetD2, which have differing product specificities (SetD2 and DiM-5 are trimethyltransferases, NSD1 is a dimethyltransferase) and differing Al residues and lengths, but share a high similarity of active site residues (Qiao et al. 2011).

Placeholder residues
Next to the steric hindrance of substrate binding in the closed conformation, the Al was also found to position residues directly in the active site, e.g.c2062 in case of NSD1 (Morishita and di luccio 2011), c1183 in NSD2 (Jaffe et al. 2013), R1670 in SetD2 (Yang et al. 2016), or S2259 in ASH1l (Yang et al. 2016), which in each case occupy the lysine binding tunnel and overlap with the position of the target lysine in a substrate bound state.these residues apparently function as a placeholder stabilizing the closed conformation.this model is supported by the almost abolishment of methylation activity of ASH1l by the S2259M mutation, where the placeholder residue serine was exchanged for methionine, which can establish stronger interactions with the hydrophobic lysine binding tunnel thereby strengthening the closed conformation of the Al ( Rogawski et al. 2015).interestingly, methionine was not found to be a placeholder residue in any Set domain PKMt indicating that its binding into the active site pocket may be too strong.

Release of autoinhibition by automethylation
One mechanism to relief autoinhibition and bring the PKMt into an active state was discovered for clr4, a PKMt of the SUv39H1/2 family found in S. pombe. in this enzyme, automethylation of the placeholder residue K455 by clr4 led to an increased activity (iglesias et al. 2018).Next to K455, also K472 was shown to be automethylated, which is positioned in the Al of clr4 as well.Automethylation of both residues led to a conformational switch of the Al away from the active site, resulting in increased enzyme activity towards external substrates.A strong increase in activity towards the H3K9 peptide was also observed for the K455A/K472A double mutant.Here, the Al of clr4 was brought into a permanently open position, which led to an elimination of the autoinhibition (iglesias et al. 2018).A clr4 enzyme variant containing a K455M mutation in the Al cannot be automethylated at this site resulting in a strong reduction in methylation activity towards a H3K9 peptide (Khella et al. 2020).Mutating the amino acids around K455 to fit the substrate specificity of clr4 led to a strong increase in automethylation and also increased activity towards the external H3K9 substrate (Khella et al. 2020).

Release of autoinhibition by nucleosome interaction
MD simulations, docking experiments and binding isotope effect (Bie) measurements showcased another mechanism to overcome autoinhibition.interactions with nucleosomes or peptide substrates has been shown to induce conformational changes in PKMts causing a displacement of the Al and exposition of the substrate binding cleft (Qiao et al. 2011;Poulin, Schneck, Matico, McDevitt, et al. 2016;iglesias et al. 2018).these studies were stimulated by data showing that the methyltransferase activity of the NSD1-3 and SetD2 enzymes is greatly stimulated when nucleosomes are used as substrates (li Y et al. 2009).equilibrium Bies indicated that the methyl group of bound SAM in a binary complex with NSD2 experienced steric constraints, which are resolved in the ternary complex with a bound nucleosome (Poulin, Schneck, Matico, Hou, et al. 2016).
For SetD2 a detailed model was developed to describe how this enzyme overcomes autoinhibition in a dynamic manner (Yang et al. 2016). in the Al closed state, the placeholder residue R1670 was observed to occupy the lysine binding tunnel, where the target lysine would normally be placed (PDB 4H12) (Figure 7(c)).However, crystal structures were resolved in which the Al was found in a half-open position (PDB 5Jle). in this state, R1670 was slightly flipped outwards from the active center (Figure 7(D)).Resolved structures with bound substrate show the Al in a fully open conformation where R1670 is completely flipped outwards and becomes fully solvent exposed.Furthermore, the post-Set loop (Q1676-K1703 for SetD2) was captured on top of the bound peptide substrate hydrophobically interacting with the core enzyme (Figure 7(e)).this loop is not resolved in crystal structures without bound peptide, indicating a high flexibility in this state.it is suggested that after substrate binding, the post-Set loop closes like a lid and helps to order the peptide in the binding cleft.A working model can be postulated in which at the start of the reaction, the post-Set loop is not binding to the core enzyme and R1670 is in the placeholder position.Upon interactions with a substrate, R1670 flips outwards, the Al is being lifted and the substrate peptide binds in the binding cleft, while the post-Set loop closes behind it.After methylation, the process is reversed (Zheng et al. 2012;Yang et al. 2016).the complex role of R1670 was highlighted by a strong reduction of SetD2 methylation activity if R1670 is mutated to glycine (Zheng et al. 2012), because a "simple" destabilization of the Al by R1670G should have led to a higher catalytic activity.Additionally, alanine mutants of different Al residues (Q1669A and Y1671A) lead to a disruption of the SetD2 methylation activity, emphasizing the importance of the Al in catalysis (Zheng et al. 2012).important to note is that the biochemical, MD simulation and crystallographic results suggesting this model are based on peptides as substrates.For larger substrate structures like proteins or nucleosomes, the post-Set loop might have a different functionality.
this model was later refined, considering that histone tails are extremely flexible structures and adopt a variety of conformations over time (Potoyan and Papoian 2011;Ghoneim et al. 2021;Huertas et al. 2021).Data from sMD simulations suggested an improved association process of peptides in hairpin conformation with the target lysine placed in the loop region of the hairpin (Schnee et al. 2022).this could indicate that histone tails could also transiently adopt hairpin conformations and thereby facilitate the association process with PKMts.Of special interest are the established first contacts between the tip of the hairpin containing the target lysine and the Al, especially the positively charged lysine interacting with solvent exposed residues of the Al. this might not only function as a probing mechanism to recognize the right target lysine, but may also be the start of the conformational changes of the Al needed to overcome autoinhibition (Figure 8(A)).this model adds to the various roles of the post-Set loop which can also interact with the DNA, thereby weakening contacts with the Al and priming the Al for conformational changes, but further studies are needed for its validation.
Substrate specificity analysis of SetD2 by celluspots array methylation showed that residues around the target lysine are crucial for the SetD2 activity (Schuhmacher et al. 2020).these residues would be first to interact with the PKMt in a hairpin association.considering nucleosomes as substrate, a model for the association between PKMts and histone tails could be envisioned.in this model, the histone tails interact with the nucleosomal DNA, dynamically wrapping around them (Figure 8(B)) (Ghoneim et al. 2021).they may gain conformational freedom after partial unwrapping from the DNA and the increased flexibility may stimulate formation of hairpin conformations, exposing the target lysine and nearby residues, which are recognized by the PKMt.this model is in line with cryo-eM structures of Mll1 (aka KMt2A, All1, cXXc7, HRX, HtRX, tRX1), Mll3 or yeast Set1 in complex with their associated proteins and their nucleosomal substrate (PDB 6KiX, 6KiU, 6UH5 (Hsu et al.

2019; Xue et al. 2019))
. in these structures, the target lysine and amino acids around the target lysine are resolved (A1-R8 for PDB 6UH5) and positioned correctly in the Set domain of the respective PKMt.However, the H3 tail is not completely resolved, but misses the connecting residues K9-K36.connecting the resolved K4 region of the H3 tail with the resolved part in close proximity to the nucleosome creates a picture of a flexible histone tail, with one end wrapped around the nucleosomal DNA, forming a hairpin like stucture which exposes the critical K36 and v35 residues to the PKMt (Figure 12(e,F)).this model is supported by a large number of positively charged residues in the H3 sequence, which could interact with the negatively charged phosphate sugar backbone (18 arginine, 13 lysine, 2 histidine vs.only 7 glutamate 4 aspartate residues in the H3 sequence (Uniprot P68431)).However, additional experimental and simulation work is needed to further investigate these processes.t1150A, has been found in mantle cell lymphoma along with the e1099K mutation (Beà et al. 2013).Multiple biochemical studies highlighted an aberrant high methylation activity of NSD enzymes bearing these mutations (li w et al. 2021;Sato et al. 2021;Khella et al. 2023).li and colleagues solved the cryo-eM structure of the NSD2 e1099K and t1150A mutants bound to a nucleosome (PDB 7cRO), showing a partial unwrapping of the nucleosomal DNA to accommodate NSD2 and the interaction between the Al and the nucleosome facilitating H3 binding (li w et al. 2021).the positively charged lysine residue introduced in the e1099K mutant interacts with the sugar-phosphate backbone of the nucleosomal DNA, which is proposed to be a reason for the increased activity (Figure 9(A)).this model was supported by the finding that NSD2 exhibits weak and nonspecific lysine methylation activity on histone octamer substrates, whereas it strongly and specifically methylates H3K36 when nucleosomal substrates comprising the histone octamer with bound DNA are used (li Y et al. 2009).

Simulation analysis of the effect of disease mutations in PKMTs
An opposing model was presented by Sato and coworkers, who also solved the NSD2 e1099K and t1150A nucleosome bound structure (Sato et al. 2021).MD simulations of the binary NSD2 e1099K or t1150A -SAM complex showed that the Al adopts a more open position in the NSD2 mutant compared to the wt.Hence, e1099K and t1150A were suggested to destabilize the interactions that keep the autoinhibitory loop closed.thereby, the authors explained the increased activity (Figure 9(B-D)).it needs to be noted that the starting conformation of this MD simulation had the Al in a conformation in which the placeholder residue c1183 was already turned outwards.As described earlier, steps before this state are necessary, like the interaction of a substrate with the Al.Still, e1099K/t1150A seemed to destabilize the Al in the closed conformation and support its opening.
Kinetic analysis of both groups revealed that NSD2 e1099K and t1150A have increased catalytic turnover, but no increased nucleosome affinity.A combination of the proposed mechanisms might be necessary to explain the increased activity of the NSD2 cancer mutant, since the destabilizing effect of e1099K and t1150A needs prior conformational changes induced by substrate interactions.these interactions could be enhanced by the improved K1099-DNA contact.
Besides the hyperactivity of NSD2 e1099K and t1150A, it was recently discovered that the product specificity of NSD1 (t2029A) and NSD2 (t1150A) cancer mutants is also altered and that they can trimethylate H3K36 on peptide and protein substrates instead of the mono-or dimethylating activity observed with the NSD1 and NSD2 wildtype enzymes (Khella et al. 2023).this phenomenon cannot be explained by the models presented above.to investigate this further, sMD simulations experiments were performed, which found that the binary NSD2-H3K36me2 peptide complex is unable to bind an additional SAM molecule and establish tS-like conformation (Figure 10(A)) (Khella et al. 2023).contrary, the NSD2 cancer mutant t1150A was able to do so (Figure 10(B)).to rationalize this behavior, MD simulations of the ternary NSD2-SAM-H3K36 peptide complex were conducted, which showed a significantly increased volume of the active site of NSD2 t1150A compared to the NSD2 wildtype (Figure 10(c)).the mechanistic reason for this was supposed to lie in the contacts established by t1150.An H-bond between the hydroxyl group of t1150 and the backbone amide of Y1092 orients Y1092 and restricts the volume of the active site pocket consequently disfavoring trimethylation.in the case of the t1150A mutant, the contact with Y1092 was much less frequently found, which supported trimethylation.in addition, the truncation of the t1150 side chain in the t1150A mutant creates space in the active site of the PKMt supporting higher product methylation levels.Finally, a hydrophobic contact between the t1150 side chain methyl group and the cδ-atoms of l1120 additionally restricted the volume in the active site (Figure 10(c)).likewise, l1120M was already shown to enhance trimethylation activity in NSD2 (Sato et al. 2021), strengthening the important role of l1120 and its interaction with t1150 in the control of the product specificity of NSD2.

NSD1
Sotos syndrome is a childhood overgrowth syndrome associated with intellectual disability (Kurotaki et al. 2002;Douglas et al. 2003).More than 80% of patients with Sotos syndrome carry NSD1 mutations mainly in the AwS and Set domains (tatton-Brown et al. 2005;waggoner et al. 2005;Saugier-veber et al. 2007).Biochemical experiments demonstrated that the R1914c, R2005Q and R1952w mutations caused a greatly reduced H3K36 methylation activity in vitro and the NSD1 mutants Y1971, R1984Q and R2017Q/l were catalytically inactive (Qiao et al. 2011;Khella et al. 2023).Structural analyses and MD simulations suggested that R2017 occupies a central position stabilizing the conformations of three aromatic residues via cation-π interactions, Y1870, Y1977, and F2018, the latter two of them are highly conserved residues in Set domain proteins (Qiao et al. 2011).Structural analysis of the NSD1 homologue NSD2 (PDB 7e8D) showed that NSD2 Y1092 (corresponding to NSD1 Y1971) directly contacts the target lysine analog H3K36M side chain together with two more aromatic residues (F1177 and Y1179).Hence, the Y1971c mutation in NSD1 disrupts this aromatic cage which may be needed for positioning and deprotonation of the target lysine explaining the loss of catalytic activity caused by this mutation (Khella et al. 2023).Adding to this, R1952, R1984, and R2005 are engaged in intramolecular interactions with several D and e residues.the observed Sotos mutation destroy these interactions and thereby they could lead to protein instability and reduced methylation activity (Qiao et al. 2011). in a pure modelling study, the NSD1 i2007F Sotos mutation was investigated by MD simulations showing conformational changes and loss of structural stability due to the bulkier F, which was speculated to lead to loss-of-function of the Set domain (Ha et al. 2016).However, a biochemical characterization of this mutant is still missing.

EZH2
trimethylation of histone H3 on lysine 27 (H3K27me3) is a repressive posttranslational modification introduced by the Set domain-containing PKMt eZH2, which is a subunit of the Polycomb Repressive complex 2 (PRc2) (Millán-Zambrano et al. 2022).various Y641 mutations (Y641e, Y641F, Y641 N, Y641S, Y641c, and Y641H) in the Set domain of eZH2 have been identified as gain-of-function mutations that promote melanoma and B-cell lymphoma (Morin et al. 2010;Sneeringer et al. 2010).Kinetic studies showed that eZH2 prefers unmethylated H3K27 as a substrate and it was much less active on substrates with higher methylation states (H3K27me1 and H3K27me2).conversely, eZH2 Y641 mutations catalyze the conversion of H3K27me2 to H3K27me3 rapidly, but show significantly lower methylation activity for H3K27me0 (Yap et al. 2011;Mccabe et al. 2012;Fioravanti et al. 2018).Another eZH2 mutation, A677G, was identified in B-cell lymphoma and also showed increased levels of H3K27me3 and decreased H3K27me1 and H3K27me2 (Mccabe et al. 2012).Modelling approaches of the Set domain of eZH2 , and its A677G and Y641 mutations uncovered that the mutations alter the lysine binding pocket (Mccabe et al. 2012).Specifically, Y641 formed  (tisi et al. 2016)).C| h-bond network in nsd2 t1150a, in which the contact with d1182 is lost and the al is lifted (image created using simulation results as described in sato et al. ( 2021)).D| h-bond network in nsd2 e1099K where K1099 forms a novel salt bridge with d1125 which is accompanied by decreased salt bridge formation between d1123 and K1152, since d1123 is in a polar connection with K1124 (image created using simulation results as described in sato et al. ( 2021)).
a hydrogen bond network with the ε-amino group of K27 and the backbone carbonyl of F665, which supports the interaction with the unmodified K27. this network was weakened by the Y641 mutations possibly causing the shift in substrate preference from unmethylated K27 to mono-and dimethylated K27.Moreover, Y641 mutations create additional space in the active site which further supports generation of higher methylation states as observed for the F/Y-switch position for other Set domain PKMts.A similar mechanism was observed for A677G, where the missing alanine methyl group created a more spacious active site directly promoting the generation H3K27me2/3 (Mccabe et al. 2012).
the eeD protein is another essential subunit of the PRc2 complex and functions as a methyl lysine reader protein of the wD40 family (cao Q et al. 2014).through the binding of its aromatic cage to H3K27me3, eeD functionally stimulates PRc2 activity.eeD i363M is a loss-of-function mutation which has been identified in myeloid disorders such as myelodysplastic syndrome (Suh et al. 2019).the PRc2 activator protein JARiD2 was found in structural and MD simulation studies to bind eeD and eZH2 effectively stabilizing a helix in a region of eZH2, the so called stimulation-responsive motif (SRM, residues 143-153) adjacent to the catalytic Set domain (Justin et al. 2016;lee et al. 2018).the SRM helix is supposed to bind to the eZH2 Set-i domain decreasing its occupancy of the substrate-binding channel, thus maintaining eZH2 in its catalytically active conformation (Suh et al. 2019). in MD simulations, it was observed that the interactions responsible for this activator function of JARiD2 were decreased by the eeD i363M mutation (Suh et al. 2019).Specifically, the salt bridges between JARiD2 R115 and eZH2 D136 and D140 were less frequent in simulations with eeD i363M, eventually leading to an unfolding of the SRM helix preventing the allosteric activation of eZH2, which then reduces eZH2 activity and suppresses the propagation of H3K27me3 repressive histone marks (Ueda et al. 2012;Suh et al. 2019).

Simulations of SMYD enzyme family members
the SMYD enzymes family comprises five PKMts that are known to target histone and non-histone substrates (Spellmon, Holcomb, et al. 2015).All SMYD family members contain a catalytic Set domain with an Figure 10.Besides the increasing the methylation activity, the t1150a cancer mutation causes a change in the product specificity of nsd2 from a di-to a trimethyltransferase. A-B| smd simulations showed that the nsd2-h3K36me2 complex was not able bind sam in a productive conformation, but the t1150a mutant-h3K36me2 complex was.shown are representative snapshots from the smd simulations, where the dashed line indicates the trajectory of the sam molecule.C| Contacts (dashed red lines) established by t1150 (sticks, white) with y1092 and l1120 (sticks, green) lead to a compact active site volume (gray spheres) restricting its activity to dimethylation.these contacts are missing for a1150 (sticks, red) resulting in a more spacious active site (blue spheres).taken and modified from (Khella et al. 2023) with permission.
inserted MYND domain, which is a zinc finger motif known to mediate protein − protein interactions with proline-rich sequences (Boriack-Sjodin and Swinger 2016).crystal structures are known for three of the five family members: SMYD1 (Sirinupong et al. 2010), SMYD2 (Ferguson et al. 2011;Jiang et al. 2011;wang et al. 2011), andSMYD3 (Foreman et al. 2011;Sirinupong et al. 2011;Xu, wu, et al. 2011), but only for SMYD2 and SMYD3 structures with a bound peptide substrate are known.SMYD1-3 have been reported to methylate H3K4 which promotes active transcription (Hamamoto et al. 2004;Abu-Farha et al. 2008;Sirinupong et al. 2010).However, SMYD enzymes do not have an effect on global H3K4 methylation, but appear to impact selective promoter regions (cock-Rada et al. 2012;Medjkane et al. 2012) and they apparently have important roles in the methylation of non-histone protein.SMYD enzymes share a bilobal, clamshell-like architecture.the N-terminal domain (NtD) contains the Set, Set-i, post-Set, and MYND domains, while the α-helical c-terminal domain (ctD) is positioned at the opposing site and shows similarities to tPR (tetratricopeptide repeats) domains despite contrasting sequences (Figure 11(B)) (Spellmon, Holcomb, et al. 2015).the MYND domain interacts with the catalytic Set domain but does not participate in substrate or cofactor binding, as its removal does not affect methyltransferase activity (Sirinupong et al. 2010;Jiang et al. 2011;Sirinupong et al. 2011).Rather, it interacts with other proteins containing a similar proline-rich sequence (liu et al. 2007;Abu-Farha et al. 2008;Abu-Farha et al. 2011) or its positively charged surface contributes to protein-DNA interactions (Hamamoto et al. 2004;Spellmon, Holcomb, et al. 2015).the Set domain can be divided in Set, Set-i, and post-Set domains, which all directly interact with the cofactor SAM and the protein substrates.together the interface of the NtD and ctD forms a large, deep binding cleft for protein substrates. in all structures, conserved residues in the NtD form a narrow channel for the lysine substrate (Spellmon, Sun, et al. 2015). in the following sections, each member of the SMYD enzyme family will be discussed individually, since different, partially opposing, mechanistic details have been observed in MD simulations.Unfortunately, no structural data or simulation results have been reported for SMYD4.

SMYD1
SMYD1 has the largest ctD among the SMYD enzymes and the most exposed substrate binding cleft, based on crystal structure results (Figure 11(c)) (Spellmon, Sun, et al. 2015).the structure of the ctD is well conserved among SMYD enzymes, the only difference among them is an extended helix that resembles a 'handle' in the SMYD1 structure.the function of the unique SMYD1 c-terminal helix is unknown, but its conserved sequence and involvement in the crystal packing suggests that it may serve as a site for protein-protein interaction (Spellmon, Holcomb, et al. 2015).On the other hand, the ctD deletion in SMYD1 resulted in an increased binding and methylation of H3, suggesting that the ctD regulates substrate binding through steric effects (Sirinupong et al. 2010;chandramouli and chillemi 2016).Yet, this assumption is solely based on crystal structures with only the cofactor, but no substrate peptide or protein bound, because to this date no crystal structure of SMYD1 with bound substrate has been published.Homology modeling based on substrate bound SMYD2 or SMYD3 complexes could still give first insights into potential substrate binding modes, due to the high sequence and structural similarity (SMYD1 vs. SMYD2: 30.7% identity, RMSD (backbone atoms) = 3.55 Å based on SMYD1 PDB 3N71 and SMYD2 PDB 3RiB; SMYD1 vs. SMYD3: 31.4% identity, RMSD = 2.48 Å based on SMYD1 PDB 3N71 and SMYD3 PDB 3PDN).However, as seen for other SMYD family members discussed later, the dynamics when interacting with the protein substrate or SAM are crucial for their catalytic mechanism.Homology modeling as a stand-alone approach might therefore not be able to uncover SMYD1 specific mechanisms.Despite the lack of substrate bound structures, it was found that the artificial SMYD1 v214Y mutation leads to an increased binding of H3, potentially due to a tighter lysine channel and additional hydrogen bonds between Y214 and the target lysine (Sirinupong et al. 2010).

SMYD2
Structures of SMYD2 with bound P53 (PDB 3S7F) (Ferguson et al. 2011;Jiang et al. 2011) and eRα (PDB 4O6F (Jiang et al. 2014)) peptides have been determined.Both peptides adopt a similar U-shaped conformation in the active site (Figure 11(B)).Methylation of P53 K370 by SMYD2 reduces the binding efficiency of P53 to promoter genes, thereby repressing the P53 transcriptional activity (Huang et al. 2006;chandramouli and chillemi 2016).the opposite effect was observed when Set7/9 methylates P53 K372, causing an increase in stability and activity of P53 (Marouco et al. 2013).A fine tuning of the P53 transcriptional activity, therefore, is obtained via alternative methylation by SMYD2 on K370 and Set7/9 on K372. the mechanism of the interplay between the two lysine residues was unveiled by MD simulation.Methylation of K372 restricted the conformational freedom of the P53 peptide in the SMYD2 active site.this decreased the accessibility of K370 to SAM, effectively hampering SMYD2 methylation of P53 at K370 (Xu, Zhong, et al. 2011;chandramouli et al. 2019).
the equilibrium of domain opening and closing can be denoted as 'protein breathing' , which refers to the inherent dynamic conformational fluctuations within protein ensembles (Klimpel and Fleischman 1984;Makowski et al. 2008;Mariño Pérez et al. 2022).Protein breathing can have significant implications on the enzymatic function, especially in the context of substrate binding and product release.it can regulate the access of substrates to the active sites or contribute to allosteric regulation.interestingly, SAM binding triggers increased flexibility of the ctD leading SMYD2 to adopt fully open conformations, which then completely expose the substrate binding crevice representing a mechanism to overcome potential autoinhibition (Al-Shar'i and Alnabulsi 2016; chandramouli and chillemi 2016). in the context of a proposed sequential reaction mechanism (wu et al. 2011), SAM binds first and increases the ctD flexibility, thereby allowing the substrate binding site to open fully and facilitating substrate protein binding.the large and flexible substrate binding crevice of SMYD2 .the molecular mechanism for this focuses on the -2 position of the peptide (in relation to the K260 or K831 target sites in maP3K2 or vegFr1, respectively), where maP3K2 carries F258 (cyan, sticks) which creates a hydrophobic patch (blue circle) with smyd3 residues (blue, sticks).this effect is reduced for vegFr1, which has a l258 (dark green, sticks) at this position.moreover, maP3K2 t263 and y264 (cyan, sticks) interact with smyd3 d332 (red, sticks) causing the peptide to bend into a hairpin conformation while vegFr1 contains g834 that does not form a contact with smyd3.
may explain its diverse substrate spectrum (Spellmon, Sun, et al. 2015;chandramouli and chillemi 2016).the potentially opening of the substrate binding cleft was further investigated by MD simulations with subsequent dynamical network analysis of SMYD2 with bound SAM but no protein substrate.these studies revealed that SMYD2 exhibits a negative correlation of its inter-lobe motion.this clamshell-like or hinge motion is caused by a twisting movement of the ctD with respect to the NtD (Figure 11(c)) (Spellmon, Sun, et al. 2015;Al-Shar'i and Alnabulsi 2016).MD simulations of the ternary SMYD2-P53 peptide-SAM or -SAH complex showed that the degree of conformational freedom of the SMYD2 ctD decreases in the presence of the P53 peptide substrate (chandramouli et al. 2019).After catalytic turnover, which is simulated by a replacement of SAM by SAH and using a monomethylated P53 K370 peptide, the ctD exhibited stronger anticorrelation movements with the NtD when compared to the system with unmethylated P53 (chandramouli et al. 2019).this could indicate that the anticorrelated movement of the ctD and NtD is necessary to bind the substrate and to facilitate product release after the methylation reaction.
Biochemical and modeling experiments have addressed the substrate specificity of SMYD2 heading towards the discovery of unknown targets.lanouette and coworkers used modeled SMYD2-peptide complexes to predict residues necessary for methylation (lanouette et al. 2015).By using the solved SMYD2-P53 peptide structure (PDB 3S7F) as a template, they identified a motif as the substrate recognition sequence.SPOt peptide array methylation was used to validate the predicted sequence motif, resulting in a motif, which slightly diverges from the predicted motif.this model was later refined with leucine at the -1 position identified as the most important recognition residue, and aversion of acidic residues at the +1 to +3 site (weirich et al. 2020).Still, the predictive power of the modeling approach was demonstrated, as based on the modeled sequence motif, 4 new SMYD2 substrates were identified (SiX1, SiX2, SiN3B, and DHX15).However, refinements of the method are necessary, because previously known SMYD2 substrates HSP90 and RB would not have been identified.So far, 19 non-histone proteins involved in various biological processes have been identified as cellular substrates of SMYD2 (lanouette et al. 2015; Ahmed et al. 2016;Olsen et al. 2016) Moreover, by using the substrate specificity profile generated from SPOt peptide arrays, 14 additional protein substrates have been described, six of which were more strongly methylated than P53, the best SMYD2 substrate identified until this point (weirich et al. 2020).

SMYD3
Surprisingly, in contrast to the increased flexibility of the SMYD2 ctD upon SAM binding, simulations of SMYD3 revealed that the ctD of SMYD3 became more rigid after SAM binding (chandramouli et al. 2016;Sun et al. 2021).the restricted flexibility of the ctD caused a compaction of the substrate binding cleft, which might have implications on the substrate spectrum (chandramouli et al. 2016).compared to the large substrate spectrum of SMYD2 with 19 known non-histone targets, SMYD3 has been reported to primarily monomethylate H3K4, H4K5, and H4K20 and only 5 non-histone targets: vascular endothelial Growth Factor Receptor 1 (veGFR1), MAP3 Kinase 2 (MAP3K2), estrogen Receptor (eR), Human epidermal Growth Factor Receptor 2 (HeR2), and serine/threonine-protein kinase AKt1 (Mazur et al. 2014;van Aller et al. 2016;Bottino et al. 2020).Hence, SAM binding seems to have opposite effects on the substrate recognition of SMYD2 and SMYD3.whereas the relaxed ctD with bound SAM in SMYD2 leads to a broad substrate spectrum, the SMYD3 substrate spectrum is narrowed due to the more rigid conformation.However, kinetic experiments showed that the peptide substrates do not discriminate between apo-SMYD3 or binary SMYD3-SAM complexes (Fabini et al. 2019;Sun et al. 2021).Hence, a random ternary complex mechanism was proposed for SMYD3, in which peptide binding is more frequent before SAM binding (Fabini et al. 2019), which is different from the ordered mechanism proposed for SMYD2 as described above.Moreover, chandramouli et al. performed MD simulations with SMYD3 alone and the binary SMYD3-SAM complex, which showed increased solvent access of residues at the target lysine channel (F183, S202, Y239) after SAM binding (Figure 11(D)) (chandramouli et al. 2016).these residues could potentially have roles in catalysis, since mutagenesis experiments showed that F183A and Y239A led to an abolishment of methylation activity (Xu, wu, et al. 2011).
A combination of crystal structures, biochemical experiments and MD simulations was used to rationalize the substrate preference of SMYD3.In vitro methyltransferase assays showed that SMYD3 has a ~14-fold higher activity towards the MAP3K2 peptide compared to the veGFR1 peptide.Methylation of histone substrates (H3K4 and H4K5) was too weak to be detected (Fu et al. 2016).this preference on the one hand was shown to be based on the interaction of the F258 residue at the MAP3K2 -2 position (in relation to the K260 target lysine) (Fu et al. 2016) which is accommodated in a hydrophobic pocket of SMYD3 (l104, l105, l147, v178, i179) (Figure 11(e)).veGFR1 possesses a leucine (l829) at this position, which showed a moderate decrease in binding free energy in 50 ns MD simulations compared to F258. this is speculated to be caused by the weaker hydrophobic interaction and the different shape.Substitution of MAPK3K2 F258 to each of the other 19 natural residues resulted in a decrease of binding free energy in 50 ns MD simulations for each variant (Sun et al. 2019).MAP3K2 F258 and veGFR1 l829 are hydrophobic, whereas a polar arginine residue is found at the corresponding positions in the histone substrates H3K4 and H4K5, explaining the much lower catalytic activity of SMYD3 towards H3K4 and H4K5 (van Aller et al. 2016).the SMYD3 preference for MAP3K2 could be furthermore based on the c-terminal end of the peptides.Structural comparison between the MAP3K2 and veGFR1 showed that the peptides superimpose well, except at the c-terminal end (PDB 5HQ8 for SMYD3-MAP3K2, PDB 5eX3 for SMYD3-veGFR1, Figure 11(e)).there, the residue in veGFR1 (G834) differs from the residue in MAP3K2 (Y264) (Fu et al. 2016).Y264 of MAP3K2 interacts with SMYD3 D332 which may increase activity, while G834 of veGFR1 does not interact with D332 but adopts an extended peptide conformation.

SMYD5
the sequence alignment of SMYD1-5 showed, that SMYD5 lacks the tPR-like ctD, which is conserved in all other SMYD enzymes. in SMYD5 this appears to be compensated by a new subdomain, which occupies a position that overlaps with the top part of the ctD domain, thereby contributing to the formation of the substrate binding site.Since the ctD is involved in the formation of the substrate binding site, this structural difference suggests that SMYD5 might use a different mechanism in the recognition of its substrates than the other SMYD enzymes.Reported substrates of SMYD5 include H2K20 (Stender et al. 2012), H3K36 and K37 (Aljazi et al. 2022) and lysine residues in the Hiv tat protein (Boehm et al. 2023).
No crystal structure of SMYD5 has yet been published, nevertheless the protein structure prediction software AlphaFold has provided insights into the structural features of SMYD5, which was assessed by Zhang and coworkers (Zhang et al. 2022).the AlphaFold structures were validated by RMSD-based comparisons with other crystal structures and their deviation among each other.Additionally, a novel inter-residue distance map (iRDM)-based metric was developed to validate the predicted AlphaFold models.iRDM comparisons are superposition independent, because each residue is specified by its distance to all other residues and every residue is involved in specifying all other residues.A structural distance network is created precisely describing the overall architecture of the protein.it is claimed, that both metrices (classic RMSD and iRDM) suggest that the AlphaFold model of SMYD5 is reliable.
the overall structure of the AlphaFold SMYD5 model resembles a 'crab with two large legs' that are enriched with negatively charged residues (Figure 11(c)).the body of the crab is made up of four conserved domains, including the Set, MYND, post-Set, and Set-i domains, while the crab legs are formed by the structural features unique to SMYD5.For the legs, the thinner one is formed by the poly-e tract that forms a single helical structure, while the thicker one is formed by two large insertions into the MYND domain (M-insertion) and the Set domain (S-insertion).the region between the poly-e tail and the M-and S-insertions forms a negatively charged deep cleft, potentially binding positively charged protein substrates, which are known to interact with SMYD5 (Stender et al. 2012;Zhang et al. 2021).the c-terminal poly-glutamic acid tract (poly-e) and a 30-residue long M-insertion have been shown to regulate the structural stability SMYD5 (Zhang et al. 2021).
An additional N-terminal sequence is unique to SMYD5 and absent in other SMYD members.this sequence was predicted to be a mitochondrial targeting sequence, but is now claimed to function as a novel non-classical nuclear localization signal (NlS) (Zhang et al. 2022).Subcellular localization of SMYD5 was probed by using GFP as a reporter that was tagged to the c-terminus of the protein.it was found that intact SMYD5 is localized to the nucleus in all three cell lines (HeK293, U2OS, and RAw264.7) in an NtD dependent manner (Zhang et al. 2022).

Structural organization of MLL PKMTs
the Mixed lineage leukemia (Mll) family of PKMt enzymes, including Mll1-4 (aka KMt2A-2D), Set1A (aka KMt2F) and Set1B (aka KMt2G), represents the major histone H3K4 methyltransferases in mammals (Rao and Dou 2015).while monomethylation of H3K4 is located preferentially at active enhancers, trimethylation is a mark found at open and potentially active promoters.thus, H3K4 methylation is typically associated with gene transcription (Bochyńska et al. 2018).Numerous studies were conducted to investigate the interaction between the Mll1/2 enzymes and menin because of its pronounced role in acute myeloid leukemia (AMl).Since these studies focus on the development of inhibitors blocking the interaction site between the Mll and menin, they are not included in this review as explained earlier.However, we refer to other publications focusing on this topic (Zhou H et al. 2013;Yue et al. 2016;Xu et al. 2020;Perner et al. 2023).

The MLL-WRAD complex
the Set domains of Mll enzymes are slow or even inactive methyltransferases (luo 2018).For their full catalytic activity, the association of binding partners is necessary: wDR5 (tryptophan-aspartate repeat protein-5), RbBP5 (Retinoblastoma-binding protein-5), ASH2l (Absent, small or homeotic 2-like), and DPY30 (Dumpy-30), forming the so called wRAD complex (ernst and vakoc 2012).cryo-eM structures of Mll1 Set-wRAD visualize how the 5 subunits interact and build a complex, capable of H3K4 methylation (Figure 12(A)) (Hsu et al. 2019;Park et al. 2019;Xue et al. 2019;Ayoub et al. 2022).An example for the drastic difference between the activities of the Mll Set domain alone and Mll Set domain associated with the wRAD complex was shown for Mll1.Here, the activity towards the H3K4 peptide (residues 1-20) increased by 600-fold if the wRAD complex was present (Patel et al. 2009;Del Rizzo and trievel 2011).A mechanism for this effect was proposed by Southall and coworkers, who solved the crystal structure of the Mll1 Set domain complexed with SAH and the H3K4me peptide (PDB 2w5Z).the Mll1 Set structure was then compared to the DiM-5 Set domain complexed with the H3K9 substrate peptide (PDB 1PeG (Southall et al. 2009)).it was observed that in DiM-5 the Set-i motif is oriented in close proximity to the Post-Set motif, forming a narrow, well-defined lysine binding channel that constrains the motion of the K9 side chain to promote its methylation.contrary to this closed conformation, the Set-i motif of Mll1 is shifted away from the Post-Set motif, resulting in an open active site that is unable to optimally orient the K4 side chain for methylation (Southall et al. 2009).
Based on this observation, it was hypothesized that the interaction of the wRAD subunits with the Set-i motif of Mll1 may induce the closure of the active site (Southall et al. 2009;Del Rizzo and trievel 2011).indeed, a combination of MD and QM/MM simulations later confirmed that the association of Mll3 Set with ASH2l and RbBP5 (Mll3 only needs theses subunits for activity (Kwon et al. 2020)) impacts the deprotonation and methyl group transfer via interactions with the Set-i motif and the SAM binding site (Miranda-Rojas et al. 2021).Salt bridges between Mll3 Set and RbBP5 (R4806-e347, R4812-D353, R4845-e338, R4864/ K4867-e341 in PDB 5F6K) restricted the conformational space of two regions in the Set domain.the first region is the interaction network in the active site including Y4800, v4824, Y4825 and Y4884.these residues were shown to be particularly critical for lysine deprotonation (Miranda-Rojas et al. 2021).the second restricted conformational space was shown to affect the distance between α-helix B (P4843-Y4846) and the SAM binding site.Both interactions increased the accessibility of active conformations, in which shorter distances between the SAM methyl group and the K4 ε-amino group were observed (Miranda-Rojas et al. 2021).consistent with the simulation results, the Q3867A and R3871A mutations diminished the activity of the Mll1 complex towards H3 peptides, but did not appreciably affect the activity of the isolated Mll1 catalytic domain itself (Southall et al. 2009;Del Rizzo and trievel 2011).Mll1 residues Q3867 and R3871 are located in the Set-i domain and interact with RbBP5.Additionally, it was found that the somatic cancer mutants R3864c and R3841w, which interact with ASH2l and wDR5, are not stimulated by wRAD association (weirich et al. 2017).this presumably leads to changes in the cellular Mll1 methylation activity, because the mutant enzymes have lost their activity control by the wRAD complex.
Of note, the wRAD complex with a catalytically inactive Mll1 Set mutant can by itself methylate H3K4 peptides.However, the reaction is slow and micromolar concentration of SAM, peptides and the complex were necessary to measure H3K4me1 (Patel et al. 2009;Patel et al. 2011;Patel et al. 2014;Shinsky et al. 2014).ASH2l was found to be the responsible subunit to catalyze this weak H3K4 methylation.Because ASH2l has no detectable homology with Set domain or 7BS methyltransferases, it is of great interest to explore the structural features of the cryptic PKMt active site in this enzyme, which can be informative to identify novel PKMts (luo 2018).

Interactions of the MLL-WRAD complex with nucleosomes
Besides the stimulation of the methylation activity of Mll enzymes upon association with the wRAD B| distinct features of the interaction between rbBP5 and the nucleosome like the quad-r residues, a-loop, i-loop.C| the ash2l idr interacts with the nCP thereby facilitating the binding of the complex.D| mll1 set-wrad binds to the nCP (grey) in two distinct binding modes. in the first binding mode, mll1 set (dark green) is positioned at the dyad barely interacting with the nCP (PdB 6Pwv).main interaction sites between mll1 set-wrad complex and nCP are established by rbBP5 and ash2l (wrad colored green). in the second binding mode, rbBP5 (wrad colored red) rotates clockwise and mll1 set (magenta) is positioned on top of the nCP disk, binding to the h3 histone tail (spheres, cyan, PdB 6KiX (Xue et al. 2019)).E| ubiquitination (yellow) of h2BK120 leads to shift towards the second binding mode of the mll1 set (white)-wrad complex (red, PdB 6Kiu (Xue et al. 2019)).F| the yeast mll1 homologue set1 (white) binds an h2BK120 ubiquitinated nCP in the second binding mode (PdB 6uh5).histone tail residues h3a1-h3r8 (cyan) are resolved and the target lysine K4 (pink) is correctly bound in the set domain active site.the h3 tail reaches out from the nCP and is not resolved due to its high flexibility and dynamic behavior.Potentially, it wraps around the nucleosomal dna to establish a stable binding in the set1 set domain.
complex, Mll activity is further stimulated if nucleosome core particles (NcP) are used as substrates instead of the recombinant H3 protein (Park et al. 2019).NcPs consist of the histone protein octamer (H2A, H2B, H3, H4) with typically 147 bp of double stranded DNA wrapped around them. the increased methylation of the NcP substrate indicates that stimulatory interactions must occur between the NcP and the Mll Set-wRAD complex. in a variety of cryo-eM studies, it was found that the Mll1 Set-wRAD complex can bind the NcP in 2 distinct binding modes. in the first binding mode, Mll1 is positioned at the dyad of the NcP (Figure 12(A)) (Park et al. 2019). in this conformation, RbBP5 was found to interact with the DNA through four positively charged arginine residues: R220, R251, R272, R294 (denoted as Quad-R), which made electrostatic interactions with the DNA phosphate backbone (Figure 12(B)) (Park et al. 2019). in addition, RbBP5 interacts with the NcP in this binding mode with its insertion loop (i-loop, D236-e240) and the anchoring loop (A-loop, t193-t198) (Park et al. 2019).the i-loop was placed between the N-terminal tail of histone H4 and the nucleosomal DNA. the A-loop was positioned parallel to the H4 tail, which was placed between the i-/A-loops of RbBP5 and residues l65-D77 of histone H3 (Figure 12(B)).these interactions were biochemically validated in systematic mutational studies, where R-to-e mutations of the Quad-R residues led to a strong reduction in Mll1 Set methylation activity.Similarly, deletion of the i-loop, and to a lesser extent the A-loop, reduced the activity of the Mll1 Set-wRAD complex.to underline the role of these interactions in NcP methylation, the mutations in Quad-R, the i-loop, and the A-loop had no effects if the free H3 protein was used as substrate (Park et al. 2019).
the second major interaction between the Mll1 Set-wRAD complex and the NcP in the first binding mode was mediated by the intrinsically disordered region (iDR) of ASH2l (Park et al. 2019).Positively charged residues (K205/R206/K207) in the ASH2l iDR established contacts with the nucleosomal DNA (Figure 12(c)).consequently, alanine mutations of K205/R206/ K207 reduced the methylation activity of the Mll1 Set-wRAD complex towards the NcP.
in other cryo-eM structures, the Mll1 Set-wRAD complex was found to bind the NcP in a second binding mode (Xue et al. 2019;Ayoub et al. 2022). in this binding mode, the Mll1 Set-wRAD was not positioned at the NcP dyad, but diagonally across the nucleosome disc (Figure 12(e)).while ASH2l mostly retained its position, RbBP5 rotated clockwise.the rotation of RbB5 diminished its NcP interactions present in binding mode 1 (Quad-R, i-loop, A-loop).instead, in this binding mode RbBP5 was found to interact with H2B through a highly conserved amphipathic loop containing l248, v249, N250, R251 (Ayoub et al. 2022). in contrast to binding mode 1, in which the Mll1 Set domain is placed above the nucleosome dyad without making significant contacts with the NcP, in binding mode 2 the Mll1 Set domains heavily interacts with the nucleosome disc face (Ayoub et al. 2022). in particular, hydrophobic interactions between Mll1 Set (M3812, l3814, M3818) and H2A (N73, l85, l108, P109) as well as a charged interaction (Mll1 Set R3821, H2A D72) were observed (Xue et al. 2019;Ayoub et al. 2022).

Influence of H2B ubiquitination on MLL
the catalytic efficiency of the Mll1 Set-wRAD complex in terms of k cat /K m -values is stimulated 2-fold if the NcP substrate is mono-ubiquitinated on histone H2B lysine 120 (H2BK120ub1) (Xue et al. 2019).H2BK120ub1 is a prevalent histone mark that disrupts chromatin compaction and favors open chromatin structures (Kim et al. 2009;Fierz et al. 2011;Xue et al. 2019).cyro-eM structures of the Mll1-wRAD complex with NcP containing H2BK120ub1 showed that the binding is very similar to binding mode 2 described in the previous chapter (Figure 12(D)).Besides contacts characteristic of binding mode 2 without H2BK120 monoubiquitination, the wD40 domain of RbBP5 (R228, t232, e238, e240) directly interacts with H2BK120ub1 residues (l8, t9, R42, Q49, H68, R72) (Xue et al. 2019).it was proposed that these additional contacts stabilize the complex and lead to an enhanced catalytic efficiency (Xue et al. 2019;Kwon et al. 2020).
in addition, a recent structural study on the yeast Set1 complex proposed another mechanism to explain how H2BK120ub1 stimulates H3K4 methylation (Hsu et al. 2019;worden et al. 2020). in yeast, the human proteins wDR5/RbBP5/ASH2l/DPY30 correspond to Swd3/Swd1/Bre2/Sdc1 (worden et al. 2020). in this complex, the arginine-rich RXXXRR motif of Set1 (R901, R904, R908, R909 in PDB 6veN (worden et al. 2020)) makes extensive contacts with the acidic patch of histones H2A and H2B. the RXXXRR motif is located on a helix, which directly interacts with ubiquitin regions e34-P37 and l71-G75.interestingly, this helix is not resolved in cryo-eM structures of Set1 without H2bK120ub1 (PDB 6UGM (Hsu et al. 2019)).this could indicate, that H2bK120ub1 stabilizes this part of Set1, potentially leading to the increased activity.in agreement with this model, the RXXXRR motif was previously demonstrated to be critical for the H2Bub-dependent H3K4 methylation activity of the Set1 complex (Kim J et al. 2013).Additionally, alanine mutations in the RXXXRR motif resulted in a complete loss of the H3K4 methylation activities of Set1A/B, suggesting an evolutionary conserved process (Kwon et al. 2020).However, the RXXXRR motif is conserved only in human Set1A/B but not in Mll1/2/3/4, this mechanism could therefore be specific to Set1A and Set1B complexes (Kwon et al. 2020;worden et al. 2020).

Catalytic role of MLL1 binding modes
As described above, the Mll1 Set-wRAD complex binds the NcP in 2 distinct binding modes. in the first binding mode, Mll1 Set is positioned at the NcP dyad anchored by ASH2l and RbBP5 and it does not directly interact with the NcP. in the second binding mode, RbBP5 rotates clockwise and Mll1 Set establishes contacts with the NcP.Multiple studies have been conducted trying to clarify the relation between these Mll1 Set binding modes and catalytic activities (Hsu et al. 2019;Xue et al. 2019;worden et al. 2020;Ayoub et al. 2022;Rahman et al. 2022).indeed, it is a challenging question to find out which binding mode resembles the catalytically 'active' conformation and arguments for both models have been provided which are presented shortly in the following paragraph.
As mentioned above, ubiquitination of H2BK120 stimulates the catalytic efficiency of Mll1 (Xue et al. 2019;Kwon et al. 2020).cryo-eM structures of Mll1 Set-wRAD and yeast Set1-cOMPASS complexed to a H2BK120ub1 NcP have a high similarity with the second binding mode, establishing similar interactions with the NcP, which could hint that the second binding mode is the catalytically active one.Surprisingly, it was found that Set1-cOMPASS binds to unmodified nucleosomes and H2BK120ub1 nucleosomes with the same apparent affinity, indicating that the H2BK120ub1 interaction does not lead to an improvement in binding affinity (worden et al. 2020).However, it needs to be noted, that cOMPASS engages both the ubiquitinated and non-ubiquitinated nucleosomes in similar fashions, likely because of its pronounced interaction with the H2A/H2B acidic patch (Nakanishi et al. 2008;worden et al. 2020).this could point towards a different binding mode of the cOMPASS complex than of the human Mll complexes (except Set1A and Set1B).
Moreover, mutagenesis of binding mode 2 specific interactions was used to identify their role and importance in catalysis (Ayoub et al. 2022).Hydrophobic interactions between Mll1 Set and H2A are mediated by Mll1 Set A3806-l3814.their respective alanine mutants reduced H3K4 methylation on the NcP (Ayoub et al. 2022).likewise, mutating Mll1 Set M3812/l3814 to alanine also significantly reduced H3K4 methylation on the NcP (Ayoub et al. 2022).However, the described Mll1 Set domain mutants showed similarly reduced methyltransferase activity on recombinant H3, suggesting that they probably function by affecting the intrinsic activity of Mll1 Set instead of specifically disrupting Mll1-NcP interactions.Disruption of other binding mode 2 specific interactions between H2B and the RbbP5 amphipathic loop (l248, N249, N250, R251) by alanine mutations did not affect Mll1 activity on either the NcP or recombinant H3 (Ayoub et al. 2022), which may argue against the hypothesis that binding mode 2 is the catalytically relevant conformation.
in the second binding mode with and without H2BK120ub1, one of the flexible H3 tails was found to be resolved and positioned correctly in the Mll1 Set domain (PDB 6KiX without H2BK120ub, PDB 6KiU with H2BK120ub).this is also true for cryo-eM structures of the yeast homologue Set1-cOMPASS in complex with H2BK120ub1 NcP (PDB 6veN, 6UH5) but not for structures in binding mode 1. the second binding mode could therefore represent a state in which the H3 tail is bend around the nucleosomal DNA, binding the Set domain as a hairpin and forming contacts in a zipper-like fashion as described earlier (Figures 8 and 12(F)).this is only possible, because the Set domain is not positioned at the dyad but leans toward one histone tail, suggesting that binding mode 2 is most relevant for activity.in a working model, one could envision that the Mll1-Set-wRAD complex binds the NcP in the dyad position (binding mode 1) anchoring with RbBP5 and ASH2l.this is supported by the strong effect of the Quad-R deletion on activity.From this position, the complex could shift into the diagonal (binding mode 2) conformation and individually methylate the H3 tails at H3K4, which is suggested by the resolved K4 residues or analogues in the Mll1 Set domain.this conformation is further stabilized by H2BK120ub1 explaining the increased catalytic efficiency (Rahman et al. 2022).After methylation, the complex detaches or positions itself again in the dyad conformation, which is consistent with a distributive reaction mechanism (Patel et al. 2009).the dynamic switching between the dyad and diagonal conformation could enable locus specific regulation of H3K4 methylation by other chromatin binding proteins (Ayoub et al. 2022) like the transcription factor ell or P-teFB (Smith e et al. 2011) or the chromatin remodeler cHD8 (Subtil-Rodríguez et al. 2014).Previous studies have shown that H3K27me3 and H3K4me3 can be deposited asymmetrically at the same nucleosome, but on opposing H3 tails (voigt et al. 2012).this asymmetric, bivalent modification of H3K27 and H3K4 is believed to be associated with maintaining promoters in a poised state during differentiation (worden et al. 2020).

Product specificity of MLL1 and 3
Mll1 primarily acts as a di-and trimethyltransferase, whereas Mll3 is only capable of transferring a single methyl group to the H3K4 target (Shilatifard 2012;Rao and Dou 2015).Xue and coworkers structurally compared complexes of Mll1 Set-wRAD and Mll3 Set-wRAD bound to an NcP (PDB 6KiU for Mll1, PDB 6Kiw for Mll3) to investigate the reason(s) for their differing product specificities (Xue et al. 2019). in the cryo-eM structure of Mll1, a hydrophobic patch consisting of Mll1 M3777, F3778, F3780, Y3883, F3904 and RbBP5 w329, F332, F336 directly interacts with the SAM binding pocket, which could facilitate SAM binding.the Mll1-RbBP5 interface is part of the RbBP5 'activation segment' (w329-F336).the RbBP5 activation segment has been previously shown to be a stimulator of Mll activity (li Y et al. 2016).likewise, deletion of the activation segments led to a 75% loss in activity of Mll1. in contrast, the described hydrophobic patch is weakened in Mll3, since the involved phenylalanine-residues in Mll1 are replaced by tyrosine in Mll3.this could lead to a reduced SAM binding especially at higher methylation stages.
As described earlier for other Set domain-containing PKMts, tyrosine residues in the active site can modulate the product specificity.Y4884c is a somatic cancer mutation in Mll3. the Y4884 residue is part of an aromatic pocket at the active center of the enzyme.Biochemical data showed that Y4884c has a reduced methylation activity towards recombinant H3 protein and converts Mll3 from a monomethyltransferase with substrate preference for unmethylated H3K4 to a trimethyltransferase with H3K4me1 as preferred substrate (weirich et al. 2015).Moreover, expression of Y4884c has been shown to lead to aberrant H3K4me3 formation in cells (weirich et al. 2015).Adding to this, QM/MM simulation of Mll3 demonstrated that Y4884A is able to di-and trimethylate its target (Blanco-esperguez et al. 2022).However, the phenylalanine mutant Y4884F is inactive, which is also true for Y4800A/F, a tyrosine similarly located in the active site.like for other Set domain-containing PKMts, the additional space created by the Y4884A mutant allowed for the accommodation of additional methyl groups, which could explain the altered product specificity of the somatic cancer mutation Y4884c.However, additional biochemical experiments need to confirm the change of product specificity observed with Y4884A in the simulations, although the agreement between biochemical data obtained with Y4884c and simulation data for Y4884A is quite compelling.

Simulations of 7BS PKMTs
Non-Set domain PKMts belong to the Rossmann-fold-like 7BS family (or class i methyltransferases) with their structural topology featuring a seven-stranded β-sheet connected by α-helices (Figure 3(e)).Besides proteins, small molecules, DNA and RNA are among the substrates of this group of methyltransferases (luo 2018).Set domaincontaining PKMts and the 7BS methyltransferases use different architectures for their catalytic domain, and little is known about the methyl group transfer mechanism in 7BS PKMts regarding conformational changes, tS stabilization, substrate-or product specificities.Nonetheless, information from crystal or cryo-eM structures and MD simulations shed first light on the processes involved in catalysis.7BS PKMts share structural similarities with PRMts, which also belong to class i methyltransferases (Richon et al. 2011).PRMts have been investigated in multiple MD simulation studies (Zhang R et al. 2013;Zhou R et al. 2015;Gathiaka et al. 2016).Mechanisms discovered in PRMt, especially conformational changes, could potentially hint towards similar mechanisms in 7BS domain PKMt, given a high sequence and structural similarity, but this conjecture needs to be investigated more closely in the future.

DOT1L
the full-length DOt1l (disruptor of telomeric silencing 1-like) protein transfers up to three methyl groups to H3K79 on nucleosomes marked by H2BK120ub1 (chandrasekharan et al. 2010).DOt1l represents the most intensively studied 7BS PKMt, because of its important role in different forms leukemia (Mclean et al. 2014;Spangler et al. 2022;Yi and Ge 2022).AF10 (aka Mllt10) can bind to DOt1l and alter the specificity of DOt1l toward the dimethylated product.Product specificity is thereby regulated through binding partners potentially causing allosteric effects (ibáñez et al. 2010).Notably, fusion proteins can cause leukemia by using their AF region to recruit DOt1l to Mll-targeted genes leading to inappropriate activation of these genes in the affected cells (Feng et al. 2002;Sarno et al. 2020).MD simulations of the DOt1l-AF10 (PDB 6JN2) complex described the interplay as a coiled-coiled structure and found multiple residues responsible for the interaction of both proteins (Stodola et al. 2021).Because of the frequent association with leukemia, multiple studies involving molecular docking and virtual screening approaches have been conducted to develop inhibitors for DOt1l (Yao et al. 2011;Basavapathruni et al. 2012;Raj et al. 2015;Boriack-Sjodin and Swinger 2016;chen et al. 2016;luo et al. 2016;luo 2018;chen and Park 2019;Khirsariya et al. 2022;Flores-león et al. 2023).the presented results show valuable information about conformational changes and altered activities of DOt1l.However, they describe the interaction of artificial molecules with DOt1l and are therefore not included in this review.
Another difference is found at the target lysine channel, where Set domain PKMts show well-tuned hydrophobic tunnels to orient the ε-amino group.
DOt1l also uses a S N 2 reaction mechanism for the methyl group transfer.Yet, no similar aromatic-rich pocket could be mapped for class i PKMts such as DOt1l.these PKMts therefore likely adopt different strategies to engage their substrates for catalysis (Min et al. 2003;luo 2018).this is especially important regarding the mechanistic requirement for deprotonation of the target lysine.Residues in DOt1l located close to the target lysine are incapable of performing this deprotonation and there is no evidence suggesting the formation of a water channel similar to the one in Set-containing PKMts (luo 2018).Alternative explanations proposed that the more hydrophobic active site reduces the pK a of the target lysine and that the carboxylate of the SAM plays a role in the lysine deprotonation (Min et al. 2003;cheng et al. 2005;cortopassi et al. 2016).
DOt1l methylates H3K79, but biochemical studies showed that histone H3 alone is a poor substrate for DOt1l, suggesting that DOt1l requires activators for full catalytic efficiency (Feng et al. 2002;lacoste et al. 2002;Ng, Feng, et al. 2002).cellular and in vitro experiments uncovered that efficient methylation of H3K79 in nucleosomal substrates is only achieved if H2BK120 is ubiquitinated (Briggs et al. 2002;Ng, Xu, et al. 2002;Kim et al. 2005;valencia-Sánchez et al. 2019).However, electrophoretic mobility shift assays and structures of H2BK120ub1 and unmodified nucleosomes suggested, that the binding affinity of DOt1l is not altered by the ubiquitination (valencia-Sánchez et al. 2019).One explanation for the DOt1l activity enhancement by H2BK120ub1 could be the direct interaction of DOt1l with ubiquitin as observed in cryo-eM structures (e.g.PDB 6NQA, 6NN6), where a hydrophobic patch consisting of DOt1l i290, l322 and F326 interacts with ubiquitin i36, l71 and l73 (Figure 13(c)) (Anderson et al. 2019;valencia-Sánchez et al. 2019).Similar observations were made for the interaction of Mll1 with H2BK120ub1 as described earlier.Biochemical studies found that disruption of these contacts by i290D, l322D and F326A mutations in DOt1l heavily decreased the in vitro methylation activity of DOt1l (valencia-Sánchez et al. 2019).MD simulations found additional polar interaction between DOt1l residues e323, K330 and ubiquitin (Stodola et al. 2021).Previously reported comprehensive mutagenesis studies of the ubiquitin surface identified l71 and l73 as necessary for DOt1l activation (Holt et al. 2015).together these results show that direct interactions between DOt1l and ubiquitin are necessary for DOt1l to deploy its full catalytic potential.However, the exact mechanisms on how these interactions increase the formation of tS conformations remain unclear.likely conformational changes are most relevant, because the binding affinity is not influenced.
An additional non-catalytic effect of the DOt1l-H2BK120ub1 crosstalk describes the destabilization of the nucleosome.cryo-eM structures showed a detached DNA from the histone octamer with the degree of the detachment greater for the complex of DOt1l-H2BK120ub1 compared to DOt1l-H2B unmodified.interestingly, in the structure without H2BK120ub1, the distance between the H3K79 and SAM was 25 Å, whereas it was only 20 Å in the structure with H2BK120ub.while this is still too far for a methylation reaction, it could indicate that H2BK120ub1 changes the binding mode between DOt1l and the nucleosome into a more productive conformation (Jang et al. 2019).Single-molecule FRet experiments additionally showed that DOt1l binding to nucleosomes with H2BK120ub1 destabilized the nucleosome structure, potentially leading to conformational changes facilitating methylation of H3K79 (Jang et al. 2019).(Couture et al. 2008)).B| in the 7Bs PKmt dot1l, sam is positioned in an extended conformation, with the nicotine amid ring solvent exposed (PdB 1nw3).C| dot1l requires interaction with h2BK120ub1 (yellow) and the h4 tail (purple) for full catalytic efficiency (PdB 6nQa (worden et al. 2019)).shown are the residues of dot1l (green, sticks) interacting with the ubiquitin (yellow, sticks), as well as the residues of dot1l (white and magenta, stick) interacting with the h4 tail (purple, sticks).the h3 protein with the target K79 is shown as sticks in cyan, whereas the other histone proteins are colored purple.
Besides the interaction with ubiquitin, DOt1l was observed to interact with the basic residues of the H4 tail via D28, e123, N126, e138, K300, S304 and K308 (valencia-Sánchez et al. 2019;Stodola et al. 2021).However, the DOt1l-H4 tail interaction surface was relatively small, suggesting that the H4 tail binding might only be transiently facilitating a rapid association and dissociation from the nucleosome (Stodola et al. 2021).this model is supported by the observation that not all cryo-eM structures captured DOt1l-H4 tail interactions (Anderson et al. 2019).Related to this, DOt1l was shown to be allosterically stimulated by acetylation of H4K16 (valencia-Sánchez et al. 2021).cryo-eM structures of DOt1l bound to a nucleosome containing H4K16ac showed that the enzyme was bound predominantly in the catalytic conformation while different conformations were observed on a nucleosome lacking H4K16ac (valencia-Sánchez et al. 2021).Based on this it was proposed that acetylation of the H4 tail restricts the conformational sampling space of Dot1l, resulting in a more active complex conformation and increased catalytic activity (valencia-Sánchez et al. 2021).An additional anchor point between DOt1l and the nucleosome is the acidic patch, where DOt1l R278, N280, R282 establish polar contacts with H2B Q47, e113 and H2A e56, e64, N68 (Anderson et al. 2019).Hence, multiple interactions between DOt1l and the NcP regulate the methylation activity of DOt1l on H3K79 in a concerted manner.

Experiments with other 7BS PKMTs
Besides DOt1l, several other 7BS PKMts have demonstrated lysine methylation activities, including Mettl10, Mettl13, Mettl20, Mettl21A, Mettl21B, Mettl21c, Mettl21D, Mettl22, eeF1A-KMt1, eeF2-KMt, and caM-KMt (luo 2018;Falnes et al. 2023), but so far they were not subjected to MD simulation experiments.However, recently a novel 7BS methyltransferase, denoted Rv2067c, was discovered in Mycobacterium tuberculosis.Rv2067c was found to be secreted into the host macrophages, trimethylating host H3K79 in a non-nucleosomal context (Singh et al. 2023).the crystal structure of Rv2067c was solved, revealing that Rv2067c and DOt1l markedly differ in their amino acid sequence, 3D structure, domain composition, architecture, and oligomeric state, as Rv2067c was present as a dimer (Singh et al. 2023).interestingly, in vitro methylation experiments showed that Rv2067c is more active on histone protein substrates compared to nucleosomal substrates, contrary to DOt1l, which is only active on nucleosomal substrates as described above (Singh et al. 2023).this difference was rationalized by MD simulations using the solved crystal structure of Rv2067c (Singh et al. 2023).Firstly, while the active site of DOt1l is a shallow groove, large enough to easily accommodate SAM and H3K79 on the surface of the large NcP, Rv2067c binds the peptide substrate deeply inside the protein.
According to MD simulations and volume calculations, the active site of Rv2067c is able to accommodate the free H3 peptide or the corresponding part of the H3 protein, but not more bulky substrates like the NcP (Singh et al. 2023).Secondly, DOt1l engages the H2BK120Ub1 nucleosome multivalently as described earlier but all structural elements involved in these contacts are missing in Rv2067c. the inability of Rv2067c to interact with nucleosomes was further demonstrated in docking experiments attempting to dock Rv2067c onto a NcP from various directions.this rotational scan revealed a minimal number of 211 atoms involved in atomic clashes even in the most favorable Rv2067c-NcP complex model.Moreover, in the corresponding complex, the NcP approaches the Rv2067c active site from a wrong direction, the SAM entry site instead of substrate-binding site, which anyway would not be compatible with catalytic activity.
Finally, the human methyltransferase-like protein 13 (Mettl13, aka eeF1A-KNMt and FeAt) is relevant for this review, because of its unique feature of having a double substrate specificity.Mettl13 was found to target the α-amino group of the N-terminus (Nt) of proteins and the side chain ε-amino group of K55 in the eukaryotic translation elongation factor 1 alpha (eeF1A) (Jakobsson 2021b).this dual reactivity is based on the presence of two independent 7BS domains, one at the N-terminal part (Mt13-N) and one at the c-terminal part (Mt13-c).In vitro experiments with purified Mt13-N and Mt13-c domains have demonstrated that Mt13-N is responsible for dimethylation of K55 (Jakobsson et al. 2018;liu et al. 2019) and Mt13-c is responsible for trimethylation of the protein Nt (Jakobsson et al. 2018).Methylation of a lysine side chain and the protein Nt are biochemically similar, both occurring on primary amino groups.However, the protein Nt has a pK a close to physiological pH whereas a lysine side chain typically has a pK a above 10 ( Grimsley et al. 2009).consequently, the Nt is chemically more reactive under physiological conditions suggesting some mechanistic differences in both reactions which still need to be investigated in more details.Moreover, the functional and biological relevance of combining two active 7BS domains with different specificities in one enzyme in the case of Mettl13 needs further investigation.

Conclusions
Protein lysine methyltransferases have essential regulatory roles in biology.Understanding their cellular functions critically depends on understanding the underlying mechanistic principles of their catalytic machinery.this includes substrate and product specificity, processivity, autoinhibition and other regulatory principles including the formation of complexes with other proteins.Major achievements have been made in understanding these prosses, facilitated by a combination of biochemical, structural and molecular simulation studies (MD and QM/MM).As described in many examples in this review, simulation science benefits from an increasing amount of structural data and computational power.conversely, biochemists can use simulation predictions to design new experiments leading to a powerful synergy.Moreover, simulations can allow to identify dynamic conformations at an atomic resolution which is not possible by any other technology.the resulting interdisciplinary approach considers multiple scales, conditions and is capable of precisely describing how PKMts transfer methyl groups to their targets and thereby influence the cellular metabolism.this methodological interplay provides a promising avenue for the mechanistic understanding of PKMts and other families of related enzymes.

Figure 1 .
Figure1.A| Protein lysine methyltransferases (PKmts) transfer up to three methyl groups to specific lysine residues in proteins.the cofactor s-adenosyl-l-methionine (sam) provides the methyl group.it is released after the transfer as s-adenosyl-l-homocysteine (sah).B| the protein substrate (cyan) and sam (orange, methyl group is colored black) bind at opposing sites of the set domain (grey) in set domain PKmts.the target lysine (pink) is inserted into a narrow tunnel, where the lysine is deprotonated and oriented for the methyl group transfer (image created using simulation results of PdB 6vdB (schuhmacher et al. 2020)).C| the methyl group is transferred using a bimolecular nucleophilic substitution (s n 2) mechanism, in which multiple geometric criteria need to be fulfilled to reach the transition state.

Figure 3 .
Figure 3. Cartoon representation of set and 7Bs domain PKmt architectures.A| set domain-containing PKmt g9a complexed with the h3K36 substrate peptide (cyan with the target lysine in pink), and cofactor sam (orange, PdB 5Jiy (Jayaram et al. 2016)).set domain-containing PKmts incorporate Zn-ions for structural stability in their aws (associated with set, magenta), post-set (yellow) or mynd (rose) domain depending on the enzyme (dillon 2005, wu et al. 2011).however, the Zn-ions are not involved in catalysis or conformational changes and they are not shown explicitely in protein structures presented in this review.B| setd2 complexed with the h3K36 substrate peptide, and cofactor sam (PdB 5JlB (yang et al. 2016)).the autoinhibitory loop (rose) is in an open position to accommodate the protein substrate.C| mll1 set domain (white) associated with wdr5 (green), rbBP5 (light blue), ash2l (rose) and dPy30 (cyan) bound to a nucleosome core particle (PdB 6Pwv (Park et al. 2019)).mll1 set domain complexed with the h3K4 peptide (cyan, wiht the target K in pink) (PdB 6uh5 (hsu et al. 2019)).D| smyd2 complexed with the erα substrate peptide (PdB 4o6F (Jiang et al. 2014)) showing the the bilobal or clamshell-like structure and the mynd domain.E| 7Bs PKmt dot1l and cofactor sam (PdB 1nw3 (min et al. 2003)).the architecture consists of a dot1l specific region (magenta), a seven-beta sheet rossman fold (white) and a ubiquitin interaction region (green).

Figure 4 .
Figure 4. A| structure representation of lysine and lysine analogues and their capability to function as PKmt methyl group acceptors.B| target lysine deprotonation is obligatory for the PKmt catalyzed methyl group transfer.the protonated target lysine (pink, sticks) is oriented by set7/9 y335 (white, sticks), the water channel is already present (red spheres, prepared using PdB 1XQh (Chuikov et al. 2004)).C| the lysine proton is transferred to the nearby water molecule.D| after lysine deprotonation, the sam-methyl group is rapidly transferred to the deprotonated target lysine thereby preventing the reprotonation.the excess proton is transferred into the bulk solvent.

Figure 5 .
Figure5.PKmts transfer a defined number of methyl groups to their lysine target (pink, sticks).A| schematic representation of a free energy profile showing a possible first methyl transfer (black line) and an energetically unfavorable second transfer (red line).B| restricted second methylation because of a disrupted water channel and blocked lysine deprotonation of monomethylated target lysine (green, sticks, PdB 1XQh).C| the s n 2 ts cannot be adopted if a monomethyl substrate is present.D| the F/y-switch position controls the product specificity of certain PKmts.Phenylalanine (white, sticks) provides more space in the active site allowing to accommodate a dimethyl product (pinks, sticks), while the additional hydroxyl group of a tyrosine leads to sterical clashes preventing formation of the dimethylated product.E| Position of the tyrosine residues 245, 305, 335 in the set7/9 active site discussed in the main text.

Figure 6 .
Figure6.setd2 has a ~ 100-fold higher activity towards an artificially designed ssK36 peptide substrate.A| sPot peptide array with the 15-residue long h3K36 peptide sequence as starting sequence, incubated with setd2 and radioactively labeled sam.Positions are individually mutated to any other amino acid except tryptophan and cysteine.at several positions non-natural amino acids are preferred in the substrate peptide.B| Quantification of the peptide array methylation data generates a PKmt specific specificity profile showing the preference for each position.C| Combination of preferred residues led to the generation of a super-substrate (ssK36) peptide (black, sticks) sequence, which differs at 4 positions (orange, sticks) from the canonical h3K36 peptide sequence (cyan, sticks).D-G| md and smd simulation snapshots of setd2 (white, cartoon) interacting with the ssK36 peptide (black, cartoon).D| Crystal structures show the ssK36 peptide in the setd2 binding cleft, where special interactions of setd2 with ssK36 residues are observed (prepared using PdB 6vdB as template (schuhmacher et al. 2020)).E| md simulation of the h3K36 and ssK36 peptides in solution with subsequent backbone conformation-based clustering show that the ssK36 peptide preferably adopts a hairpin-like conformation with the target lysine facing outwards.the h3K36 peptide (cyan, cartoon) prefers an extended conformation.F| in smd simulations, hairpin-like shaped peptides docked more often successfully in the setd2 binding cleft, establishing more ts-like conformations.G| after binding, the hairpin conformations unfold in an extended conformation with contacts spreading gradually from the middle section.A-C| taken from(schuhmacher et al. 2020; schnee et al. 2022)  with permission.

Figure 7 .
Figure 7. in the binary PKmt-sam structure, the placeholder residue occupies the target lysine channel.A| alignment of the autoinhibitory loops (al) of PKmts with known placeholder residue (bold).B| Cartoon and sticks representation of ash1l (PdB 4ynm (rogawski et al. 2015)) with s2259 acting as the placeholder residue and the al in closed position.a target lysine taken from a superposition is shown in fade violet in the target lysine channel to visualize the overlap with the placeholder residue.sam is shown in orange and the methyl group in black.C| in setd2, the placeholder residue r1670 (rose, sticks) can adopt multiple conformations.if no peptide is bound, the al is in a closed position and r1670 occupies the target lysine channel (PdB 4h12 (Zheng et al. 2012)).D| in a half-open position, the al starts to lift and r1670 turns outwards (PdB 5Jle (yang et al. 2016)).E| if a peptide substrate (cyan, target lysine in pink) is bound, the al is in an open position and r1670 is solvent exposed.the post-set loop (yellow, cartoon) is closed on top of the bound peptide (PdB 5JlB).

NSD2
the NSD2 e1099K mutation has been found frequently in patients with acute lymphoblastic leukemia and other types of cancer(Jaffe et al. 2013;Oyer et al. 2014).NSD2

Figure 8 .
Figure 8. Possible mechanism of setd2 histone tails binding.A| hypothetical strucutre of the histone tail (modified from PdB 7ea5 (liu et al. 2021)) interacting with the dna and forming a hairpin-like conformation with the target lysine facing outside.B| First contacts between h3K36 and h3v35 and the setd2 al (modified from PdB 6vdB) could trigger conformational changes in the al and r1670 opening the setd2 binding cleft.

Figure 9 .
Figure 9. nsd2 cancer mutants e1099K and t1150a result in an increased methylation activity.A| Crystal structure of nsd2 e1099K, t1150a (sticks, red) complexed to a nucleosome binding the h3 tail (cartoon, cyan, PdB 7e8d (sato et al. 2021)).e1099K binds the nucleosome dna backbone (cartoon, orange), which was suggested to be the reason the increased activity.B-C| mechanism proposed for the increased activity of t1150a and e1099K by remodeling the h-bonds around the al (cartoon and sticks, rose) resulting in an easier uplifting of the al. the placeholder residue C1181 is in an open position.B| h-bond network in nsd2 wt (PdB 5lsu (tisi et al. 2016)).C| h-bond network in nsd2 t1150a, in which the contact with d1182 is lost and the al is lifted (image created using simulation results as described in sato et al. (2021)).D| h-bond network in nsd2 e1099K where K1099 forms a novel salt bridge with d1125 which is accompanied by decreased salt bridge formation between d1123 and K1152, since d1123 is in a polar connection with K1124 (image created using simulation results as described in sato et al. (2021)).

Figure 11 .
Figure 11.A| alignment of the different domains in smyd enzymes.B| Cartoon representation of smyd2 complexed with the 10 amino acid long peptide erα in a u-shape conformation (PdB 4o6F (Jiang et al. 2014)).different domains are colored as in the alignment.C| Cartoon representation of smyd enzymes 1-5. the color code is as in the domain alignment: sam is colored orange, and shown as sticks.smyd1 (PdB 3n71 (sirinupong et al. 2010)) is shown in an open Ctd (pink) conformation.smyd2 (PdB 3riB (Xu, Zhong, et al. 2011)) is shown in an intermediate Ctd (magenta) position.smyd3 (PdB 3Pdn (sirinupong et al. 2011)) in a closed Ctd (red) conformation.Black arrows indicate the directions the Ctd is expected to move based on the starting conformation displayed.smyd4 (alphaFold model based on uniProt Q8iyr2) with the characteristic tPr domain (white).smyd5 (alphaFold model based on uniProt Q6gmv2) in which part of the Ctd is replaced by a poly e-tail (red).D| results of crystal structures of with smyd3.Binding of sam (orange, sticks) into its binding pocket in smyd3 causes residues F183, s202 and y239 (green, sticks) to have increased solvent accessible area (PdB 3ru0 (Foreman et al. 2011)).F183, s202 and y239 are part of the target lysine channel.E| smyd3 shows an increased activity towards the maP3K2 peptide (cyan) compared to the vegFr1 peptide (dark green, PdB 5hQ8 (van aller et al. 2016) and 5eX3 (Fu et al. 2016), modified).the molecular mechanism for this focuses on the -2 position of the peptide (in relation to the K260 or K831 target sites in maP3K2 or vegFr1, respectively), where maP3K2 carries F258 (cyan, sticks) which creates a hydrophobic patch (blue circle) with smyd3 residues (blue, sticks).this effect is reduced for vegFr1, which has a l258 (dark green, sticks) at this position.moreover, maP3K2 t263 and y264 (cyan, sticks) interact with smyd3 d332 (red, sticks) causing the peptide to bend into a hairpin conformation while vegFr1 contains g834 that does not form a contact with smyd3.

Figure 12 .
Figure12.structural organization of the mll1 set-wrad complex.A| structure of the mll1 set domain (white) associated with wdr5 (green), rBBP5 (light blue), ash2l (rose) and dPy30 (cyan) bound to a nucleosome core particle (nCP, purple) (PdB 6Pwv).B| distinct features of the interaction between rbBP5 and the nucleosome like the quad-r residues, a-loop, i-loop.C| the ash2l idr interacts with the nCP thereby facilitating the binding of the complex.D| mll1 set-wrad binds to the nCP (grey) in two distinct binding modes. in the first binding mode, mll1 set (dark green) is positioned at the dyad barely interacting with the nCP (PdB 6Pwv).main interaction sites between mll1 set-wrad complex and nCP are established by rbBP5 and ash2l (wrad colored green). in the second binding mode, rbBP5 (wrad colored red) rotates clockwise and mll1 set (magenta) is positioned on top of the nCP disk, binding to the h3 histone tail (spheres, cyan, PdB 6KiX(Xue et al. 2019)).E| ubiquitination (yellow) of h2BK120 leads to shift towards the second binding mode of the mll1 set (white)-wrad complex (red, PdB 6Kiu(Xue et al. 2019)).F| the yeast mll1 homologue set1 (white) binds an h2BK120 ubiquitinated nCP in the second binding mode (PdB 6uh5).histone tail residues h3a1-h3r8 (cyan) are resolved and the target lysine K4 (pink) is correctly bound in the set domain active site.the h3 tail reaches out from the nCP and is not resolved due to its high flexibility and dynamic behavior.Potentially, it wraps around the nucleosomal dna to establish a stable binding in the set1 set domain.

Figure 13 .
Figure13.different positioning of sam in the cofactor binding sites of set domain and 7Bs PKmts.A| in set domain-containing PKmts, sam is positioned in a bent information, with the sugar ring hydroxyl groups solvent exposed (PdB 3F9w(Couture et al. 2008)).B| in the 7Bs PKmt dot1l, sam is positioned in an extended conformation, with the nicotine amid ring solvent exposed (PdB 1nw3).C| dot1l requires interaction with h2BK120ub1 (yellow) and the h4 tail (purple) for full catalytic efficiency(PdB  6nQa (worden et al. 2019)).shown are the residues of dot1l (green, sticks) interacting with the ubiquitin (yellow, sticks), as well as the residues of dot1l (white and magenta, stick) interacting with the h4 tail (purple, sticks).the h3 protein with the target K79 is shown as sticks in cyan, whereas the other histone proteins are colored purple.