Automated flow synthesis of algorithmically designed ferroelectric nematogens

ABSTRACT The synthesis of the ferroelectric nematic material RM734 is telescoped into a single continuous flow process, utilising inline liquid–liquid separation to remove reactants and by-products from the reaction stream. Following the final synthetic step, we subject the semi-crude material to offline automated chromatography. Materials can be prepared on a gram scale (12 minutes) and chromatographically purified (8 minutes) in around 20 minutes. Over a given time interval the system described herein can prepare a far larger number of materials (on gram scale) than an experienced chemist. Secondly, we implement an established process for generation of new molecular structures via a genetic algorithm, which generates many new candidate structures per iteration. A filter removes undesirable structures, and a fast semi-empirical QM calculation is performed on those that remain. A neural network trained against a library of nematic and ferroelectric nematic materials is used to rank the candidate structures based on the QM output and a fingerprinting process, with the highest scoring candidates then used to seed a new round of mutations. We combine these two approaches; we generated and ranked several tens of thousands of variants of RM734 and synthesised a small library of materials using our continuous flow protocol. Graphical abstract


Introduction
The ferroelectric nematic (N F ) phase is the most recently discovered nematic polymorph, in which the orientational order of the 'classical' nematic phase is augmented with polar order [1][2][3][4][5], endowing this phase and the materials that exhibit it with remarkable properties not easily replicated by other systems [6][7][8][9].
The design of new molecules and materials that exhibit the N F phase is in its infancy [10][11][12][13][14][15][16][17][18][19][20][21][22]; we know little about how molecular structure generates this phase, and the majority of materials known to exhibit it are derivatives of either RM734 [11] or DIO [23]. The pool of known materials remains small and grows slowly, limiting the pull towards applications.
We considered that simply increasing the production rate of new materials is a logical way to increase the rate of discovery of N F materials (accelerated serendipity). The productivity of the synthetic chemist is constrained not by their creativity, nor their capacity to design materials, but rather the sheer time taken to set-up, execute, and purify a chemical reaction (typically 1-48 h). This time can be extended greatly by troublesome emulsive liquid-liquid extractions, complex chromatographic separations, and the need to maintain inert (e.g. argon, nitrogen) or reactive (e.g. hydrogen, oxygen) atmospheres, among many other variables. Most (if not all) of these reactions are conducted in batch, with multiple synthetic steps each being subjected to isolation and some form of purification. Together, these factors conspire to present many challenges in scaling up the synthesis of larger quantities of materials.
Continuous flow systems offer a simplified route to scale up as well as simplifying the synthesis of libraries of materials, and as such, and are widely employed in academic and industrial chemistry, yet are largely absent from the liquid crystal literature save for a few examples [24][25][26][27].
Then, we come to the topic of the chemical libraries themselves. For a multitude of reasons, some benign and some not, the synthesis of libraries of liquid crystals often resorts to the same 'tricks'; a fluorine here, a nitrogen there, methyl to ethyl to butyl to futile. These variations are often logical progression based on prior art and are sometimes borne out of habit. Algorithmic methods for generating new chemical structures are an intensive area of research [28][29][30][31][32], and offer a route to generate new candidate structures from some initial starting point. Algorithmic generation of molecular structure could potentially reduce or even avoid human biases, suggesting new or underexplored chemical spaces for experimental study. However, as with continuous flow synthesis, the deployment of such methods is as yet essentially absent from the liquid crystal literature.
Herein we combine these two approaches. We first report a series of continuous flow reactors configured in such a way as to produce useful quantities of liquid crystalline materials in a short timeframe, as well as deploying an algorithmic method to design new materials which can then be synthesised in flow.

Flow-chemistry
RM734 and related materials were synthesised here by using a continuous flow protocol which performs sequential esterification/debenzylation/esterification (Figure 1(a)).
Our flow system consists of what we term 'general reaction modules' (GRM, Figure 1(b)) and an off-theshelf H-cube system for high-pressure continuous flow hydrogenation. A GRM consists of two programmable push/pull dual syringe pumps (New Era), i.e. a total of four outputs. Outputs 1 and 2 of pump 1 are dedicated to reagents; they are combined by a tee connector which flows into a PTFE reaction loop (vol = 2800 µl). The outputs of pump 2 deliver the aqueous quench/washing solution to the output of the reaction loops via a tee connector; a small PTFE reactor loop (vol = 250 µl) ensures thorough mixing. In-line liquid-liquid extraction is performed by a membrane-based separator (Zaiput) with the aqueous waste being discarded and the organic portion being passed directly to the next reactor (or to automated chromatography).
The H-cube system employs a HPLC pump to deliver reagents into the system and performs hydrogenolysis of the benzyl protecting unit using a 10% Pd/C catalyst bed and hydrogen gas at a pressure of up to 100 bar and a temperature of up to 100°C. The H-cube system contains an inbuilt back pressure regulator.
The post-reaction solution is subjected to automated off-line chromatographic purification by a Combiflash NextGen 300+ system (Teledyne Isco); the chromatographed material is then filtered through a 0.2 micron PTFE filter and finally recrystallised ethyl acetate/ hexanes.

Algorithmic molecular design
Algorithmic generation of candidate structures was performed via random mutations to the structure of RM734. The input molecular structure is first converted to a text string under the group-SELFIES (group selfreferencing embedded strings) representation protocol [33]. We use a custom fragment library which contains only 'liquid crystal-like' units and combinations thereof; the generation of this library is detained in the ESI. Looking forward, the ability to construct most liquid crystalline materials from a relatively small library of fragments suggests the possibility of creating a semantically correct language, with clear advantages for machine-lead material discovery.
This first representation undergoes a random mutation in which a fragment (e.g. ring, functional group) is inserted and an existing fragment replaced or deleted, giving a library of variant structures. We generate an additional structural diversity by using the STONED algorithm [34] to generate additional structural variants through random mutation again, this time with the molecule represented as a SELFIES string [35].
We perform several sequential rounds of mutations, each generating many thousands of candidate structures. After each round of mutation, we remove molecular structures that: are duplicates, contain rings which have less than five or more than seven atoms; have more than 3 contiguous rotatable bonds; have no rings. A second filter removes molecules that do not have two ester groups; while there is no a priori reason we cannot synthesise these, they would require a different configuration of the flow-reactor setup described above, and fall outside the scope of this paper as a result. After the final iteration, a visual inspection removes structures which are likely to be unstable, is probably unsynthesisable (e.g. the phospaalkyne in Figure 2) and/or is unstable under atmospheric conditions (e.g. silabenzenes, benzynes and such).
For these plausible structures, we then generate starting 3D coordinates for each structure with RDKit and performs an initial optimisation with the MMFF force field. The MMFF optimised geometry was then passed to Gaussian G09 [36] for further optimisation with the semi-empirical AM1 method. We then calculated a series of molecular properties at the AM1 level (dipole tensor, polarisability tensor, hyperpolarizability tensor, molecular dimensions, molecular volume, and shape anisotropy parameter). Additionally, we use RDKit to compute a 'fingerprint' for each molecule which counts the number of instances of various structural features which are found in one or more N F materials (e.g. number of nitro groups, aryl fluorines, contiguous rotatable bonds, pi electrons, and so on; over 100 properties in total). Scoring is then made by a trained a multilayer perceptron feedforward neural network (using Pytorch). This input to the neural network is the matrix of molecular properties (from AM1 calculations) and fingerprint data (from RDkit), which is used to predict the scaled N F -N(Iso) transition temperature which we define here as: The input used to train this network is all N F materials that have either: been publicly disclosed; are unpublished N F materials synthesised in our group; are closely related molecules which do not exhibit the N F phase.
The top scoring molecules from each round were then each used to seed another round of mutations. We performed eight rounds of mutation all in all, and we selected the top 100 molecules from all rounds based on their scores. A visual inspection was then performed, and we selected the highest ranking molecules that could be synthesised using the continuous flow platform described in this paper using readily available building blocks.
The GitHub repository (https://github.com/ RichardMandle/AlgMol1) contains Python code for structure modification via SELFIES/STONED [34] and group-Selfies. A Jupyter Notebook is provided in the GitHub repository that provides a 'walk-through' for generation of 5CB variants using SELFIES/STONED, fragment-based generation of RM734-variants via group-selfies and genetic algorithms which utilise fingerprint-based scoring regimes. Functions are provided for filtering, substructure searching, scoring, initial geometry creation and optimisation using MMFF/RDKit, creating input files for Gaussian (and Mopac, although this is not used here).

Flow-reactor optimisation
We considered that, in the present case, a continuous flow approach is only beneficial if it can offer a significant advantage in throughput. The most persontime consuming portion of organic synthesis is chromatographic purification; we therefore purify synthetic intermediate steps with an in-line liquid-liquid extraction only, with chromatographic separation reserved for the final synthetic step. We sought to benchmark our system on a model synthesisin this case, the well-known ferroelectric material RM734, targeting the highest possible throughput that gives >95% conversion to the target material at each step.
We utilise a single general reaction module (GRM) for the initial esterification between 2,4-dimethoxybenzoic acid and benzyl 4-hydroxybenzoate. The time for the reaction to complete depends upon the flow rate and and corresponding representation as SMILES and group-SELFIES, with ring fragments indicated. Bottom: Exemplar molecules from a single round of mutation to RM734; we filter molecules using the rules shown, and score using a neural network. Materials which score highly and were later synthesised using our continuous flow apparatus (vide infra).
also on the size of the reaction loop; we fixed the reaction loop size at 2800 µl as an arbitrary first choice (giving a residence time of 2.8 minutes at a flow rate of 1 ml min −1 ). We then explore simultaneous variations in both reagent concentration and flow rate, the latter being the sum of the flows from both reagent pumps. For each run, we adjust the flow rate of the quench pump to match the flow rate of the organic stream.
We explored flow rates in the range of 0.05-1.00 ml min −1 , with concentrations ranging from 0.1 M to 1.50 M. In Table 1 we focus only on the limiting cases: at low flow rates (≤0.75 ml min −1 ) we observe quantitative (e.g.>95%) conversion at all but the highest concentration. Similarly, at low concentration (≤0.10 M) we observe quantitative conversion at all but the highest flow rates. At high concentrations (1.00 M) we encounter problems with the solubility of the acid-EDC complex, which has a deleterious effect upon conversion, and so subsequent experiments with high flow rates were not performed.
Concentrations in the range of 0.25-0.75 M give quantitative conversion at certain flow rates (e.g. 0.25 M at 1.00 ml min −1 ), and incomplete conversion at higher rates (e.g. 0.75 M at 1.50 ml min −1 ). Our objective is to obtain the highest throughput possible with near-quantitative conversion, enabling us to pass the reactor output straight into the next reaction without purification and so eliminating the need for time-consuming chromatography and solventremoval. A flow rate of 0.75 ml min −1 at a concentration of 0.50 M is the highest throughput that gives quantitative conversion [37].
We next sought to explore the hydrogenolysis step. Here, the concentration is determined by the (optimised) output of the preceding step, while the flow rate is required to be equal to the outgoing flow rate of the preceding step (i.e. 0.75 ml min −1 ). Our variables to optimise here are therefore temperature and pressure; the former increasing the rate of reaction (at the cost of reduced hydrogen gas solubility) and the latter -partially -dictating the solubility of hydrogen in the solvent stream (Henry's law). Initial investigations employed the flow rate (0.75 ml min −1 ) and concentration (0.50 mmol ml −1 ) which gave the highest throughput in the preceding step, however we were unable to attain quantitative debenzylation under any of the explored conditions. Moreover, the liberated benzoic acid is not entirely soluble in DCM at these concentrations which presented additional challenges (e.g. precipitation in the high pressure hydrogen cell, blocked output lines, fouling of the catalyst cartridge (etc.), Table 2). We therefore selected the second-highest throughput conditions from Table 1, using a flow rate of 0.75 ml min −1 and a concentration of 0.25 M.
DCM is hardly a widely encountered solvent for hydrogenation, yet it performs adequately in here. At ambient temperature (internal temperature set at 20°C) there is incomplete conversion at atmospheric (1 Bar) and elevated (100 Bar) pressure. We considered that the rate of reaction is probably not limited by the solubility of hydrogen (which is higher in DCM than in common hydrogenation solvents such as THF or ethanol) [38] but rather the rate of adsorption/desorption onto the catalyst itself, which will be temperature dependent. Taking advantage of the backpressure regulation within the H-cube system, we deployed elevated pressure (up to 100 Bar) enabling us to work at temperatures far beyond the boiling point of DCM at atmospheric pressure (up to 100°C here). A temperature of 40°C is sufficient to attain quantitative debenzylation at a concentration of 0.25 M and a flow rate of 0.75 ml min −1 .  Table 3. Transition temperatures (T, °C) and associated enthalpies (ΔH, kJ mol-1) for a family of RM734 analogues. # The clearing point was above 250°C, and we were unable to measure the enthalpy or temperature for this phase transition owing to the sample decomposing. * An additional phase transition is seen at ~160°C by microscopy which is presumably an antiferroelectric/splay nematic; we omit this from the table for clarity. Transition temperatures for several compounds are in keeping with prior reports [11,16,22]. For the final esterification, the flow-rate and concentration of carboxylic acid is determined by the output of the hydrogenolysis step (0.75 ml min −1 , ~0.25 mmol ml −1 ). The output of the H-cube is fed into a T-mixer along with a DCM solution of EDC.HCl (0.5 mmol ml −1 ), 4-nitrophenol (0.28 mmol ml −1 ), and DMAP (0.01 mmol ml −1 ) at a flow rate of 0.75 ml −1 . The streams are then fed through a 2.8 ml reaction loop as described elsewhere. Following inline liquid-liquid extraction this provides >95% conversion to RM734, and so there are no parameters to optimise here.
Next, a confirmatory experiment was performed in which the optimised reaction conditions were applied to each step in the sequence. After achieving a steady state, we collected the output of the final flow reactor for 12 minutes, and subjected this to automated offline chromatography (~8 minute run time). The chromatographed material was concentrated to ~10 ml volume, filtered through a 0.2 micron PTFE filter and triturated with hexanes to induce crystallisation. The solid material was then collected and dried via suction filtration. This afforded 0.85 grams of isolated material (1.94 mmol, ~ 85% yield), giving a throughput of 11.3 mmol hr −1 (~5 g hr −1 ).

Mini-library synthesis
With optimised conditions at hand, we now apply our continuous flow platform to the synthesis of some RM734-like materials. We algorithmically generated many variants of RM734 through systematic replacement of molecular fragments under the group-selfies representation. Variants were scored using a neural network, and the top 100 were visually inspected to select those that could be synthesised using the flow platform, i.e. those constructed by sequential esterificationhydrogenolysis/debenzylation-esterification. Clearly, the system could be modified to permit the synthesis of many other things, but this goes beyond the scope of this paper. We also restricted our synthetic endeavours to targets whose building blocks could either be purchased, gifted to us or were available in-house.
Some high-scoring materials have been reported previously and were resynthesised here, with excellent agreement between the transition temperatures in this work and the original reports (Table 3). One notable consequence of using the neural network is that it appears to overestimate the importance of polarisability, biasing our selection towards highly polarisable materials by scoring these highly. Some suggested functional groups lack the polarity to generate the N F phase (e.g. RM734-S), yet others (RM734.dma, RM734.pyr) exhibit enhancements to the stability of the N F phase relative to RM734, albeit with increased melting points. Relative to the parent material, the biphenyl containing RM734.P has large increases at the melting point, the onset of N F phase, and also at the clearing point. Presumably the increased aspect ratio helps to arrest head-to-tail flipping.
Our use of a neural network to score the algorithmically designed molecules, especially one trained on such a small set of materials, is that molecules that are very similar to RM734 score strongly, whereas more radical deviations (e.g. Figure 2) generally score poorly or are filtered out. The neural network is probably significantly less adept at discerning the likelihood of any given structure being worth synthesising (or not!) than one skilled in the art; however, the advantage is that many thousands of molecules can be evaluated simultaneously. It is difficult to see how (or why) a human should go about sifting through the huge number of structures that can be generated per minute. It is to be expected that our ability to score candidate molecules will improve significantly over time, both as more sophisticated networks are trained and deployed and also as we can start to draw on an evergrowing pool of experimental data. The possibility stands that the generated molecules, which we currently filter out -either via the neural network or by human intervention -may in fact be the roughest diamonds in the mine.

Conclusions
Here, we outline the use of a continuous flow reactor to telescope multiple chemical transformations into a single process, which vastly increases the number of materials we can synthesise in a given time. By telescoping multiple optimised reactions into a single continuous flow process, the bottleneck is no longer synthesis but characterisation, chemical analysis and so on. We also deploy an algorithmic method for designing new chemical matter from an initial starting molecule. This method reduces human bias, and produces output structures which range from the obvious to the bizarre, from trivial to unphysical. The output of this algorithmic method is scored using a neural network which is trained to predict the scaled onset temperature of the N F phase. We then synthesise several of the high-scoring candidate structures using our continuous flow system, providing a mini-library of RM734-like materials.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The work was supported by the UK Research and Innovation [MR/W006391/1].