Machine learning aided evaluation and design based on polymer membrane materials

Abstract With the acceleration of the global modern industrialisation process and the increasingly serious environmental problems, the development of low energy consumption, high efficiency electrochemical energy conversion equipment and separation system has become a research hotspot in the scientific and industrial circles. At present, machine learning has become an important research method to explore and expand two-dimensional material family. Traditional experimental and computational methods have low fault tolerance when studying two-dimensional materials, which requires a lot of time and research and development costs. Machine learning, due to its powerful data processing capability and flexible algorithm model, can help reduce the time and cost of discovering and understanding two-dimensional materials, and can effectively predict and expand two-dimensional material systems based on data and explore their potential for experimental synthesis and application. This paper will focus on the methods of machine learning, the exploration of machine learning in 2D material design and synthesis, and the exploration of machine learning in 2D material properties and applications. Finally, this paper uses ML algorithm to test the synthesised polymer. The experimental data points and prediction data points have relatively good consistency with each other, which indicates that ML model can be used as a prediction tool to identify the undeveloped polymer for gas separation.


Introduction
In 2021, the National Development and Reform Commission proposed six key work directions, including increasing the proportion of non-fossil energy in my country's energy structure, popularising low-energy chemical production, and strengthening waste recycling and discharge [1].In this process, the development of membrane technology is considered to be one of the strategic emerging industries [2].Whether it is an ion exchange membrane of an electrochemical energy conversion device or a separation membrane at all levels in a separation system, the performance of the membrane mainly depends on its chemical properties and internal microscopic appearance [3].Functional and functional micro-nano structure will be development direction and exchange membranes [4,5].Due to the special properties of the crystal structure, different two-dimensional materials have different electrical or optical properties, including the anisotropy of Raman spectrum, photoluminescence spectrum, second-order harmonic spectrum, optical absorption spectrum, thermal conductivity, conductivity and other properties.It has great development potential in the fields of polarised optoelectronic devices, polarised thermoelectric devices, bionic devices, polarised light detection, etc. [6].Researchers have conducted many theoretical and experimental studies on the electronic properties of 2D nanomaterials.In the past decades, the traditional first principles calculation has become a powerful tool to calculate the structure and properties of materials [7].For example, they have been used to study the structure and electronic properties of doped and defective 2D nanomaterials, the interaction between the substrate and 2D nanomaterials, and the interaction between contacts and 2D nanomaterials [8][9][10].
Although the results of the first principle theoretical calculation are usually consistent with the experimental results, this method is expensive and time-consuming.Machine learning provides a more efficient method, and the trained machine learning model can quickly predict the performance of materials.Therefore, the application of machine learning in the field of materials has important practical significance.
The machine learning algorithm uses the crystal structure, electronic energy band, mechanical properties, chemical characteristics, magnetic properties and other data of two-dimensional materials obtained from experimental characterisation or theoretical calculation to intelligently mine and explain the complex internal correlation between these features, and attempts to predict the unknown features of new two-dimensional materials [11][12][13].In addition, through the training of the model, the relevant algorithms of machine learning can make very fast predictions, making machine learning a powerful tool for exploring the physical properties and applications of a large number of twodimensional materials [14][15][16].Therefore, the design, synthesis, physical property exploration and application of two-dimensional materials can be extensively studied by using machine learning strategies.
Xu Yonglin's band gap prediction of diamond-like carbon compounds based on highthroughput computing and machine learning.Three basic machine learning regression prediction algorithms, Lasso, SVR and GBDT, are used in this paper.A more powerful and stable Ensemble algorithm is combined to predict the band gap value of materials [17].The experimental results show that the prediction model has an accuracy of 77.73%, and is robust and stable enough to be widely used in the research scenarios of thermoelectric materials that require large batch band gap prediction.Michael Fernandez trained the machine learning model using the structural characteristics of graphene nano sheets to predict Fermi energy levels and band gaps.The accuracy of the model was 94% and 88%, respectively.This method makes the screening of nanomaterials more rapid and efficient, and can be extended to other materials.Gabriel R. Schleder et al. used machine learning techniques to identify thermodynamic stable 2D nanomaterials.According to the energy of convex shell and material surface, they divide the materials into low, medium and high stability.This method can be used to evaluate the stability of new two-dimensional compounds for further detailed study of promising candidates.
In recent years, although researchers have conducted exploratory research on some properties of 2D materials using machine learning methods, they still face the following challenges.(1) The data used in the current research are basically self-trained and predicted by using the data calculated by the first principle, so it is not universal.(2) In the field of materials, the amount of data is often very limited, so how to select high correlation features is the key difficulty to improve the prediction accuracy in the limited amount of data.
Machine learning, due to its powerful data processing capability and flexible algorithm model, can help reduce the time and cost of discovering and understanding two-dimensional materials, and can effectively predict and expand two-dimensional material systems based on data and explore their potential for experimental synthesis and application.This paper will focus on the methods of machine learning, the exploration of machine learning in 2D material design and synthesis, and the exploration of machine learning in 2D material properties and applications.Finally, this paper uses ML algorithm to test the synthesised polymer.

Introduction to machine learning methods
The general process of the application of machine learning in materials research is generally (Figure 1): 1. Collecting data and pre-processing data; data is the cornerstone of machine learning results [18].When the original data distribution is unreasonable or the amount of data is small, and there are some outliers in the data set or individual items in the data are missing, reluctance to perform machine learning will Undesirable effects on the results require data cleaning or supplementation as well as data pre-processing to standardise the dataset to obtain standard normally distributed data suitable for machine learning [19].2. Select appropriate descriptors (features) to construct a model; descriptors are the bridge between input data and target attributes, and the choice of descriptors is important for controlling the direction of the final results of machine learning with a given material, although there may be many for material, the number of descriptors must be reasonable, and such descriptors should less for materials in such dataset [20].3. Use different algorithms to learn the model data; use the classification algorithm to classify the data, use the regression algorithm to predict the mapping function between descriptors and target properties, and use the clustering algorithm to automatically classify similar objects into one category, etc. [21][22][23].4. Evaluate the model; evaluate whether the final model and its parameters are successful, not only depends on whether the model conforms to the data in the training set, but more importantly, whether the obtained model is suitable for other material systems outside the training set.Due to the lack of known data for some properties in the material field, the obtained machine learning model is likely to be over-fitted, resulting in a large error between training and validation.Therefore, it is very important to test the model [24,25].

Method definition and introduction
Polymer membranes are used for various gas separation.Permeability and selectivity of polymer gas separation membranes are usually negatively correlated.The design of polymer membranes is mainly based on empirical observation, which is a limitation to the discovery of new materials that can separate specific gas pairs.Therefore, the challenge of synthesising a new generation of polymer gas separation membranes is to design materials with high permeability and selectivity at the same time.It is very expensive and time-consuming to use chemical synthesis methods and test possible polymer structures and their potential chemical modifications.ML technology is a new way to explore the design of polymer membrane by using smaller experimental data sets for training.The removal formula for various gas separation is as follows.
The ideal selectivity between two gases is the ratio of their permeability.
where D A D B and S A S B are the selectivity.There will be an increasing on using permeability and selectivity when using for these scarce, so they are discussed in this paper.
While the ones are included, it is generally observed that these quantities are inversely.The various polymer for the diagram of CO 2 /CH 4 separation will be in Figure 2.
This paper presents a polymer machine learning (ML) algorithm.The experimental permeability data of six different gases of about 700 polymers were used to predict the gas separation behavior of more than 11,000 untested homopolymers.In order to test the accuracy of the algorithm, the author synthesised two most promising polymer membranes predicted by the method and found that they exceeded the upper limit of CO 2 /CH 4 separation performance.

Results and discussion
Scikit learn's Gaussian Process Regressor (GPR) allows 'prediction without prior fitting (based on prior GP)'.The method is as follows: The correlation of y depends on x, which is then represented by a Gaussian kernel function from x to y.The distribution between Y is represented by a covariance matrix.
Add noise to the diagonal when there is noise.A feature for a set of X¼ fx 1 , x 2 … , x t g, there is Nð0, kÞ, in K ¼ That is, a multivariate one: The posterior distribution of f(t þ 1) is For the choice of Gaussian kernel, we use RBF We can add a control hyperparameter h to control the width of the kernel.
For the more general case: add the diagonal In GPR, this is the addition of a set of automatic relevance determination (ARD) hyperparameters h on the diagonal as shown in Figure 3.For the Matern kernel, we define with the following formula Such permeability in Barrer by 10 The logarithm of the base ().There are some terms: It will be defined as follows: 1.When t is not equal to 0, d(t)¼0; 2.
The dimethylacetamide one is like this: Synthesise CO 2 and CH 4 permeability are with the techniques.The film was increased on a 47 mm brass disc of known inner diameter, and the film was properly adhered with epoxy.Such acts will be the membrane area required in experiments.Machine learning can simulate human brain learning.Machine learning, on the other hand, takes time to write code, and requires learning a lot of known data ahead of time (for example, showing your computer lots and lots of apples and pears) to correctly make classification decisions.But once the machine learning process is complete, automation and mass adoption are easya machine can easily classify millions of images quickly and without fatigue, something a human brain can't.
Vd is volume (calibrated to within 0.001 cm 3 using Burnett's gas expansion of helium).

Experimental results
The results of traditional method The gas separation performance of membrane and membranes was determined swing with it (test conditions, 25 C).By testing the permeability Pa of different pure gases, the permeability selectivity of fast permeable gas to slow permeable gas between different gas pairs.Permeability is shown in Formula (15).
Such conventional one was tested using swing method for pure gas permeation.The one of this series of membranes are higher than those of the original membrane, which indicates that the pore size shrinkage of the micropores enhances the pore size of the membrane for gas molecules.While the gas permeation separation performance was improved, the carbon dioxide permeability of a series of X-SCPIM-y/Z membranes was gradually improved with the prolongation of reaction time, the increase of treatment temperature and the increase of trifluoromethanesulfonic acid content.After screening, 3-SCPIM-6/24 membrane has the best CO 2 separation performance, its CO 2 flux reaches 4008.Barren CO 2 /methane permeation selectivity is 40.9,CO 2 /nitrogen permeation selectivity is 58.1, the comprehensive performance far exceeds the upper limit of Robeson gas separation performance in 2008.After long-term gas permeability tests (Figure 4), the 3-SCPIM-6/24 membrane has good gas permeation stability after experiencing a rapid decrease in gas permeability caused by initial severe polymer aging (can be maintained at about 50% of the initial permeability) and improved stable permeation selectivity (CO 2 /methane and CO 2 /nitrogen permeation selectivity can reach 56.4 and 78.6, respectively).In other words, these x-SCPIM-y/z membranes can simultaneously have excellent gas separation performance and stability, so superacidcatalysed self-crosslinking is considered as a potential modification of intrinsically microporous polymer membranes.Compared with various PIM-1 thermal crosslinking processes (thermal self-crosslinking, thermal decarboxylation crosslinking, thermal oxidation crosslinking, etc.) reported (Figure 5), the superacid-catalysed self-crosslinking has obvious advantages.Advantages: (1) The temperature required for the cross-linking process is low, that is, the energy consumption of the whole process is low; (2) The carbon dioxide permeability of the x-SCPIM-y/z membrane is generally high.The gas permeation selectivity of the carbon dioxide system is higher.
The results of method proposed in this paper This article is based on a literature database containing the diffusivity, solubility, and permeability.We then randomly divided this dataset into one of two categories for each gas; one was used to train ones.The training dataset for each gas represents it, and each gas represents them.I will be applied the trained to the remaining 25% of the aggregates (the test set) and used this data to validate the model's accuracy.It will be the predictions of ML models values for greater, it has increased with larger datasets (see Table 1).
Fingerprinting is used to digitally represent the chemical connectivity in the polymer repeating unit.In the group contribution method, all building blocks must be defined prior and remain unchanged.Fingerprinting is dynamic in nature and can evolve to include synthetic materials.The fingerprint method also considers the chemical connectivity between different units.The author uses the fingerprint algorithm similar to Daylight implemented in RDKit to transform each polymer into a binary 'fingerprint'.This topology-based method analyses each segment containing a certain number of molecules, and then hashes each segment to generate a binary fingerprint, which computationally represents the molecule, as shown in Figure 6.
The large synthesis toolbox that can be used to create new polymers is simulated by converting polymers into binary 'fingerprints' that are fed into ML algorithms.The model was trained using a random subset of polymers from our literature database and then tested against the remaining polymers.The model is then applied to extensive literature data to discover high-performance polymers, thereby facilitating machine-aided design.
We found that for training sets larger than 400, the mean squared error only began to decrease, and the model monotonically decreases with this size.This explains why previous studies (often using 100 polymers in ML research) lacked insight.Therefore, it is for predicting new polymers that have never been in the one.
It can be seen from Figure 7 that by checking the materials above the upper limit of CO 2 /CH 4 and their common characteristics, it is possible to understand which physical  quantities are critical to improve gas permeability and selectivity.Among the 11,325 polyin the data set, polysulfides accounted for only 7.00%.However, they account for the majority (53.00%) of polymers that exceed the CO 2 /CH 4 upper limit.In addition, among the groups that broke the upper limit, polysulfone (5.30% in total, 18.00% exceeding the upper limit) and polyimide (17.65% in total, 35.00% exceeding the upper limit) accounted for a larger percentage.Aromatic polyethers accounted for 30.78% of the total forecast data set, but only 21.00% of the upper limit.By creating a two-dimensional histogram to further analyse the polymers that break the upper limit.It was found that 18.00% belonged to polysulfone and polyimide, and 17.00% belonged to polysulfone and polyether.Therefore, in this case, materials with sulfur groups, oxygen atoms or nitrogen rings along the main chain perform best.
Figure 8 shows the polymer candidates with CO 2 /CH 4 gas transport properties identified by ML and their experimental properties.P432,095 and P432,092 were synthesised in the experiment, and their CO 2 /CH 4 transmission performance was tested to verify the ML data.As predicted by the ML model, the polymer exceeded the 2008 Robeson upper limit of the gas pair, and under the same permeability value, the selectivity of P432092 and P432095 was about 7 times and 5.5 times of the upper limit, respectively.In addition, it is also found that the experimental data points and prediction data points have relatively  good consistency with each other, which indicates that ML model can be used as a prediction tool to undeveloped polymers for gas separation.
The design goal of our ML approach is to rapidly characterise the ones such as gas transport.Likewise, much experimental work in the past has focused on the effect of various polymer backbone properties on solubility or diffusion, e.g. the effect of sulfur groups on CO 2 solubility and, more often, the effect of polymer backbone stiffness on gas diffusion constants (29-31).Our method is particularly focused on predicting polymer permeability, since there is less available literature data to decompose permeability into solubility and diffusion.If a more complete database becomes available, this method could potentially be used in the future to probe ones which may it lead to this respectively.

Conclusions
The features of two-dimensional materials obtained from material databases and experiments and calculations are used as input parameters to train various machine learning algorithms, explore the intricate interrelationships between input features, and associate input features with output target results, which is conducive to better understanding, discovery, synthesis and application of two-dimensional materials of huge systems.This paper first discusses the methods of machine learning, the exploration of machine learning in 2D material design and synthesis, and the exploration of machine learning in 2D material properties and applications.Finally, the synthesised polymer was tested using ML algorithm.The experimental permeability data of six different gases of about 700 polymers were used to predict the gas separation behavior of more than 11,000 untested homopolymers.There is relatively good consistency between experimental data points and prediction data points, which indicates that ML model can be used as a prediction tool to identify undeveloped polymers for gas separation.At present, with the cooperation of a large number of researchers to build a large and accurate two-dimensional material database, these problems and challenges of machine learning in two-dimensional material research will be solved in the near future.In addition, better machine learning methods are incorporating physical, chemical and other constraints into algorithms and models to achieve more reasonable theoretical predictions.

Figure 1 .
Figure 1.Schematic diagram of the application process of machine learning methods in materials research.

Figure 2 .
Figure 2. A Robeson plot of the selectivity versus permeability of the CO 2 /CH 4 isolation.

Figure 4 .
Figure 4. ：CO 2 /methane gas separation performance of a series of polymer membranes.

Figure 5 .
Figure 5.Comparison of gas separation performance between x-SCPIM-y/z membrane and thermally cross-linked PIMl membrane.

Figure 6 .
Figure 6.Assisted design of high-performance polymer membranes.

Figure 7 .
Figure 7. Identification of polymer structures from a machine learning-aided design.

Table 1 .
Evaluation of the model performance on the reserved test set.