Design of substitution nodes (S-Boxes) of a block cipher intended for preliminary encryption of confidential information

Abstract This paper considers a new method for obtaining an S-box, which is one of the nonlinear transformations used in modern block-symmetric cipher systems. This method is based on modular arithmetic, that is, exponentiation modulo polynomial in extended Galois fields . The indicators and criteria of efficiency of the obtained S-box (balance, Hamming distance, distribution criteria, autocorrelation, algebraic immunity, cyclic structure of the S-box) are analyzed. The cryptographic characteristics are presented in comparison with the substitution boxes of known modern block ciphers. In addition, the resulting S-box was investigated by the methods of linear and differential cryptanalysis. In the future, the proposed S-box will be used in the developed encryption algorithm designed for the pre-encryption of confidential information.


ABOUT THE AUTHOR
Ardabek Khompysh -Research Scientist at the Institute of Information and Computational Technologies, PhD in specialty «Information Security Systems». Theme of scientific research development and research of reliable algorithms for generating pseudorandom sequences used to generate key sequences for symmetric block and stream encryption, as well as the development of cryptographic information protection algorithms using modular arithmetic. Published 20 works in foreign and domestic publications.

PUBLIC INTEREST STATEMENT
To protect algorithms from various cryptanalysis methods, S-boxes must have a number of cryptographic properties and satisfy a number of criteria. Currently, the main attacks for which the properties of S-boxes used in cryptographic algorithms are important are attacks based on linear, differential, and algebraic cryptanalysis methods.

Introduction
Symmetric cryptographic primitives are widely used due to their high performance and low implementation complexity (Schneier, 1996). When guaranteeing confidentiality using block ciphers, symmetric primitives are used to ensure the integrity of information based on message authentication codes and hash functions, as components of an electronic digital signature to protect authenticity, generate pseudo-random sequences, as part of authentication protocols (Menezes et al., 1997), etc. In accordance with the well-known Shannon principles (Shannon, 1949), such algorithms use nonlinear operations for confusion and linear transformations for diffusion. Consistent, multiple applications of confusion and diffusion allow for a high level of cryptographic strength. Nonlinear substitution nodes for modern symmetric primitives are typically implemented as substitution tables or S-boxes. Considering that most modern block algorithms (AES, Kuznyechik, Kalyna, BelT, etc.) use a single linear modulo 2 addition operation to introduce round keys and combine inter-round values, S-box is the only element that determines the nonlinearity of the encryption transformation and the level of its resistance to cryptanalytic attacks (Kuznetsov et al., 2015). The required number of rounds of block ciphers is calculated based on ensuring resistance to known types of cryptographic analysis under the condition of the specified properties of nonlinear substitution nodes. Many stream algorithms, cryptographic hash functions, and pseudo-random sequence generators are based on block ciphers or their structural elements. Thus, the cryptographic strength of most modern symmetric primitives largely depends on the properties of the S-boxes used (Kazimirov & Oleinikov, 2010).
The most advanced key cryptosystems are based on the idea of cipher production, which has already become traditional (Shannon, 1949) and represent a class of cryptosystems that repeat a complex operation transforming plain text into ciphertext. Each such repetition (iteration) is known as a cipher cycle (Connor, 1994). A complex (compound) operation performed in each cycle is usually a combination of a set of primitive operations, such as shift, linear transformation, modulo addition, and substitution. In an appropriate combination, these transformations should implement the concept of building such ciphers, which, as is known, is that a combination of permutation and substitution operations of separately weak transformations can lead to a cryptographically strong nonlinear transformation if it is applied a sufficient number of times. Substitution operations in many ciphers act in this case as the main nonlinear element of the cyclic transformation (Dolgov et al., 2009).
The nonlinearity of the elements that make up symmetric block crypto schemes has a great impact on the strength of the information security subsystem in modern telecommunication systems. The selection of quality substitutions (S-boxes) is considered the most difficult aspect of block cipher design. To date, the existing tools for assessing the strength of substitutions do not contain any method that determines the best S-box in terms of counteracting various cryptanalytic attacks and techniques for its hardware and/or software implementation (Panasenko, 2009;Zenzin & Ivanov, 2002). In modern algorithms AES and Kuznyechik, instead of fixed substitutions, linear transformations are used to map an established S-box (often the only one) to another box of a sufficiently large set of equivalent ones (Grigoryan & Ngi, 2019). Therefore, significant efforts of researchers are aimed at studying properties and constructing substitutions with high cryptographic indicators. One of the most popular for describing and studying the properties of S-boxes is the mathematical apparatus of linear algebra and, in particular, the apparatus of Boolean functions .
The article presents a novel way to design an S-box based on modular arithmetic, that is, exponentiation modulo polynomial in extended Galois fields GF 2 8 À � .

General information about substitution boxes
An S-box (Substitution-box) underlies any block cipher and is a source of nonlinearity. The study of a cipher for strength, as a rule, begins with a study of the properties of its S-boxes.
In general, an m � n S-box maps Z m 2 ! Z n 2 . The algorithms GOST R 34.13-2015 and AES use 8 × 8 S-boxes, and DES uses 8 different S-boxes of 6 × 4. Tables are convenient for software implementation, but restrictions are imposed on their size.
There are many competing approaches to the selection of S-boxes, among them four main ones can be distinguished (Babenko & Ischukova, 2006).
Random sampling. It is clear that small random S-boxes are not reliable, but large random Sboxes can be good enough. Random S-boxes with eight or more inputs can be quite strong. The strength of S-boxes increases when they are both random and key-dependent.
Sampling followed by testing. In some ciphers, random S-boxes are generated first, and then their properties are tested for compliance with requirements.
Manual development. Here, the mathematical apparatus is rarely used: S-boxes are created using intuitive techniques. Bart Preneel stated that " . . . theoretically interesting criteria are not sufficient (for choosing Boolean S-box functions) . . . " and " . . . special design criteria are needed." (Preneel, 1993).
Mathematical development. S-boxes are created under the laws of mathematics; therefore, they have guaranteed resistance to differential and linear cryptanalysis and good diffusion properties (Kapalova et al., 2020). There have been proposals to combine the "mathematical" and "manual" approaches, but in practice, randomly selected S-boxes and S-boxes with certain properties compete. The advantages of the latter approach include optimization against known attack methods-differential and linear cryptanalysis. In recent years, many approaches to obtaining substitution tables have appeared, for example, a linear fractional transformation (LFT), a cubic fractional transformation (CFT), a heuristic approach, a modular approach, and others. It is also worth noting the work of the following scientists (Chew & Ismail, 2020;Hussain et al., 2012;Naseer et al., 2019;Ozkaynak, 2017;Shah et al., 2011;Shah & Shah, 2018;Zahid et al., 2020;Zahid & Arshad, 2019;Zahid et al., 2019;Y. Zhang, 2018), who are engaged in obtaining dynamic quality S-boxes. This article (Naseer et al., 2019) shows a way to obtain a dynamic S-box by applying substitution-permutation transformations on the input value of the S-box. This (considered) S-box is structurally different from the developed S-box; our proposed algorithm represents a method for obtaining a fixed S-box.
An S-box, in other words, is a mapping of m-bit inputs to n-bit outputs. S-boxes are part of the transform function and are important for the strength of the encryption algorithm. Any changes to the S-box input should result in similarly random changes in the output. The dependence of the output values on the input should not be linear or easily approximated by linear functions (this property is used when applying linear cryptanalysis; Kapalova & Haumen, 2018). S-boxes are currently used in many symmetric encryption algorithms, such as AES, GOST 28147-89, DES, Twofish, etc., (Biyashev et al., 2021).

The proposed method for obtaining nonlinear substitution nodes for symmetric cryptoalgorithms
Let us consider a three-step method for obtaining an S-box.
At the first step, we choose an irreducible polynomial that generates a multiplicative group in the Galois field GF 2 8 À � and an irreducible polynomial called the base. The modular exponential operation is performed on the selected polynomial P x ð Þ: where A x ð Þ is an irreducible polynomial called the base, P x ð Þ is the module (an irreducible polynomial).
At the second step, we consider the coefficients of the polynomial S 0 x ð Þ as vectors of length 8, . Then the operation of addition S i modulo 2 (XOR) with a fixed vector B is performed: At the third step, we multiply by the matrix M (S Thus, the result of the calculation by formula (3) is a vector of S-box values.
The developed S-box is calculated once and is not dynamic, the calculation of the computational complexity is considered insignificant. To generate an S-box according to the proposed method, an average of 97,921 mathematical operations are performed.

Software implementation of the method for designing substitution nodes of a block cipher
The proposed method for designing block cipher substitution nodes for the encryption algorithm under development was implemented in software. Figure 1 shows the work window of the program for obtaining an S-box.

Evaluation of the strength of nonlinear substitution nodes of an S-box
The use of the mathematical apparatus of vector Boolean functions provides means to simplify the description of the basic elements of symmetric algorithms. Such a representation makes it possible to generalize a set of criteria, including those applied to substitutions, at the same time enabling us to evaluate the correlation, algebraic, and other properties of S-boxes (Maier & Staffelbach, 1990;Sergienko & Moskovchenko, 2007).
If we briefly characterize the possibilities of this approach, then we can note that its basis is the representation of the S-box in the form of a composition of component Boolean functions with the subsequent study of their properties. True, these criteria had to be supplemented with additional restrictions on the maximum permissible values of the elements of the tables of differences and linear approximations, which, however, are also present when using the apparatus of Boolean functions (Millan et al., 1998).
Let F n 2 be the vector space of all binary vectors of length n, where F 2 is a Galois field with two elements {0,1}. Let n and m be two natural numbers, then by an n; m ð Þ-function we mean the vector Boolean function F : F n 2 7 !F m 2 . Such functions are used in cryptography as nonlinear mappings in pseudo-random generators (stream ciphers) or as substitutions (S-boxes) in symmetric block ciphers . Obviously, for m ¼ 1, a vector Boolean function has one output bit and is equivalent to an ordinary Boolean function. To find the algebraic structure, the vector space often has the structure of a finite field F 2 n with some irreducible polynomial.
Let f x ð Þ : F n 2 7 ! F 2 , where x ¼ x 0 ; x 1 ; . . . ; x nÀ 1 ð Þ, is a Boolean function with n variables. Then, the function of finding the Hamming weight is given as Let f x ð Þ and g x ð Þ be Boolean functions with n variables. Then, the Hamming distance between the two functions is calculated by the formula: The Algebraic Normal Form (ANF) of a Boolean function has the form: The algebraic degree of a Boolean function is understood as the maximum degree of a monomial with a coefficient different from 0. It is denoted as deg(f) (Oliynykov, 2011).
The correlation value between a Boolean function f x ð Þ and the set of all linear functions is defined as the Walsh transform : where The nonlinearity of a Boolean function NL f ð Þ ð Þ is understood as the minimum Hamming distance to all affine functions consisting of n variables. The following is the relationship between the nonlinearity of a Boolean function and the Walsh transform: The autocorrelation (AC) function r f α ð Þ of the truth table of the Boolean function f x ð Þ is the derivative of the function for all variables in the direction of α 2 GF 2 n ð Þ, which is given in the form: Let AC j j max be the maximum absolute value of the autocorrelation function, then: Let σ denote the Global Avalanche Characteristics (GAC) "sum-of-squares indicator," then:

Figure 1. S-box obtaining application window.
It is said that some Boolean function f x ð Þ satisfies the Strict Avalanche Criterion (SAC) if the following system of equations is valid for all s (X.-M. Zhang, 1995): A Boolean function f x ð Þ satisfies the propagation criterion of order k PC k ð Þ ð Þ if and only if for a nonzero vector α 2 GF 2 n ð Þ: A Boolean function f x ð Þ has correlation immunity of order m CI m ð Þ ð Þ; if the system of equations is valid for all w: If a Boolean function f x ð Þ is balanced and at the same time possesses correlation immunity of order t, then such a function is called t-stable (Kazimirov, 2013).
Let the function g x ð Þ be the annihilator of the function f x ð Þ, that is f x ð Þ � g x ð Þ ¼ 0. Then, the minimal algebraic degree of the function g x ð Þ�0 is called the algebraic immunity of the function f x ð Þ and is denoted as Currently, work is underway to establish a connection between these approaches, especially since recently, work has intensified to study the properties of reduced versions of block symmetric ciphers.
We consider S-boxes of common symmetric block algorithms, including the S-box obtained by our proposed method.

Conclusion
This article describes a developed and software-implemented method for designing block cipher substitution nodes that can be used to create encryption algorithm substitution boxes for preencrypting confidential information. This method can be used in the design and implementation of cryptographic information protection facilities to protect unclassified information of limited access transmitted over open communication channels. The use of the mathematical apparatus of logical vector functions, and also indicators and criteria of the efficiency of the obtained S-box make it possible to simplify the description of the main elements of symmetric algorithms. The studies carried out have shown that the application of the theoretical approach does not always meet the practical needs, in particular, for the generation of substitutions with given unsaturated values. Thus, despite the many existing solutions in the field of symmetric cryptography, the search for approaches that provide protection against existing and proposed types of attacks on encryption  algorithms is relevant. It is necessary to substantiate the criteria for the development and improvement of methods for creating nodes of nonlinear transformations, as well as to conduct research on cryptanalysis methods and the theory of vector Boolean functions.
To date, there is no unambiguous set of criteria for an ideal S-box. Many studies show that perfect substitutions probably don't exist. Therefore, the term "optimal substitution" is introduced, the criteria of which are determined for a specific encryption algorithm (or a group of algorithms) and are optimal from the point of view of protection against existing types of attacks. Based on the research, a method was proposed for obtaining substitutions used in cryptographic sentences.
As a result of the studies, it was found that the characteristics of the efficiency of the obtained Sblock are not inferior to those of the known algorithms. It was also shown that the obtained numerical values correspond to the values of the S-boxes of such algorithms as Kuznyechik, Camellia and others.