Mathematical modelling for TM topology under uniform and hotspot traffic patterns

ABSTRACT Interconnection networks are introduced when dealing with the connection of a significant number of processors in massively parallel systems. TM topology is one of the latest interconnection networks to solve the deadlock problem and achieve high performance in massively parallel systems. This topology is derived from a Torus topology with removing cyclic channel dependencies. In this paper, we derive a mathematical model for TM topology under uniform and hotspot traffic patterns to compute the average delay. The average delay is formulated from the sum of the average delay of network, the average waiting time of the source node and the average degree of virtual channels. The results obtained from the mathematical model exhibit a close agreement with those predicted by simulation. In addition, sufficient simulation results are presented to revisit the TM topology performance under various traffic patterns.


Introduction
High-performance computing is needed to solve a variety of well-known problems in many research areas, such as the development of new materials and sources of energy, development of new medicines and improved health care, weather forecasting, and for scientific research including the origins of matter and the universe. Nowadays, many studies have been conducted on the critical topic of parallel computing, which provides a solution for increasing processing power and computation speeds [1]. Parallel computing is employed in advanced computing by integrating multiple computers through an interconnection network. As message passing through an interconnection network is done by multiprocessor communication, the interconnection network is significant in massively parallel computers consisting of tens or hundreds of processors [2][3][4][5].
Over the last decade, interest has increased in the interconnection network as an essential component, so it is important to have a better conception of its performance. In such computers, with millions of nodes, the performance of the entire system is also determined by the main interconnection network [6]. In the interconnection network, the network performance depends on the characteristics of the network topology and the employed routing within it. In previous studies [7][8][9][10][11][12][13][14][15], different interconnect topologies, such as k-ary ncubes, are popular topologies that have been utilized. Figure 1 illustrates the taxonomy of interconnection networks. Mesh and Torus as k-ary n-cubes topologies are included in strictly orthogonal topologies by having at least one link in each dimension for each node.
To design a new network topology, an interconnection network comprises a complex connection of switches and links that communicate among the processors/cores and themselves. Moreover, developing fast routers is a significant approach to provide low delay and high throughput as well. A message selects its network path from source to destination according to its routing algorithm. As the routing decision is significant to the performance of an interconnection network, the routing decision process must be as fast as possible to reduce network delay.
One of the related problems to interconnection topologies and their routing scheme is deadlock. The three deadlock strategies are deadlock prevention, deadlock avoidance and deadlock recovery [16][17][18]. In previous studies [19][20][21][22], two deadlock strategies out of three strategies proposed a new scheme to reduce resource utilization and remove the deadlock. Recently, the proposed solution by Wang et al. [23] introduces a new and simple topology for interconnection networks. It includes some of the advantages of both the Mesh and Torus topologies by removing one link in each row (column) of the Torus to split the cycles to present TM. The cyclic channel dependencies of the Torus make some deadlock-free algorithms unavailable. Since efficient routing is significant for the topology performance, by removing one link in each row and column in TM, the cycles are eliminated, and by CONTACT Mohamed Othman mothman@upm.edu.my using virtual channels (VCs), the routing algorithm can be deadlock-free. The goal of this paper is to present the mathematical model for TM topology and provide a detailed comparative evaluation between the TM, Torus and Mesh topologies under different traffic patterns. TM topology is validated mathematically with considering the deterministic routing algorithm in a different simulation environment. We found a similar result with a difference in the packet injection rate. Eventually, the accuracy of the proposed mathematical model for TM topology is proved by the simulation results.

TM topology
In [23], the idea is to remove the cycles of each dimension of the Torus to propose TM topology. TM is derived from a Torus topology in which there is a link between node a = (x, y) and node b = ((x + 1)mod k, y), and a = (x, y) and b = (x, ((y ¡ 1 + k)mod k)). Then the links (a, b) and (b, a) are removed along dimension X, if (x + y + 1)mod k = 0. The links (a, b) and (b, a) along dimension Y are removed if (x + y)mod k = 0. If the original topology is rearranged, the node coordinates will be changed to new coordinates. A node a = (x, y) is mapped to (x, y) from the original topology to the rearranged topology if (x + y) < k or else node a is mapped to (x, y ¡ k). The TM topology can be divided into two subnetworks -UpTriangle and DownTriangle. A node belongs to the UpTriangle if its rearranged coordinate along dimension Y is no less than 0. A node belongs to DownTriangle if its rearranged coordinate along dimension Y is no more than 0. Figure 2 draws TM topology with two subnetworks.

Deadlock avoidance scheme
A deadlock-free routing scheme is important for a new interconnect topology. The proposed deadlock avoidance technique in [23] is the VC partitioning. In this technique, the physical network is divided into a set of independent directed-cycle free virtual networks. Two VCs are required to direct a channel from left (right) to right (left) and another channel down (up) to up (down). The only difference between the Mesh and TM is the VC selection for a packet. It is selected according to the coordinates of the source and destination.
According to the coordinates of the source and destination nodes, the four partitions of TM topology in Figure 3 are (1) x + y +, (2) x + y ¡, (3) x ¡ y +, and (4) x ¡ y ¡. To avoid deadlock and deliver the packets in TM topology when two nodes belong to the same triangles, partition x ¡ y + is selected for x s less than x d and partition x + y ¡ for x s greater than x d . If two nodes do not belong to the same triangles, there are two options. When y s is not greater than y d , partition x ¡ y + is selected while the source belongs to the UpTriangle and destination belongs to the Down-Triangle, or else partition x ¡ y ¡ is selected. Partition x + y + is selected if the source belongs to the Down-Triangle and destination belongs to the UpTriangle, or else partition x + y ¡ is selected. After selecting the compatible partition without deadlock, the routing algorithm is employed in TM topology to identify the path between packets from a source to a destination. It specifies the path selection from the source to the destination with a message. The presented routing algorithms in [23] are deterministic and a fully adaptive algorithm. In the deterministic routing algorithm, a single path is provided from the source to the destination, while there are different paths for packets in an adaptive routing algorithm from the source to the destination. The fully adaptive routing algorithm finds any possible minimal path. In this article, we apply the deterministic routing in Mesh, Torus and TM. It is a fast algorithm and performs well under uniform traffic assumption.

Mathematical model
To validate the TM, topology performance is the key reason that induced us to propose a mathematical model for TM topology in terms of the average delay as the goal of this paper. The average delay for interconnection networks is the sum of the average delay of network (S) and the average waiting time (W) of the source node. Also, the average degree of VCs at each physical channel scaled by a parameter (V) influences on the average delay [24]. Thus, Equation (1) shows

Assumptions
The mathematical model for TM topology utilizes the assumptions that are listed as follows: The model is restricted the attention to k-ary ncube networks which is referred to n as the dimension of the cube and k as the radix. (N = k n or n = log k N) Network size is assumed N = 8 £ 8. (k = 8 and n = 2) Message length is L flits and each flit is transmitted from source to destination in one cycle across the network. The traffic is generated across the network nodes independently and follows a Poisson process with a mean rate (λ). A generated message with probability h = 20% is directed to the hotspot node, and probability (1h) used for the other nodes, while it refers to regular messages.

Model derivation
In general, topological properties such as degree, diameter, average distance and cost are used to analyse an interconnection network performance. Degree is the number of links that are connected to a node. In tradeoff with degree, diameter is the maximum distance between any two nodes and the average distance is the minimum number of links between any two nodes in the network. Cost is the number of communication links that are required to build the network. The properties of Mesh, Torus and TM topologies have been presented in Table 1.
Considering degree, diameter, average distance and cost, TM topology provided better properties compared with the Mesh. However, the Torus has better average distance under the same evaluation compared with the TM due to being the wraparound links in the Torus. Furthermore, TM can be introduced as a costeffective topology because of removing several links at specific locations of Torus. Consequently, TM topology is considered to propose the mathematical model in this paper.
The average number of nodes along one dimension is defined as k that it is multiplied with number of dimensions, n, for the whole network and it is defined as d. Based on [25], the average number of nodes in a topology is introduced average distance, which can be different depends on the used topology. The number of nodes across the TM topology can be computed according to Equation (2), where k is the average distance of TM: Under uniform traffic patterns, the received injection rate of messages for each channel, λ u , can be found using In the presence of hotspot traffic pattern, the injection rate divided to the rate of regular messages and hotspot messages. This is because of message distribution that is not uniformly across the network. The injection rate for regular messages, λ r gains by For hotspot messages, hλ messages are generated with ph j probability in a cycle which shows the probability of using the channel to reach the hotspot node. To compute ph j , the combinatorial theory has been applied to find the following theorem.
Theorem 4.1: [24] The number of channels that are j nodes away from a given node in the k-ary n-cube is Theorem 4.1 is used to find the probability of using messages where C j is the total number of channels, l is the omitted channel and t is the number of nodes without the channel. The rate of hotspot traffic for each channel, which is j nodes away from the hotspot nodes, is λ h . Therefore, the injection rate for hotspot messages is given by Finally, the injection rate of messages along network at each channel is equal to the message length at the beginning under uniform traffic pattern. Under hotspot traffic, it is equal to the sum of the rates for regular and hotspot messages. The average delay of network is the sum of the average network delays S r and S h for regular messages and hotspot messages, respectively. In [24], the used notations to determine the quantities of S r and S h have been provided. Considering the average delay for messages, the average delay of network is The waiting time (W) for a message under traffic pattern is computed as W is a function of the average waiting time of the different possible values (1 j n(k ¡ 1)) and it is used to compute the average delay. However, there is a probability for each node that is j hop away from a hotspot node and it is multiplied to the waiting time to find W under hotspot traffic pattern.
Then the probability for each node that is j hop away from a hotspot node should be considered for the average degree of virtual channels, V. It should be noted, pv j is the probability to determine that the VCs are busy at the physical channel by using a Markovian model [26]. The bandwidth is shared to multiple VCs in each physical channel. The average of all the possible values at a given physical channel is Therefore, we presented all the equations for the mathematical model and this model is used to evaluate the performance of TM topology for minimum number of VCs.

Performance evaluation
In this section, we present mathematical and simulation results to evaluate the TM topology performance. Attention has been paid to the TM topology which is validated with mathematical model. This topology has considerable advantages to transfer packets and is better than the Mesh in average delay and throughput in the presence of different traffic patterns. The traffic patterns used are uniform and hotspot. In uniform traffic, a node sends the packet to another node with the same probability. For hotspot traffic, the pattern  used directs the source node to each hotspot node with h probability. In the following sections, the mathematical and simulation results have been discussed.

Mathematical results
The mathematical model can be introduced as a practical evaluation tool due to the simplicity of model. In the mathematical model, we analyse the Torus and TM topologies for different traffic patterns which is the main goal of this research. The mathematical model in this paper has been validated through the simulation. Similar assumptions are used for both simulation and model.
The mathematical model used to present the accuracy of the simulation performance TM topology. The obtained results in Figure 4

Simulation results
We simulate the Mesh, Torus and TM under uniform and hotspot traffic patterns. For router design, the dimension-order routing algorithm as deterministic routing has been selected. It is popular in massively parallel computers because of the minimal hardware requirements and its speed, as to be as fast as possible is important for the routing decision to reduce the network delay: Th ¼ Total received flits Number of nodes Â Total cycles (10) Network throughput and average delay are presented as two performance metrics for simulation in Equations (10) and (11). They measure in flits/node/ cycle for each traffic pattern. In the figures, the x-axis stands for the packet injection rate in flits/node/cycle and the y-axis stands for the average delay (D) in cycles or the throughput (Th) in flits/node/cycle.
We revisit the simulation of TM topology in different traffic patterns by Booksim2.0 [27]. It is a cycle-accurate interconnection network simulator that supports a wide range of topologies. The inputs of the simulator have been listed in Table 2. Figures 5 and 6 show the results obtained, while a deterministic routing scheme has been implemented in the Mesh, Torus and TM topologies. The simulation results in Figure 5(a) are obtained under uniform traffic. The average delay is reduced in the TM to less than 10% compared to the Mesh and Torus. TM topology is saturated at the similar and lower packet injection rate compared to the Mesh and Torus. Figure 6(a) shows that the average throughput is extremely close to each other and Mesh network performs poorly compared with the Torus and TM.
In hotspot traffic, a particular link experiences a much greater number of requests than the rest of the links. The obtained results under hotspot traffic are shown in Figures 5(b) and 6(b). The improvement is clear in the average delay and throughput for TM topology. In this traffic, TM has comparable average delay when the packet injection rate is 0.001 and higher. TM performs better than the Mesh in this injection rate. This improvement in the network is mainly from the short average distance and diameter, and the easement of the congestion in the hotspot area.

Conclusion
This paper presents a mathematical model for a new interconnection topology, TM. It is derived from a Torus by removing the cycles of each dimension with the virtual network partitioning scheme. The mathematical model is validated with the simulation results of TM topology in terms of the average delay under uniform traffic and non-uniform traffic which is introduced as hotspot. The TM was proven to be an attractive deadlock-free topology for the interconnection network. In this direction, we revisited TM topology with a lower injection rate in a new simulation environment as well. The topology was evaluated by proposing low average delay and improved throughput.
This research has been carried out to show the performance improvement of the network by reducing the average delay with a lower injection rate. The findings enhance our understanding of the benefits of the low diameter in the interconnection networks. For further research, a new topology is proposed to improve the performance of Torus and TM topologies and develop an efficient routing scheme. We will explore such a topology by sharing a number of nodes and strongly believe that our interconnection topology will improve the network performance. For evaluation, mathematical model will be used as well to validate the performance improvement in terms of the average delay.

Disclosure statement
No potential conflict of interest was reported by the authors.