Advances in spatiotemporal graph neural network prediction research

ABSTRACT Being a kind of non-Euclidean data, spatiotemporal graph data exists everywhere from traffic flow, air quality index to crime case, etc. Unlike the raster data, the irregular and disordered characteristics of spatiotemporal graph data have attracted the research interest of scholars, with the prediction of spatiotemporal graph data being one of the research hot spots. The emergence of spatiotemporal graph neural networks (ST-GNNs) provides a new insight for solving the problem of obtaining spatial correlation for spatiotemporal graph data prediction while achieving state-of-the-art performance. In this paper, a comprehensive survey of research on ST-GNNs prediction domain is presented, where the background of ST-GNNs is introduced before the computational paradigm of ST-GNN is thoroughly reviewed. From the perspective of model construction, 59 well-known models in recent years are classified and discussed. Some of these models are further analyzed in terms of performance and efficiency. Subsequently, the categories and application fields of spatiotemporal graph data are summarized, providing a clear idea of technology selection for different applications. Finally, the evolution history and future direction of ST-GNNs are also summarized, to facilitate future researchers to timely understand the current state of prediction research by ST-GNNs.


Introduction
Numerous non-Euclidean data emerge in our daily life from transportation networks to social networks, etc.The graph structure, with the natural applicability for describing non-Euclidean data has attracted increasing attention from scholars (Zhou et al. 2020).As a type of classic non-Euclidean data, the spatiotemporal graph focuses on the characteristic changes in time and space dimensions, represented in a graph data structure.Its irregularity and disorder characteristics present obstacles for traditional Euclidean neural networks to deal with.Processing while converting such data into traditional Euclidean data (generally raster data) would lead to the loss of some information, such as the topological relationship or the connectivity between points.
Thus, it is challenging to obtain better performance of prediction without spatial dependencies.Previously, the prediction of spatiotemporal graph data is only considered as a time-series prediction task focusing only on the temporal dimension, such as HA (Liu and Guan 2004), ARIMA (Williams and Hoel 2003), VAR (Tetteh-Bator et al. 2018) in statistical models; SVR (Smola and Schölkopf 2004), KNN (Zhang, He, and Lu 2009), Bayesian network (Sun, Zhang, and Yu 2006) in machine learning, etc. Thanks to the advantage of deep learning in dealing with high-dimensional features, RNN and CNN types have been used to build models specifically for time series prediction, such as RNN (Van Lint, Hoogendoorn, and van Zuylen 2002) (variants LSTM (Tian and Pan 2015)/GRU (Chung et al. 2014)), FC-LSTM (Sutskever, Vinyals, and Le 2014), TCN (Bai, Kolter, and Koltun 2018), WaveNet (van den Oord et al. 2016), etc.Although they boast the capacity of extracting capture time dependency on individual sensor, they ignored the dependencies between multiple sensors.Additionally, to consider the spatial dependencies in data, the CNN-based model has been utilized to mine it from raster which is transformed from graph, including Conv-LSTM (Shi et al. 2015), CLTFP (Wu and Tan 2016), ST-ResNet (Zhang et al. 2018), etc.However, since CNN, as a traditional Euclidean convolution, is incompatible for this kind of data in nature, the spatial dependency is partly lost in the data transformation, thus those models still fail to solve the problem fundamentally.
With the rapid improvement of computer hardware, many deep learning methods have been extended for graph data (Scarselli et al. 2009;Henaff, Bruna, and LeCun 2015;Li et al. 2018b).On this basis, a new class of aggregated neural network models has been developed, which is dedicated to graph data sampling, namely, graph neural networks (Wu, Pan, Chen, et al. 2020;Xu et al. 2020).In recent years, the introduction of graph neural networks has contributed greatly to many research fields, such as computer vision (Chen et al. 2019;Guo, Zhang, and Lu 2019), natural language processing (Guo, Zhang, and Lu 2019), biology (Duvenaud et al. 2015), and recommender systems (Ying et al. 2018), etc. Afterwards, scholars found that the ST-GNNs constructed based on graph neural networks can focus on the information transfer between nodes, which would be beneficial for extracting the spatial dependencies.These models have been demonstrated to achieve the better performance on spatiotemporal graph data prediction, especially compared with non-GNNbased prediction models.Moreover, with the increasing number of prediction tasks, ST-GNNs have become the mainstream prediction model for spatiotemporal graph data.
However, in recent years, ST-GNNs in prediction domain appears to be less explored and summarized, while there are some reviews on other application aspects of graph neural networks such as recommendation systems (Wu et al. 2022), power systems (Liao et al. 2022), knowledge graph (Arora 2020) and natural language processing (Wu et al. 2023), etc.Therefore, a comprehensive review of ST-GNNs in prediction domain is needed, which has been completed in this study, which would be beneficial for understanding the current state of research on ST-GNNs in time and avoiding duplication of research as much as possible.Our framework, as shown in Figure 1, can be divided into three parts, including the ST-GNNs prediction paradigm, the ST-GNNs evolution history and ST-GNNs future direction.Various aspects of each part would be discussed and analyzed in detail in the following parts.
An in-depth review of the development and trends of ST-GNN in the field of prediction will be provided, with the following contributions: . Firstly, the definition of graph notation, the introduction of graph convolution, and the paradigm of ST-GNNs are comprehensively introduced. .Secondly, 59 commonly used ST-GNNs frameworks, from the birth of ST-GNNs from 2017 to now, are reviewed.These frameworks with their datasets are summarized and classified from both the construction perspective and application perspective, which would be convenient for future researchers. .Thirdly, the evolution history and future direction of the spatiotemporal graph convolutional prediction model are summarized and analyzed, which would further investigate new schemes and avoid duplication of research.
. Different from other proposed reviews, not only the classification of models is given from the data and technical perspective, but also analysis and discussion of the changes have been made in research directions with examples.In addition, to enrich our review, the performance and efficiency of some models have been provided.

Definition
In this section, definitions of the common symbols used in the paper are provided (Table 1).
A spatiotemporal graph is a special kind of attribute graph structure, which is defined as G (V, E, A, X), where V∈R N represents the set of nodes; E∈R N×N represents the set of edges, i.e. the edges between spatiotemporal graph nodes; and the adjacency matrix is denoted by A∈R N×N .The graph network G represents the relationship between nodes in the spatial dimension, which can be directed or undirected, as shown in Figure 2. The graph feature matrix X t ∈R N×F represents the F feature values detected at each node N on the graph G at moment t.Along with the alterations of time dimension, the spatiotemporal features of the matrix X change dynamically.Suppose we  have h historical time intervals with a feature matrix X t for each moment t, the historical data can be denoted as H = {X t-h+1 , X t-h+2 , … , X t }, and the data being predicted can be denoted as P = {X t+1 , X t +2 , … , X t+p }.

Spatiotemporal graph neural network evolution
In this section, the characteristics of spatiotemporal graph data and the evolution of ST-GNN is briefly introduced.Non-Euclidean data is a class of data without translational invariance (Wu, Pan, Chen, et al. 2020;Xu et al. 2020), which may have different numbers of neighboring nodes for given node.Therefore, it may require a unique convolution (sampling and aggregation) approaches to extract the features from those neighbors.Generally, such data are represented by traffic networks, social networks, chemical molecular structures, etc. Different from Euclidean data, they lose edge information if converted to raster data, thus, they are more suitable for expressing structural information into graphs (Bruna et al. 2014;Defferrard, Bresson, and Vandergheynst 2016;Kipf and Welling 2016a).Meanwhile, having a temporal dimension in most cases, such data is also called spatiotemporal graph data.
In the past, for prediction research of such data, extracting temporal correlation has been the core segments for prediction models.With the advent of the deep learning boom, there have been more research results on temporal prediction models (Chung et al. 2014).They are all more able to achieve the capture of temporal correlation and show different advantages in terms of performance and efficiency.In the subsequent prediction work, the importance of spatial correlation has been gradually realized and attempts to capture it have been made.For example, Conv-LSTM (Shi et al. 2015), they all added the capture of spatial features and improved certain prediction accuracy (Liu et al. 2017).However, as none of the deep learning methods is well suited for Euclidean space, their capturing results are not satisfying (Zhao et al. 2020).
It was not until the addition of graph neural networks that spatial correlation was captured in its entirety.Since it emerged, ST-GNNs have solved the above difficulty because of learning more completed potential patterns from spatiotemporal graph data.It combines the above time series model with the graph neural network to extract the completed temporal correlation and spatial correlation from such data, respectively.In terms of the spatial dimension, due to the rapid development of graph neural networks, the correlation methods become more diverse.Diverse graph neural network methods are applied by scholars in prediction domain, yet they can generally be divided into two types: spectral domain convolution and spatial domain convolution.Spatial domain model, known as message-passing neural network (MPNN) Battaglia et al. 2018Gilmer et al. 2020(Gilmer et al. 2020;Battaglia et al. 2018), performs the convolution by aggregating the neighborhood feature along graph structures.In contrast, spectral domain models utilize spectral theory to transform the graph convolutions into products of the frequency domain (Chen, Chen, Zhang, et al. 2020).Spectral domain convolution, for example, is proposed in Spectral Graph CNN (Bruna et al. 2014) (SCNN), which turns the convolution process into a product of signals through Laplace transform before converting back to the original spatial features using Fourier inversion to complete the convolution, solving the problem of difficult convolution on non-Euclidean data.In comparison, Chebyshev Spectral CNN (Defferrard, Bresson, and Vandergheynst 2016) (ChebNet), uses Chebyshev polynomials to approximate the variation of the graph Laplacian matrix of SCNN, thus achieving error optimization.In the meantime, the Graph Convolutional Network (Kipf and Welling 2016a) (GCN), simplifies the previous approach by restricting the convolution kernel to convolve in the first-order neighborhood of each node.The most active spatiotemporal graph convolutional network is GCN (Kipf and Welling 2016a), whose most representative spatiotemporal prediction models are STGCN (Yu, Yin, and Zhu 2018), T-GCN (Zhao et al. 2020), etc. GCN has become the first choice for most models to capture spatial features with low time training cost and relatively good performance.
In contrast to spectral-domain graph convolution, spatial-domain graph convolution has become a focus of interest because of the flexibility of the method by performing sampling aggregation directly on the graph.Graph Attention Network (P.Veličković et al. 2017) (GAT) uses attention mechanism to calculate the hidden value of each node in the graph by aggregating the signal values of neighboring nodes.GAT reduces the dependence on the adjacency matrix compared with GCN, focuses more on the node information itself, has better performance, and is also used by many scholars.However, it is not widely used because of a much larger memory and time cost comparing to that of GCN.In addition, other networks that have the same starting point as graph attention networks and perform convolutional operations from the spatial domain include Graph CNN (Hechtlinger, Chakravarti, and Qin 2017), Graph Sage (Hamilton, Ying, and Leskovec 2017), Graph Embedding (Wang, Cui, and Zhu 2016;Cao, Lu, and Xu 2016;Kipf and Welling 2016b), etc.Currently, they are more frequently employed in domains like recommendation systems, human skeleton-based action recognition than for prediction.In any case, their appearance has enabled the capture of non-Euclidean spatial features that were difficult to capture with Euclidean convolution in the past.
Since a large number of ST-GNN models have emerged recently, time-series model and graph neural networks, as well as analysis on their advantages and limitations will be introduced in detail in the next section.

Composition classification of spatiotemporal graph neural network on prediction domain
This section summarizes the ST-GNN models in recent years, and analyzes and classifies the models from the perspective of model construction.
A complete ST-GNNs generally includes two parts: the temporal extraction part and the spatial extraction part.In order to give a quicker understanding of what is in the two parts of the currently available frameworks, we classify them separately and discuss the advantages and drawbacks of such models.
For one thing, the temporal extraction part is divided into three categories, RNN-Based framework, CNN-Based framework, and Attention-Based framework.For another, the spatial extraction part is divided into two categories, namely, Spectral domain graph convolution and Spatial domain graph convolution.

RNN-based model
RNN-Based model refers to the approach of extracting temporal features in spatiotemporal graphs mainly by RNN or its variant models.RNN (Medsker and Jain 2001) was first applied in NLP field, and was migrated to temporal prediction due to its natural strong sequence processing ability.However, due to its model construction, RNN iterates too much redundant temporal information, which causes the reduction of subsequent stacked information and eventually leads to the gradient disappearance problem (Hochreiter and Schmidhuber 1997).Therefore, its prediction effect is not satisfying.
The most common variants of RNN, LSTM and GRU, solve the above problems.LSTM (Hochreiter and Schmidhuber 1997), which controls the retention or forgetting of information at each step through a gating basis, avoids the problem of RNN iterative redundancy.Moreover, it is also divided into main and compound threads to handle global and local information separately, which can better handle long-time sequence problems.However, LSTM also results in the problem of long training time and difficult convergence.The birth of GRU (Chung et al. 2014) solves its problem by simplifying the triple gating of LSTM into two gates, the update gate and the input gate, respectively.Compared with LSTM, GRU is able to converge faster during training and is relatively easier to train, though the overall performance is nearly the same as LSTM.They are both able to dynamically learn the extracted spatiotemporal features based on the change of temporal patterns, thus capturing dynamic spatiotemporal features.Among ST-GNNs, the most representative work of RNN-Based is DCRNN (Li et al. 2017).
Furthermore, to be able to handle information for a longer time, Seq2Seq (Liu et al. 2017) is also used in multi-step prediction.Seq2Seq is an Encoder-Decoder model based on the LSTM or GRU composition, which can handle the problem of mismatching the number of time input and output steps.It encodes the historical traffic data into a potential spatial vector by encoder, before feeding the potential vector into the decoder to generate the future predicted condition.
Due to its excellent dynamic time processing capability, LSTM, GRU, and Seq2Seq have become the most popular choices among spatiotemporal graph prediction components.Among them, onethird of the models choose RNN-Based models as shown in the table below.However, the RNN-Based model also has disadvantages.Since the cell parameters of each cycle are shared, it is impossible to make a discrimination based on the strength of the degree of variation of the data, which results in the limitation of their models.

CNN-based model
CNN-Based model refers to the method of extracting temporal features in spatiotemporal graphs mainly by convolutional neural networks, which are widely used in fields of image (Zheng, Yang, and Tian 2018) and speech (Khan et al. 2020).1DCNN is used for time series prediction because of its excellent and flexible feature extraction ability.Compared with the RNN class, 1DCNN has also attracted the attention of scholars owing to its simple structure, fast training, and free from redundant information (Kalchbrenner et al. 2016).However, it was found that 1DCNN does not perform well during practical applications due to the lack of historical time information (Kalchbrenner et al. 2016).For this reason, some scholars improved 1DCNN then became causal convolution, hoping it could learn the direction of future information from historical information (Bai, Kolter, and Koltun 2018).Then, in order to allow the convolution kernel to extract longer historical information, scholars increased the perceptual field of the causal convolution by expanding the convolution to compensate for its existing shortcomings (Bai, Kolter, and Koltun 2018).However, the features of the spatiotemporal graph have the distinction between the strength and weakness of change in the process of data alteration, so WaveNet (van den Oord et al. 2016) designs a more flexible and lighter gating suitable for CNN by referring to the gating mechanism of RNN, so as to control the strength of information flow.It is also widely used and proved that it can be more effective than LSTM/GRU in the application (Bai, Kolter, and Koltun 2018).
Therefore, CNNs are gradually increasing in the application of ST-GNNs as the research continues to advance.However, it is identified that less than one-third of the models chose CNN-Based models, which is less than expected.This is most likely due to two reasons: firstly, the attention mechanism has better performance and attracts more people to study it, which affects this balance; secondly, it is caused by the fact that CNN models have their parameters fixed after training and do not follow the changes of data, lacking dynamic temporal characteristics.

Attention-based model
Attention-Based model refers to the method of extracting temporal features in spatiotemporal graphs mainly through the attention mechanism.The attention mechanism is the attention to the input weight assignment, which usually existed in the encoder-decoder.The attention mechanism obtains the input variables of the next layer by weighting the hidden states of all-time steps of the encoder.Compared with CNN and RNN, it has a more comprehensive ability to have a larger perceptual field than CNN and a faster convergence speed than RNN.In particular, Transformer (Vaswani et al. 2017) has emerged to surpass the performance of nearly all temporal models, so many models are built based on its approach.Attention mechanism can follow the data to produce real-time changes and express the strength of information in the form of scores, which is more suitable for the task of real-time changes like spatiotemporal graph prediction.Moreover, there is currently a great deal of attention to the Attention mechanism in many fields, and there is a great variety of ways that win it attention scores (Vaswani et al. 2017;Feng et al. 2017;Du et al. 2017;Arnab et al. 2021).This allows researchers in the field of spatiotemporal graph prediction to have more research base, then select or improve it.However, the Attention mechanism is not perfect and may sometimes cause over-fitting and excessive computational complexity, which may be more demanding on the computer (Dong, Cordonnier, and Loukas 2021).Therefore, instead of using Attention alone in many frameworks, it tends to be utilized together with CNN or RNN, so as to complement each other.

Spectral domain graph convolution
Spectral domain graph convolution refers to the graph convolution method that changes the data into the spectral domain through graph theory and convolution theorem (i.e.Hadamard product after Fourier transform) (Zhou et al. 2020;Chen, Chen, Zhang, et al. 2020).It transforms all node features into a weighted sum of different feature vectors via the graph Fourier transform, which naturally captures global information (Bo et al. 2023).Compared to the spatial field, it can obtain a larger receptive field, but sacrifice some of their flexibility in computation.In the applications related to ST-GNN, local feature is needed more than global feature due to spatial auto-correlation.Therefore, they often need to be restricted to the receptive field to achieve simple and efficient computations.In terms of interpretability, spectral domain model are more interpretable models (Bo et al. 2023).Among them, the graph filters can directly explain the most important frequency information related to the label, such as low, medium and high frequencies.
Its representatives include SCNN (Bruna et al. 2014), ChebNet (Defferrard, Bresson, and Vandergheynst 2016), GCN (Kipf and Welling 2016a), etc.It has a solid theoretical foundation, but still needs to load the whole graph structure in the convolution process.Meanwhile, the efficiency and flexibility of the processing effect of large graph structures are relatively poor.The most active spatiotemporal graph convolutional networks are GCNs (Ye et al. 2020) due to their low time training cost and relatively good performance.In the prediction process, they are less capable of handling dynamic spatial information due to their fixed and shared convolutional kernels.Therefore, in subsequent research species, ST-GNNs mainly in the spectral domain almost always need to add some techniques that affect the dynamic changes of information or graph structure to eliminate their defects.

Spatial domain graph convolution
Spatial domain graph convolution refers to the graph convolution method that performs convolution-like operations directly on the spatial domain (Zhou et al. 2020).It is based on the idea of traditional convolutional neural network operators and is developed to be suitable for sampling and aggregation of graphs (Chen, Chen, Zhang, et al. 2020).Compared with spectral domain models, the spatial domain models aggregate nodes features layer by layer.These nodes can only capture features within a fixed distance, emphasizing local information, and thus give more freedom and flexibility in computation (Bo et al. 2023).The spatial domain models involve node-based updates in which gradients flow only between connected nodes, but there is an upper limitation on their expression.In terms of interpretability, these models require an interpretation strategy after computation, aiming at revealing the most important structures relevant to the prediction results, such as nodes, edges or subgraphs (Bo et al. 2023).
Its representatives include GAT (Veličković et al. 2017), Graph Sage (Hamilton, Ying, and Leskovec 2017), DCNN (Atwood and Towsley 2016), Graph Embedding (Wang, Cui, and Zhu 2016;Cao, Lu, and Xu 2016;Kipf and Welling 2016b) etc.Compared with the spectral domain graph convolution, it is more flexible and intuitive to convolve.For example, GAT has been used in several fields and sometimes in time series prediction.The score of GAT changes in real time based on the data, which is also more suitable for the data changing in real time (Guo and Yuan 2020).In addition, it also learns the weights and hidden spatial heterogeneity from the data thanks to the limited reliance on the graph structure (Li and Lasenby 2021).However, currently, there are somewhat fewer spatial domain graph convolutions for prediction compared to the spectral domain as they require more computational resources and training time than GCN (Wu, Pan, Chen, et al. 2020), which calls for higher computer performance and has led some scholars to abandon their use.
In addition, it has been demonstrated in recent study that the spatial domain model can also be transformation into a spectral domain model by approximation theory (Chen, Chen, Zhang, et al. 2020).Only the previous classification is studied in this paper, not after conversion.

Other domains
In addition to the above frameworks, there are a small number of MLP-Based (Oreshkin et al. 2021) and GCN-Based models (Bai et al. 2019;Song et al. 2020;Li and Zhu 2021;Fang, Prinet, et al. 2021a), which are currently more inferior to the above frameworks.Among them, MLP-Based is similar to the CNN-Based, it is more tightly connected than CNN-Based, and the number of parameters is larger and harder to train.The GCN-Based model is more novel, and it extracts temporal features and spatial features simultaneously with GCN.They are more able to cope with feature extraction of inhomogeneous spatiotemporal variables, i.e. spatial heterogeneity is considered.However, it still has some room for development, such as the different degrees of feature association in time and space, and the difficulty to distinguish the difference of each side with the same GCN.
Most of the models are combined in a stacked, side-by-side or fused manner by the above two types of models, respectively.Table 2 presents a list of the model name, the time when the model was proposed, the classification of the model in time and space, the technique applied as well as the type of graph structure in the process, respectively.
It is worth mentioning that the above classification of graph structures refers to the types of graph variations during the ST-GNNs training.It is divided into three categories, namely static, dynamic, multi-scale and adaptive graphs.The static graph denotes the graph structure defined in advance, mostly based on the distance or the number of edges connected in the graph.Then, the dynamic graph denotes the graph structure that is dynamically changed during the training and testing process, considering the predefined graph and influence by the time-series changes.Since spatiotemporal graph data are dynamic in time and space, dynamic graph structure is more suitable than static graph.The multi-scale graph denotes the joint use of several different graphs in one model.Different graphs may mean graphs with different semantics, graphs with different spatial scales, or multiple subgraphs.The type of graph structure enables ST-GNNs to extract a comprehensive spatial pattern by obtaining more spatial features.Unlike other types, the adaptive graph denotes the dynamic graph that does not require predefined adjacency matrix, which is trained directly by learnable node embeddings.Specifically, node embeddings are twodimensional, learnable vectors in which one dimension represents the number of nodes, and the other represents the dimension of the embedding itself.The adaptive adjacency matrix typically consists of two node embedding vectors that undergo matrix multiplication.One vector is used to represent the strength of each node's inflow, while the other represents the strength of its outflow.Adaptive graphs mostly train those parameters in terms of node embedding, which can be utilized without predefined graph structures and have the same effect as dynamic graphs.
From these above studies, we find that scholars in early studies have focused on extracting the complete spatiotemporal correlation in spatiotemporal graph data, and the techniques used are mainly GCN, RNN and CNN.Later, more attention was paid to the capture of dynamic spatiotemporal patterns based on the characteristics of real-time changes of data in order to improve the effectiveness and accuracy of prediction.Until today, dynamic spatiotemporal correlation is still the core content of such research.In addition, the attention score of attention mechanism will Multiscale change in real time along with the alteration of data, resulting in the increasing number of frameworks based on attention mechanism.Moreover, synchronous capture of spatiotemporal patterns has been gaining attention recently.Such framework is able to capture dynamic changes, avoiding the asynchronous problem that occurs when capturing correlations alone, while further capturing the spatiotemporal heterogeneity among data.To better understand the above, we will illustrate with an example of each in Section 3.1.

Spatiotemporal graph data classification
This section reviews the research hotspots of spatiotemporal graph data, and summarizes and compares the research tasks with commonly used datasets, taking traffic flow prediction as an example.
Finally, the research hotspots of spatiotemporal graph prediction are explored.
While summarizing the model construction methods, we also found that the prediction research of ST-GNNs can be divided into three major categories from the application point of view: dense data prediction, sparse data prediction and similar dense data.The details are shown in Table 3.

Dense spatiotemporal graph data
Dense spatiotemporal graph data prediction mainly includes the following aspects, such as traffic flow prediction, traffic speed prediction, PM 2.5 prediction, air quality index prediction, etc.These data are generally characterized by the fact that their data are continuous all the time and almost all of them are not zero.This makes them not subject to the prediction difficulty problem caused by zero-value inflation, and only the magnitude of prediction error needs to be considered.In the whole application of ST-GNNs, they apply the most mature and novel techniques, such as graph pooling (Yu, Yin, and Zhu 2019;Guo, Hu, et al. 2021b), spatiotemporal synchronous graph convolution (Bai et al. 2019;Song et al. 2020;Li and Zhu 2021;Fang, Prinet, et al. 2021a) or data-based adaptive graph generation (Bai et al. 2020;Wang et al. 2023;Wang and Jing 2022).In particular, most traffic prediction data are publicly available and well-recognized, such as PeMS-Bay, METR-LA, etc.Many scholars tend to use them to prove the superiority of their models against the latest baseline models.We summarize their applied datasets to facilitate future researchers to make a fair comparison with other models.

Sparse spatiotemporal graph data
Sparse spatiotemporal graph data prediction mainly includes the following aspects, such as crime case prediction, infectious disease prediction, etc.A common characteristic of all such data is that their data values are zero in most cases and only have values within some time points.This type of model must be aware of the problems arising from zero-value inflation.Usually, the model will filter out these points with values in case of abrupt changes because of the loss function and the model components.However, for this type of data, if the prevailing model is followed, their predictions are essentially zero.Therefore, more significant learning is called for when it comes to non-zero points.The current situation is mostly improved by using reinforced loss functions (Zhang and Cheng 2020;Wang, Lin, et al. 2022;Sun et al. 2021), or more sensitive model components (Zhang and Cheng 2020;Wang, Lin, et al. 2022).We likewise summarize the datasets they use to facilitate fairness for future researchers.Currently, there are some publicly available crime datasets, such as Chicago crime dataset, New York City crime dataset, San Francisco crime dataset, Boston crime dataset, Los Angeles crime dataset, Brazil crime dataset, and London crime dataset, etc.Among them, the Chicago dataset is one of the most widely used datasets in prediction domain as it provides data for a longer time span (from 2001 to now) and with more details.

Similar dense spatiotemporal graph data
Similar dense spatiotemporal graph data mainly includes the following aspects, such as flood forecasting and power forecasting.This type of data differs from the above two types of data in that they  are zero in some time periods, but exhibit the characteristics of dense data in another period.Therefore, it is common for scholars to remove the segment that is zero and integrate the intensive data to predict the specifics of the intensive moments in the future.The specific technique is nearly the same as that for intensive data, with no obvious differences.
In addition, ST-GNNs can be applied to many scenarios, such as vehicle behavior classification (Mylavarapu et al. 2020), optimal dynamic electronic tolling collection (DETC) schemes (Qiu, Chen, and An 2019), path availability (Li et al. 2019), and traffic signal control (Nishi et al. 2018).In the future, we believe there are more directions to be explored, such as the study of prediction of multi-year ground settlement changes, the study of pollution dispersion rates, or the study of airway traffic prediction.Although the research datasets and models still need to be collected and explored, it proves that spatiotemporal graph data are widely available in our life and the results may have a significant impact on people.Some links to above-mentioned datasets have been surveyed, please refer to Table A2 in the Appendix for details.

Performance and efficiency of spatiotemporal graph neural network
This section describes the performance and efficiency differences between the various models.

Performance comparison
Since different models have different validation datasets, making it more difficult to compare.Therefore, we selected one of the most used datasets, i. e. METR-LA, and collected the performance of the above models over different time spans (15, 30 and 60 min), as shown in Table 4.These results are cited from literature (Kang et al. 2019;Li et al. 2022;Yu, Yin, and Zhu 2019;Chen, Chen, Lai, et al. 2020;Park et al. 2020).Due to the differences in computing environments, the problem of fairness needs to be considered.Therefore, the following conditions have been proposed.If their evaluation results of benchmarks are the same or less different (no more than 5%) from the original paper, it indicates that their proposed model defaults to a fair comparison.
In terms of model performance, we found that the temporal layer selection using CNN-based model generally performs better than RNN-based model, e.g.GCRNN, DCRNN compared to STGCN.However, with the addition of the dilation convolution, there is a significant improvement in prediction, especially a higher improvement in long-term prediction, e.g.STGAT, Graph Wave-Net have higher performance than DCRNN, especially at 60 min interval.Meanwhile, the utilization of temporal attention mechanisms both have better performance than RNN-Based and CNN-Based model in general, but they also have different effects depending on the different types of attention they use.For example, ST-GART has better results in short-and mediumterm prediction, and GMAN and CDGNet have better results in medium-and long-term prediction.
Moreover, GCN and ChebNet are commonly used to capture spatial features in the spatial construction, while the diffusion effect, which is unique to DCNN, turns out to be more suitable for the prediction of spatiotemporal graph data than GCN.For instance, DCRNN outperforms GCRNN in all time spans.A few others select to utilize the temporal attention mechanism or spatial attention mechanism (such as GAT), which consider the spatial heterogeneity in the model construction, and thus providing the better performance, such as ST-GAT, ST-MetaNet, ST-GART, GMAN, AutoSTS.

Efficiency comparison
It is rather challenging to collect the computing time for all models under a uniform standard since the models do not utilize the same data set and different computer performance can introduce large time differences.Therefore, we only counted the computing time of some of the models, as shown in Table 5.These results are cited from (Lee et al. 2021), whose computing environment is Intel Xeon 5120 CPU, 394 GB of RAM, and eight Nvidia Titan RTX GPUs.
We found that CNN-Based models generally have a fastest inference, e.g.STGCN, Graph Wave-Net.The next faster ones are the GCN-Based STSGCN and STG2Seq, which benefit from the short training time of GCN.The slower ones are the RNN-Based models, which are DCRNN, ST-Meta-Net.Since RNNs are essentially fully connected layers with gating, their dense connection will take longer to train.The longest duration is the Attention-Based model, such as GMAN, because the attention mechanism generally has to calculate the score of each position relative to other positions, which will be slower than RNN.A more specific example is ASTGCN, which is CNN-Based, but it incorporates partial Attention and computes multiple time slices in the form of multiple components.This brings its training speed down to the RNN-Based level.
The ones without specific results are classified according to the complexity of the model construction.Based on the analysis of the efficiency of above models, we classify CNN-based as a fast class model, RNN-Based as a medium speed class model, and Attention-Based as a fast model based on the above-mentioned perspectives.To minimize possible mistakes, we also compare whether they are more complex than the model constructions of DCRNN and STG2Seq to fine-tune their specific classifications.It is worth noting that the graph attention network is also considered as a type of attention, and thus will be divided into medium or slow speed.In addition, models like ASTGCN, GSTGCN, and Dynamic GRCNN that employ multiple components will have substantially increased computing time, and thus will default to one level lower when classifying.
As presented in Table 6, the table contains the investigated models to facilitate future researchers to understand the efficiency of the current models in order to develop models that combine performance and efficiency.

Prediction processes of spatiotemporal graph neural network
This section details the application of ST-GNN in prediction domain by process stages with reference to the application process of prediction.
ST-GNN in prediction domain can be generally divided into the following four stages in terms of application process, which are data pre-processing phase, model construction phase, model adjustment phase, and model evaluation phase.The flow chart is displayed in Figure 3. 2.6.1.Preprocessing phase Many problems arised in the process of collecting raw data from sensors, such as data anomalies, data loss, etc.Therefore, pre-processing is required before putting into the model for prediction.Pre-processing consists of: data outlier removal, filling of vacant values, and normalization.In some models (Zhao et al. 2019) data outliers are also added for the purpose of verifying the robustness of the network.Let's take the pre-processing method of ASTGCN (Guo et al. 2019) as an example and describe it in detail: first, it aggregates the raw data in order to be able to put the data that changes in real time into the model.Then, it ensures that the processed data are aggregated from the raw data every five minutes, and a total of 288 data points are obtained every day.Afterwards, to reduce unnecessary calculations, redundant sensors were removed and ensured that the distance between each adjacent detector was greater than 3.5 miles.Next, it linearly interpolated the missing values of the data to fill them with vacant values and reduce the curve mutation.In addition, the data were normalized so that all data values were between [−1, 1] and the mean of the data set was zero.

Model construction phase
In the model construction phase, all networks consider how to extract the required spatiotemporal correlations from the spatiotemporal graph data.As the name implies, spatiotemporal correlation contains both temporal and spatial correlation.Generally speaking, spatiotemporal correlation needs to be extracted in two partsi.e. the temporal layer is needed to extract temporal correlation and the spatial layer to extract spatial correlation.Among them, temporal correlation is mostly extracted using methods such as CNN, RNN, and attention mechanism, while spatial correlation is mostly extracted using methods such as GCN and graph self-encoder.
Those models were classified as a whole into two categories according to the method of model construction.One category is the phased stacking type model, and the other is the overall spatiotemporal block model.
Among the phased models, the temporal and spatial layers are stacked separately.Its advantage is that the number of stackings of temporal and spatial layers can be flexibly controlled to get the optimal prediction model.However, if the stacking is not done carefully, it is likely to lead to the disconnection or even fragmentation of temporal and spatial features.
The overall spatiotemporal block model, in comparison, integrates the temporal and spatial blocks into one spatiotemporal block.Meanwhile, the spatiotemporal blocks are stacked in a certain number to obtain enough high-level spatiotemporal features to get the optimal prediction results.Its advantage is that the temporal and spatial layers are stacked alternately, which does not lead to disconnecting the spatial features from the temporal ones.Although it is difficult to have flexible control on the number, the overall spatiotemporal block model is a little more numerous because the current stacking demand for temporal layers is not much different from that of spatial layers.
Besides the above construction methods, researchers can also choose to perform simultaneous extraction of temporal and spatial features based on GCN.In this type of model, the relevance between temporal and spatial feature needs to be focused on.It is due to the possibility of two type of feature mismatch if features are extracted separately in the temporal and spatial dimensions, especially when temporal anomalies and spatial heterogeneity are evident.This pattern is also termed as spatiotemporal heterogeneity (Song et al. 2020), i.e. there will be differences between patterns in different locations at different times.

Model training and adjustment phase
After constructing the model, the proposed model needs to be trained and tested.Thus, we give the commonly pseudocode for ST-GNNs training to helps future researchers quickly build pseudocode for your own models.
Algorithm 1 The learning algorithm of ST-GNNs.1: Input: 2: Historical spatiotemporal graph data: H = {X t-q+1 , X t-q+2 , … , X t } 3: The initialized temporal component of ST-GNN model: T(•) 4: The initialized spatial component of ST-GNN model: S(•) 5: Predicted step: p, Historical step: q, Adjacency matrix: A, Learning rate r 6: Prediction labels: P = {X t+1 , X t+2 , … , X t+p } 7: Output: 8: ST-GNN with learned parameters: Θ 9: Prediction results: 15: compute the error between prediction and labels L = loss(Y out , P) 16: update model parameters Θ according to their gradients and learning rate r 17: end for 18: repeat 19: Randomly select a batch of samples and optimize the parameters by optimizer to minimize L 20: until convergence Most of the time the initial training results are not satisfactory, and the model needs to be adjusted by the validation set accuracy to complete multiple training improvements.In the early stage of adjustment, the composition related to the important modules and the adjacency matrix type (graph type in Section 2.3) needs to be focused more.For example, in temporal terms, RNN-Based or Attention-based modules may be able to obtain longer temporal dependencies than CNNbased, resulting in better accuracy for long-term prediction (Zhao et al. 2019).In spatial terms, an adjacency matrix that can change dynamically over time may lead to better results for certain peaks or valleys than a static adjacency matrix (Guo et al. 2019).Once the core architecture algorithms have been determined, we may need to fine-tune the hyperparameters of the modified model to achieve optimal results.These adjustments may include: (1) tuning the number of spatiotemporal blocks (layers) to maximize the depth and breadth of features extracted by the model, (2) optimizing the size of the convolution kernel and loss function to ensure that the model is appropriate for the specific application scenario, and (3) introducing additional algorithms to address potential limitations of the model while preserving its core architecture.Ultimately, the purpose of model adjustment is to enhance the predictive ability of the model and achieve better performance.

Model evaluation phase
After the model is adjusted, some metrics was demanded to evaluate our model.The common metrics used in the researches are root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), Accuracy, coefficient of determination (R 2 ) and explained variance score (var), etc., as shown in Eq. 1 to Eq. 6.The smaller the root mean square error, the smaller the mean absolute error and the smaller the mean absolute percentage error, the better the model effectiveness; the larger the accuracy, the coefficient of determination and the explained variance score, the better the model effectiveness.
where the Y t denotes the ground truth, Ŷt denotes the prediction of model.

Model evolution history
In the prediction task, how to obtain the time dependence and spatial dependence is then the core of the model.After years of research by scholars, many models have also been born.In order to obtain better prediction results, the research direction has slowly changed from the acquisition of complete spatiotemporal correlation to dynamic spatiotemporal correlation, and then to spatiotemporal simultaneous correlation.In this section, the change of models in recent years will be explained from the change of ideas of capturing spatiotemporal correlation, before the advantages and disadvantages of various models are analyzed, as listed in Table 7.

Capture of complete spatiotemporal correlation
In earlier ST-GNNs, there has been sufficient experience on capturing temporal correlation.For the capture of spatial correlation, it has also been studied but it is difficult to satisfy the accurate mining of spatiotemporal graph data due to the Euclidean convolution itself.The birth of graph convolution has made it possible, and we take STGCN (Yu, Yin, and Zhu 2018) as an example to explain its capturing process, as shown in Figure 4.
In the framework of STGCN, the temporal and spatial blocks are stacked in a 'sandwich' structure.The temporal block adopts a 1DCNN with GLU gating mechanism to capture temporal correlation and control the passage of effective information to obtain temporal correlation; the spatial block adopts a GCN approach to transform spatial domain information into spectral domain for convolution operation to obtain effective spatial correlation.The overall design balances efficiency and performance, and is one of the most classic baseline models in spatiotemporal graph prediction research.However, it should be noted that the model still has some shortcomings.For example, the convolution kernel of CNN is too small leading to a restricted perceptual field, thus limiting the poor long-term prediction.Another example is the fixed parameters of the convolution kernel of CNN, leading to the fixed final extracted features in the test set and unable to follow the changes of the data, resulting in the lack of dynamic changes of prediction results based on the data itself.Therefore, plenty of new frameworks awaits future exploration considering the space that can be improved.
To clearly see the enhancements, we have selected two most classical frameworks of such models, STGCN and DCRNN, as representatives of specific performance comparisons.As can be seen from Table 8, compared to the LSTM (without considering spatial features), DCRNN improves the accuracy by approximately 12% and STGCN by approximately 14%.This proves that it is necessary to consider spatial features in the prediction process.

Capture of dynamic spatiotemporal correlation
The existing prediction effect obtained via the complete spatiotemporal correlation appears unsatisfying, which drives further study on the changing characteristics of the data, that is, the process of changing its high-dimensional features, more specifically, dynamic spatiotemporal correlationfrom the real-time changing characteristics of spatiotemporal graph data.This research continues to this day, and a large number of models have been born in the process.Among such models, the attention mechanism plays a pivotal role, and we take ASTGCN (Guo et al. 2019) as an example to introduce its detailed capture process, as shown in Figure 5.
In the framework of ASTGCN (Guo et al. 2019), it is stacked in spatiotemporal blocks and modeled with three sets of correlated temporals (recent period correlation, daily period correlation, and weekly period correlation) data.An attention mechanism is added within each spatiotemporal  block in addition to causal and graph convolution.This attention mechanism (Feng et al. 2017) first originated in the field of NLP for intra-sentence inter-word or inter-sentence relationship extraction.It is used in this framework to extract in real time the proportional weight of each moment or location, relative to the overall framework, so that the acquired correlations act on the prediction results in a dynamic form.They validated this on two highway datasets and the addition of dynamic correlation did result in improved performance.The accuracy improvement is larger compared to other ST-GNNs without the addition of dynamic correlation ideas, demonstrating the need for dynamic temporal correlation capture.
To clearly see the enhancements, we also selected two popular frameworks of dynamic models, ASTGCN(r) and LSGCN, as representatives of specific performance comparisons.As can be seen from Table 8, compared to the DCRNN and STGCN, ASTGCN(r) improves the accuracy by approximately 12%, while LSGCN has a 4-9% improvement.Such a big difference comes from the fact that the two models employ different techniques in dynamic capture.Both models add attention, but LSGCN builds on this by adding a gating mechanism and proposing a controlled attention that controls the flow of information.Therefore, LSGCN will have relatively better results.
Therefore, it is clear from the above analysis that different ways of dynamic model capture still cause differences in the results.This is the reason why there are many such models yet the number of further research continues to increase.

Capture of synchronous spatiotemporal correlation
In the latest research, we found that in addition to networks focused on acquiring dynamic spatiotemporal correlations, networks from the perspective of integrated graph convolution have emerged, such as STG2Seq (Bai et al. 2019), STSGCN (Song et al. 2020), etc. Usually spatiotemporal correlations are acquired separately, which fragments the process of acquiring temporal and spatial correlations.In these studies, however, they focus on a spatiotemporal block that synchronously captures local spatiotemporal correlations while considering spatial heterogeneity, thus further improving the prediction accuracy.We take STSGCN (Song et al. 2020) as an example to introduce its detailed capturing process.As shown in Figure 6, in the framework of STSGCN, only some number of STSGCM blocks are stacked, except for the fully connected layer that adjusts the dimensionality and relies solely on it to obtain spatiotemporal correlations.STSGCM contains two graph convolution layers and an integration layer.Unlike the usual models with only one moment of graph structure as input, STSGCM unites three neighboring moments of spatiotemporal graph data for spatiotemporal embedding and uses them together as input.Within a local spatiotemporal graph data, it learns temporal features and spatial features simultaneously only by the shared GCN, thus ensuring the simultaneous acquisition of spatiotemporal correlation.To strengthen the association between adjacency matrices, it also proposes a spatiotemporal mask layer that adapts and changes according to the data.Such a learning mode strengthens the correlation between time and is space within a piece of region, thus considering the spatial heterogeneity (i.e. the data presents a heterogeneous distribution in space) that exists in spatiotemporal data.In addition, it improves and adjusts the loss function, activation function, etc. to ensure that it works well on four real datasets.To clearly see the enhancements, we also selected two popular frameworks of dynamic models, STSGCN and STFGNN, as representatives of specific performance comparisons.As can be seen from Table 8, compared to the ASTGCN(r), STSGCN optimized for 6% error, and STFGNN optimized for 10% error.There are two reasons why STFGNN is greater than STSGCN.For one thing, STFGNN obtains a larger field of feeling by integrating WaveNet, considering a longer time dependence.For another, the spatiotemporal fusion block of STFGNN, compared to STSGCN, it overlays multiple blocks and uses maximum pooling to obtain non-local spatiotemporal correlations.It considers not only spatial heterogeneity but global homogeneity.
However, there are fewer models in this category compared to the second category because they are more difficult to construct rationally.In particular, the differential representation of spatial connectivity and temporal connectivity requires suitable algorithms to be deployed so as to achieve the desired results.

Future directions
Although ST-GNNs have achieved good results in recent years, some problems still need to be solved, such as network deepening problem, other influencing factors problem, adaptive graph structure problem, multi-scale information fusion and model robustness impact.These problems have received less attention in the existing models, and there are even fewer models that can solve these problems.Therefore, in this section, we analyze the necessity and possibility of network deepening, influence of other elements and adaptive graph structure.

Deepen network and broaden feature
The capture of deep feature information is one of the ways to improve the prediction accuracy.Take ResNet (He et al. 2016) as an example, its emergence has brought a qualitative change in the performance of CNN.It solved part of the network training 'degradation' problem, successfully increased the network layers to more than 100 layers and improved the model performance.Most of the ST-GNNs in current prediction studies utilize the GCN to obtain spatial correlation.Since the GCN design of convolutional kernels may be over-smoothed during the stacking process (Li, Han, and Wu 2018), the network can only be stacked in layers 1-3, and the accuracy slowly decreases from layer 4 onwards.Therefore, future researches may apply other spatial convolutional networks to replace the dominant position of GCN.Among the classical graph neural network models, GAT (Veličković et al. 2017) and Graph Sage (Hamilton, Ying, and Leskovec 2017) perform a little better in over-smoothing (Li, Han, and Wu 2018); there are also many new graph neural network models focus on solving the over-smoothing problem, taking Bayesian graph neural network (Hasanzadeh et al. 2020) (Bayesian GNN) as an example, which is proposed in the paper Graph DropConnect to specifically solve the over-smoothing problem and uncertainty problem.However, there are not many networks to target this problem in the ST-GNNs for prediction.Only GAT (Veličković et al. 2017) has been less used in them, and the main reason is that GAT consumes too much computation resource, which has high hardware requirements.Perhaps in the future with hardware enhancement and algorithm improvement, a graph neural network model more suitable for deepening ST-GNNs will appear.

Multivariate forecasting
Most of the current networks only consider the prediction data itself, and ignore the influence of some other factors that have a great impact on feature capture, leading to the limitation of prediction accuracy.These factors can be treated as a variable and put into the model to be trained with.We found that there have been studies to pay attention to this (Fang, Prinet, et al. 2021a;Qi et al. 2019;Huang, Li, and Huang 2020;Fang et al. 2020;Zhao et al. 2020;Zhu et al. 2021;Zhou et al. 2021;Zhang et al. 2021;Wang, Lin, et al. 2022;Chen et al. 2021;Liu et al. 2021), which is about one-fourth of the total sum.However, overall, their number is still small, most likely due to the difficulty of obtaining data from multiple sources.We briefly summarize the different influencing factors data that have been studied so far, which are concerned in different fields: for traffic prediction, weather and site properties (such as POI) can greatly affect the change of spatiotemporal patterns; for PM2.5 or PM10, the change of wind field and other pollutants can cause their sudden change; for crime prediction, site properties can affect the case likelihood of crime, etc.
Currently, such models usually encode multi-source data and transform them into features that are mixed (mostly combined or multiplied) with the predicted quantities to finally achieve multivariate prediction.Such predictions mostly have better predictive performance, but usually add a portion of training time since the model needs to consider components from multiple sources of data.
Relative to other challenges, the difficulty of this challenge lies in the unavailability of data, namely, the absence of reliable and relevant open-source data.Therefore, with the availability of open-source data from multiple sources, this will become an essential part of every model.

Self-adaptive adjacency matrix
In the current ST-GNNs, many frameworks are difficult to migrate to other data sets due to the fixed graph structure.The emergence of adaptive graph structures has changed the status quo by solving the problem of difficult migration of models and eliminating the preprocessing problem of graph structures.In models such as Graph WaveNet (Wu et al. 2019), STGAT (Kong et al. 2020) and AGCRN (Bai et al. 2020), AST-InceptionNet (Wang et al. 2023), adaptive adjacency matrices are utilized to learn the graph structure relationships.The experimental results found that the adaptive adjacency matrix can completely learn the previous graph structure's adjacency relations and mine its potential adjacency relations, and can even provide slightly better predictions than the direct input of the original graph structure.For the vast majority of current networks, this is not yet possible.

Multi-scale spatiotemporal neural networks
As the research on forecasting continues to develop, the importance of spatial heterogeneity has drawn the attention of more and more scholars.Considering spatial heterogeneity, it is necessary to extract information of local graphs, which is difficult to be extracted from the global graph.Therefore, scholars try to use techniques such as graph pooling (Yu, Yin, and Zhu 2019;Wang et al. 2023) or clustering (Guo, Hu, et al. 2021a) to obtain new subgraph structures and potential local spatial correlations in subgraphs.The fusion of local and global graph information has led to an essential improvement in terms of accuracy.In future prediction studies, local graph considerations will become an indispensable part of the model.

Temporal pattern mining
For universal spatiotemporal graph data (e.g.traffic flow, PM2.5, pedestrian flow variation, etc.), the data itself has certain regularity in terms of time, i.e. proximity, trend and periodicity.Most of the networks merely focus on proximity, yet ignore the trend and periodicity.Only a few (Guo, Lin, et al. 2021;Guo et al. 2019;Ge et al. 2020;Peng et al. 2020;Wang and Jing 2022;Wang, Jing, et al. 2022;Wang et al. 2023) employ a multi-component structure to learn the regularity of time and improve some prediction accuracy.In future forecasting research, the mining of temporal regularity will be considered by more forecasting models.

Conclusion
In this paper, a comprehensive review of ST-GNNs in prediction domain was proposed since the birth of ST-GNN.Most of the models were summarized from the perspective of model construction, categorized in terms of applications, and aggregated with relevant datasets.Then, a development history of these models is presented, before the transitions of interest in current research are analyzed.In addition, the current shortcomings and future directions of ST-GNNs are illustrated with examples.Based on such references, a thorough description of spatiotemporal graph data in prediction domain was provided as a comprehensive basis for upcoming researchers so that they may keep on designing new techniques to address challenges and avoiding duplication of research.
In the future, it may be possible to improve upon existing models by pursuing one of the future directions proposed in this paper, so as to better support traffic management.Meanwhile, due to the problem of graph convolution process of neighborhood nodes is also computed in pairs and one by one, which limits the collaborative consideration of multiple nodes simultaneously (Bai, Zhang, and Torr 2021;Wang and Zhu 2022), such as trajectory data, metro passenger data.Therefore, we will pursue to explore prediction research related to ST-GNNs, such as high-order relationship structure constructions, or prediction of raster data combined with graph structure, etc.Currently, ST-GNNs have achieved some results in prediction domain, but there are still some problems to be solved.In conclusion, we are optimistic that this technology will continue to improve in the future and find increasing use across a broad range of applications, such as prediction of waterway or airway traffic, prediction of multi-year ground settlement changes and prediction of pollution dispersion.

Figure 2 .
Figure 2. The data of spatiotemporal graph.

Table 3 .
Application classification and corresponding datasets.

Table 4 .
Model Performance on METR-LA.

Table 5 .
Comparison of specific computing time of some models in METR-LA dataset.
Figure 3.The flow chart of prediction processes.

Table 7 .
Summary of model advantages and disadvantages.

Table 8 .
Specific performance comparison of different types of models.