Graph convolutional networks for street network analysis with a case study of urban polycentricity in Chinese cities

Abstract Graph theory effectively explains urban structures via street–street connectivity. However, systematic comparisons of street structures across cities remain challenging. This study employs graph convolutional networks (GCNs) to analyze street network structures. A two-branch GCN was used as the backbone to extract comparable features among street networks. The proposed approach was used to examine the structures of different urban road networks in a case study of polycentricity prediction across 298 Chinese cities. The model transformed approximately 4.5-million street segments into natural streets to create urban street graphs, which were subsequently analyzed to extract local and global embeddings. The extracted embeddings – with a portion labeled with a known urban polycentricity score – were used to predict the score for each city through a single-layer perceptron (SLP) model. Our results show consistency between the predicted polycentricity scores based on the derived street embeddings and those based on the population. Thus, the proposed GCN-based method can effectively predict the complexity and interconnection of street networks in different cities. This innovative integration of GCNs into urban studies demonstrates that deep learning techniques can analyze and comprehend the intricate patterns of street networks on a large scale.


Introduction
Streets enable the movement of people, goods and information, as well as shape the infrastructure layout and land use patterns, thereby realizing various urban functions.The intersections and connections of streets form a 'network.'A well-developed street network supports economic growth, fosters social cohesion and facilitates emergency responses during crises (Sharifi 2019;Wang et al. 2019).Studying street networks is imperative for creating sustainable and livable spaces that cater to the needs of individuals and communities.Accordingly, various street network representations have been devised to scrutinize spatial configurations (Penn et al. 1998;Marshall et al. 2018), which establish a direct link between physical space and people.
With the increasing popularity of graph theory and complex networks, the network structure of urban streets from a topological perspective (street-street connectivity) has been of primary interest to scholars from various disciplines through network centralitybased measures such as betweenness (Crucitti et al. 2006;Kirkley et al. 2018).However, comparing these network centrality parameters for individual streets across cities is difficult because no geometrically or statistically uniform layout for each urban street network exists (Xue et al. 2022).Deep learning techniques have recently exhibited their ability to bridge this gap and have attracted significant attention from researchers in both, the computer science and geospatial communities (Zhang et al. 2019).Graph convolutional networks (GCNs) and graph autoencoders (GAEs) can learn deep representations of street-street connections, employ graph convolutional layers to embed the graph into a latent vector space, and map the derived embeddings to a reconstruction of the input street network.The learned embedding from street networks can be viewed as a specific type of representation learning (Bengio et al. 2013) in urban studies that focuses on mapping urban configurations to continuous vector representations that can be nonlinearly fitted with related urban metrics.However, deriving embeddings from large-scale street networks that can balance high dimensionality, structural integrity and computational efficiency is challenging (Wang et al. 2016).
In this study, we propose a novel semi-supervised GCN framework that is designed to characterize urban structures across 298 prefectural-level Chinese cities for a comprehensive comparative analysis of large-scale urban street networks.Our framework leverages a two-branch graph neural network (GNN) structure.The first branch is based on an unsupervised autoencoder model that is used to learn the intrinsic patterns and embeddings from the input graph data.This autoencoder model enables us to explore the underlying structures of urban streets without requiring extensively labeled data.The second branch of the model incorporates embeddings from the first branch and is trained using a subset of labeled data to predict the morphological indices that fit the requirements of different urban analysis purposes.In addition, to ensure satisfactory performance and scalability, we rely on urban street graphs constructed from approximately 4.5 million street segments and employ a strategic subsampling approach to preserve the global graph structure and significantly reduce the computational load created by large street graphs.To showcase the feasibility of our framework, we conducted a case study predicting the index of urban polycentricity, i.e. a nuanced structural property describing the presence of multiple centers within a city (Meijers 2008), using embeddings extracted from street networks.
This study contributes to the literature in two key areas.Methodologically, the developed GCN-based framework highlights the potential of GNNs for street network analysis.It represents a significant advancement over traditional approaches by offering the unique capability to adaptively learn graph embeddings suitable for diverse tasks, all without altering the fundamental architecture of the network.Furthermore, this novel framework design addresses the challenge of preserving the structure and handling nonlinearity in conventional street network analysis approaches, e.g.centrality-based analysis.Empirically, our case study of polycentricity prediction not only effectively demonstrated the efficacy of the framework but also illuminated the intricate relationship between the structure of street networks and urban polycentricity as informed by population.The predicted polycentric structures of Chinese cities revealed by our approach are potentially useful for policymakers because they enable a detailed comparison of the current urban landscape, facilitate the simulation of future developments and guide decisions based on existing or planned street configurations.
The remainder of this article is organized as follows: Section 2 reviews the related literature.The proposed methodological framework is introduced in Section 3. Section 4 presents the results, with a focus on the subsampled street graphs for 298 cities, the GCN model performance and the correlation between the measured and learned urban polycentricity scores.The proposed methods, derived results, and implications for urban studies are discussed in Section 5. Section 6 provides the conclusions and outlines future research directions.

Complex structure of urban street networks
A city is not a tree but a complex network that embodies many interconnected urban elements (Alexander 1965).A typical example is a network of urban streets that can be naturally modeled as graph representations.Dual graphs (with the nodes as streets and links as intersections; Porta et al. 2006) and their related centrality measures are the most common methods for street structural configurational analysis (Crucitti et al. 2006;Kirkley et al. 2018;Wang and Debbage 2021;Wu et al. 2024).Empirical studies have demonstrated that urban street networks tend to possess a similar structure to that of many other real-world complex networks (e.g.scale-free and/or small-world properties) in terms of the statistical distribution (e.g.long-tailed or power-law) of graph measures (Batty 2008;Boeing 2017).In addition to the statistics for a single quantity, previous studies have identified both nonlinear and linear relationships among the street structural complexity, urban socioeconomic and environmental factors (such as GDP, population and CO 2 emissions) (Lu et al. 2016;Lan et al. 2019;Zhang et al. 2022) and urban life (Huang et al. 2022).Critics in the space syntax literature (e.g.Ma et al. 2019) have noted that the complex structure of street networks is hardly viewed in terms of geometric primitives, such as individual street segments or junctions that are conventionally adopted in geographical information system (GIS) models because they have a similar number of neighboring nodes or links.Instead, the network is formed by natural streets, each of which denotes a group of adjacent segments with strong continuity (see details in Section 3.2).This enables the complex network structure to be grasped, which further aids in deriving street hierarchies that are useful for many applications, such as traffic monitoring (Liu et al. 2019) and map generalization (Yu et al. 2020).

GCNs and their applications in urban studies
Among the different deep learning architectures, convolutional neural networks (CNNs) are the most commonly used for analyzing regularly structured data such as images (Defferrard et al. 2016).However, the use of these networks in the analysis of irregularly structured data, such as graphs remains challenging (Bronstein et al. 2017).Motivated by CNNs, two potential solutions, namely, spectral and spatial approaches have been investigated.The conceptual basis of the spectral-based approach is the graph Fourier transform, which resembles the Fourier transform of a 1D signal (Zhu and Rabbat 2012).In contrast, the spatial approach performs convolution operations directly on the graph vertices by aggregating the information from neighboring nodes in a convolutional manner.The spectral and spatial GCN approaches have both been adopted to address various types of graph-structured spatial vector data owing to their impressive ability to learn discriminative features from input graph-structured data.The potential of GCNs for characterizing urban spatial patterns based on regular and irregular geometric configurations has been demonstrated.
In urban studies, spatial vector graphs can be constructed from connections that link different types of spatial units, such as equally partitioned grids, self-defined boundaries based on data clusters, and individual spatial objects.For example, Yao et al. (2021) predicted the spatial flow distribution among 1 km � 1 km grids, Zhu et al. (2020) connected a set of point-of-interest (POI) clusters and applied a GCN to infer place characteristics, and Yan et al. (2019Yan et al. ( , 2021) ) proposed a GCN approach for the pattern cognition and classification of a group of individual buildings.In addition to networks of point or polygonal units, networks of linear units (i.e.road networks) have been used extensively for urban applications such as traffic forecasting (Yu et al. 2018).However, adapting emerging complex network structures for a comprehensive understanding of the spatial configurations among various cities remains a challenge.Xue et al. (2022) predicted the spatial homogeneity using segment-based centrality measures from a grid-level road network across 30 cities and used this GCN-based measure to determine urban structure and socioeconomic performance.Despite this remarkable effort to link street structures with urban performance, it is worth discussing whether parts of streets in grids are sufficiently representative because no visual or statistical direct self-similarity proof between grid-level street segments and entire streets is available in the literature.
Another approach is to effectively sample the street network for each city to boost computational efficiency.Pooling or downsampling is commonly applied to grid-like image data in CNNs.Atwood and Towsley (2016) pioneered pooling with diffusionbased convolutions in GCNs, demonstrating the significance of pooling via graph coarsening for merging similar nodes.Subsequently, Ying et al. (2018) introduced a differentiable pooling operation for GCNs, which enables a hierarchical reduction in the graph size while preserving crucial data structures.Building on this concept, Lee et al. (2019) proposed a self-attention mechanism for graph pooling to identify and preserve the most salient nodes within a graph.However, whether the pooling strategy of existing methods can be directly applied to street data remains uncertain owing to the uniquely complex geometric and topological characteristics of such data.
Existing approaches usually require annotated datasets to train models.However, collecting dense semantic labels is usually very labor-intensive and time-consuming, and the quality and adequacy of the labeled dataset significantly affect the performance of the learned model.Several semi-supervised and unsupervised approaches have been proposed to address the demand for strong supervision.GAEs have been used extensively to learn the efficient coding of unlabeled graph data.Kipf andWelling (2016a, 2016b) introduced a GAE with a GCN encoder and a simple inner product decoder, which exhibited competitive link prediction performance in several citation network datasets compared to other nondeep learning models.Inspired by the above research efforts, we propose a novel semi-supervised architecture to measure urban polycentricity from road networks using the GAE and single-layer perceptron (SLP) models.

Urban polycentricity
Over the past several decades, rapid urbanization has fostered the development of polycentric urban regions (PURs), which have been increasingly recognized in recent studies (Derudder et al. 2021;Harrison et al. 2022;Thomas et al. 2022;Derudder et al. 2022).The concept of urban polycentricity is often employed to assess the extent of such development quantitatively.It is a multifaceted concept that encompasses at least two dimensions: geometrical aspects (i.e. the morphological form) and topological relationships (i.e.functional links), as described by Burger and Meijers (2012).However, most of the existing related research has focused mainly on the morphological dimension of urban polycentricity.This approach typically involves examining the distribution and intensity of urban elements and activities, as highlighted in a recent study by Thomas et al. (2022).
Gridded data, particularly population grid data and nighttime imagery, have historically been the primary tools for assessing PURs.For example, Liu and Wang (2016) and Liu et al. (2018) used LandScan population grids to capture a holistic view of polycentric development in Chinese cities and their surrounding regions.In addition, location-based social media data have emerged as a novel source for analyzing urban polycentricity.Lv et al. (2021) demonstrated this application by performing a multiscale PUR analysis in Chinese cities, incorporating data on POIs at the city level and check-in densities of street blocks within city centers.
Although numerous studies have leveraged various geospatial data to deduce aspects of urban polycentricity (e.g.Liu and Wang 2016;Wang 2021;Taubenb€ ock et al. 2017;Liu et al. 2018;Volgmann and M€ unter 2022), a notable gap remains in the literature regarding the use of large-scale street network data and their inherent network topologies.Therefore, this study aims to illustrate the potential of the proposed GCN-based methodology for analyzing and interpreting street networks.Specifically, we present a case study that employs this innovative approach to predict urban polycentricity using street network data.

Methodology
The proposed framework includes three main steps: subsampling urban street networks, encoding and decoding subsampled urban street features, and predicting urban polycentricity (Figure 1).The full urban street network across almost 300 cities, which is the input for the GCN, was too large to compute.Therefore, each urban street network needed to be generalized, and a fixed-size street network sample had to be used (Section 3.1).Thereafter, we implemented a GCN as a feature extractor to encode each subsampled street graph with a comprehensive embedding layer, resulting in a set of multidimensional vectors (embeddings) that could be restored by the decoder (Section 3.2).The extracted embeddings for each city further helped us to answer the question 'How polycentric are these cities?' The embeddings were divided into training and test sets.Based on the training set labeled with empirically measured urban polycentricity values, we applied the SLP model to predict the urban polycentricity for the test set (Section 3.3).Finally, the loss function was optimized for the proposed models (Section 3.4).

Subsampling city-scale street graph
Urban roads are naturally modeled as networks or graph representations based on their spatial relationships (e.g.intersections).Roads are conventionally stored as segments in GIS systems, in which they are separated at each street junction where at least three polylines intersect.Subsequently, a dual graph can be generated based on the segment-segment topology, in which the nodes are segments and the links intersect.This type of topology has been widely adopted for many urban applications, such as navigation, and has been effectively integrated into mainstream GIS software for spatial analysis (e.g. the Network Analyst toolbox in ArcGIS).However, uncovering the underlying street structure as a segment, which can be regarded as a geometric primitive with little meaningful information in reality, may be challenging (Ma et al.

2019
).As shown in Figure 1, the dual graph of street segments that is visualized in a circular layout is mechanistic or less vivid because the connectivity for each segment is similar.The segment-segment topology can be transformed into a street-street topology for an organic structure.We group and join a set of neighboring segments with two conditions, namely, the same name and good continuity (small deflection angles, e.g. less than 45 � ), into a 'stroke' (Thomson and Richardson 1999) or 'natural street' (Ma et al. 2019) to avoid the generation of meaningless streets.Specifically, we first merge neighboring segments with the same names.Because not all segments have name attributes in reality (database incompleteness), we further consider good continuity for the processed and unprocessed segments together for the final natural street generation.According to Gestalt psychology, a natural street (street) matches human visual cognition and is more meaningful than a segment alone.
Apart from human perception, street-street topology enables a complex network perspective to be adopted to explore the network of streets.Some street properties (e.g.street lengths and connectivities) appear to have a hierarchical structure.As illustrated in Figure 2, approximately 80% of the streets are less connected, whereas the other 20% are well connected, among which approximately 1% are very well connected.Thus, we can observe that the transformation from segments into streets also leads to a profound statistical change in the geometric and topological properties of the street, that is from a normal distribution to a heavy-tailed distribution.Heavy-tailed distribution statistics imply an imbalanced, hierarchical structure that enables us to conduct street network subsampling effectively.Subsequently, well-connected streets can be selected to represent the essence of the entire network to a significant extent.In addition, drawing on the idea of pyramid representation in image processing (Adelson et al., 1984), we adopt the power of two as the scale factor to select streets with the highest connectivity values as the input to the GCN for training and testing, as denoted by Equation (1). of subsampled streets ¼ 2 level  (1) To perform street subsampling, we first rank all streets in decreasing order according to connectivity and select the top 2 level streets as the subsample.An example is depicted in Figure 2. A street network with reduced resolution can be obtained through simple ranking and selection.The resulting 'network pyramid' can be adopted as a data structure that efficiently supports the following convolutional operations through representation at multiple reduced scales.As an urban-scale street network usually consists of thousands of streets and the number of streets differs among cities, the level at which the street network should be subsampled must first be decided.This level applies to all cities because an equal-sized subsample for each city is required as the input to the GCN in the subsequent steps for computation and comparability.

GAE model
Assume a graph G ¼ ðV, A, XÞ, where V is the set of streets, A 2 R n�n is the adjacency matrix, and X is a feature matrix that consists of two features comprising the length and number of connected streets for each street (both values are normalized).The Laplacian matrix of the specified graph-structured data can be represented as L ¼ D − A: In this mathematical representation, L, D is the diagonal node degree matrix, and the diagonal element D ii ¼ P j A ij is the sum of the elements of row i in the adjacency matrix A: The graph Laplacian matrix L can be further normalized as , where U is the matrix of eigenvectors of the normalized graph Laplacian and K is the diagonal matrix of eigenvalues.Assuming that x is the signal vector that is defined on the nodes of graph G, the graph convolution can be defined as the multiplication of x with a filter g h according to the convolution The street network consists of 57 natural streets (2 5.87 � 2 6 ), the spatial intersection relationship of which helps to derive the dual graph.We visualize the importance of the street using the connectivity measure with a color spectrum, where red, yellow, cyan and blue represent the streets most connected to the poorest connected streets.Thereafter, subsampling of the network can be performed at a reduction ratio of 1/2 by selecting the top 32 (2 5 ), 16 (2 4 ) and 8 (2 3 ) most connected streets.Although the number of streets is continuously halved, the main street structure is maintained throughout the series.Source: Adapted from Figure 1  theorem in the frequency domain, as outlined in Equation ( 2): where g h ðKÞ is a function of the eigenvalues of L: Therefore, this convolutional structure is impractical for large-scale graphs because the eigendecomposition of the normalized Laplacian matrix L may be very computationally expensive.To address this problem, Hammond et al. (2011) introduced a fast approximation of g h ðKÞ, which is a truncated expansion in terms of the Chebyshev polynomials T k ðxÞ up to the Kth order: In the approximation, where k max denotes the maximum eigenvalue of L and I N is an identity matrix of size N; h k 2 R k is a vector of polynomial coefficients, which is to be learned in the training process; and T k ðxÞ is recursively defined as T k ðxÞ ¼ 2xT k−1 ðxÞ − T k−2 ðxÞ, with T 0 ðxÞ ¼ 1 and T 1 ðxÞ ¼ x: Kipf andWelling (2016a, 2016b) truncated the Chebyshev polynomial to the first order (i.e.K ¼ 2 in Equation ( 3)) as a special variant, leading to the following simplified convolution: where : Consequently, we can generalize the simplified graph convolution to an input layer of feature X ðlÞ 2 R N�C (i.e. a Cdimensional feature vector for every node) and F filters as follows: where H l ð Þ is a matrix of C � F parameters (i.e. a layer-specific trainable weight matrix), X ðlþ1Þ 2 R N�F is the convolved matrix, which is also the input feature for the next graph convolutional layer l þ 1, and r denotes the activation function, such as ReLU(�) ¼ max (0, �).
Given the graph convolution operations that are defined in the frequency domain, we can establish an autoencoder comprising five parts: the input, encoder, embedding, decoder and output, as illustrated in Figure 3.In particular, the encoder receives the graph features and adjacency matrices and processes them through multiple stacked graph convolutional layers, as shown in Equation ( 5).This procedure facilitates information aggregation from interconnected nodes across the graph to generate features for every individual node, that is local embeddings.Following the encoding process, a max pooling layer is employed to extract the overall features of a particular graph from the embedded node-level features.This approach is inspired by the objective of model permutation invariance; that is, different orders of nodes in the graph should produce identical embeddings.The max pooling layer, which is a straightforward symmetric function, accepts n vectors as the input and delivers an output vector that is unaffected by the input order to ensure order invariance.This concept of deriving global features has been used extensively in deep neural networks, such as PointNet (Qi et al. 2017) for point-cloud processing, given that point clouds typically consist of points in varying orders.After deriving the global features, the graph-level embeddings (i.e.global embeddings) are fed into the decoder to realize the reconstruction, which is expected to recover the graph structure in the original data.In this study, inspired by the work of Kipf andWelling (2016a, 2016b), the decoder recovers adjacency matrix A' as a product of embedding Z and its transpose.Owing to this architecture, the model can be optimized by minimizing the mean squared error (MSE) between adjacency matrix A and reconstructed adjacency matrix A'.The parameters of the convolution kernels and the biases of the activation functions are updated through backpropagation.

SLP-based prediction
We adopt an SLP model to predict urban polycentricities with graph-level embedding, which can be learned from the proposed autoencoder.As illustrated in Figure 4, the SLP model uses the embedding vector from the autoencoder as its input.Subsequently, the aggregate of the weighted embedding values is passed through a sigmoid function to map the prediction value to the range of 0 to 1. Notably, in this  study, the SLP model is trained in a supervised manner, with the city polycentricity values calculated by Liu and Wang (2016) used as a reference for our training data.During the training process, the reference values of a subset of cities are used to refine the predictive model, while the autoencoder network is concurrently trained on all the graph data in a purely unsupervised manner.In this regard, the proposed framework aligns with the definition of a semi-supervised framework, given that all available graphs are used for the unsupervised training of the autoencoder model, whereas only a portion of the reference values is used to train the SLP prediction model.The concurrent optimization of both branches emphasizes the semi-supervised nature of the model.The SLP model parameters are fine-tuned by minimizing the MSE between the predicted and reference values.

Loss function
We address graph-based urban polycentricity prediction using a semi-supervised learning approach owing to the limited number of reference labels.In this study, we express the final loss for model training as the sum of two losses that are derived separately from the two proposed branches: where the unsupervised loss L GACE is the MSE between adjacency matrix A and reconstructed adjacency matrix A' from the GAE model.The supervised loss L P denotes the MSE between the predicted and available reference values.In addition, the trade-off parameter k is used to balance the supervised and unsupervised losses during training.In this study, the value of k is set to 0.0001 to prevent overfitting of the prediction branch.

Data sources and processing
Four datasets were used in this study: (1) the national street network, (2) 298 city administrative borders, (3) 2019 national GDP statistics and (4) population-based urban polycentricity scores.The national street dataset was sourced from OpenStreetMap (OSM) and originally included 4,419,603 segments that were strictly separated at the vertices where at least three segments intersected.Each city boundary was further adopted as the unit for data processing.We extracted city-level street segments and transformed them into natural streets.Note that, on average, more than 70% of the name information was absent in the raw segment data, and the situation improved to 50% or less for the top cities.The national GDP data for 2019 were sourced from the published editions of the China Statistical Yearbook 2019.The population-based polycentricity scores of all Chinese cities were adopted from Liu and Wang (2016) and calculated based on the formula in Green (2007), resulting in a polycentricity score ranging from 0 to 1 (a higher value indicates that the city is more polycentric, 0 denotes a total lack of urban polycentricity where there is only one urban center in a city, and 1 denotes complete urban polycentricity, with several urban centers of the same size in the city).

Subsampled street networks
Before discussing the experimental results, we briefly introduce the ht-index (Jiang and Yin 2014) to characterize the heavy-tailed distributions of the street measures.Data with a heavy-tailed distribution often exhibit an unbalanced pattern between large and small ratios, that is, very few large values versus many small values.Interestingly, this imbalance may recur within the data.The number of recurrences has been termed the 'ht-index' and further developed as an effective means for quantifying the extent of imbalance or complexity of data.A higher ht-index indicates a more complex street structure because more street hierarchies can be reflected.We computed the ht-index for each of the 298 cities in terms of the street connectivity using the generated natural streets.The ht-index values ranged from 1 to 9, and almost all the cities had an ht-index �3.As shown in Figure 5(a), urban streets with relatively high complexity (red dots) are distributed evenly across the western, central and eastern regions.In addition, a general pattern is observed in which important cities such as provincial capital cities tend to have higher ht indices (above 5).Among these cities, Shanghai has the most complex street structure, with an ht-index of nine.Given this high complexity, a clear visual binary division of street connectivities could be observed (Figure 5(b)), that is the scarcity of well-connected streets (in red and yellow) versus the number of poorly connected streets (in blue).This partitioning enabled us to identify the street hierarchies effectively, leading to successful subsampling of large-scale street networks.We selected the 128 most highly connected streets as a subsample of each original urban street network.As in the case of streets in Shanghai, we first conducted simple power-law fitting for the connectivity values of the entire street and the selected subset.By taking the double logarithm of the x-and y-axes, we observed that the distributions of the two datasets were comparable (nearly straight lines; Figure 5(c)).In addition, we counted the frequencies of each logged value and plotted them in a histogram to analyze the statistical patterns from one subset to another.As a further step, a trendline (using curve fitting) was created for this bar, where the entire pattern appeared very right-skewed.Notably, the trendlines for the top 16, 32 and 64 values were distant from the pattern, similar to the whole, until the peak of the top 128 values occurred (green line in Figure 5(d)).This statistical pattern was also confirmed in other cities.Table 1 and Figure 6 present the representative top 128 streets in 20 selected Chinese cities from statistical and geographical perspectives, respectively.Unlike a subsample of streets within equally partitioned grids, the subsample in this study (the most connected and longest roads) covered the entire urban area.Interestingly, of the 20 cities with different socioeconomic statuses, each subsample (only 1% of streets) had an ht-index above or equal to 3, accounting for no less than 50% of the structural complexity of all streets (in most cases).In addition, we observed that the power-law distributions of the subsampled and full street networks were similar across cities, highlighting the ability of the ht-index to delineate subtle differences in street hierarchy.Liu and Wang (2016).The hierarchy of the cities from top to bottom is aligned with the row sequence, e.g. the first row indicates the cities at the top hierarchy determined by urban GDP, and the fifth row is the fifth hierarchy.For each row (cities in the same hierarchy), we ordered the cities in ascending order from monocentric (left) to polycentric (right).

Model training and parameter settings
The architecture and parameters of the autoencoder model are depicted in Figure 7(a).The encoder of the proposed framework contains five convolutional layers, with dimensions of 32, 64, 128, 256 and 512.After the fifth graph convolutional layer, the nodelevel graph features with dimensions of 128 � 512 are fed into the max pooling layer to derive a 512 � 1 vector, which is the global-level feature for the involved graph.For the decoder, the 512 � 1 graph coding is first scaled to a 65,536 � 1 vector and then reshaped to a 128 � 512 matrix Z, with dimensions identical to those of the output of the final graph convolutional layer in the encoder.Finally, adjacency matrix A' is reconstructed as the product of matrix Z and its transpose.Simultaneously, the 512-dimensional global embeddings are used as the inputs for predicting the urban polycentricity.As shown in Figure 7(b), a sigmoid function is used to obtain a predicted value from 0 to 1.This value represents the predicted urban polycentricity of the input street-street connectivity graph.
In the experiment, the autoencoder and SLP models were simultaneously optimized using the Adam optimizer with a learning rate of 0.0001.All 298 city graphs were used for unsupervised training of the GAE model.A total of 159 cities, each of which had a referenced polycentricity score greater than 0, were used for training and testing (i.e.104 (approximately 65%) for training and 55 (approximately 35%) for testing).Note that a total of 139 cities with referenced polycentricity scores of 0 were excluded from the training process.The minibatch size was set to 32, and the models were trained for 1000 epochs.The area under the receiver operating characteristic curve (AUC) was 0.934, and the average precision of the trained GAE model was 0.952.

Comparison with baseline models
We selected the following baseline models, which all produced 128-dimensional representations across cities, to verify the importance of the subsampled streets in relation to urban polycentricity and to compare the performance of our proposed GAE model.� GraphSAGE, which was proposed by Hamilton et al. (2017), is a powerful method for learning low-dimensional vector representations of nodes in graphs.This is achieved by sampling neighboring nodes and aggregating their features to obtain node representations.In this study, we constructed an end-to-end model as a baseline with a similar structure to that of the proposed GAE.The mean aggregation technique was employed for information aggregation.� Node2Vec is a powerful unsupervised graph representation technique that transforms nodes in graphs into low-dimensional vectors (Grover and Leskovec 2016).
We employed Node2Vec to embed each street into an eight-dimensional feature and combined it with a two-dimensional (2D) street attribute feature, which yielded a 128 � 10 feature matrix for each city.Thereafter, principal component analysis (PCA) was used for dimensionality reduction to obtain 128-dimensional features for each city.The walk length and number of walks sampled for each node were set to 10, and the actual context size considered for the positive samples was five.� The vanilla GCN was inspired by the foundational GCN approach introduced by Kipf et al. (2016aKipf et al. ( , 2016b)).We implemented five GCN layers, with dimensions that were sequentially configured as 32, 64, 128, 256 and 1, for the encoder of the GAE.Notably, the final GCN layer in the encoder produced a 128 � 1 output dimension, which was an aggregated feature that was achieved by retaining only one feature per node in the graph.These aggregated features acted as global embeddings and were employed to rebuild the adjacency matrix during the decoder stage.
Street graphs should ideally exhibit a distribution pattern in which graphs of the same type are adjacent, whereas graphs of different types are separated from the coding space.We mapped the 128-dimensional coding representation of each city to the two principal axes that were derived via PCA. Figure 8 depicts the correlation between the distribution of cities in the encoding space and their predicted polycentricity scores for each trained network.The results of our proposed method show that cities with lower predicted polycentricity scores are positioned in the lower region of the plot, whereas those with higher scores are grouped in the upper part of the graph.This pattern signifies the effective distribution of cities according to their polycentricity.
Three regression-based metrics, namely, the R 2 , mean absolute error (MAE) and root MSE (RMSE), were adopted to determine the correlation between the predicted and observed polycentricity scores.We used data from only nonmonocentric cities (those with at least one subcenter) to perform the correlation test.As a result, 55 polycentric cities with diverse economic statuses and street complexities were selected.presents the results for the metrics.Similar to the comparison of the embedding plots, it can be observed that our proposed approach outperforms the baseline models in the three measures.The performance was assessed by GraphSAGE and Node2Vec (as indicated by the R 2 values in Table 2).Noticeable distinctions in performance among the different models emerged in terms of the MAE.The proposed GAE exhibited remarkable performance, with an MAE of 0.14.In contrast, GraphSAGE and Node2Vec yielded higher MAE values of 0.21 and 0.18, respectively.The RMSE metric showed a comparable trend.The GAE achieved an RMSE of 0.17, whereas GraphSAGE and Node2Vec attained values of 0.27 and 0.24, respectively.

Correlations between measured polycentricity scores and predicted scores
Figure 9 shows the correlation between the polycentricity predicted using the proposed GAE model and the previously measured scores.The correlation coefficient was 0.41 (Table 2), which was significant at the 0.01 level (two-tailed).We divided the 55 cities into three groups according to their urban GDP to conduct an in-depth investigation.Specifically, the cities in the top hierarchies had relatively low polycentricity scores owing to the existence of leading dominant centers.In contrast, the polycentricity profiles for the cities in the bottom class differed because the scores were distributed evenly along the value range.The correlation results varied across the different groups.High predictability was observed within cities in the top three hierarchies (approximately 0.5) for urban GDP and street complexity.In contrast, cities at lower levels were found to have very low predictability (below 0.2).We also visualized the ht-index values for each city.Note that the groups based on urban GDP and the ht-index were not the same but overlapped to a certain degree.A general trend between the two groups was that cities with a better economic status tended to have a more complex street structure (but not always).

Discussion
This study confirms the effectiveness of a deep learning framework for large-scale comparative urban analysis through network representation learning and reveals the underlying connection between street network structures and population-based polycentric urban structures.Like in other graph-structured data, the difficulty of learning street network representations lies in structure preservation and high nonlinearity.To address this issue, the proposed architecture first preserves the inherently complex street structure by iteratively joining adjacent, homonymic and continuous segments into streets with a deep hierarchy; that is, numerous less-connected streets, few wellconnected streets and some in between.In this way, the street network exhibits a form of self-similarity, in which parts reflect the characteristics of the whole network (Zhang et al. 2022).Namely, well-connected streets are representative of the entire network structure.This recognition of self-similarity is instrumental in the subsampling process and offers fresh perspectives on the concept of scale in geographic representation learning, particularly within spatial networks (Yuan and McKee 2022).Each city possesses geometrical, topological and societal properties that differ among cities.The graph embeddings of well-connected streets that are extracted through multiple convolutional layers can map these attributes in a highly nonlinear latent space (128-dimensional) and help characterize urban similarities across different street layouts.
The superior performance exhibited by the GCN-based approach compared with that of other popular GNN models, such as Node2Vec and GraphSAGE, can be attributed more to the specific downstream task than to the model itself.The desired embeddings are expected to accurately represent the structural complexity of the subsampled street graph and its related street features for predicting urban polycentricity.The underperformance of Node2Vec is due to its lack of consideration of node features.Like Node2Vec, which yields inferior results, the vanilla GCN is inherently designed for node-level (rather than graph-level) feature extraction.The selected 128 � 1 global embeddings that are derived by retaining only one feature per node from the final GCN layer in the encoder do not optimally represent the graphs examined in our study.These results also indicate the superior efficacy of the proposed max pooling layer in aggregating the global representation of the graph from embedded node-level attributes.However, GraphSAGE primarily emphasizes local structures by acquiring an aggregator that aggregates features from neighboring nodes (first or second order).Nevertheless, when the subsampled graph structure is known and representative, the GAE enables a more global perspective (higher-level order) by leveraging the relational information among the important nodes and their features in the graph.The characteristics of GCNs are more in line with the requirements of the task at hand and thus ultimately contribute to the superior performance of the proposed architecture.
Our analysis excludes cities that are characterized by a single population center during the prediction phase.This exclusion stems from the fact that monocentric cities inherently have a polycentricity score of 0 (Green 2007), which renders them unsuitable for correlation analysis.Although Liu and Wang (2016) assigned a polycentricity score of 0 to these cities based on population data, our findings indicate the potential for slight polycentricity (i.e.marginally above 0) in these areas.This discrepancy could be attributed to the limited resolution of the LandScan gridded population data (1 km � 1 km), which may not capture subtle urban centers as effectively as street network data.Street network data not only complement but also enhance our understanding of urban polycentricity by revealing more nuanced urban structures.Consequently, the integration of street-and population-based findings in this study offers a more detailed and comprehensive view of urban structural dynamics.
Cities with a higher GDP tend to have more complex street structures, and their predicted polycentricity scores are more consistent with the empirically measured population-informed urban polycentricity than those of other cities.Cities with better economies, such as those along the eastern Chinese seaboard (e.g. the Pearl River and Yangtze River Deltas; Figure 5), have experienced intensive decentralization and marketization processes (Liu et al. 2018); thus, they naturally have at least one populous location where people gather for various activities.As cities evolve, these concentrations of people and activities lead to more roads locally and various densities in the road network globally, resulting in the emergence of street complexity.The street complexity measured using the ht-index in this study reflects the variation in streetstreet connections, which is widely considered to be the driver of network community structure (Fortunato and Newman 2022).In other words, a larger ht-index indicates greater variation in connectivity values across streets and a greater chance of the street network having communities in which streets are strongly connected.Our results show that more developed cities tend to have more complex street structures (larger ht-index values).A higher correlation between the predicted and measured polycentricity values among these cities also strengthens the potential relevance of complex street configurations to the polycentric urban structure.

Conclusions
This study successfully demonstrated the efficacy of a GCN-based approach for analyzing cross-city street network structures using a two-branch, semi-supervised architecture.The methodological innovation of using natural streets underscores the evolving discourse in street network analyses (Marshall et al. 2018).The study demonstrated that the representation learning of the subsampled streets across cities, constituting approximately 1% of the total streets, can capture the global structural characteristics of the entire street network.The case study on predicting urban polycentricity informed by population data in 298 Chinese cities based on the proposed GCN-based approach relies on the unique integration of deep learning with urban street network analysis.A key observation was that more economically developed cities exhibit more intricate street network structures, reflecting their dynamic urban evolution and the resulting diverse densities in street networks.This complexity, as measured using the ht-index, correlates with variations in street-street connections and the likelihood of polycentric urban forms being developed.This study also highlights that street network data can complement and enhance the nuanced understanding of urban structures more than traditional data sources, such as the gridded population data of LandScan.
This study lays the foundations for future research.The observed alignment between the predicted polycentricity scores from street networks and populationbased data opens new avenues for research in urban studies.This synergy invites further investigation into the interconnected dynamics of street layouts and population distributions.However, as Derudder (2021) noted, network analyses do not always provide clear-cut interpretations, potentially limiting their practical applicability in urban planning and policy formulation.To bridge this gap, future research could incorporate explainable AI techniques into the proposed methodology to enhance the transparency and comprehensibility of urban polycentricity predictions from street networks for urban policymakers, planners and designers.Future research could also expand this methodological framework by applying other urban metrics and investigating their potential in different geographical contexts, thereby further enriching the understanding of urban structure and dynamics and facilitating the development of more livable and sustainable urban environments.

Figure 1 .
Figure 1.(Color online) Overall framework of GCN-based analysis of urban street structure includes three modules: the gray dashed box indicates the creation of subsampled street graphs based on the street-street topology, the orange dashed box shows the extraction of graph embeddings through the GAE model and the blue dashed box represents the prediction of urban polycentricity via the SLP model.

Figure 2 .
Figure2.Subsampling process of illustrative natural street representation and its dual graph layout.Note: The street network consists of 57 natural streets (2 5.87 � 2 6 ), the spatial intersection relationship of which helps to derive the dual graph.We visualize the importance of the street using the connectivity measure with a color spectrum, where red, yellow, cyan and blue represent the streets most connected to the poorest connected streets.Thereafter, subsampling of the network can be performed at a reduction ratio of 1/2 by selecting the top 32 (2 5 ), 16 (2 4 ) and 8 (2 3 ) most connected streets.Although the number of streets is continuously halved, the main street structure is maintained throughout the series.Source: Adapted from Figure1inMa et al. (2019).Copyright # 2019 by Pion.Reprinted with permission from SAGE Publications, Ltd.
Figure2.Subsampling process of illustrative natural street representation and its dual graph layout.Note: The street network consists of 57 natural streets (2 5.87 � 2 6 ), the spatial intersection relationship of which helps to derive the dual graph.We visualize the importance of the street using the connectivity measure with a color spectrum, where red, yellow, cyan and blue represent the streets most connected to the poorest connected streets.Thereafter, subsampling of the network can be performed at a reduction ratio of 1/2 by selecting the top 32 (2 5 ), 16 (2 4 ) and 8 (2 3 ) most connected streets.Although the number of streets is continuously halved, the main street structure is maintained throughout the series.Source: Adapted from Figure1inMa et al. (2019).Copyright # 2019 by Pion.Reprinted with permission from SAGE Publications, Ltd.
equivalent to adding self-loops to the original adjacency matrix of the graph and D � is the diagonal degree matrix of A �

Figure 3 .
Figure 3. Architecture of the proposed GAE model.

Figure 4 .
Figure 4. (Color online) Architecture of the SLP-based model for predicting urban polycentricity.

Figure 5 .
Figure 5. (a) Ht-index of street connectivities across 298 Chinese cities, (b) complex structure of the Shanghai street network (ht-index ¼ 9), (c) statistical similarity between the top 128 street connectivities and all the data examined by the power law and (d) histogram.Note: The nighttime image from 2020 of 298 cities in Panel (a) demonstrates the transition of moderate-to-intensive economic development from the western to the eastern regions in China to a certain extent.

Figure 6 .
Figure 6.Layout of the top 128 streets for 20 Chinese cities at different development levels.Note: Poly ¼ population-based polycentricity scores previously measured byLiu and Wang (2016).The hierarchy of the cities from top to bottom is aligned with the row sequence, e.g. the first row indicates the cities at the top hierarchy determined by urban GDP, and the fifth row is the fifth hierarchy.For each row (cities in the same hierarchy), we ordered the cities in ascending order from monocentric (left) to polycentric (right).

Figure 7 .
Figure 7. Architecture of (a) GAE and (b) SLP models for urban polycentricity prediction.

Figure 8 .
Figure 8. Visualization of derived embeddings from (a) vanilla GCN, (b) Node2Vec, (c) GraphSAGE, and (d) proposed GAE model in a 2D space.Note: The dot color was rendered using the predicted polycentricity scores for 298 cities.

Figure 9 .
Figure 9. Scatterplot of the predicted polycentric values and previously measured values for 55 Chinese cities at the different levels defined by the urban GDP.

Table 1 .
Ht-index for subsample and all streets for 20 Chinese cities.

Table 2 .
Performance of our approach and baseline models.