A graph neural network framework for spatial geodemographic classification

Abstract Geodemographic classifications are exceptional tools for geographic analysis, business and policy-making, providing an overview of the socio-demographic structure of a region by creating an unsupervised, bottom-up classification of its areas based on a large set of variables. Classic approaches can require time-consuming preprocessing of input variables and are frequently a-spatial processes. In this study, we present a groundbreaking, systematic investigation of the use of graph neural networks for spatial geodemographic classification. Using Greater London as a case study, we compare a range of graph autoencoder designs with the official London Output Area Classification and baseline classifications developed using spatial fuzzy c-means. The results show that our framework based on a Node Attributes-focused Graph AutoEncoder (NAGAE) can perform similarly to classic approaches on class homogeneity metrics while providing higher spatial clustering. We conclude by discussing the current limitations of the proposed framework and its potential to develop into a new paradigm for creating a range of geodemographic classifications, from simple, local ones to complex classifications able to incorporate a range of spatial relationships into the process.


Introduction
Geodemographic classifications are crucial tools in quantitative geography, created through statistical analysis and machine learning to characterise neighbourhoods based on socio-demographic data such as the census.Their use is widespread in a range of public and private sectors to better understand the places we live and how they change, from social policy to urban planning, from business strategy to marketing.Historically, such classifications have largely been created as commercial products by specialised companies.Recent years have seen the development of several open classifications, including national-level classifications created as part of academic research, starting with the 2001 Output Area Classification developed by Vickers and Rees (2007).
The development of geodemographic classifications requires highly specialised expertise, not limited to the knowledge of statistical methodologies (e.g., clustering) but rather rooted in geodemographic theory and practice.In particular, preparing and selecting the attributes 1 (e.g., census variables) to be included in the clustering process requires in-depth knowledge of the area and domain subject.Such expertise is irreplaceable, and this paper does not aim to suggest that it should be replaced.
However, there are cases where substituting part of the process with an algorithmic approach can be beneficial.First, an algorithmic approach could be used as a guideline for practitioners, providing a reasonable starting point or point of comparison.Second, an algorithmic approach could be used where resource constraints render it impossible to invest specialised expertise and time in performing the attribute selection and transformation process.For instance, that could be the case when developing local geodemographic classifications, as national-level ones can be inadequate for the study of a single city, as the detail of local variation is lost to optimise national patterns (Singleton andLongley 2015, Alexiou 2016).Third, an algorithmic approach could be used in cases where the number and heterogeneity of input attributes are so large as to render their manual manipulation arduous.The latter case is becoming more common due to the increased digitalisation of everyday life in many countries, along with the shift of many governments and institutions to open data approaches.While the computational resources might be keeping pace with the increase in data availability, the capability of academics and practitioners to invest time into the manual steps usually required for geodemographic classification (e.g., attribute comparison, selection and transformation) might not.
Another key characteristic of classic geodemographic classifications is that, while the data used to create geodemographic classifications are geographic (i.e., they describe something about a set of places represented as areal units), the methods used to create geodemographic classifications are frequently a-spatial, as they do not take into account spatial information (e.g., the spatial location of an areal unit, its spatial proximity to other areal units or mobility flows such as commuting) in their classification process but rather rely solely on the attribute information (e.g., the values of the census variables for each areal unit).That is based on the assumption that the areal units for which data are available are an effective representation of neighbourhoods.As a result, whichever way census boundaries are drawn, some houses next to one another will be cast into different areal units, although those houses might share more in common with one another than with the rest of the areal unit they have been assigned to.The overarching set of common practices, opportunities, and behaviours generated by the spatial proximity among people, businesses, nature, and the built environment might be lost in classic geodemographic approaches, as discussed by Birkin and Clarke (1998) and Feng and Flowerdew (1998).When such neighbouring effects are an important aspect of classification, an alternative approach is to render the classification of an areal unit dependent not only on its attribute information but also at least partially on the classification of its neighbours.That is usually achieved through fuzzy clustering (see, e.g., Grekousis 2021), as first suggested by Openshaw (1989Openshaw ( , 1995)).
In this paper, we present a groundbreaking, systematic study investigating whether deep neural networks might hold the key to ushering in a new landscape of spatial geodemographic classifications, which could be created efficiently and effectively through a more automated process and incorporating spatial relationships between areal units into the classification process.Webber and Burrows (2018) provide an extensive discussion of the development of geodemographic classification practices, which has its roots in the study of urban deprivation, dating back to Booth's poverty maps at the dawn of the twentieth century and the research conducted at the Chicago School of Sociology in the mapping of urban areas.Their technological background is rooted in work by Shevky and Williams (1949) and Shevky and Bell (1955), and the most common approach still used in current geodemographic classification, the k-means clustering algorithm, also dates back to the middle of the twentieth century (Jain 2010).Modern geodemographic classification, however, began in the late 1970s, with the work of Webber andCraig (1976, 1978) and commercial classifications such as CACI's Acorn 2 in 1979 and Experian's Mosaic 3 in 1985.

Related work
A key development of the last two decades has been the emergence of open geodemographic classifications developed through academic research.Vickers and Rees (2007) first developed the UK's Office for National Statistics 2001 Output Area Classification, based on the 2001 UK Census.Similar classifications based on earlier UK censuses existed (Webber 1977, Blake and Openshaw 1994, 1995) but were not released as open access.The subsequent 2011 UK Census was then used to create a wide range of classifications: Gale et al. (2016) developed the 2011 Output Area Classification (2011OAC); Singleton and Longley (2015) created the London Output Area Classification (LOAC); Hincks et al. (2018) focused on commuting flows; and Cockings et al. (2020) included it in the data used to create a classification of workplace zones.Brunsdon et al. (2018) used the 2011 Irish Census to create a geodemographic classification for the Republic of Ireland, while Spielman and Singleton (2015) developed a sociospatial classification of the United States based on the 2010 American Community Survey.
Overall, although quite different from one another, all those classifications have two aspects in common.First, the attributes in the source dataset had to be preprocessed, assessed and selected.Second, the classification process was a-spatial.

Attribute selection and dimensionality reduction
As in any analysis, selecting the input data is crucial in constructing a geodemographic classification.For instance, the 2011OAC (Gale et al. 2016) was developed for 232,296 Output Areas (OAs) in the UK, starting from an initial set of 167 prospective census variables from the 2011 UK Census: 86 were removed, 41 were retained as they are, and 40 were combined, resulting in a final set of 60 variables (attributes) used for the classification.The k-means clustering approach was then used to create 8 clusters or supergroups, 26 groups and 76 subgroups.The LOAC (Singleton and Longley 2015) employed the same 60 variables and, following the same procedure, generated 8 supergroups and 21 groups for the 25,053 OAs in Greater London.
Generating a smaller set of attributes to use as input in the clustering process is a necessary step in geodemographic classification.That is due to both the nature of the procedure, which can be disrupted by the presence of highly correlated attributes and its computational complexity.In previous decades, geodemographic classifications had explored the use of data reduction methods such as Principal Component Analysis (PCA) to solve the issue.PCA allows an algorithmic reduction of the number of attributes to a selected set of principal components, which capture the highest dimension of variability in the data.Using PCA made geodemographic classification based on dozens, if not hundreds, of attributes possible, when computational resources were much more limited than today.At the same time, relying on only the main principal components rather than a full set of attributes can have two significant downsides (Harris et al. 2005).First, it might render some interesting patterns undetectable.Second, it renders working with the dataset less straightforward for the practitioner, creating a classification based on 'arbitrary' principal components rather than 'understandable' input attributes.However, as current computing capabilities allow running clustering algorithms on a much larger set of attributes, the use of PCA declined as scholars and practitioners favoured completeness over computational speed.
Revitalising the use of PCA in geodemographic classification, Liu et al. (2019) proposed an iterative procedure to filter attributes from an initial set based on their contributions to the main principal components.The principal components are not used to create the geodemographic classification but rather to identify a good set of noncorrelated attributes to use as input.The authors also suggest that a further step in developing such approaches to attribute selection would be using Geographically Weighted PCA (Harris et al. 2011) to render the whole process more spatially explicit.That highlights again how most geodemographic classifications are largely a-spatial.Carver (1998) first proposed adapting the classic fuzzy c-means approach by redefining cluster membership (m i ) of an areal unit (i) after computation based on the memberships assigned to the spatial neighbours (m j ), based on spatial weights (w ij ) and parameters (a, b, A) as illustrated in Equation 1.In the formulation by Carver (1998), the weights depended on the length of the shared boundary.Mason and Jacobson (2007) developed this idea further and proposed a geographically weighted fuzzy c-means, which integrates the classic fuzzy c-means clustering approach with a spatial interaction model.In their approach, cluster memberships are adjusted at each iteration of the algorithm rather than just at the end.Furthermore, adjusted memberships are based not only on memberships assigned to areal units that share a border (Rook's neighbourhood) but on memberships of all areal units, weighted by distance and population counts.Grekousis (2021) proposed a local version of the approach developed by Mason and Jacobson (2007), limiting the spatial interaction to a distance-based neighbourhood.This local approach effectively introduces a distance threshold beyond which the areal units have no impact on each other's cluster memberships.Grekousis (2021) also abandons the spatial interaction aspect of previous approaches, suggesting that taking into account total population sizes in the weighting process is not theoretically sound, and it might be misleading, as the amount of interaction does not necessarily depend on the amount of population living within two areal units.

Spatially-explicit classification
Alexiou (2016) takes a different approach by proposing to shift the incorporation of spatial dependency from clustering to preprocessing.In this approach, the attributes (x i, a ) are normalised (z i, a ) based not only on the global distribution (l N , r N ) but also based on a local one (l c , r c ).The analyst is then able to manually tune a parameter (g) to inject more globally-normalised or more locally-normalised scores into the final values that are used as input to a k-means procedure to generate the final classification.Alexiou (2016) argues that this approach allows accounting for unseen local variables and spatial processes that can skew what should be considered normal at the local level compared to a national scale.
Overall, the use of such approaches seems to be growing in creating purpose-specific classifications (e.g., Nasution et al. 2020, Majewska andTruskolaski 2019), but they have not been widely applied to general-purpose classifications so far.

Geographical artificial intelligence
Geospatial AI (GeoAI) can be conceptualised as the combination of recent, innovative Artificial Intelligence (AI) approaches, such as deep learning and high-performance computing technologies, with high-resolution spatial big data (Jiang and Shekhar 2017, Singleton and Arribas-Bel 2021, Janowicz et al. 2020).Recent years have witnessed the creation of GeoAI approaches capable of outperforming conventional ones commonly adopted in a wide range of GIScience research, from spatial prediction and interpolation (Murphy et al. 2017, Li et al. 2020) to cartography and mapping (Feng et al. 2019, Duan et al. 2020).Thus, GeoAI is now widely considered a promising approach to data-driven and compute-intensive spatial analytics research (Li et al. 2020).
However, most early approaches in GeoAI have been limited to the adoption or translation of AI methods from computer science with no or only minor modifications when applying them to geospatial problems, thus lacking a solid consideration of a critical element in GIScience-i.e., location (Janowicz et al. 2020, Li et al. 2020, Gao 2021, Liu and Biljecki 2022).More recently, scientific communities have adopted a two-way knowledge transfer, not only bringing AI into geography but also explicitly incorporating geography into AI (Li et al. 2020, Janowicz et al. 2020, Gao 2021).As a result, new methods have started to embed location as an integral component of neural network architectures, assimilating spatial heterogeneity, spatial dependence and other spatial characteristics.GeoAI has started developing spatially-explicit methods to address how spatial is special in the geographical domain (Gould et al. 1996, Gao 2021).
The first and crucial step in developing a spatially-explicit GeoAI method is to represent (or encode) spatial data (e.g. points, lines, polygons) into embeddings (dense numeric representations of data), which the neural networks can utilise.Mai et al. (2022) reviewed a wide range of location encoding methods in GIScience.They identified graph-based aggregation encoders, such as graph neural networks (GNNs), as one of the key approaches to encoding spatial data that are often represented in graph structures-where nodes represent areal units and links (edges) represent spatial proximity.GNN layers aggregate the information from each node and its connected neighbours and propagate it through the neural architecture, producing effective embeddings for nodes, edges or graphs, which can benefit further downstream tasks, such as node classifications.Recent studies have showcased many successful applications of GNNs in various GIScience tasks, from traffic studies (Zhao et al. 2022) to point-cloud classifications (Yue et al. 2022).In our previous work (Liu and De Sabbata 2021), we used graphs to encode the distance between geo-located (point-like) content and aid content classification.
In our own previous work (De Sabbata and Liu 2019), we also explored the use of autoencoders to create geodemographic classifications.In particular, we used deep embedding clustering (Xie et al. 2016) as a general approach to unsupervised clustering, combining a data reduction step with a clustering step.We first trained an autoencoder to create dense representations of the input census variables.The encoder component was then extracted, concatenated with the clustering layer defined by Xie et al. (2016) and trained to optimise the final geodemographic classification.In the same paper, we put forward the idea of geoconvolution to account for higher-level neighbouring effects and mitigate the impact of edge effects by integrating local attribute averages in the clustering approach.The proposed geoconvolution was conceptually akin to a simple GraphSAGE convolutional layer (Hamilton et al. 2017) as the average values from neighbouring OAs were concatenated to each OA's hidden features (here referring to inner numerical representations) at each layer, using geographically generated batches.
A similar approach was also adopted by Singleton et al. (2022), which used a convolutional autoencoder to learn dense representations from multispectral satellite imagery through convolutional neural networks, which were then used as input to a k-means clustering procedure to create a classification of urban contexts.Kang et al. (2022) instead focused on the broader issue of multivariate spatial clustering in a geographic context, formulating it as a repeated geographic pattern discovery problem and proposing a spatial Toeplitz inverse covariance-based clustering approach.The latter accounts for spatial relationships by defining subregions that are clustered while handling attribute dependencies through a Markov random field whose structure is defined by an inverse covariance matrix restricted to follow the block Toeplitz form.Kipf and Welling (2017) first defined a graph convolutional network (GCN) layer as illustrated in Equation 3. As with any neural network layer, a graph convolutional layer combines a series of input values (h ðl−1Þ v , from the previous layer l -1) with a set of weights (W ðlÞ ), and it applies a non-linear activation function (r) to generate an output (h ðlÞ v ), which is then pushed forward through the rest of the network.An optimiser is used to estimate the optimal weights (W ðlÞ ) such that the difference between the network output and a target value, calculated through a loss function, is minimised.In a simple layer, the input values (h ðl−1Þ v

Methods
) are only related to the attributes of each case.The defining characteristic of a GCN layer is that the output of the layer for each node (v) is not only dependent on the attributes of the node (v) itself but also on the attributes of the neighbouring nodes (u 2 NðvÞ)-which commonly include the node itself through the addition of self-loops.In the formulation given by Kipf and Welling (2017), the attributes of the neighbouring nodes are simply averaged.
Hamilton et al. (2017) then proposed a generalisation of the approach above where the average or any other relevant aggregation function AGGREGATE can be used to generate a contribution from the neighbouring nodes, which can then be combined (COMBINE) with the original input through operations such as concatenation before the weights and the non-linear activation function r are applied.
The core intuition behind this paper is that Equation 4 can be understood as akin to Equation 1 used in spatial fuzzy c-means.Equation 1 applies a form of spatial convolution to the fuzzy cluster memberships, whereas Equation 4 applies a form of spatial convolution to the attributes-or to the hidden features derived from them through a neural network.That led us to hypothesise that GNNs can be used to create spatial geodemographic classifications.Here, we explore the use of the graph autoencoder (GAE) architecture proposed by Kipf and Welling (2016) to generate geo-spatially convoluted, dense representations of census variables, which can then be used for an a-spatial clustering process (e.g., k-means or c-means) to create a spatial geodemographic classification.
A key difference between current approaches to spatial fuzzy c-means and the approaches explored in this paper is that while the former use linear combinations and weights manually predetermined by the analyst, the latter use non-linear functions and the weights are learnt via self-supervised training.
In this section, we start by presenting the case study and related data, as well as the evaluation framework we used to assess the performance of the approaches we tested and a series of baselines based on current approaches for spatial geodemographic classifications.We then introduce the GAE-based approaches and discuss how issues related to oversmoothing led us to test approaches based on combining GAEs with Correlational neural Networks (CorrNet) and an approach based on node attributes-focused GAEs.

Data
This paper focuses on Greater London as the study area and uses the 2011 UK Census data.In particular, we use z-scores derived from the 167 prospective census variables originally considered for the creation of the 2011OAC before the attribute selection process (Gale et al. 2016)  4 and the 60 variables (derived from those same 167 prospective census variables) used to create the LOAC (Singleton and Longley 2015) 5 -in the remainder of the paper, we refer to those as 167 z-scores and 60 k-vars, respectively.

Spatial graphs
Spatial weights are a common approach to representing the spatial relationship between geographic areas.Once a spatial relationship (e.g., sharing a border or being within a predefined distance) is selected, a spatial graph is constructed where each geographic area is represented as a node, and two nodes are linked (i.e., are connected by an edge) if they satisfy the chosen spatial relationship.From the spatial graph, a spatial adjacency matrix is derived, which thus encapsulates the spatiality of the geographical data and can be used for subsequent statistical analysis (O'Sullivan and Unwin 2003).
Different ways of constructing spatial graphs may positively or negatively impact the output of spatially explicit deep learning models (Liu andDe Sabbata 2021, Li et al. 2021), therefore introducing additional uncertainties into the geographical analysis (Bhattacharjee and Jensen-Butler 2013).In this paper, we explored three different spatial graph construction methods to explore how the different graphs affect the final geodemographic classification: � Queen's contiguity: areal units are connected if they share a border or a vertex.� K-nearest neighbours (KNN): each areal unit is connected to its K nearest neighbours (based on centroid distance).We set K equal to 8, the average number of neighbours for OAs in London based on Queen's contiguity.� Maximum distance threshold (MDT): each areal unit is connected to all areal units within a set threshold distance (based on centroid distance).We set the threshold to 2,098 meters, the minimum threshold allowing all OAs in London to have at least one neighbour.

Evaluation framework
In order to assess and compare the quality of the classifications, we developed three main metrics.The first metric (Spatially clustered OAs) is based on join count, a measure of spatial autocorrelation for categorical attributes.Similarly to the approach used by Kang et al. (2022), a join count test was performed on each class of a classification, identifying the OAs that are part of a significant spatial cluster of OAs of the same class.The number of spatially clustered OAs for each class of a classification was counted and tallied, and the final percentage of spatially clustered OAs was used as a measure of the overall spatial autocorrelation of each classification.
The second and third metrics are based on the Squared Euclidean Distance (SED) between cases and the centroid of their assigned class (attributes-based cluster), a classic measure of class homogeneity and thus geodemographic classification quality used to evaluate the 2011OAC and the LOAC.A lower SED score indicates a more coherent classification.However, the LOAC has been calculated using the 60 k-vars, while most of the proposed approaches used the 167 z-scores.As such, we included in our evaluation two different SED measures, one for each set of inputs: SED 60 k-vars and SED 167 z-scores.
Furthermore, we also developed a fourth metric (Matching OAs), which is not a direct measure of quality but rather simply compares each classification with the LOAC, the official geodemographic classification for Greater London, which we decided to use as a point of reference.For each classification, we tested all possible pairings between a LOAC class i and a class j computed by the approach, counting the number of matching OAs-those assigned to i by LOAC and to j by the approach under consideration.We then selected the pairing that produced the highest number of matching OAs and calculated the percentage compared to the study area.

Baselines
As the LOAC was not developed as a spatially-explicit classification, we decided to create a series of spatial geodemographic classifications as baselines.We created three classifications using the three spatial graphs discussed above, the 60 k-vars and the spatial fuzzy c-mean (SFCM) algorithm as proposed by Gelb and Apparicio (2021) and implemented by Gelb (2021).We also created three classifications utilising the same SFCM algorithm and one utilising k-means but using as input the first 60 components obtained through a Principal Component Analysis (PCA) of the 167 z-scores as a classic approach to automatic attribute selection.In both cases, we followed the procedure prescribed by Gelb (2021) to identify the best parameters as m ¼ 1.20 and a ¼ 1:80, as named in their implementation.

Graph neural networks
Our first step towards developing a robust framework to create spatial geodemographic classifications using GNNs was to test the effectiveness of common GAE architectures (Kipf and Welling 2016) based on the two earliest and most common approaches to graph convolution: GCN and GraphSAGE.
Common autoencoders (Kramer 1991) are neural networks designed to learn dense representations of input graphs through a process of encoding and decoding.First, an encoder transforms the input attributes (commonly referred to as features in deep learning) into a (commonly smaller) series of hidden features through a series of neural network layers.Then, the decoder tries to recreate the original input from the hidden features through another series of (commonly specular) neural network layers.In GAE architectures (Kipf and Welling 2016), the input is composed of the node attributes and the adjacency matrix (and, possibly, edge attributes).The encoder's layers include graph convolutional layers and possibly simple linear layers, and the decoder is most commonly an inner product that assesses the probability of the presence of an edge between two nodes based on the similarity of their hidden features.Kipf and Welling (2017) based their formulation of a GCN layer on spectral graph theory and thus as an extension of the conventional Euclidean convolution for graphs.The core idea behind GCN is to mimic signal processing over graphs.That is, the graph convolution filter of the GCN performs the eigendecomposition of the graph Laplacian matrix (also referred to as spectral-based convolution process, see, e.g., Balcilar et al. 2020).GraphSAGE (Hamilton et al. 2017) was instead developed from spatial graph theory, and it is thus rooted in the idea of message passing or information propagation through the graph network.More recently, Balcilar et al. (2020) demonstrated how convolution processes can be defined based on spectral or spatial graph theory independently of their original formulation.
We implemented a first set of approaches using StellarGraph (CSIRO's Data61 2018) as GAEs based on GCN and GraphSAGE layers, using each one of the three spatial graphs discussed in Section 3.1.1.The latter was implemented using a mean aggregator, which simply takes the element-wise mean of nodes' features, which is the default setting for GraphSAGE and the one most similar to GCN.We also implemented classifications based on Attri2Vec (Zhang et al. 2019) and Node2Vec (Grover and Leskovec 2016).
However, early results clearly indicated that oversmoothing is a significant threat to the use of GNNs in geodemographic classification, as illustrated by the results discussed in Section 4. If not addressed, oversmoothing results in classifications strongly dominated by geographical proximity, which might be undesirable.Most approaches based on GAE, which focus on reconstructing the adjacency matrix, are based on the intuition that nodes with similar encoding should be more likely to be connected in the graph.This is a valuable approach in many applications and could work well with some geographical networks, such as mobility flows, but it seems to be too strong an assumption for spatial networks such as those taken into account in this paper.To balance this effect, we explored two distinct approaches, outlined in the two following sections: the use of Correlational Neural Networks (CorrNet, Chandar et al. 2016) to create common representations and the implementation of a node attributes-focused graph autoencoder.

Graph neural networks with common representations
CorrNet (Chandar et al. 2016) is a machine learning approach for learning common representations from heterogeneous sources of data (i.e., multimodal data).Its architecture is similar to a conventional single-view deep autoencoder but includes one encoder-decoder pair for each modality of the input data, as shown in Figure 1.A primary advantage of CorrNet over the simple autoencoder model is its ability to learn a joint representation from two different data modalities (but addressing the same task or phenomenon, e.g., images and their captions) such that the correlation between the two inputs is maximised when they are projected in a common subspace.
As such, we tested whether CorrNet could be used to counterbalance the oversmoothed embeddings created by the GNNs, generating joint representations that maximise the correlation between geographic location (the graph-based embeddings produced by a GNN) and original census variables.
We implemented another set of approaches using StellarGraph, combining the embeddings generated by GAEs based on GCN and GraphSAGE with the original 167 z-scores as input to a CorrNet (referred to as GCN-CN and GS-CN, respectively below).In developing the approaches based on GraphSAGE, we also experimented with two addition aggregation approaches beyond the mean aggregator.First, we tested the attentional operator (GAT) (Veli� ckovi� c et al. 2018), which implements the attention mechanism (Vaswani et al. 2017) into the node's value aggregation process.The attention mechanism works in this function by comparing every node in the graph to each of its connected neighbours, including itself, and reweighing the embeddings of each node to include contextual relevance (i.e., neighbours).That is, the graph embeddings are produced by computing and indicating the importance of each node in the graph based on its neighbours.Second, we tested the pooling aggregator, which applies an element-wise pooling operation to generate aggregate values from the features of the neighbours.Max-pooling calculates the maximum value for each hidden-layer output matrix, and mean-pooling calculates the average value for each hidden-layer output matrix.
To learn the node embeddings, GraphSAGE first performs a series of random walks (Pearson 1905) through the graphs, generating a large set of "positive" pairs of nodes where two nodes co-occur within a set context window.A set of "negative" pairs of nodes of the same size is also generated through random sampling according to the global node degree distribution of the graph.GraphSAGE then learns a simple binary classification model able to predict whether two arbitrary nodes are likely to co-occur in a random walk, which requires the creation of node embeddings in a high-dimensional vector space, where proximity equates to closeness in the graph structure and similarity of the represented values.As the learning process is impacted by the length of the random walks, we also explored different walk lengths, observing that longer walks result in worse classifications.

Node attributes-focused graph autoencoder
In our final approach, we explored a different strategy for addressing the issue of oversmoothing by restructuring the nature of the GAE decoder to focus on node attributes instead of the adjacency matrix.This approach is akin to the GALA architecture proposed by Park et al. (2019), which is designed to reconstruct node attributes using a decoder specular to the encoder.Park et al. (2019) demonstrate how common graph convolutional layers are unsuitable for the decoding process, as they apply a Laplacian smoothing which disrupts the node attributes reconstruction.As such, they propose the use of Laplacian sharpening layers to counterbalance the Laplacian smoothing introduced in the encoding process by the graph convolutional layers.In this paper, we simplify the architecture through a compromise solution, adopting a decoder composed of simple linear layers, although we plan to explore the use of Laplacian sharpening layers in our future work.Moreover, compared to the GALA approach, which utilises GCN convolutional layers, we decided to adopt the attention mechanism (Vaswani et al. 2017) through the use of GAT convolutional layers (Veli� ckovi� c et al. 2018).
To identify the best design for this architecture, inspired by the approach described by You et al. (2020), we conducted a systematic search of the design space as outlined below, implemented in PyTorch Geometric (Fey and Lenssen 2019).
� 1 or 2 post-processing linear layers with 60 output features.� A decoder composed of 1, 2 or 4 linear layers with either 60 or 167 output features, where the last layer always had 167 output features.
All linear layers were followed by a leaky ReLU activation function using a 0.02 negative slope, except for the last layer of the encoder, which was followed by a batch normalisation layer (to obtain normalised hidden features for the clustering step) and the last layer of the decoder.The same 0.02 negative slope was applied to the GAT layers.We used mean squared error as the loss function, an AdamW optimiser with a 0.0001 learning rate for a maximum of 5000 epochs, with early stopping if there was no improvement for 300 epochs, and batches of 256, 512, 1024, and 2048 nodes generated through a GraphSAINT node sampler (Zeng et al. 2020).As the design space outlined above and including all three spatial graphs defined in Section 3.1.1resulted in 10368 possible combinations, we followed an iterative process, sampling 100 random designs and removing the options that seemed to produce worse results.We then repeated the same process for another random sample of 100 of the 1152 remaining designs and then tested the final 288 designs.
Figure 1 illustrates the best-performing design of our node attributes-focused graph autoencoder (NAGAE) framework.The model (NAGAE-d1) is among the simplest in our design space, including a preprocessing layer with 60 output features, two GAT layers with 2 attention heads and a zero edge dropout rate, one post-processing layer and one single layer in the decoder, trained for 1167 epochs with batches of 1024 nodes and using Queen's.However, while NAGAE-d1 achieves the best overall performance, we notice a lower performance when using the KNN8 and especially the MDT spatial graphs.As such, in the following sections, we also showcase another model (NAGAE-d2), which is similar to NAGAE-d1 but uses 2 post-processing layers and a 0.5 dropout rate.NAGAE-d2 has a very similar, although slightly lower, performance compared to NAGAE-d1 when using Queen's and far better performances when using the KNN8 and MDT.

Results
The outcomes of our evaluation are reported in Table 1, while Figure 2 showcases the key patterns emerging from those scores, illustrating how the SED measures and the number of matching OAs (on the y-axis) relate to the spatial clustering measure (on the x-axis).Both Figures 2a and 2b show similar patterns, including a string of cases starting at the bottom centre around the LOAC and ending in the middle right around the NAGAE models, indicating an overall consistent assessment of class homogeneity.Figure 3 further illustrates the results through a series of maps, including the LOAC (Figure 3a) and a selection of the classifications obtained through the baselines and the approaches presented in this paper.The colours have been chosen to maximise  1.
the correspondence with the LOAC's classes, thus visualising the matching score presented in the last column of Table 1 and on the y-axis of Figure 2c.
In Figures 2a and 2b, a set of approaches is visible near the LOAC (in dark green), including the two approaches using Attri2Vec and Node2Vec (in purple) and all but one of the GraphSAGE-CorrNet approaches using the 167 z-scores and the KNN8 spatial graph (in dark red in Figure 2).The LOAC performs better than these approaches when measured against SED 60 k-vars, as the classification has been optimised towards that specific measure but slightly worse when measured against SED 167 zscores, which is the basis on which the other models have been built.Overall, this group of approaches does not seem to provide a sizable advantage compared to LOAC regarding the final output.At the same time, it is important to highlight how these approaches provide classifications of similar quality to the LOAC without requiring the long manual preprocessing required to create the 60 k-vars.
Moving further to the right in Figures 2a and 2b, we find a series of models that result in higher SED scores (thus lower classification homogeneity) than the LOAC but far higher spatial clustering scores.These include the classifications obtained using SFCM on the 60 k-vars (in dark yellow) and those obtained using PCA and SFCM on the 167 z-scores (in light yellow), which perform very similarly to one another-the former scoring better for SED 60 k-var the latter for SED 167 z-score, as expected.Close to those classifications, we also find the classification obtained using GCN-CorrNet with KNN8 on the 167 z-scores, which completely overlaps with the classification obtained using PCA and SFCM with Queen's on the 167 z-scores in Figure 2a.However, the latter classification shows better class homogeneity based on SED 167 zscores and higher similarity to the LOAC.That is illustrated in Figure 3c (SFCM) and 3e (PCA and SFCM), which resemble the LOAC in Figure 3a more closely than Figure 3b (GCN-CN).For instance, GCN-CN shows much larger regions of OAs assigned to the same class, such as class 3 (red), which spans throughout large parts of inner London, including areas classified both as class B "High Density & High Rise Flats" and class E "City Vibe" (red and green in Figure 3a), whereas the two are more distinguishable in Figures 3c and 3e (green and red).
Arriving at the middle right-most section of Figures 2a and 2b, we find the classifications based on the NAGAE models (dark and light blue), which achieve better spatial clustering compared to the classifications based on both SFCM and those based on CorrNet, while maintaining similar class homogeneity scores.Looking at Figure 2a, we can see how the classification based on NAGAE-d1 with Queen's achieves similar SED 60 k-vars class homogeneity scores as the classification obtained using PCA and SFCM on the 167 z-scores and the one obtained using GCN-CN with KNN8 on the 167 z-scores (0.958 compared to 0.932 and 0.936) but a far higher spatial clustering score (93.05% compared to 82.94% and 83.21%).The classification obtained using SFCM on the 60 k-vars and Queen's still achieves better class homogeneity (0.888) but a lower spatial clustering score (85.72%).When looking at Figure 2b and SED 167 z-scores, the classification obtained using NAGAE-d1 achieves lower class homogeneity than the one obtained using PCA and SFCM (120.3 compared to 114.4) and is closer to the classification obtained using SFCM on the 60 k-vars and GCN-CN (both scoring 118.2).In both Figures 2a and 2b, if we draw a line across the plane delimiting the border of the best results (lower SED scores and higher spatial clustering), we can see that the classification obtained using NAGAE-d1 is on the border in both cases, whereas the classification based on SFCM on the border for Figure 2a and the classification based on PCA and SFCM is on the border for Figure 2b.Moreover, in Figure 2c, we can see how NAGAE-d1 provides a classification that is more similar to LOAC compared to NAGAE-d2 while achieving overall the same scores in class homogeneity and spatial clustering.That seems to indicate that NAGAE-d1 offers the overall best compromise classification based on our evaluation framework.
Those results are further illustrated by Figure 3d, where the spatial distribution of classes very closely resembles Figures 3a, 3c and 3e while maintaining a higher spatial clustering than all of them.Moreover, we can see how classes 1 and 7 (red and green) in Figure 3d more closely map the LOAC class B "High Density & High Rise Flats" and class E "City Vibe" (red and green in Figure 3a) compared to the classification in Figure 3b based on the GCN and CorrNet model.
Finally, on the top right of Figures 2a and 2b, we can see the two approaches based only on GraphSAGE, resulting in far less homogeneous classes, seemingly focusing solely on spatial clustering.That is illustrated well by Figure 3f, which maintains some of the overall structure of the LOAC but shows a set of almost completely uniform patches of colour.

Discussion and conclusions
In the introduction, we set out to explore and examine a new perspective on geodemographic classification.We posed the question of whether the use of deep learning approaches could render the creation of such classifications easier by providing an alternative to the time-consuming process of attribute selection and transformation while also incorporating the spatial aspect of the geographic data.The results presented above are promising, but they also illustrate the drawbacks of using such approaches.A key challenge in developing new methods for geodemographic classification is that, compared to other fields of GeoAI, there is no ground-truth or benchmark by which new approaches can be measured.Summary measures such as SED for class homogeneity and the number of spatially clustered OAs can not ultimately assess whether a classification adequately represents the socio-economic situation on the ground.At the same time, the results outlined above do allow us to reach three main conclusions.
First, approaches based on GNNs can create geodemographic classification with minimal attribute preprocessing.The use of classic dimensionality reduction approaches such as PCA has been criticised, as such approaches tend to maintain either the global or the local properties of the attributes.We demonstrated that approaches based on GNNs can produce classifications from a simple transformation of a set of prospective attributes (167 z-scores), which are of higher quality compared to classifications based on PCA and SFCM when measured against spatial clustering and comparable quality when measured against class homogeneity.
Second, approaches based on GNNs can create geodemographic classifications that effectively incorporate neighbouring effects.Our NAGAE approach based on a GAE with GAT convolutional layers designed to focus on the reconstruction of node attributes instead of the adjacency matrix allows to achieve a balanced geospatial convolution that avoids oversmoothing.In the design presented here as NAGAE-d1 seems to achieve the best overall results, demonstrating the capability of improving the spatial clustering produced by approaches based on SFCM while also capturing the internal structure of Greater London in a manner closer to the LOAC compared to other GNN-based designs.
Third, classification results can vary starkly based on the spatial graph used and how that is combined with the GNNs hyperparameters.How areal units are linked through a geospatial graph greatly impacts how the node attribute information (i.e., the census variables associated with each OA) is aggregated in the graph convolution process.For instance, the Queen's and KNN8 spatial graphs demonstrate an advantage over the distance-based graph MDT as defined in our experiments.The latter creates a graph linking each OA and all OAs within about 2 km, generating a large number of edges.As a result, the models seem to either oversmooth irreparably or to learn weights that completely disregard the information coming from the neighbours, depending on the nature and capabilities of the architecture.However, meaningful results can be obtained even with such dense geospatial networks by introducing a large dropout rate, which randomly ignores some of the edges, as shown by the NAGAE-d2 design.
The proposed framework also presents three key challenges, which we aim to explore in our future research.First, developing such models requires expertise in deep learning and knowledge of related libraries and development environments, and thus a higher level of technical expertise compared to classic models such as k-means.Moreover, deep learning approaches allow for a wide range of designs that require specialised expertise, and most AI fields and tasks have developed common practices, as GIScience did in the development of classic approaches to geodemographic classification.A similar development of shared expertise will be required before the time saved in the attribute selection step is not lost to the model design step.Second, our framework currently provides no specific procedure to assess the optimal number of classes.While the analysis here presented was based on the eight supergroups of the LOAC, the classic "elbow" approach used in creating classifications based on k-means could be adopted, although that might not be the optimal approach.Third, deep learning models provide a much lower explainability, as is generally the case for deep learning approaches.At the time of writing, Explainable AI (XAI) is an active field of research in GNN (see, e.g., Ying et al. 2019, Huang et al. 2022), as well as GIScience (Xing and Sieber 2023), which will allow us to gain a better understanding of the inner workings of the models, developed through GNN architectures.
The key opportunity that the models presented in this paper open up is a new way to fuzz geodemographic classifications, which we aim to explore in our future research.The flexibility and complexity of GNNs might allow analysts to take better control over how neighbouring values are aggregated and combined, as well as how to define neighbouring areas.First, the flexible nature of the general frameworks defined by Hamilton et al. (2017) allows for the implementation of a wide range of operators.For instance, Alexiou (2016) highlights the importance of integrating "geographic sensitivity" (Alexiou 2016, p. 123) into a geodemographic classification by incorporating a level of spatial dependency, which can be beneficial for policy-making the local context is more relevant than a comparison to national-level norms.The design proposed here would not be able to produce a classification that incorporates spatial dependency in a fashion similar to the approach proposed by Alexiou (2016)-as that would require layers to incorporate local standard deviation for each node into the weights, which are instead global.However, it might be possible to apply a similar comparison to local values as part of the graph convolution by integrating it into the aggregate and combine process.Second, the graphs used for the convolution process do not need to be based on distance, as is the case here.Other types of geospatial networks could be included, such as travel time or commuting patterns, or even non-(only-)spatial relationships, such as virtual interactions.Moreover, the design might not be limited to simple graphs, but it could be expanded to account for multi-graphs (Butler et al. 2023), including multiple edges between pairs of nodes that could encode a wide range of geospatial and non-spatial relationships.
Overall, our graph neural network framework for geodemographic classification has the potential to develop into a wide range of approaches.On the one hand, it can allow the simple, quick and effective creation of specialised spatial geodemographic classifications, which could be developed for and in collaboration with single local authorities and communities, thus benefiting a local understanding of our cities and counties and driving a positive impact on local policies and development.On the other hand, it can be expanded in its complexity to explore complex classifications incorporating a wide range of relationships between areal units, both within and beyond the geospatial realm.

Figure 1 .
Figure 1.An overview of the NAGAE framework, showcasing design 1 (NAGAE-d1).Map data source: CDRC LOAC Geodata Pack by the ESRC Consumer Data Research Centre; Contains National Statistics data Crown copyright and database right 2015; Contains Ordnance Survey data Crown copyright and database right 2015.

Figure 2 .
Figure 2. Visual representation of the results presented in Table1.

Figure 3 .
Figure 3. On the left, the spatial distribution of the LOAC (Singleton and Longley 2015) supergroups (a) and two baselines (see Section 3.3): SFCM on the 60 k-vars using Queen's (c) and PCA and SFCM on the 167 z-scores using Queen's (e).On the right, the spatial distribution of the classes obtained using three GNN-based approaches: GCN-CN on the 167 z-scores using KNN8 (b, see Section 3.5), NAGAE-d1 on the 167 z-scores using Queen's (d, see Section 3.6) and GraphSAGE on the 167 z-scores using KNN8 (f, see Section 3.4).Data source: CDRC LOAC Geodata Pack by the ESRC Consumer Data Research Centre; Contains National Statistics data Crown copyright and database right 2015; Contains Ordnance Survey data Crown copyright and database right 2015.

Table 1 .
Results obtained by each classification for the four metrics described in Section 3.2.
GS stands for GraphSAGE and CN stands for CorrNet.