Applications of federated learning in smart cities: recent advances, taxonomy, and open challenges

Federated learning (FL) plays an important role in the development of smart cities. With the evolution of big data and artificial intelligence, issues related to data privacy and protection have emerged, which can be solved by FL. In this paper, the current developments in FL and its applications in various fields are reviewed. With a comprehensive investigation, the latest research on the application of FL is discussed for various fields in smart cities. We explain the current developments in FL in fields, such as the Internet of Things (IoT), transportation, communications, finance, and medicine. First, we introduce the background, definition, and key technologies of FL. Then, we review key applications and the latest results. Finally, we discuss the future applications and research directions of FL in smart cities.


Introduction
In a smart city, different types of electronic Internet of Things (IoT) sensors are implemented to collect data (Mahmud et al., 2018) in urban environments. Although the existing urban internet architecture is complex, decision makers can use the insights gained from these data to effectively manage assets, resources, and services in urban areas. The operating model of a smart city involves the use of IoT sensors to collect data. Furthermore, it can realise effective applications in a series of fields, such as urban public services, resource allocation, and communications. Simultaneously, smart cities have provided efficient solutions for key issues, such as the development of IoT (Li, Zhao, et al., 2020), medical care (Rath & Pattanayak, 2019), transportation (Qiu et al., 2019), and communications (Guan et al., 2018).
In this large-scale information exchange process, sensors generate a large amount of data. These data are of great significance for improving the application of a programme and helping managers optimise their decisions. However, a large proportion of data is sensitive and involves user-generated private information (Khan et al., 2019). First, we must prevent the inclusion of private data of users during data processing. Additionally, problems related to low data resource utilisation and network congestion (McMahan, Moore, et al., 2017) exist in the process of data interaction.
Currently, the self-organisation theory (J. , machine learning (ML) (Li, Zhao, et al., 2020), edge computing nodes (Rahman et al., 2019), system simulation (Lv et al., 2019), and other computing implementations have large network bottlenecks in practical applications related to smart cities. In addition, the problem of low efficiency when using network resources still exists; therefore, a distributed learning paradigm is required. Distributed frameworks can reduce network bottlenecks. The problem of user privacy is solved through the collaborative sharing model of IoT devices. Some examples of distributed organisation computing are mentioned in current smart city applications. However, these practical applications have not adequately solved the problem of user privacy (Zhou et al., 2017).
FL has the advantage of solving the aforementioned problems. Under a FL framework, users can use data without obtaining private information about other participants. The related data are stored locally (McMahan, Moore, et al., 2017). Users periodically share their local model gradients with the coordination server for a specific duration. The server organises the training data and measures the contribution of all participants (Smith, Forte, et al., 2017). It constructs a global model by averaging all gradients in a network  at the server level. Subsequently, the coordination server distributes the new model update to all clients (Lim, Luong, et al., 2020). Each client uploads its local model to the server. Then, users download the new updated models and use cloud distribution to realise inference on their devices. This is the complete operating principle of FL algorithms.
FL has the advantages of distributed processing and effective privacy protection. Some common distributed communication devices, such as mobile phones, have communication transfer problems. FL proposes a federated domain-adaptive method based on the domaintransfer problem. This model solves the problem of data privacy and efficiency (Peng et al., 2019). Meanwhile, some scholars have implemented the block-chain FL (BlockFL) architecture that can exchange and verify local learning model updates. It can describe the best block generation rate by considering communication and consensus delay issues (Kim et al., 2019). Research related to FL has been conducted in the fields of IoT, communications, and public services. These practices promote the updating and development of applications in smart cities. In Figure 1, we show the application of FL in fields related to smart cities.
Thus far, FL has not been popularly used in smart city development; this led us to conduct a comprehensive investigation. This study makes the following contributions.
• This work introduces the background, definition, and key technologies of FL.
• This work classifies and summarises the latest research on the application of FL in smart cities. Simultaneously, we review key technologies and the latest results in Figure 2. • We discuss the future development of applications and research directions for FL in smart cities.
The remainder of this paper is organised as follows. Section 2 introduces the definition and key technologies of FL. Section 3 explains the applications of FL in the IoT system  of smart cities. Then, Section 4 discusses the applications of FL in transportation systems. Section 5 presents the applications of FL in the financial field of smart cities, and Section 6 introduces the applications of FL to the medical field of smart cities. Section 7 explains the communication of FL in smart cities. Section 8 discusses the future developments and directions for FL in smart cities. Finally, the conclusions are presented.

Definition and key technologies of federated learning
The concept of FL has been widely proposed (Konečnỳ et al., 2016), implemented, and applied in various fields. Most of the existing large-scale studies have implemented distributed learning in the development of big data (Dayarathna et al., 2017) and cloud computing (Dean et al., 2012). The concept of FL involves building a ML model using data distributed across multiple devices. This solves the problem of data privacy. Currently, there is a rapid increase in the use of distributed computing agents, and FL has become an effective solution to this problem because it protects user privacy in the process of information and knowledge sharing (Smith, Chiang, et al., 2017). For example, smart behaviours currently exist in mobile devices. Mobile phones and tablets use image classification to predict the classification of pictures that have been previewed multiple times (Ji et al., 2019). FL is based on the use of data and information processing to improve user experience. Moreover, several insurance companies are concerned about protecting their data, which they are unwilling to share with other entities (G. . In this case, multiparty data can be used in the FL framework because it solves the privacy problem of ML. Recent research improvements in FL have mainly focussed on statistical challenges (Smith, Chiang, et al., 2017) and security issues (Bonawitz et al., 2017). Simultaneously, research has made FL more personalised (F. Chen et al., 2018). This process involves factors such as data interaction among distributed mobile users, unbalanced data distribution, and communication costs in equipment reliability. It can inspire researchers to continuously overcome challenges related to data privacy, computational constraints, and communication costs. In addition, the concept of FL has been extended to include other collaborative learning programmes between organisations. Here, we provide a preliminary explanation of the extension of the original concept of FL to other distributed collaborative ML. We further investigate the application of FL in smart cities and discuss its development status and future directions. In this section, we provide a more comprehensive overview of FL and consider its definition, privacy, training process, and classification structure.

Basic definition of FL
We define N data owners as F 1 , F 2 , . . . , F N , who hope to train their own ML models by merging their respective data, D 1 , D 2 , . . . , D N . A conventional method combines all data. It uses D = D 1 ∪ D 2 · · · ∪ D N to train and obtain the model (M SUM ). FL is a systematic learning process in which data owners jointly train the model (M FED ). Data owner F i will not disclose their own data, D i , to others. In addition, the accuracy of M FED , which is expressed as V FED , should be close to the performance of V SUM of M SUM . In terms of expression, let ε be a non-negative real number; if |V FED − V SUM | < ε, then we can assume that the FL algorithm has an ε error accuracy.

Privacy technologies of FL
Privacy management is one of the core elements considered in FL. The realisation of this requirement requires the analyses of security models. In this section, we briefly describe different privacy technologies that are currently used for FL.

Secure multi-party computation (SMC) model
SMC models involve data from multiple parties; they provide safety certification under a known and clear simulation framework. These models guarantee zero interaction of the knowledge data. In this case, in addition to the input and output terminals, users are unaware of these information data. The zero-knowledge model formed under this condition is highly expected. Based on this feature, we can consider a part of public knowledge under the complex and secure computing protocol. Research has demonstrated that SMC can be used to establish a security model to improve computing efficiency under low security conditions. In addition, the multiparty computation protocol performs model training and verification. In this process, users do not need to disclose privacy-sensitive data (Kilbertus et al., 2018). However, SMC still has limitations. First, as an algorithm for data privacy protection, it cannot manage the curiosity from the server. Moreover, for privacy attacks from other clients, the FL system has an acceptable defense. However, because SMC is a four-round interactive protocol, the server does not learn the client data before completing the submission phase, which results in data wastage and reduces the model accuracy.

Differential privacy
Existing research uses differential privacy technology to ensure data privacy . The aforementioned methods can process data and mask certain privacy-sensitive attributes. This makes it impossible for third-party users to distinguish between users; thus, data become unrecoverable and user privacy is maintained. However, the disadvantage of this method is that the data must be transferred to other locations, which may affect the accuracy of data. Therefore, we need to make a tradeoff between accuracy and privacy. Currently, many applications have implemented this privacy-processing method. Some researchers have proposed a differential privacy method for FL. They have been enabled to hide customer contributions during the training process to protect client data (Geyer et al., 2017).

Homomorphic encryption
The homomorphic encryption operation model is an encryption mechanism in the ML process. It uses parameter exchange to protect the privacy of user data (Giacomelli et al., 2018). The difference between homomorphic encryption and differential privacy protection is that the data and models themselves will not be transmitted. Their data are also encrypted without discovery. Therefore, its advantage is that the probability of leakage of original data is minimal. In practice, the additive homomorphic encryption model (Acar et al., 2018) is widely used.

Typical architecture and training process of FL systems
In FL training systems, data owners also participate. They train a shared model in the aggregation server center. In this architecture, the basic premise includes honest data owners and accurate data. This requires data users to employ real private data for training. After training is completed, the relevant parameters of local model training are submitted to the FL server.
Generally, the FL training process includes the three training steps mentioned below. We first define a local model as the model trained on each participating device. The global model refers to the model after the FL server is aggregated.
• Step 1. Initialise a task. The server determines the training task. In other words, the target application and corresponding data requirements are determined. Meanwhile, the server specifies the global model and establishes the parameters in the training process, such as the learning rate. The global model parameter W 0 G is initialised by the server. Training tasks are assigned to the participating users to complete the task assignment.
• Step 2. Train and update the local model. Training is performed based on the global model, W t G . Here, t represents the current iteration index. Each participating client uses local data and equipment to update the local model parameter, W t i . The ultimate goal of participant i in iteration t is to find the optimal parameter W t * i that minimises the loss function, L(W t i ), namely arg min L(W t i ), where W t i = W t * i . • Step 3. Aggregate and update the global model. The server aggregates the local models of participating users and sends the updated global model parameter W t+1 G to users with data. The server continuously calculates the minimum global loss function, . Repeat steps 2-3 until the training global loss function converges or reaches the required training accuracy.

Application of FL to IoT system in smart cities
The development of IoT and the application of FL have enabled the provision of technical support for the transformation and progress of smart cities. However, many user privacy and information security issues have been exposed. The framework model of "FL + IoT" has solved many problems. FL builds a scalable production system for mobile devices (Bonawitz et al., 2019) Figure 3), which has improved the system architecture design. In addition, the combination of blockchain and FL constitutes a BlockFL architecture. It enables the comparison of the performance of different terminals (Kim et al., 2019). The following factors must be considered in the process of realising these applications: • Privacy. A core objective of FL involves protecting the private information of users.
Recent research has shown that some malicious participants or FL servers may be present in the FL process, potentially resulting in privacy and security issues and generating corrupted global models. Malicious users can infer sensitive information, such as gender, occupation, and location of users, based on the sharing models of other participating users. Researchers have used the FaceScrub dataset to train a binary gender classifier. In this process, by checking the shared model, the accuracy of inferring whether the input of a participant is included in the dataset was found to be as high as 90% (Melis et al., 2019). • Security. During the FL training process, the participating users train the learning model locally and share the training parameters with other participants, which can improve the forecast accuracy. However, they are often vulnerable to various attacks. For example, data and models go missing or become corrupted. In this attack mode, malicious users may send incorrect parameters or corrupt models. Thus, the global model will be updated incorrectly and the entire learning system will be damaged. Simultaneously, loopholes in the FL protocol may result in data privacy issues. Lyu et al. analysed and investigated the threat model and attack method under this behaviour (Lyu et al., 2020).

Data application scenarios under IoT
Some scholars have proposed a novel FL framework for efficient communication and privacy protection that improves the performance of IoT. Subsequently, it stabilised dynamic data flow through Transmission Control Protocol and Cubic Curve Binary Increase Congestion (TCP CUBIC) flow on WiFi networks. Finally, it obtained a good training model (Pokhrel & Choi, 2020b). The creation of a joint cloud video recommendation framework based on deep learning (DL) for mobile IoT meets the user requirements for applications. Simultaneously, it uses quantitative methods to reduce the uplink communication cost and network bandwidth (Duan et al., 2019). In addition, FL enables resource-constrained edge computing devices to learn shared predictive models (Y. Zhao et al., 2018).

Blockchain federated learning (Block FL)
The development of blockchain technology has provided a new development direction for IoT. The BlockFL architecture efficiently updates local learning models. It uses a consensus mechanism and performs effective performance data analyses (Kim et al., 2019). In industrial IoT, some researchers have designed a secure data sharing architecture authorised by blockchain. This process maintains data privacy effectively through a shared data model. Compared with real-world datasets, it has good accuracy, high efficiency, and safety (Lu et al., 2019). The existing FL method is based on a semi-honest assumption that the client achieves SMC, which is vulnerable to attacks from malicious clients. Awan et al. (2019) proposed a blockchain-based privacy-preserving FL (BC-based PPFL) framework based on the immutability and decentralisation of blockchain. It ensures that the local model is safely updated and the data sources are reliable.

Federated learning for visual detection protection
In traditional security, video collected by cameras is often used as basic data in urban communities. After that, it conducts information processing through the computer system. At the same time, the monitoring room serves as a monitoring link. This is supplemented by manual detection of unsafe actions. This process is time-consuming, high in labor costs, and inefficient in management.
In addition, the travel of personnel is unpredictable. Abnormalities in special populations (such as the elderly, drug users, criminals) cannot be discovered in time. And the existing exception definition relies on manual rules. The early warning has errors and missed judgments. The monitoring data are not related to each other. And there are data barriers between them.
Visual object detection on FL provides a good solution to the above problems. It has implemented many applications, such as fire hazard monitoring. FedVision is a computer vision application platform that supports FL. It can provide security monitoring solutions for smart cities (Y. . The preset algorithm completes pre-warning by training the model. It carries out the following steps : (1) High-precision focussed shooting; (2) Judging the location and identifying actions; (3) Analysing behaviour and predicting user travel trajectories and abnormal trips. This greatly improves community safety and community management efficiency. FL uses multi-community data to build a security model. And it finally realises the interconnection and intercommunication between community information.

Federated learning for edge computing
FL has been combined with edge computing and it achieved good practical application (Valêncio et al., 2020). The use of edge and terminal computing can meet the requirements of cloud capacity and equipment at the edge of networks. Under this condition, FL has successfully applied a 4G/5G-based interconnected vehicle edge computing platform. This model completes the edge collaborative learning of real datasets collected by large electric vehicle (EV) companies. This method has the advantages of driver personalisation, asynchronous execution, and security protection. In addition, personalised FL for intelligent IoT applications can alleviate the negative impact of heterogeneity from different perspectives . Simultaneously, a framework design based on FL can utilise limited bandwidth resources. We must combine DL techniques and FL frameworks with mobile edge systems simultaneously . This can accelerate the application of mobile edge computing (MEC).
The existing implementation mode of FL allows computing nodes to only synchronise the local training model in distributed training. This results in a FL architecture that relies on highly concentrated models and a large server bandwidth. However, network capacity distribution between nodes is highly uniform and smaller than that of a data center. In Jiang et al. (2020), the authors proposed that the bandwidth between nodes can be used to accelerate communication. First, they performed staff selection through segmented gossip aggregation and bandwidth awareness of a network. Second, they used the bandwidth between the client nodes of the worker. This ultimately increased the convergence speed and reduced the number of communication rounds involved. Nowadays, general FL systems use a central parameter server to coordinate a large federation of participating workers. Workers use their own datasets to train local models. The parameters are periodically updated to the server for synchronisation. Model updates of all nodes in a system are sent to other nodes. However, this process consumes a significant amount of bandwidth resources, thereby resulting in increased costs. Therefore, they used the model split-level synchronisation mechanism. First, they divided a model into a set of segment subsets containing the same number of model parameters that do not overlap. Second, the workers aggregated the partial divisions with the corresponding divisions of k other workers. Then, a segmentation level update was performed. Third, they divided other workers, which maximised the bandwidth capacity between workers. Thus, the communication cost was shared and the convergence speed was further accelerated.
X.  proposed to combine deep reinforcement learning techniques and FL frameworks with mobile edge systems. This can optimise MEC. In this process, the In-Edge AI framework was designed. It can intelligently use the cooperation between a device and edge node to exchange learning parameters. It can achieve dynamic system-level optimisation and application-level enhancement. The key to solving this problem is that computing offloading requires wireless data transmission. The optimisation of entire communication and computing integration system jointly allocates the communication and computing resources of edge nodes. It also uses floating and edge cache calculations between MEC systems. In addition, FL (B.  has been introduced as a framework for training agents in a distributed manner. The effects of this method are as follows: (1) it reduces the amount of data that should be used; (2) it responds to mobile communication environments and cellular network conditions; (3) it isomerises user equipment well, and (4) it protects personal data privacy. Table 1 summarises the current studies that provide certain solutions for the development of practical applications under the framework of IoT. However, the realisation of FL involves various problems, such as computing power, heterogeneity, security, and resource integration Li, Sahu, et al., 2020;Lim, Luong, et al., 2020;. They have an adverse effect on the development of IoT. Thus, we propose possible solutions to solve these problems. The specific plan is as follows:

Challenges and possible solutions
• Sparsification of FL: The influence of factors such as wireless resource limitations and noisy datasets often affects the convergence of FL and training of local models. We can construct a gradient-based sparsity scheme by integrating the available communication resources. Simultaneously, the dataset must be cleaned up and equipment with sufficient computing power should be selected for training. • Heterogeneous clustering of FL: Currently, numerous equipment datasets have certain statistical heterogeneity. It significantly reduces the convergence performance of FL. We can select terminal devices with a certain degree of trust to cluster in a group of datasets. • Security of FL: Malicious terminal devices may be present during the training process.
These wrong local learning model parameters will affect the accuracy and convergence time. Blockchain can be used to verify the update of terminal equipment. • Resource allocation of FL: Terminal equipment interferes with cellular users and occupies uplink communication resources in the process of FL. We can attempt to establish a resource allocation mechanism based on game theory. This one-to-many matching theory can effectively integrate resources and link resource blocks with other terminal devices that allocate resources.  Zhao et al., 2018) (1) Focused on the statistical challenges of FL when local data are non-IID.

FL
(2) Calculated the earthmover's distance (EMD) of each device distribution to quantify the weight difference.
(3) Created a small portion of data that is globally shared between edge devices to improve training on non-IID data. (Bonawitz et al., 2019) (1) Provided a scalable production system for FL in the field of mobile devices based on TensorFlow.

FL
(2) Introduced advanced designs for FL of mobile devices, such as on-device item ranking, content suggestions for on-device keyboards, and next word prediction. (Duan et al., 2019) (1) Proposed the JointRec federated cloud video recommendation framework.

JointRec
(2) Reduced the uplink communication cost and network bandwidth. (1) Proposed a deep network FL based on iterative model averaging.

FL
(2) Compared with synchronous stochastic gradient descent, the number of communication rounds required was considerably reduced. Blockchain (Kim et al., 2019) (1) Used blockchain to propose a BlockFL framework. BlockFL (2) Analysed the end-to-end delay model of BlockFL.
(3) Described the optimal block generation rate from communication, calculation, and consensus delay. (Lu et al., 2019) (1) Proposed a secure data sharing architecture authorised by blockchain.

BlockFL
(2) Converted data sharing problems into ML problems by merging and preserving privacy-preserving FL.
(3) Used consensus calculations for training. IoV (Pokhrel & Choi, 2020b) (1) Proposed an efficient FL framework to improve the performance of the Internet of Vehicles.

FL
(2) Considered the TCP CUBIC flow on WiFi network for verification and stabilised its data flow dynamics. Edge Computing  (1) Promoted a personalised FL framework for smart IoT applications in the cloud edge architecture.

FL
(2) Reduced the negative impact of heterogeneity in the training process. (3) Realised fast processing and low latency through edge computing. (X.  (1) The In-Edge AI framework was designed to use the collaboration between devices and edge nodes to exchange learning parameters.
In-Edge AI (2) Realised dynamic system-level optimisation and application enhancement. (3) Reduced unnecessary system communication load. IoT communication  (1) Proposed Bandwidth Aware Combination (BACombo) to solve network capacity issues between computing nodes.

BACombo
(2) This mechanism used the node-to-node bandwidth to accelerate the communication time. (1) Proposed a learning architecture for navigating in cloud robot systems: Lifelong Joint Reinforcement Learning (LFRLA).

LFRLA
(2) Improved the efficiency of reinforcement learning for robot navigation.

Application of federated learning to intelligent transportation systems in smart cities
Transportation is an integral part of a smart city. As shown in Figure 4, we can solve various problems in transportation systems through FL, such as communication delays, calculation processing, and data privacy.

Vehicle communication
Emerging vehicle applications highly depend on vehicle-to-vehicle communication. Therefore, it is necessary to consider ultra-reliable low-latency communication (URLLC) in vehicle networks when developing intelligent transportation systems (Ashraf et al., 2017;Pokhrel & Choi, 2020b). Samarakoon et al. (2018) described the problem of joint power control and resource allocation in vehicle-mounted communication networks as a net-range power minimisation problem constrained by URLLC. Vehicular users estimate the tail distribution locally with roadside units (RSU) . The constraint of URLLC is characterised by the extremum theory and is modelled as a tail distribution of the network scope queue length over a predefined threshold. It can effectively reduce delays and enhance reliability (Samarakoon et al., 2019). However, cloud-based learning methods are relatively slow. Pokhrel and Choi (2020b) proposed a systematic IoT network design approach that could accelerate the learning process of data transmission protocols (e.g. TCP) that convert vehicles into mobile data centers.

Electric vehicle
In the future, the large-scale use of EVs is inevitable. It will generate an enormous energy demand. Therefore, maintaining effective energy demand forecasting services for charging station (CS) providers is an urgent problem. Owing to privacy protection, toll CSs and vehicle companies cannot share data. X. Wang et al. (2021) used the features of data on both sides and the cross features between the two to build a model through encrypted entity alignment, secure FL, and prediction. A model with cross-features was also introduced and the area under the curve (AUC) was improved. Finally, the results of relatively centralised learning were almost lossless. Saputra et al. (2020) introduced a CS-based decentralised federated energy learning (DFEL) framework to learn local datasets through CSs to predict energy requirements accurately and reduce communication cost markedly.

Autonomous vehicle
For autonomous vehicles (B. Liu, Wang, Liu, & Xu, 2019), their best ML models and the ability to make intelligent decisions should be maintained. Pokhrel and Choi (2020a) proposed an autonomous blockchain-based FL (BFL) design, which used the consensus mechanism of blockchain to enable on-vehicle ML (oVML). Their reward method developed a mathematical framework with controllable network and BFL parameters to investigate system-level performance effects. In conjunction with a designed contract-theoretic incentive mechanism, Zeng et al. (2021) proposed an FL framework for collaborative learning and optimisation of its autonomous controller design under the conditions of wireless link uncertainty and environmental dynamics.

Car insurance
The traditional car insurance pricing method is based on the quality of the car. The premium of a good car is much higher than that of an ordinary car. Moreover, the usage of cars and the environment of the driving area will also affect the compensation risk during the warranty period. Due to data from people, vehicles, and behaviours are scattered in different companies, the data cannot be exported and cannot be directly aggregated and modelled. The data model based on FL has a rich risk characteristic system, which can effectively identify risks, predict compensation costs, and provide personalised services.

Aircraft
Owing to the large amount of data generated by aviation systems and lack of computing resources, aviation systems cannot predict faults in an aircraft. Moreover, the deployment of additional airborne resources is complicated and expensive. Therefore, in Aussel et al. (2020), the authors proposed a method of using an active online decision tree based on confidence as the basic model of client learning De Rosa and Cesa-Bianchi (2017). They classified standard samples with minimum computing power and established a mechanism for the transmission and identification of uncertain data under the communication budget.

Unmanned aerial vehicle (UAV)
Due to limitations in computing and power resources, traditional centralised DL has resulted in reduced network bandwidth and UAV energy efficiency. Brik et al. (2020) discussed the use of federated deep learning (FDL) to tackle target challenges in wireless networks supported by UAVs, critical technical challenges of FDL-based methods, and future research directions. Lim, Huang, et al. (2020) proposed FL-based sensing and collaborative learning solutions through a contract matching incentive design; accordingly, the lowest cost UAVs can be matched to each partition. Besides, we analysed few studies on the battery problems of FL for UAVs. Tang et al. (2021) adjusted its operating CPU frequency to extend battery life and promptly quit FL. Using a strategy based on deep deterministic strategy gradient, they combined delay and energy consumption linearly to estimate the system cost. All devices could complete all FL tasks with limited batteries while reducing system cost significantly.

Challenges and possible solutions
With the evolution of science, the characteristics of transportation systems are constantly changing. After investigating the applications in FL transportation systems, we list the innovations and contributions of FL toward transportation systems in Table 2. The challenges and possible solutions are described as below.

Communication and calculation costs
The mobility of equipment in a transportation system and resources in communication and computing are limited (Aussel et al., 2020;De Rosa & Cesa-Bianchi, 2017). Therefore, it is challenging to reduce the communication and computing overhead of FL, successfully apply the data to other learning scenarios, improve the accuracy and efficiency of design models, and avoid affecting the performance of the framework. (1) Proposed a distributed, FL-based joint transmit power and resource allocation framework.

FL
Privacy protection (Pokhrel & Choi, 2020a) (1) Developed a comprehensive mathematical analysis of system dynamics for end-to-end delay analysis.

FL
(2) An encrypted entity alignment method is proposed for different IDs from different platforms Energy Issue (Saputra et al., 2020) (1) Reduced communication overhead and increased learning speed. Energyefficient (2) Developed an iterative energy contract algorithm. (Tang et al., 2021) (1) Proposed a resource allocation strategy for UAVs based on edge computing.

FL
(2) Proposed a resource allocation strategy based on deep reinforcement learning.

Privacy protection
With the ever-changing complex calculation methods invading the privacy of users, when miners attempt to verify a local model (Pokhrel & Choi, 2020a), the risk of privacy leakage significantly increases.The challenge is how to protect user privacy from the impact of the construction of large data sets. A dynamic and scalable method can be provided through risk analysis to protect user privacy from the impact of constructing large datasets (X. Wang et al., 2021).

Energy issues
With the increasing depletion of energy, clean energy has become the first choice. In EVs, UAVs, and other transportation equipment that uses electric energy, the problem is to study the balance between energy saving and FL performance.

Federated learning in the financial field of smart cities
The financial field includes banking, insurance, trust, securities, and leasing. However, in recent years, criminal activities have been observed in these sectors. Some financial crimes can involve up to several hundreds of millions of dollars, such as the mortgage crisis. These activities have led to crisis scenarios for families and society as a whole. While the financial industry expends a significant amount of resources annually to combat financial fraud, it is not very effective.
In recent years, it has become necessary to use FL to reduce losses to banks and consumers. In DL, the sample size must be sufficient to enable the training of a better model. A single bank cannot provide sufficient information about the consumption and credit cards of a person, and it is also difficult for a bank to detect fraud. The concept of FL provides the financial industry with a new approach to train models using DL, whereby the owner of each set of data can collaborate on the model without sharing customer private information. The financial industry encounters several challenges when reviewing user qualifications and screening quality customers. The combination of FL and finance can effectively address this challenge while protecting from disclosing customer private information. As shown in Figure 5, the collaborative group sends the key to the data source. Then, the exchanged data are encrypted to ensure data security. Finally, the learning model is updated in real time to obtain the model output for different application scenarios. FL can be effective in these areas. For example, banks can use cameras to identify suspicious transactions and prevent malicious multi-party lending (Y. Q. Yang et al., 2019).

Federated learning in the field of financial fraud
With the advent of digital age, there have been many transnational financial crimes. Common sub-categories of financial crimes include financial theft, fraudulent loans, and money laundering. Credit card fraud causes large losses to both banks and consumers. C.  provided satisfactory answers to some questions (model aggregation, data poisoning, scaling up issues). In addition to improving FL, some studies have combined FL with other algorithms. In recent years, the interest in privacy issues has significantly increased (X. . Wealthy individuals are concerned about the effective protection of their private financial information. Owing to the development of ML, an increasing amount of bank user data is analysed and trained in relevant bank marketing models. However, protecting the private information of financial users is an important research direction. Feng et al. (2020) proposed a bilateral privacy-protected FL scheme that also protects iterative parameters during the training process. This scheme further protects model parameters from being acquired by external attackers on the basis of traditional FL, considering the privacy of client.

Federated learning in the field of insurance
Building a data service platform for the insurance industry requires the integration of financial, medical, and other data from multiple parties. If an insurance company wants to improve its risk management capabilities and business development level, it must consider the impact of multi-party data. The effective use of data without infringing on personal privacy is also an important issue in the insurance industry. Śmietanka et al. (2020) proposed that the key technologies that promote the reform of insurance industry include FL and computable insurance contracts. Liang et al. (2020) proposed a configurable FL benchmark suite, FLBench. This kit can simulate various isolated data islands according to specific research requirements and include areas such as insurance and securities.
When different insurance companies and multi-party data providers implement FL, the quantification of participant contribution becomes a realistic problem. G.   (1) Detection framework FFD. FL (2) Using real data for testing, the average test AUC of FDS based on joint learning reached 95.5%. (X.  (1) A new framework for secure multi-party learning. SML (2) A specific scheme was constructed by merging aggregated signature and proxy re-encryption technology. (Y.  (1) Asymmetrical federated model training. FL (2) Innovatively proposed a genuine with dummy approach to achieve asymmetrical federated model training.  (1) The key exchange technology. FL (2) Double shielding protocol was used to ensure that user privacy is not leaked. Model fusion method (C.  (1) An intelligent aggregation method. FL (Gu et al., 2020) (1) FL algorithms for vertically partitioned data. FL  (1) A new federated learning framework, Fed+, was introduced. Fed+ (2) Better handled the inheritance of statistical heterogeneity in federated environment Financial applications (Suzumura et al., 2019) (1) A method for sharing key information between institutions using a federated graph learning platform was proposed.

FL
(2) The performance of the constructed model was 20% higher than the original model.  (1) The author proposed an Automated Separated-Federated Graph Neural Network learning example.

ASFGNN
(2) Achieved better results in actual experiments. (Long et al., 2020) (1) Proposed open banking. FL (2) For statistical heterogeneity, this study provided valuable discussions proposed that the service model can employ related models to use data models to integrate information to obtain better feedback. B. Liu, Yan, Zhou, Wang, et al. (2020) proposed an online evaluation method that is more sensitive to the quality and quantity of data, and compared it with the results obtained by the Shapley value in game theory. Table 3 summarises the application and development of FL in the financial field.

Challenges and possible solutions
The financial domain, which contains a large amount of structured data, provides a perfect foundation for the implementation of AI. However, owing to the particularity of financial industry, it has stricter requirements for data security and privacy. In addition to the general FL environment problems, we believe that FL needs to be resolved urgently in the financial field as follows. Statistical challenges. Data distribution for different companies in the financial industry varies greatly. For example, the size of companies varies significantly depending on the population and age group. Most financial companies encounter data confusion and efficiency issues. The quality of data collected from multiple sources is uneven, and there is no uniform data standard and scale. Effectively determining the similarities and differences in different data sources is still the key to FL in the financial field.
Incentive mechanism. Large financial companies have mastered more user information, while small and medium-sized financial companies have more valuable information and are often unwilling to share data. If there is no design incentive, FL cannot be efficiently developed in the financial field. Therefore, establishing effective incentive measures to attract high-quality financial data in FL systems is an urgent problem that must be solved.

Application of federated learning to the medical field in smart cities
With the rapid increase of COVID-19 worldwide, the burden on medical staff has gradually increased. The treatment of patients using effective methods is a major problem. Smart medicine is an area of future medical development that is expected to benefit from the increased development of FL technology. In the past, owing to the independence of hospitals and privacy of information, there was a lack of sufficient samples for ML. On the right side of Figure 6, it can be observed that FL can unite previously independent individual hospitals into a collective population, thus significantly increasing the sample size of model training.

Combination of federated learning and medical assistant diagnosis
With the Precision Medicine Initiative in the United States and the emergence of a large amount of personal health electronic information, patient data are usually protected in localised silos. This makes it difficult to establish a reliable medical assistant diagnosis. A reliable medical assistant diagnosis system is particularly important. The establishment of a horizontal federated learning model can greatly improve the accuracy of the model. This allows small hospitals to have a better level of diagnosis, thereby reducing direct costs for patients. However, there is an increasing desire to merge datasets from different medical systems (J. . Because the constraints of establishing a calibration model locally may limit the degree of improvement, Huang (2020) proposed an SMC method to establish a global isotonic regression calibration model. Ahamad and Khan Pathan (2020) propose a secure mobile healthcare framework using a community cloud.
It is worth noting that Vepakomma et al. (2019) further demonstrated the minimisation of distance correlation between the original data and intermediary representation. This reduces the leakage of sensitive raw data patterns during client communication, while maintaining the accuracy of model. It reduces the leakage of communication payload and original data present in medical data. Chamikara et al. (2020) proposed a privacy-protected FL framework for multi-site functional MRI analysis. They studied the use of brain function connections to classify the communication speed and privacy protection of autism spectrum disorders and health control problems. In addition, there are still some methods that can effectively protect the fusion of medical data (Rizzi et al., 2020;Xiao et al., 2021).

Combination of federated learning and drug development
FL has revolutionised leading fields, including the health care technology, thereby resulting in outstanding achievements in various fields, such as drug discovery. The proposed FL approach allows the pharmaceutical industry to use distributed data from different sources without leaking sensitive information. This emerging decentralised ML paradigm is expected to significantly improve the success of artificial intelligence (AI) drug discovery. S. Chen et al. (2020) verified the feasibility of applying horizontal FL (HFL). FL quantitative structure-activity relationship (FL-QSAR) under the HFL framework provides an effective way to break the barriers of pharmaceutical institutions in QSAR modelling. The solution promotes the development of collaboration and privacy-preserving drug discovery, and it can be extended to other privacy-related biomedical fields. Xiong et al. (2020) demonstrated the application of FL in predicting drug-related properties. Meanwhile, they also emphasised its potential role in solving small and biased data dilemmas in drug discovery.

Federated learning combined with disease prediction
FL has been proven to be an effective way of helping the medical industry make decisions and predict diseases. FL can further expand the sample size and protect privacy. In the medical field, more accurate judgments are often required for disease prediction and decision-making. For example, in the detection of lung nodules, lung nodes are often too small to be detected. If they are incorrectly assessed, the survival rate of patient is likely to get reduced. FL helps to produce enhanced prediction results and can protect data privacy and security. Table 4 summarises the application and innovation of FL in the medical field.

Challenges and possible solutions
Thus far, we compared the latest FL technology and direction in the medical field. However, there are many challenges involved with the development of FL in the medical field. The most urgent problems that must be solved for the implementation of medical FL, according to this study, can be summarised as follows: Heterogeneity of medical data. FL can fuse medical data; however, it is one of the most challenging problems to mix horizontal medical data from medical institutions in different regions and longitudinal medical data of the same patient in various hospitals.
Model accuracy and diversity. The development of AI has been mostly realised in a closed scene. For example, AI challenges professional Go players and AI game play is implemented  Liu, Yan, Zhou, Yang, et al., 2020) (1) It is based on the FL framework testing the performance of five models.

FL
(2) The performance of the ResNet18 model is the best in comparison. (Z.  (1) Proposed a more stable variation-aware FL (VAFL) framework. FL (2) Classification of prostate cancer data was better than FL framework. Medical data (Rajendran et al., 2021) (1) The FL framework was tested on two data samples and its performance was measured.

FL
(2) The FL method generally does not improve the performance of logistic regression (Vaid et al., 2021) (1) The effectiveness of multiple models for mortality prediction under the data of five hospitals was tested.

FL
Data processing (W.  (1) A dynamic fusion method was proposed and model fusion was arranged according to the training time of participating customers.

FL
(2) The categories of medical diagnostic image datasets used for COVID-19 detection were summarised. (Sui et al., 2020) (1) A strategy based on knowledge extraction was used to overcome the communication bottleneck in FL.

FL
(2) The study produced satisfactory results on three different medical datasets.
in a known environment. However, in the intelligentisation of medical treatment, there are many unexpected symptoms. Dirty data identification. In the medical model, doctors can misdiagnose or malicious data can interfere. For example, Malekzadeh et al. (2021) and others introduced some methods to distinguish between benign and malicious models; however, they have strong limitations. They can only process malicious data in a specific environment. At present, how to effectively identify wrong data under the framework of federated learning is still a challenging direction.

Application of federated learning to the communication field in smart cities
FL allows distributed machines or users to cooperate in training ML models with the help of parameter servers. It periodically updates the centralised server to protect user privacy. However, with the development of DL, massive datasets are required to achieve better ML. The participants and servers require several rounds of communication to achieve the target accuracy. This process may result in the need for several million parameters and high communication cost (He et al., 2016). In addition, there are problems such as delays in IoT terminal equipment (Luping et al., 2019), instability of communication links, and lack of data links and computing resources in aviation communications. Transportation systems are inefficient and expensive. FL has led to significant breakthroughs in the field of multiple-access channel communication (in Figure 7). One of the challenges faced by FL is the required communication overhead owing to its iterative nature and large model size. A new method to alleviate the bottleneck of FL communication involves enabling the simultaneous display of user traffic on multiple access channels; this may improve the use of communication resources. Another involves exploring the superposition characteristics of wireless multiple-access channels to calculate the required function of the distributed local calculation update (i.e. the weighted average function).

Federated learning solution for multiple access channel problem
Previous work relieved the communication bottleneck by compressing the gradient before transmission. Two commonly used methods are (A) quantisation and (B) sparse gradient quantisation. It follows the lossy compression idea of using a small number of bits to describe the gradient. These low-precision gradients are transmitted back to the parameter server. However, these independent compression techniques have not been adjusted to the underlying communication channel exchanged between users and the parameter server, and channel resources may not be fully utilised. Another study of FL through wireless channels includes a more general multiple-access channel. The stacked nature of wireless channel allows gradients to be clustered in the air and enables more effective training. These methods can be roughly classified as digital or analogue solutions depending on the transmission of gradient through the channel. In the simulation scheme, the local gradient is scaled and transmitted directly through the wireless channel. In the digital scheme, slave users are decoded separately, but transmission still occurs on multiple access channels. In terms of bandwidth, the analogue solution is better than the digital solution (Amiri & Gündüz, 2020b). Digital solutions have the following advantages: • Backward compatibility, i.e. they can be easily implemented on existing digital systems.
• It is difficult to slow down users.
• They are more reliable because various error control codes can be used.
• Digital solutions do not require the tight synchronisation required for analogue transmission.
Driven by the above discussion, they considered the application of FL to multiple access channels. This study focussed on the design of a digital gradient transmission scheme, where the gradient of each user was the first quality conversion. This process was transmitted through multiple access channels and decoded separately on the parameter server under the following conditions: (a) the informality of gradient of each user and (b) underlying channel conditions. They proposed a stochastic gradient quantisation scheme to optimise the quantisation parameters according to the capacity area of the multiple access channel. The results showed that when users experience different channel conditions or different degrees of information gradient, in particular, the channel-aware quantisation of FL is better than the non-perceptual channel quantisation scheme (for example, uniform distribution). The difference between this scheme and the scheme in Suresh et al. (2017) is that it allows each user to have its own quantitative budget. First, a scheme for an arbitrary user M is proposed and the convergence speed of the scheme is analysed. The algorithm proposes a general optimisation problem of quantitative budget allocation based on multiple access channel capacities. Then, they showed an example with M = 2 users and found the best quantitative budget and communication rate. Accordingly, they researched and analysed a channel-aware quantisation scheme that is superior to uniform quantisation and other existing digital schemes.

Challenges and possible solutions
Current studies offer obvious opportunities from the edge to core network. Table 5 summarises some studies that proposed certain solutions for the application of FL in communication. However, there are several key challenges related to the application of federated servers, as described in Niknam et al. (2020). Security and privacy. Although it adopts a secure aggregation algorithm, an encrypted local model can reveal the local situation by analysing the global model. In the case of FL, the model was trained using sensitive user data. The premise of FL is to employ users to process data memory effectively without revealing private information. Ultimately, this process can reduce the potential for data disclosure in the event of an attack. Additionally, FL may be subject to reasoning and confrontational attacks. Enemies may embed carefully Table 5. Category, key contributions, and framework in applications of FL in communication.
F-RANs (2) It provided a certain solution to the problem of data offload in wireless networks. (3) Promoted network intelligent computing and reduced the high cost of model training.
designed samples into the data, thereby affecting the local training datasets to manipulate the model-polluting FL process results. Therefore, it is necessary to explore how FL can improve its own defense mechanisms against these attacks. Considerations such as the optimal number of local learners participating in the global update, grouping of local learners, and frequency of local updates and global aggregation that lead to a trade-off between model performance and resource protection are all application-dependent and merit further study. Moreover, for low-power devices such as IoT nodes, the scale of FL network updates may be enormous. Therefore, it is necessary to use sparse and compressed model parameters to reduce resource consumption. Some studies have designed FL-enabled intelligent fog radio access networks based on accuracy correction and model compression (Z. Zhao et al., 2020). A certainly feasible scheme is provided for the solution of this problem.

Future development and direction of federated learning in smart cities
FL has been continuously developed since it was proposed in 2016. In addition to the main issues discussed at this stage (asynchronous (Li, Sahu, et al., 2020), communication security, and privacy issues (Lim, Luong, et al., 2020)), the following key open directions remain to be explored.
Defense against attacks. Although FL can protect important information, if people deliberately launch a poisoning attack on distributed devices, it may also lead to the leakage of important information. For example, owing to the stochastic gradient descent in the actual application process, the leakage of these gradients may leak data information (Shiho Moriai, 2019).
Algorithm efficiency. The rapid growth of network traffic has become the main technical bottleneck in the development of IoT. Although FL can effectively connect distributed devices, optimisation algorithms are also required to realise practical applications. For example, to reduce time complexity, the FedAvg algorithm (K. Yang et al., 2020) is used for local calculation updates and aggregation; it is also used in client-side differential privacy preservation federated optimisation algorithms. Owing to the limitation of computing power, the related algorithms of FL still need to be optimised owing to the presence of massive data.
Technology application. FL has widespread potential in smart city development. It can involve almost all aspects, especially in the fields of finance, medical care, and transportation. FL can be used to perform model training on data associated with multiple standards. Taking smart healthcare as an example, FL can train models that cannot be directly aggregated by hospitals. However, FL can fuse sensitive information without revealing privacy or overcoming the data island. Combining more data can significantly improve the accuracy of the model. The practical application of FL will also make cities smarter.
The gradual development of FL has introduced new opportunities in various aspects of life. This article introduces the application of FL to smart cities, including communications, life services, and IoT. It is expected that in the near future, the use of FL will lead to the further development of smart cities. FL will also be combined with all aspects of life to form a good ecological community to ensure that everyone can benefit from it.

Conclusion
FL has been widely used and developed in various fields. In this study, we investigated the development achievements of FL in the fields of IoT, transportation, communication, medical care, and finance. Consequently, we considered the future research direction of FL in other fields related to smart cities, including heterogeneous communication security and privacy issues. We also considered certain ideas and implementations in defense against attacks, potential privacy security, algorithm efficiency, and broader application scenarios. Finally, we proposed future prospects for technology development. We will continue to conduct in-depth research on the key technologies.