Understanding and evaluating visual guidance quality inside passenger terminals-a cognitive and quantified approach

Using the findings from visual cognitive psychology and computer vision, this study proposes a method that can predict how passenger visual attention to indoor visual guidance elements will affect the visual guidance quality inside passenger terminals. A saliency model is used to simulate the human visual attention, so as to understand how the visual guidance elements and visual noise are cognitively perceived by passengers. For every possible origin and destination node combination, the length and probability of the path that passengers are most likely to take (LP) are compared with the respective shortest path (SP). The overall evaluation of the terminal’s visual guidance quality can be expressed by the Extra Walking Index. The validity of the developed evaluation method is verified and it is then applied in a case study. The method can be used as a supporting tool for architects to identify relevant architectural features in the design phase and optimize them accordingly; in addition, it can also provide existing passenger terminals with specific improvement suggestions. ARTICLE HISTORY Received 5 February 2019 Accepted 5 September 2019


Introduction
Psychological research shows that our visual and cognitive abilities are fundamental in the wayfinding process (Bosch and Gharaveis 2017). In a passenger terminal, a passenger uses visual guidance elements around him/her to navigate. Traditionally, studies on passenger wayfinding performance are conducted by on-site surveys and interviews (Foljanty 2017;Hölscher et al. 2009). This kind of method not only requires tedious work and tremendous time investment but is also impossible to perform before the building is completed. Therefore, recent researches propose predicting the passenger's wayfinding performance by evaluating the quality of the terminal's directional signage system, often using a 3D perception approach (Churchill et al. 2008;Lam et al. 2003;Becker-Asano et al. 2014). However, in addition to directional signage, what are the other important visual guidance elements in the wayfinding task? The visual information inside a passenger terminal can be divided into useful visual information and disturbing visual noise. How will the nearby visual noise (such as advertisement posters and shop boards) affect passenger visual cognition and wayfinding performance?
Recent findings in visual cognitive psychology disclosed the relationship between wayfinding and visual attention Wiener, DeCondappa, and Hölscher 2011). As is shown on the MIT Saliency Benchmark website (Borji et al. 2018), the latest developments in computer vision have made it possible to simulate and predict human visual attention with high precision using saliency models. Therefore, with the saliency model as leverage, this research not only targets answering the questions raised above but also strives to provide an evaluation method to assess the overall visual guidance quality inside a passenger terminal using a cognitive, objective and quantified approach.
The necessity to make pedestrian guidance systems measurable was previously confirmed (Bauer et al. 2018). Compared to other public buildings, the design of the circulation route in a passenger terminal is much more complicated. The quality of transfer inside a passenger terminal imposes significant time and cost expenses on passengers (Guo and Wilson 2011). The efficiency of wayfinding is also crucial in evacuations during a fire or earthquake. Evacuees retrieve indications for their escape route from visual clues such as directional signage (Galea et al. 2017) and architectural spaces (Sun and de Vries 2006). To guide passengers to their destinations safely, efficiently and accurately in any situation, having an in-time, adequate and effective visual guidance system is of vital importance in a passenger terminal building.
Developments in technology and rapid urbanization cause growth in passenger numbers and building scales. Architects find it more and more difficult to use only their empirical knowledge to balance larger variations of design attributes. They often utilize computer analyses and simulations to help them predict the quality of their design work. This paper bridges the findings in visual cognitive psychology and computer vision with architectural design. It proposes a method that helps architects obtain a clearer understanding of the passenger terminal's visual guidance quality during the design stages and when a passenger flow simulation cannot yet be performed. Architects can use this information to improve their design when necessary. Terminal management teams can also use this method to identify the problems that currently exist in the best-case scenario (without passengers) and optimize the guidance system's setup.

Literature review
Wayfinding in architecture is built on findings from fields of cognition and environmental psychology to the design of built spaces. It articulates spatial features and circulation systems as well as environmental communication and uses these as main elements to facilitate the movement of people through individual buildings. Our navigation is realized largely with the help of visual information originating from the real 3D world. Signage, visual access and the design of the architecture itself are the visual guidance elements that mainly influence our wayfinding performance within built environments (Weisman 1981). Section 2.1 offers a closer examination of these elements separately. However, these visual guidance elements do not only cooperate with each other to guide passengers to their destination, but also compete with the nearby visual noise for passenger visual attention. This cooperate-compete relationship for human visual attention can be visualized with the help of a saliency model, which will be explained in Section 2.2.

Visual guidance elements
The architectural space is the fundamental visual guidance element in a passenger terminal and the focus of architects. It was observed that symmetrical, regular and continuous (Canter 1974) or clearly organized spaces (Arthur and Passini 1992) have positive guidance effects on people in their wayfinding tasks. The direct visual contact to the destination point creates the most effective visual guidance (Carpman, Grant, and Simmons 1985). Raubal, Pfoser, and Tryfona (1997) proposed the "image schemata" to evaluate how architectural space facilitates people in wayfinding. The method reconstructed the built space by interviewing people regarding their spatial experiences as they were performing a wayfinding task. Surveys show that 78% of the interviewees consider their wayfinding performance to be related to architectural design (Foljanty 2015(Foljanty , 2017. In the space syntax approach, parameters such as visibility, viewshed and connectivity are often used to calculate the "reachability" between two locations and to predict passenger wayfinding performance (Ueno, Nakazawa, and Kishimoto 2009). This kind of method uses a purely geometric and quantitative approach and provides a very sound analysis for the relationships between different spaces. However, human visual and cognitive abilities in this situation are rarely considered. For instance, even if the visibility between two locations was theoretically unblocked, the visual noise (such advertisement posters or even unorthodox architectural spaces) will likely negatively affect the wayfinding performance.
Directional signage is an essential visual guidance element that assists passenger wayfinding within an architectural space. These two visual guidance elements cooperate closely with each other. It was determined that wayfinding, when assisted by proper signage, appeared more natural when the building function was also reasonably designed (Peponis, Zimring, and Choi 1990). Certain researchers detected that during wayfinding, a reasonable signage design could not compensate for the difficulties caused to humans by unreasonably organized architectural spaces (Arthur and Passini 1992). Reasonably designed directional signage was not only effective in reducing the number of wrong turns or requests for extra information, but its presence also significantly lowered the feeling of crowding, discomfort, anger, and confusion (Wener and Kaminoff 1983). There have been studies that examine directional signage system as an independent visual guidance element. A research was conducted to determine the relationship between the conspicuity and illuminance of the signage in Swiss railway stations and recommended 100 cd/m 2 to be the optimal value (Lasauskaite and Reisinger 2015). Interviews and surveys were also conducted to quantify how the signage in the Japan Railway East railway system influence cognitive cost in wayfinding (Chang et al. 2018). There have also been studies that investigated directional signage together with architectural spaces. For instance, the influence of the signage on passenger path choice behavior was studied (Li and Xu 2019); by considering only the impact of the directional signage, the researchers reconstructed the path that the passenger was likely to choose, and then the signage was optimized via the Ant Colony Algorithm. An agent-based simulation model was also used to assess the signage's visibility (Nassar 2011), the humans were modeled as intelligent agents that move in spaces with predefined patterns, and their perceptual attention was modeled considering their field of view as well as the signage location and design.
Landmark as a visual guidance element can also add convenience to the wayfinding process (Epstein and Vass 2014;Hamburger and Röser 2014). Landmarks are used as milestones to form a cognitive map inside the human brain. Researchers examined the verbal route instructions and sketch maps collected from participants (Schwering, Li, and Anacta 2013). They determined that the inclusion of global landmarks in both verbal instructions and sketch maps were supportive for wayfinding.
A fact that has been largely omitted by the past studies is that visual guidance elements are in constant conflict with the surrounding visual noise. Architectural spaces, directional signage and landmarks intend to note the shortest path to destinations, while distractions such as advertisement posters and shop boards are constantly attempting to lure passengers away from the shortest path. It is very difficult to quantify (or even identify) the impact of visual noise because visual noise cannot be defined, and its content changes constantly with each visual scene. The challenge is that the human visual attention must be simulated to understand how visual guidance elements and visual noise are perceived by passengers. This difficulty is tackled by integrating a saliency model into the evaluation method, as is proposed in this paper.

Saliency model
To understand how visual guidance elements and visual noise are cognitively perceived by passengers, a saliency model is used as leverage in this study. The research on the saliency model is a popular topic in computer vision and its objective is to simulate human visual attention as accurately as possible. Since the saliency model can effectively predict human visual attention on visual scenes (Veale, Hafed, and Yoshida 2017), it has been widely used in areas such as scene recognition (Zhang, Du, and Zhang 2014;Hu et al. 2016), identification and tracking of people (Aguilar et al. 2017b(Aguilar et al. , 2017a, navigation (Wang and Tian 2011), detection and recognition of road signage (Zahabi et al. 2017;Won, Lee, and Son 2008) and many more. The model was inspired by the following visual cognition mechanism; the visual world contains enormous amounts of information, and human visual attention is a scarce resource. To use this resource efficiently, the human eyes developed a selective attention mechanism to prioritize certain parts of the incoming visual information for further processing. First, the highest resolution in human eyes is limited to the central fovea, instead of having an equal resolution throughout the entire field of view; the acuity declines sharply into a lowresolution periphery. Second, the visual system has integrated attention mechanisms that rapidly preselect locations with certain qualities for further processing, such as areas with higher color contrast, and ignore the areas with less contrast (Wolfe and Horowitz 2004;Koch and Ullman 1985;Das, Bennett, and Dutton 2007;Boynton 2005). A saliency model is the mathematical and logical expression of this preselection of exterior information by human visual attention. The model calculates the conspicuity by comparing a particular image region to its surrounding, thus predicting the probability of this region to attract human visual attention. Saliency maps are generated using saliency models, and saliency values are assigned to image regions in the range of 0 to 1, with higher values representing higher probabilities for the eyes to notice this region (Veale, Hafed, and Yoshida 2017;Itti, Koch, and Niebur 1998;Zhao and Koch 2011) (see Figure 1).
Itti, Koch and Niebur proposed the first saliency model in 1998 (Itti, Koch, and Niebur 1998), which divides an input image into three channels: color, intensity and orientation (of lines). Thereafter, filters are used to calculate the contrast between image regions within each channel, thus producing a series of feature maps. For each channel, one final conspicuity map is produced using these feature maps, and then the conspicuity maps across all channels are combined linearly into one final saliency map. This model is based on comparing contrasts between different image regions; Figure 2 shows a simple workflow. The imitation of the early visual processing of the human visual system makes this model the basis of many new saliency models; it remains to be widely applied in many areas.
Traditional saliency model uses feed-forward neural network, meaning that its processing method is not altered with the input data. The recently developed Saliency Attentive Model (SAM) (Cornia et al. 2018) is capable of processing serial data by the integration of Long-Short-Term-Memory (LSTM) (a recurrent neural network), and the accuracy of its prediction results is greatly improved. The LSTM is inspired by the neural cell, which can "memorize" the previously processed input data and frequently update its means of processing based on this memory. The manner in which the data are processed can be called a "cell state", and three gates are used inside the model to abandon/add information from/to the cell state. These three gates are the forget gate, input gate and output gate. The forget gate decides which information the cell state should forget; the input gate decides which information to retain and generates data that can be appended to the new cell state; and the output gate and the new cell state decide the cell output.
The SAM saliency model focuses on the most salient regions of the input image to iteratively refine the predicted saliency map. Furthermore, one can use standard image datasets and the affiliated eyetracker data (obtained by tracking the real human eye fixations when watching images from the dataset) to train the model to achieve higher prediction accuracy. The SALICON dataset is the largest available, with 10,000 training images, 5,000 validation images and 5,000 testing images (Jiang et al. 2015). Experiments show that when observers examine the images, their gazes are biased towards the center of the images (Tatler 2007). This center-biased phenomenon is also included in the SAM model by learning certain priors. The accuracy of saliency models can be measured by comparing the prediction results with real human eye fixations using different metrics (Bylinskii et al. 2016). After training the SAM model with the image dataset, its accuracy was measured as one of the best in the MIT Saliency Benchmark (Borji et al. 2018) and LSUN2017 competition (Yu et al. 2017).
As a tool that can effectively predict human visual attention by processing input image data, saliency models can provide solutions to many visual cognition-related design problems and therefore have been largely applied in design area. For instance, models have been applied in 3D animation modelling to determine which part of the computer model was most likely to attract human visual attention and therefore needed further refinement (Lee, Varshney, and Jacobs 2005). In the advertising sector, designers used saliency models to determine if their design projects would lead to the results they were hoping for (Wilson, Baack, and Till 2015). Dupont et al. (2016) introduced saliency models into landscape design; they provided a series of landscape photos to observers and recorded their eye movements using an eye tracker, and when comparing the record with the calculated saliency maps, they obtained satisfying results. Although saliency models are widely applied in design sectors, their application in architectural design remains limited. Xu et al. used saliency maps to evaluate the visual impact caused by the installation of Building Integrated Photovoltaics on existing buildings with an objective, quantified and cognitive approach (Xu andWittkopf 2014, 2017;Xu 2016).
As can be concluded from the literature review, signage and landmarks are important visual guidance elements in the wayfinding task, but their effectiveness is very dependent on the organization of architectural spaces. Also, passenger visual attention towards visual guidance elements may be disturbed by the nearby visual noise. Due to the large variations of design attributes, it is often difficult for architects to use their empirical knowledge to estimate the overall visual guidance effect in passenger terminal designs. In this case, saliency model can be integrated as a tool to simulate the human visual attention and to examine the visual guidance quality.

The proposed evaluation method
This paper proposes a method that can evaluate the visual guidance quality inside passenger terminals. The Markov chain model has often been used to model human movement (Xia, Zeephongsekul, and Packer 2010) or route choice (Manley, Cheng, and Haworth 2013). Inspired by this notion, a passenger terminal can be modeled as a walkable network G V; E ð Þ consisting of decision nodes X in the set Vand links in the set E. Figure 3(a) is an example of a passenger terminal's floor plan. d i;j is the distance between two connecting nodes X i and X j . For the origin and destination node combination X a and X b connected by nodes X 1 ; X 2 ; . . . X n , the classic Dijkstra (Dijkstra 1959) method can be used to determine the shortest path (SP). Figure 3(b) is the walkable network G V; E ð Þ transformed from the floor plan. p i;j is the transition probability between two connecting nodes X i to X j only under the influence of nearby visual guidance elements. In other words, when a passenger is standing on node X i , p i;j is the probability of him/her "discovering" the relevant visual information that will guide him/her to the connecting node X j . For the origin and destination node combination X a and X b , connected by intermediate nodes X 1 ; X 2 ; . . . X m , the probability that the passenger is going to choose this path can be calculated as: Þvalue. The path with the highest P X b jX a ð Þ is the path that passengers are most likely to choose (LP).
For the origin and destination node combination X a and X b , the visual guidance quality can be expressed with the Extra Walking Index, which is defined as: where D LP a;b ð Þ stands for the length of LP and D SP a;b ð Þ stands for the length of SP. P LP a;b ð Þ is the probability of passengers choosing LP and P SP a;b ð Þ is the probability of choosing SP. The SP path is treated as standard because it is supposed to be the optimal route. The length difference illustrates the extra walking effort that poor visual guidance quality yields. However, using only the length difference is not enough to describe whether only a small percentage or the vast majority of passengers are likely to prefer the LP over the SP. To fill this gap, the probability com- ð Þ value indicates that at certain nodes, the visual guidance elements are providing incorrect wayfinding clues to passengers. Taking both the length difference and probability comparison into account, the closer the w ab is to 0, the better the visual guidance quality between the origin and destination node combination X a and X b .
The final evaluation for the passenger terminal's visual guidance quality is expressed by the overall Extra Walking Index: Figure 3. The floor plan (a) and the corresponding network (b) of a passenger terminal. X stands for decision nodes, d i;j is the distance between two connecting nodes and p i;j is the transition probability between two connecting nodes.
where P w is the sum of w of all origin and destination node combinations, and N is the total number of origin and destination node combinations. The closer the EWI is to 0, the more optimal the visual guidance quality inside the passenger terminal.
In the following, the selection of nodes X as well as the determination of transition probability p i;j will be explained in detail.
(1) The selection of the nodes X The nodes are selected using the information provided by the architectural floor plans and the designed passenger circulation route. The main purpose of a passenger terminal is to let passenger access transportation vehicles. Therefore, entrances, exits from the metro station, railway platforms, etc. are treated as origin nodes; exits, entrances to the metro station, railway platforms, etc. are set as destination nodes. Important intersections, where passengers would confront direction choices, are treated as intermediate nodes. The intermediate nodes should be located on the main circulation route of the building, and the passenger should be able to reach no less than 2 other nodes from it. The decision regarding where to set a node is dependent on the designed architectural floor plan and not on how visual guidance elements are located. For instance, passengers would not be confronted with a direction choice in the middle of a corridor, therefore setting a node there would be unnecessary. Direct visibility between two connecting nodes is not necessary because wayfinding can be accomplished with the help of nearby visual guidance elements. An entrance node can also be an intermediate node or even an exit node simultaneously and vice versa.
(2) The transition probability p i;j between two connecting nodes When a passenger is standing on the node X i , the probability of him/her "discovering" the relevant visual information that will guide him/her to the connecting node X j is p i;j . To obtain p i;j , photos are taken around the node X i first. Using these photos, respective saliency maps are generated and then manually processed to determine the probability of a passenger "discovering" the relevant visual guidance elements that will take him/her from node X i to node X j . During manual processing of the saliency maps around node X i , one should concentrate on these three visual guidance elements: • The architectural spaces The visual guidance effect of an architectural space can usually be expected from its outstanding brightness due to daylight or natural light and its geometry, such as the horizon line and the perspective vanishing point (see Figure 4); this will be reflected in the saliency values.

• Directional signage
This element mainly refers to signage that provides directional information to other nodes (see Figure 5). Other prominent but irrelevant signs, such as warning signs or advertisement posters, are not included.

• Landmarks
Landmarks are important for passengers in forming a cognitive memory map for navigation. In passenger terminals, landmarks may refer to help desks, meeting points or even an artistic sculpture (see Figure 6).
On each saliency map around node X i , the top 10 local maximum values will be selected to check if they overlap with visual guidance elements that provide information to node X j . A high saliency value either means that this image region is very salient to the passenger, or that he/her is very likely to spend a relative long time processing the information on this image region. Low saliency values either mean that a passenger is less likely to discover this region, or that a passenger will not spend much time processing the corresponding information. It needs to be emphasized that in saliency maps, when only parts of the visual guidance element are marked with high saliency values, it is assumed that the entire element is visually perceived by the passenger.
If none of the visual guidance elements overlap with the top 10 local maximum values on the saliency map, it is deemed that the passenger is unlikely to arrive at node X j from X i . If positive, the highest saliency value across all the saliency maps around node X i will be chosen as the temporary visual guidance probability o i;j for the connection node X i to X j . To ensure that the sum of all the outgoing visual guidance probabilities on node X i equals 1, the o i;j is normalized to p i;j as follows: The visual guidance probabilityp i;j is the normalized visual guidance probability of the passenger to arrive at node X j from its connecting node X i only with the help of nearby visual guidance elements. The higher the number of nodes that X i is linked with, the richer the visual guidance information that is contained on X i . Then, the normalization step can ensure that the corresponding p ij will be lower because more visual distractors will result in longer reaction time to discover the right target (Wolfe, Palmer, and Horowitz 2010).

Validation of the proposed evaluation method
In the following, two validations of the proposed evaluation method will be made based on empirical results from wayfinding studies (1) Validation 1 In the wayfinding studies of Wiener et al. (2009Wiener et al. ( , 2011Wiener et al. ( , 2012, it was discovered that people displayed a gaze bias in direction of the eventually chosen path option and had a tendency to choose the path option that featured the longer line of sight. Vilar et al. (2013) confirmed that people prefer brighter pathways at intersections. Validation 1 examines the proposed evaluation method based on these notions.
A very simple computational 3D model was created and its floor plan can be seen in Figure 7. There are three different paths connecting the origin node X A with destination node X B . All paths are of the same width. Path 1 is the longest (71 m), it is very bright and is featured with a long line of sight. Path 2 has a length of 36 m, but it is very dark and has a twist so that its line of sight is shorter. Path 3 is 56 m long, it has the shortest line of sight but is very well lit. Based on the empirical results from wayfinding studies mentioned above, the hypothesis of the path preference order from high to low should be path 1, path 3 and path 2.  Note that although only part of the sign is highlighted by the saliency map, it is assumed that the passenger has "discovered" the visual guidance information.
To test this hypothesis, three renderings were generated around the origin node X A (see Figure 8). The viewpoints of the renderings are denoted with arrows and letters a, b, c in Figure 7. The resulting saliency maps in Figure 8 show the probabilities of passengers to visually detect each path (marked out by red dotted boxes). Thanks to the longer line of sight and high brightness, the temporary visual guidance probability of path 1 is o path1 ¼ 1. Due to the darkness and shorter line of sight, the temporary probability for passengers to see path 2 is o path2 ¼ 0:11. Even though path 3 has the shortest line of sight, thanks to its high brightness, its temporary visual guidance probability o path3 ¼ 0:58. After normalization, the path probabilities are P path1 ¼ 0:60, P path2 ¼ 0:07 and P path3 ¼ 0:35. These results are coherent with the hypothesis mentioned above. The path 1 is the LP and the path 2 is the SP. The overall Extra Walking Index EWIis 71 À 36 ð ÞÂ 0:60=0:07 ð Þ¼ 300:00.
(2) Validation 2 In an experiment, O'Neill (1991) discovered that even though the addition of signage resulted in an improvement in wayfinding, but the architecture's spatial configuration was found to exert a significant influence regardless of signage. Validation 2 examines the evaluation method based on this observation. This time, the rendering generated from viewpoint as in Figure 7 is equipped with a signboard or a landmark (see Figure 9). The respective saliency maps indicate that a signboard or a landmark will increase the probability of passengers to see path 2. The renderings from viewpoints b and c in Figure 7 remain unchanged. Now, the temporary probabilities have become o path1 ¼ 1, o path2 ¼ 1 and o path3 ¼ 0:58. After normalization, p path1 ¼ 0:39, p path2 ¼ 0:39 and p path3 ¼ 0:22. The overall Extra Walking Index EWIis 71 À 36 ð ÞÂ 0:39=0:39 ð Þ¼35. After adding the signage or landmark, the decreased EWI value indicates an improvement in the visual guidance quality. Generally speaking, the impact of the architectural space is still very strong regardless of signage or landmark, which is coherent with the observation made by O'Neill (1991) as mentioned above. Even though in both Validation 1 and Validation 2, the length difference between the LP and SP is the same (35 m), but the change in visual guidance elements results in a very different wayfinding environment for passengers. This change is reflected in the visual guidance probability comparison between path 1 and path 2.
In this section, the proposed evaluation method is applied on a simple 3D computational model. The model features an origin node and a destination node. The two nodes are linked by three different

Case study
The new Beijing South Railway Station was put into use in 2008 and covers an area of 420,000 m 2 . This station is a transfer station for multiple transportation  types: high-speed railway, buses, cars and the metro. In 2014, more than 100,000 passengers entered the station every day, and the station's capacity is to be expanded to 190 million passengers by 2020 (Chen et al. 2014). Public opinions (Huo 2018), private interviews with the railway station staff and private observations all indicate that the station's current visual guidance quality is problematic. The proposed evaluation method was used to test if it can find out the causes behind the dissatisfaction. The station's overall visual guidance quality was also evaluated.
The Beijing South Railway Station is a five-level modern railway terminal that includes two levels above ground and three underground. The station's architectural space is provided in Figure 10. Figure 11 shows a simplified floor plan of the Beijing South Railway Station's basement, which has a simple oval shape. As can also be seen in Figure 12 that the entrances, exits and lifts to the second floor waiting hall are located on the south and north ends of the floor plan. The metro station is situated at the heart of this floor and is surrounded by train arrival exits and commercial facilities on its four sides. Four access points for passengers arriving by private cars and taxis can be found on the west and east sides. Although the passenger circulation route is clearly designed, unobstructed visual contact cannot always be guaranteed due to numerous columns and indoor partitions. In addition, advertisement posters and shop boards also disturb the perception of the existing visual guidance elements. The proposed evaluation method was applied to see if it is possible to evaluate the current visual guidance quality and to identify problems.
There are 9 origin nodes, 13 destination nodes and 22 intermediate nodes in total. Origin nodes are exits from the 2 nd floor, from the metro and from the parking garage or entrances from the exterior.
Destination nodes are entrances to the 2 nd floor, to the metro and to the parking garage or exits to the exterior. To reduce the number of unnecessary nodes, node points are not positioned directly at the entrances; instead, they are positioned where the passenger must pass after entering the terminal. Origin and destination node combinations between node points of the same category are omitted. For instance, after a passenger exits from the metro station, it is highly unlikely that his/her next move is to reenter it. Intermediate nodes are set at main junctions and where passengers would confront direction choices. In total, there are 92 variations of origin and destination node combinations (see Figure 12).
All input photos for the saliency analysis were taken around midnight, when the passenger terminal was nearly empty, with only a few passengers. Thus, the disturbing effect of the crowds was minimized. Photos were taken from each of the node's four cardinal directions whenever possible. The SAM saliency model was used for this case study. To enhance the model's prediction accuracy, the SALICON image dataset (Jiang et al. 2015) was used for training.
The distance difference analysis is shown in Table 1 and the probability comparison analysis can be found in Table 2. The overall Extra Walking Index EWI of the Beijing South railway station is 9.08. Table 1 shows that the maximum distance difference (63 m) can be found between the origin and destination node combination of node 33 to 16,    Table 1. Different origin and destination node combinations and the related distance differences between the LPs and SPs.  where the LP = [33,12,14,15,16] and SP = [33,34,15,16] (see Figure 13 left). The same distance difference can be found between the origin and destination node combination 33 to 17, where LP = [33,12,14,15,17] and SP = [33,34,15,17] (see Figure 13, right). This is caused by a construction site located between the nodes 33 and 34 (see Figure 14), resulting in a blocked visual contact between these two nodes. In addition, the available signage is not very salient visually, thus the passengers are more likely to make a turn to arrive at their destination, instead of following the originally intended straight line between the origin and final nodes.
The probability comparison analysis in Table 2, in 37 out of 92 origin and destination node combinations, the LPs are not coherent with the SPs. The maximum value (4.23) can be found between the origin and destination node combination node 28 to 32, with the LP node sequence being [28,27,25,31,32] and the SP being [28,2,30,31,32] (see Figure 15). This occurs because the node 2 on the SP path is an important junction node that is connected to 6 other nodes (node 1, 3, 5, 28, 29, 30). The visual information around the node 2 is very rich because it has to provide visual guidance information to all 6 connecting nodes (see Figure 16). After normalization, the corresponding transition probability p towards each connecting node becomes quite low. This also corresponds to the phenomenon that passengers usually hesitate at an unfamiliar important junction node, because a lot of visual information needs to be scanned. In comparison, each node on the LP path is linked to fewer connecting nodes, the corresponding transition probability p is therefore higher, making this path more preferable. Figure 17 presents a visual comparison between origin and destination node combinations where the LPs and SPs are not identical. The darker the color of the connecting lines, the more likely this path is preferred by passengers. In both LPs and SPs, the right aisle with the   connections [18,19,20,22,23,25,27,28] (blue line in Figure 18) are less preferable in comparison to left aisle with the connections [14,12,11,9,7,6,5] (orange line in Figure 18). The reasons are the longer distances between nodes 19 to 20 and 23 to 25 as well as smaller corridor width at certain locations. These have exacerbated the condition of the existing visual pathway. Figure 19(a) shows an example for the connection node 23 to 25 in the right aisle, and Figure 19(b) presents an example for the connection node 9 to 7 in the left aisle.
To conclude the application of the proposed evaluation method on this case study, one can discover that: • The length difference and probability comparison are as important as the final evaluation result EWI, because they can illustrate the difference between the LP and SP, hence identifying the node connections with significant visual guidance problems (for instance the connection node 33 to 34). • Generally speaking, the evaluation method considers the station's architectural space to be a prominent factor in visual guidance. This effect is especially apparent in architectural space that provides a long line of sight or a clear horizon line (for instance, architectural spaces displayed in Figure 16). • Important junction nodes that are linked to a lot of other nodes are physically very convenient for accessing to different destinations. However, this type of node may also contain a large amount of visual information that requires longer reaction Figure 16. Visual saliency analyses around node 2. a is the analysis for the connection node 2 to nodes 1, 3 and 29. b is for connection node 2 to 5, c for node 2 to 28 and d for node 2 to 30.
time to find the information that the passenger needs. For the node 2 in this case study, the corresponding saliency maps were able to visualize the visual information overload, and the impact was reflected in the corresponding LP and SP. • In the visual saliency analyses, the effect of signage and landmarks on wayfinding is equivocal. If located in a complex and cluttered image region, then the signage or landmark will not have high visual saliency values (Figure 19(a), upper part). When situated in a monotonous image region, then the signage or landmark will be considered as visually salient (Figure 19(b), lower part). • Advertisement posters and shop boards are causing disturbance in detecting the necessary visual guidance elements (such as Figure 16(b and d); Figure 19(a)). • An overall evaluation of the visual guidance quality based on visual guidance elements can be provided for the Beijing South Railway station (EWI= 9.08).
By taking only the LP and SP into account, the calculation of the evaluation result is rapid, convenient and efficient. This advantage will be more  apparent when evaluating large-scale passenger terminals where the number of nodes and links are also larger.

Discussion and conclusion
This study bridges two very distinct disciplinary worlds, that of visual cognitive psychology and computer vision on the one hand and that of architectural design on the other hand. The former is concerned with understanding the human mind in a theoretical and quantified manner; the latter focuses on the design and analysis of physical places using an empirical approach.
This paper proposes a method that integrates saliency model to quantitively analyze the visual guidance quality for passengers inside a passenger terminal. The main novelty of the method is the use of a visual cognitive analysis approach to understand how architectural spaces, directional signage, and interior landmarks influence passenger path choice. For each origin and destination node combination, the path that passengers are most likely to choose (LP) is compared with the shortest path (SP). Using this information, an overall estimation of the terminal's visual guidance quality can be expressed by the overall Extra Walking Index (EWI). The validity of the proposed method is verified in reference to empirical results from wayfinding studies. By using the Beijing South Railway Station as a case study, the evaluation method and its intermediate analyses can reveal weaknesses in the original architectural design, as well as identify problems in the terminal's current setup. When compared to other widely used analysis methods, such as site surveys, tracking individuals with think aloud protocols, passenger questionnaires or space syntax, the proposed method is characterized by the following qualities: • Cognitive and quantified approach: By integrating saliency model, it is possible to understand the passenger's visual perception and attention.
There is no need for passengers to tell what is on their minds or to passively observe their behavior. Dynamic weights can be applied to visual guidance elements depending on each specific visual scene, instead of assigning identical and fixed weights to these elements throughout all the scenes. • Pre-assessment and high efficiency: Surveys and on-site investigations can be very expensive and time consuming. In addition, these investigations can only be conducted after the building is completed. By using architectural renderings as input images for the proposed evaluation method, the terminal's visual guidance quality can be identified in early stages quickly and conveniently, and design revisions can be made when necessary. • Identifying problems in a terminal's current setup: The terminal's management team can use this method to identify problems that exist even in the best-case scenario (without disturbances from passenger flow and crowds). For instance, weaken the visual saliency of existing shop boards or advertisement posters can reduce visual noise and thus improve passenger wayfinding performance.
The implication of the case study is that visual guidance quality is a combined effect of all visual Figure 19. Visual saliency analyses for connection node 23 to 25 in the right aisle (a), and for node 9 to 7 (b) in the left aisle. Note that in a, the highest saliency value is applied to a shop board instead of useful visual guidance elements. guidance elements plus visual noise. While one might expect that visual guidance elements only cooperate with each other in guiding a passenger to his/her destination, the evaluation analyses reveal that even visual guidance elements can compete against each other for passenger visual attention, for instance, on important junction nodes.
At this stage of the study, the passenger crowd's impact was excluded in the evaluation because even though the crowd may guide passengers in wayfinding, but they may also randomly interrupt the perception of visual guidance elements. In the future study, it is necessary to develop a method that can appropriately model the complex "guidance" and "interruption" effect. In the future development, it is also considered to integrate vertical movement via stairs/ escalators/lifts into the visual guidance quality evaluation. Stairs/escalators/lifts could be treated as transition nodes between the current network G i V i ; E i ð Þand the network G j V j ; E j À Á of another floor.

Disclosure statement
No potential conflict of interest was reported by the authors.