Naval ship design-process analysis through dynamic social networks

ABSTRACT Modern naval ship design is increasing in complexity as more and more systems are incorporated into the design process, leading to an increase in the number (and interdependence) of tasks that designers need to complete and progress through the design stages. This work develops a dynamic bipartite social network representation, separating the nodes in activities and individuals, with the aim to analyse and draw conclusions regarding the design process. A dynamic time-dependent graph (TDG) is constructed with the use of a presence function for the edges. Network properties (density, clustering, active tasks/designers) are expressed as functions of time, and node-wise metrics (centralities) reveal the key role of individuals in the naval ship design process, which is heavily-crowded with respect to tasks. Furthermore, application of the model to naval ship design data reveals interesting insights regarding the impact of COVID-19 and the design company’s adopted hiring policy.


Introduction
Modern naval ship design is becoming increasingly complex with the advent of technological leaps, leading to some of the most convoluted man-made systems around (Andrews 2006).This complexity is due to an aggregate of usually conflicting design goals, operational criteria, complex system integration, and communication between a multitude of people executing multiple tasks or activities.Naval ship design aims to deliver a new class of vessels without the advantage of a prototype.A common step to mitigate the complexity of a naval ship's design is the expansion of the design teams, however, this brings extra complications in the design process due to an increase in the number of tasks and a requirement for a greater degree of team integration (Tudor and Bach 2016).Naval ship design has been evolving constantly from the classic notion of Vee-model (Forsberg and Mooz 1992) and design spiral approach (Harvey-Evans 1959), to parametric explorations of performance and Model Based Systems Engineering (MBSE) (Pearce and Hause 2012).Despite the different approaches, one remaining constant is the duet of sets that formulate the design process: design activities and designers.
In all processes in which humans are involved, the human element either aids or hinders the completion of a process.By using a social network-based representation of the naval ship design process, inherent aspects of the process (human element), which would remain hidden in a task-to-task representation (e.g.GANNT chart), can be encapsulated in a unique way.It is common to make the distinction between activities and people in design processes but, in the past, studies have treated them as separate systems and not as an interconnected network of information flows (Eppinger and Browning 2012).In (Piccolo et al. 2018) a bipartite network (111 people, 148 activities, 926 edges) is used to analyse data of the design stage of a renewable energy power plant for electrical energy generation and demonstrates the central role that people play in the design of complex systems.This role is greatly appreciated in Love et al. (2019), where rework in projects is seen as the by-product of human errors and pathogens of the work environment.Love (2002) defines rework as the 'unnecessary effort of re-doing a process or activity that was incorrectly implemented the first time'.This definition includes design errors leading to non-conformances during construction.Design errors are the main reason for schedule delays and cost overruns in design and construction of complex systems (Han et al. 2013).Because of the complexity of naval ship design, rework related delays are measured in years and cost overruns can reach up to 30%.
This emerging truth regarding the negative impact of rework in naval ship design points towards the need for a process supervision tool, which can help the design process supervisors in their decision-making and resource allocation.The first step in the development of such a process supervision tool is the representation of the design process using a cohesive and comprehensible model that quantifies insightful metrics of the design process and accurately represents the current stage of the design.It is essential that the model is dynamic, in the sense that it depicts the reality that tasks and people are not always interconnected, but only in specific time ranges.Other demands from the model are that the model stores all information related to tasks and people, and that the model handler has a full picture on examined regions as well as important data on neighbouring regions.In the next section, the data used for the model creation are presented along with an introduction of the network properties and necessary graph theory concepts, and a description of the model development process.Section 3 includes the main results of the model from the example data and various conclusions are drawn on the nature of the design process under study.Section 4 summarizes the obtained results from the case study and points out the next steps of this work and different avenues of exploration leading to a complete process supervision tool.

Data and handling
The data used for the development of the model are indicative data for design of a modern surface combatant.They include activities that cover a period of 11 years (2017)(2018)(2019)(2020)(2021)(2022)(2023)(2024)(2025)(2026)(2027)(2028).The activities/tasks are all assigned a codename (e.g.Activity 2PSS170M) to avoid displaying sensitive information.In the same manner, the people/individuals are assigned a number (e.g.Individual 98) to avoid personal data breach.After a necessary data wrapping/cleaning step the number of activities is ∼12700 and the number of individuals is ∼100.In a static network approach (all edges are active simultaneously) the number of edges is ∼12800.The dataset includes information on the original/planned start and completion dates and the actual start and completion dates.The planned labour units and the actual labour units (at task completion) are also included in the dataset.The data were available in June 2022, so the period from mid-2022 to early 2023 is uncertain in terms of the future projections and the accuracy of the data values (liable to change).

Bipartite networks
A network or a graph G is a representation of connections among a set of items: G = (V, E).The V items are called nodes or vertices and the E items are call connections or edges.A bipartite network B = (U, V, E) includes an extra set of U items, being a second set of nodes (Figure 1).In a bipartite graph, edges are only allowed across the two sets.In this setting, the activities are assigned to set U and the individuals are assigned to set V. The simple task-to-task model can be derived from the bipartite network of interconnected activities and individuals by applying the U-projection of B.
The graph density δ G measures the fraction of existing links to all possible ones and for bipartite graphs it is: where N(i) is the neighbourhood set of node i.This definition depicts clustering as an overlap between neighbourhoods.In Figure 2 number of activities changes and this results in pair clustering coefficients of 0.5 and 0.25, respectively.Latapy et al. (2008) defines the clustering coefficient for a node as the average of the clustering coefficient with all the other nodes in the network: where N(N(u)) is the neighbourhood of node u at distance 2 (2nd order neighbours).An averaging process with respect to the two bipartite sets returns the global average clustering coefficient: which is termed as Latapy clustering C(G).Similar to Latapy clustering, an expansion of average LCC, the Robins-Alexander clustering CC 4 (G) (Robins and Alexander 2004) is an expansion of the notion of transitivity.Instead of triangles, the simplest cycle (Figure 3) in a bipartite graph is a square (C 4 ) and instead of open triads, paths of length three (L 3 ) are used: Considering the example cases of Figure 2, for the first network there is only one square formation and 12 paths of length three.From Equation ( 5), the resulted transitivity value is 0.33.For the second and third network, the paths of length three are 8 and 16 respectively, leading in Robins-Alexander clustering values of 0.5 and 0.25.In these examples, the Latapy clustering coefficient and Robins-Alexander clustering are the same.This is not the case for networks that include few or no square topologies, such as the one depicted in Figure 4.In this case, the Latapy clustering is 0.387 and the Robins-Alexander clustering is 0 since there are no squares.
Another important measure of the network is centrality.There are various centrality metrics that can be employed, each with its own use and insight into the network connectivity.In this paper, degree centrality, betweenness centrality, and closeness centrality are calculated in node-wise form.Borgatti and Halgin (2014) provides an analysis of centrality measures for bipartite social networks.
Degree centrality DC of a node is the fraction of nodes connected to it; the normalization being performed with the opposing set of the node under examination.The benefit of this normalization is the immediate depiction of whether a given individual is more central than a given activity, i.e. it allows for comparative analysis across sets.
where d i is the degree of node i.
Closeness centrality CC of a node is the sum of the distances to all other nodes in the graph.It measures how close a node is to all the other nodes.Closeness centrality is normalized with the minimum distance possible.For a bipartite graph, the minimum distance is 1 for connections across sets and 2 for connections where d i is the sum of distances from i to all other nodes.This normalization results in higher CC values for nodes with high centrality.
Betweenness centrality BC of a node is the sum of the fraction of shortest paths that pass through the node.Its main advantage is that it accounts for nodes that have few neighbours (low DC) but connect different regions of the graph; nodes that act like connectors of distant clusters of nodes.Values of betweenness are normalized by the maximum possible value which, for bipartite graphs, is limited by the relative size of the two node sets.
where b(i) is the betweenness value of node i, g jk is the number of shortest-paths from node j to node k, g jk (i) the number of shortest-paths from node j to node k passing through i.The normalizing quantities are: The normalizing quantity for set V is obtained by alternating U and V in relations above.After normalization the betweenness centrality is:

Time-dependent graphs
Activities and individuals involved in activities are not always active during the design process.For example, activities pertaining to detailed design and analysis are inactive during early design stages.Considering the time-evolution of the network from concept design to detailed design completion, an individual may have worked on several tasks sequentially.Using a 'static graph' approach as in Wang et al. (2019), this individual is connected to those tasks concurrently, so this limitation of the static model can lead to erroneous conclusions and poor decision-making.The conversion of the static graph to a time-dependent graph (TDG) solves these issues and simplifies the network, since the static graph is the superposition of all discrete TDGs.For creating the TDG, two methods are considered: variable weights and presence function.In the variable weights method, depending on the design process stage, the edges of the graph are assigned a weight value which quantitatively represents the information flow between activity and individual.The advantage of this approach is that edges can be activated, deactivated or semi-activated, when an activity is put on hold.The main drawback of the variable weights approach is the heavy dependence on the data, documented by the design company, related to quantifying the put-on-hold level and subsequently on the formulation of the weight function of each edge.
Due to the above weakness, the presence function approach is used as described in Wang et al. (2019).Using the presence function, edges are either active or inactive and this can be deduced from the available data (example in Figure 5).The presence function takes value 1 when the edge is active in the time domain under examination, and 0 when the edge is inactive.
The use of TDGs allows the examination of the behaviour of the graph properties at different stages of the design process as all properties can be plotted as functions of time.

Naval ship design case study
Information for nodes and edges is retained in respective feature vectors.For example, a node feature vector and an edge feature vector are defined as: . Node: ['Assigned number', 'Node name', 'Individual personal information (position, status in company etc.)'] .Edge: ['Codename', 'Activity description', 'Planned labour units', 'Labour units at completion', 'Planned start', 'Planned end', 'Start', 'End', 'Task operator', 'Edge directionality'] From the available data an undirected bipartite static network is created, the directionality (single or dual) of information flow being retained within the edge feature vectors.The static network is decomposed into dynamic time-dependent networks, based on the desired time step.The most refined decomposition is in days (available in the naval ship design indicative data).This refinement splits the static graph into 4193 time-dependent graphs.The refinement selected for this paper analyses the data per design week resulting 599 time-dependent graphs (two example cases are depicted in Figure 8).The static graph edges are assigned to the bipartite timedependent graph based upon the value of the presence function at each time step.Network density, Latapy clustering, Robins-Alexander clustering, total number of activities that are connected and total number of individuals involved with those activities, are plotted as functions of time per design week (Figures 6 and 7).
The density values (Figure 6) and a cross-reference with the behaviour of the average degree of the nodes (Newman 2010), indicate that the time-dependent graph is sparse; it has much fewer edges than the possible number of edges.
In our case-study Latapy clustering coefficient generally lies in the region between 0.03 and 0.06 (Figure 6), but peaks in November 2020 and early 2021 followed by a big drop-off appearing after 2022.The values of Latapy clustering until summer 2020 are in keeping with values recorded for other social networks as in Latapy et al. (2008).All metrics display low spikes (sudden decrease in activities and clustering), which are attributed to disruptions from concurrent projects such as other vessel design or construction reaching critical milestones.In such cases, the activities are reduced because the focus is on accomplishing the tasks of the competing project (e.g.launch of a patrol vessel).In this respect, the effect of the allocation of resources on different projects immediately affects the design of ship under study.A multiple-project scenario can provide guidance on how progression in one project affects the completion and efficiency in others (section 4).The number of concurrent activities at each time exhibits the same behaviour (being from 500 to 800 before the peak of November 2020 and the big dropoff after 2022) (Figure 6).Just before the peak of November 2020, there is a gradual decrease in the number of activities and sudden drop to 480 activities.This is the impact that the COVID-19 pandemic had in the design process.This drop in activities includes the COVID-19 outbreak and UK's first lockdown (26/03/2020).From May to September 2020 employees of the design company moved from personal computers to laptops since a remote work policy was adopted by the company.The relocation from on-site to remote work, coupled with pandemic-related delays in completion of activities, definitely affected the design of the ship.The pressing deadlines and the delays from March -October 2020, led to the peak of November 2020, where a huge increase in activities can be observed, in an effort to put the original timeline back on track.A note-worthy remark is that, at the time-period leading to the peak of the activity plot, the number of individuals (Figure 7) involved with those activities is not increased, but is reduced by 1 individual.From the peak values, the designers would have to be involved in 57% more activities than before the pandemic.
The behaviour of the number of individuals demonstrates a key aspect on the hiring policy of the company in terms of contractors.From the total of ∼100 individuals involved with activities during the design process, at any time the maximum number is 13, i.e. one order of magnitude less.This reveals the company's contractor policy: contractors are hired to complete specific design tasks and then they are allowed to leave, their work being moved forward by a new team of contractors.The immediate effect of this is that very few individuals are present during the whole of the design process.
After 2022 all plots exhibit a drop-off, because the data have not been adjusted to reflect current information.This reveals that the company is not allocating resources (individuals or correcting-course activities) which implies that Giustiniano et al. (2016)'s term organisational zemblanity applies perfectly in this case study.Zemblanity means that 'we make our own misfortune', or 'unpleasant unsurprise', as stated in Love et al. (2019).The gist of it is that, in normal operations, the company knows that the number of activities ranges between 500 and 800 and the number of individuals ranges between 9 and 13.On the other hand the pandemic resulted in huge increases in activities and strain on the individuals, but still the projections for the future are in the range of 50-150 activities from ∼9 individuals.This is an indication that a design process analysis tool is necessary for those in charge of the design component of projects with this complexity and scale.
The Robins-Alexander clustering value is always 0 (Figure 6).The same is found when examining the static graph, which is but the superposition of all dynamic graphs.In Piccolo et al. (2018), a similar static bipartite (activitiesparticipants) network was created for an energy plant and the Robins-Alexander clustering was 0.3.A great difference of the two graphs is the size of the activity set, since activities in the naval ship design case range to 1000 (dynamic setting) and 12,700 (static graph), while in the power plant design they are 148.This value for the Robins-Alexander clustering coefficient means that there are no cycles of length 4, C 4 , in the network.Two individuals involved in the same activity are not involved in any other activity together.The zero Robins-Alexander clustering and the relatively small number of people involved at each time with the activities, indicate that the design process is heavily individual-based and hierarchical, i.e. no lateral communication (through a communication activity) between individuals takes place.This is also supported by the contractor hiring policy and the lack of communication across different contractors.
Since the naval ship design process is heavily individual-based it is reasonable to assume that the nodes with higher degree centrality correspond to individuals.However, the use of the bipartite definition for degree centrality results in activities being more central than individuals.The reason for this is the great difference in the number of elements in the activities and individual sets.The individual set is comprised of ∼100 nodes and the activity set is comprised of ∼12700 nodes.Dividing the degree of each node with the opposing set (Equations (6a)-(6b)) results in high degree centrality for activity nodes and low degree centrality for individuals.The bipartite degree centrality definition is particularly useful for comparative analysis between the bipartite sets, however in this network a limitation is discovered.A way to avoid this issue would be to neglect, momentarily, the bipartite nature of the network and use the classic degree centrality (normalization with all the nodes).This approach produces centrality values that are higher for the individual nodes and much lower for the activity nodes, which is reasonable, however it is not recommended.In this work, the bipartite nature is not neglected, but instead the two sets are treated separately, the ranking is kept within sets, and no conclusions across sets are derived, mitigating the impact of the bipartite degree centrality definition.
Sorting the centrality values at any time-instance of the TDG, returns individuals who are more involved with the activities and activities that experience most information traffic (Table 1).In the COVID-19 lowest traffic region (28/9/2020) (Figure 8), the most central activities were activities: . 2MFDA727: authorization of safety requirements for power distribution systems .2MEIT189: review and management of support systems .MAWNC103: logistics for 2020 From the individual set, more central are individual 56 and individual 2. Individual 2 is supply chain manager and their main duty is the procurement of equipment.A note-worthy observation in the data is that Individual 56 is not currently employed by the design company and they either had been hired as a contractor or have retired/moved to another company.Therefore, an individual who is central during the pandemic period of the design process is not participating in the design process as of May 2022.
In the November 2020 peak time-instance (Figure 8), the most central activities are: . 2MRD3T45: review of metallic pipe fittings .2MSEN101: changes in system engineering management .MPLHA575: delivery of design model and data set From the individual set, the most central nodes are: . Individual 25: engineering manager .Individual 30: head of electrical engineeringreviewer of electrical plans

Discussion and research continuation
A time-dependent bipartite social network is developed for the representation and analysis of the design process of a modern surface combatant.The model is tested on indicative data for part of the design process of the naval ship which are comprised of two main sets of nodes: activities and individuals engaged with activities.The analysis timeline spans 11 years and the time-instances are examined weekly (daily and monthly analysis is available).The main outputs of the model are plots of the network properties as functions of time for the complete design process (Figures 6  and 7) and availability of node-wise centrality information and ranking.The case study reveals the impact COVID-19 had on the design process, both in terms of delays until the design company adapts and the resulting increase in work to meet certain deadlines and  strain on the designers.The analysis of the number of individuals at each stage of the design and the topological structure of the network is entwined with the hiring/outsourcing policy of the design company of contractors and subcontractors, employed only for specific stages of the design process.Based on past and current data, all network properties exhibit a drop-off after 2022 that can be interpreted as an underestimation of the scale of the activities and subsequent clustering.The analysis of the bipartite network properties is the first step in developing a decision-making support and resource allocation tool for supervisors of complex design projects.In continuation of this work, a modern and interactive design process visualization platform and UI (user interface) will be developed, allowing more freedom for a design supervisor to explore the dataset, make changes in the interconnections of the network, and reassess the network behaviour.The next analysis step is the depiction and quantification of rework through task completion delays and labour units and exploratory data analysis with the aim of reducing rework in naval ship design processes and projects of similar scale and properties.An intriguing case study is the application of the model in the more complicated design process of the simultaneous manufacture of three naval ships, where progress in the first design directly affects the other two, where it is expected that the experience will lead to quicker and more efficient design completion.With three concurrent projects, from a multi-layered projects point of view, the involvement of participants, and the effect this can have the efficiency of the design and rework reduction, is significant both in terms of process management and human adaptation.
1) where |( • )| is the number of elements in the ( • ) set.The clustering coefficient in classic graphs measures the prevalence of triadic closure in the network and there are two main measures: node local clustering coefficient (LCC) and transitivity.The graph global clustering coefficient can be derived by averaging LCC for all nodes.Transitivity measures the percentage of 'open triads' that are triangles in the network.In bipartite graphs the concept of triadic closure is not applicable since triangles are inherently illegal.In Latapy et al. (2008), an expansion of LCC is presented where, instead of node local clustering, a clustering coefficient for pairs of nodes is used:

Figure 2 .
Figure 2. Pair clustering coefficient for different neighbourhood sizes.

Figure 3 .
Figure 3. Simple cycle and path of length three.

Figure 4
Figure 4 depicts a network that illustrates the importance of investigating network centrality with a variety centrality metrics, each returning a different 'more central' node.Degree centrality ranks nodes with more connections as more central in the network.This would mean that nodes 2, 6, D and E are more central since they have the maximum number of connections which is 3.The resulting centrality value is DC(2) = DC(6) = DC(D) = DC(E) = 0.4286.Closeness centrality measures how close a node is to all other nodes.By examining the example network, one would assume that nodes near the network centre, i.e. nodes D, E, 4 are going to have the highest centrality values.The resulting centrality values are CC(D) = CC(4) = 0.5758 and CC(E) = 0.5423.One would assume that nodes E and D would have the same value for closeness centrality.They have the same number of paths with length 5: E→4→D→2→C→1, D→4→E→6→G→7.However, node D has only one path of length 4, D→4→E→6→F, while node E has two such paths, E→4→D→3→A and E→4→D→3→B, which means that node E is less central than node D. Examining the betweenness centrality values, it is expected that nodes D, E, and 4 would be more central, since removal node D results in 3 connected networks, removal of node E results in 2 connected networks and an unconnected node 5, and removal of node 4 results in 2 connected networks.The betweenness centrality values are BC(D) = 0.7083, BC(E) = 0.6111 and BC(4) = 0.5833.Betweenness centrality ranks nodes as the more important connectors of distant clusters of nodes.Therefore, evaluating the betweenness centrality of node 1 returns zero since it is not a connector, but a node on the periphery.For node 1, the centrality values are: DC(1) = 0.1489, CC(1) = 0.3016 and BC(1) = 0.

Figure 6 .
Figure 6.Behaviour of density (top), Latapy and Robins-Alexander clustering (middle), and number of activities (bottom) and detail of example period.

Figure 7 .
Figure 7. Number of individuals involved with the examined activities.

Table 1 .
Maximum centrality values for two example cases (DC, BC, CC as introduced in section 2.2).