Automated layout of origin–destination flow maps: U.S. county-to-county migration 2009–2013

ABSTRACT Visualizing large movement datasets with flow maps is difficult because overlapping flows create significant graphical conflicts that make accurate interpretation difficult or impossible. Interactive flow mapping applications allow users to explore large movement datasets by automatically generating flow maps from subsets of the data in response to queries by the user. However, even a small number of flows can overlap and cross each other in a way that impedes accurate interpretation. We introduce an interactive flow map of migration in the United States from 2009 to 2013 that uses a force-directed method to automatically lay out migration flows at the county-to-county and state levels. This map, available at http://usmigrationflowmapper.com/, aims at improving readability by automatically creating origin–destination flow layouts according to identified cartographic design principles. Map users explore high-level state-to-state migration patterns as well as detailed county-to-county movements through a custom user interface and interactive map features. We show migration flows between counties of different states by representing other states as nodes with a circular arrangement around the selected state, and connect county flows to those nodes. This constrains the map layout to a smaller area, reducing clutter and the amount of interaction required to view flows.


Introduction
Flow maps are a powerful tool for communicating information about geographic movement (Tufte, 1983). Flow maps depict quantitative flow volumes as lines of varying width connecting points or areas, where arrows optionally indicate flow direction (Dent, Hodler, & Torguson, 2009;Slocum, McMaster, Kessler, & Howard, 2009;Thompson & Lavin, 1996). When the actual routes taken by the flows have no significance, these maps are also called origin-destination flow maps (Dent et al., 2009).
There are several key challenges to creating origindestination flow maps. Visualizing large datasets of hundreds or thousands of movements in a map creates significant graphical conflicts that make accurate interpretation difficult or impossible. The readability of flows in origin-destination maps can be improved by reducing graphical conflicts such as overlaps and intersections between flows, but even a small number of flows can overlap and cross each other in a way that impedes accurate interpretation. Therefore, large datasets must undergo a great deal of generalization, and/or be represented as a series of maps that show selected subsets of flows (Tobler, 1987). High-quality flow maps are created manually, applying cartographic design principles to ensure flows are arranged optimally for both interpretation and aesthetics (Jenny et al., 2016). Unfortunately, it would be very time consuming to manually create enough flow maps to fully describe datasets with hundreds of thousands of movements.
Automated techniques for flow maps are not yet a common feature in interactive web maps, cartographic software, or in geographic information systems (GIS) software packages (Rae, 2011). As large datasets of movement information are becoming more accessible and are frequently updated, there is a growing need for new automated flow visualization methods. These methods should adhere to cartographic principles for flow maps to create maps that are easy to interpret correctly. Additionally, exploring large online flow data repositories requires web-based interactive visualizations. They should be designed for the general public and provide easy-to-use functionality, as well as an engaging and accessible user interface.
Our objective is to support the presentation and exploration of U.S. county-to-county migration through an interactive web-based flow map. The U.S. Census Bureau collects and publishes county-tocounty migration flow data for the United States on the U.S. Census website (U.S. Census Bureau, 2015). These data are freely available as tables containing hundreds of thousands of rows of 5-year flow estimates between all U.S. counties and county equivalents, with origins, destinations, and migration volumes. Our aim was to create an automated, web-enabled method to produce multiple flow layouts that adhere to cartographic design principles for best readability. Users should be able to explore the data and create alternative flow map layouts to answer basic questions about U.S. county and state migration, such as: What were the main patterns of migration between all states, and between counties within a specific state? Where were the largest net and total migration flows between states and counties? How many people migrated to or from a specific state or county?
We present an interactive flow map for web browsers of the 2009-2013 U.S. Census county-to-county migration flows, available at http://usmigration flowmapper.com/. Flow curves on our map are automatically generated on-the-fly from migration data tables according to established cartographic principles for improving the readability and aesthetics of origin-destination flow maps. This is the first web-based implementation of the force-directed flow map layout method developed by Jenny et al. (2017). We also introduce a new layout method developed for displaying county-level flows between states by representing other states as nodes with a circular arrangement. The automated layout method computes flow layouts in a few seconds in modern web browsers.

Flow curving
Automatically creating flow maps with straight flows is simple, but visual clutter is created by the inevitable occlusions that result as flows cross and overlap. In early attempts by Tobler (1987) to automate flow map layout using only straight flows, overlap between flows was such that all but the largest flows were partially or completely hidden, making detailed interpretation difficult or impossible. Flow maps created manually by cartographers usually contain curved flows (Jenny et al., 2016), which have several advantages over straight flows. Curved flows can be routed to avoid unnecessary crossings or overlaps, and they can be evenly distributed to make better use of the canvas space. Studies have also shown that people prefer curved visual objects (Bar & Neta, 2006;Silvia & Barona, 2009), prefer curved lines in flow maps specifically, and perform better when reading flow maps with curved flows (Jenny et al., 2016). To reduce clutter and improve the readability of flow maps, cartographers use the following design principles when curving flows (Jenny et al., 2016): . Flows are routed to avoid intersections with other flows. Figure 1(A) compares a preferred flow layout with fewer intersections and reduced overlap to a layout with many intersections and significant overlap between the same flows. . Single, symmetric, gradual curves are favored over sharp, asymmetric, or multiple curves. Figure 1(B) compares flows with preferred single, symmetric, and gradual curves to flows with multiple, asymmetric, and sharp curves. . Wide angles at intersections between flows are preferred over acute angles. Figure 1(C) compares a preferred wide-angle intersection to an acute-angle intersection between flows. . Flows are not allowed to touch or pass through nodes they are not connected to. Figure 1(D) compares a preferred flow that is routed around the unconnected node to a flow that passes through an unconnected node. . Narrow angles between flows at shared nodes are avoided. Figure 1(E) compares a preferred layout with wider angles between flows connected to the same node to a layout with narrow angles between flows.
Several automated methods for producing flow maps with curved flows exist. Ho, Nguyen, Åström, and Jern (2011) apply an adjustable curvature to flows, relying on the map user to alter flow curves in the event of excessive clutter. Guo and Zhu (2014) apply the same curvature to all flows to show direction; flows are curvy near the origin and straight near the destination. However, Holten and Van Wijk (2009a) found that curvature is not an effective method for indicating direction. Furthermore, naively curving flows uniformly as in the study by Xu, Rooney, Passmore, Ham, and Nguyen (2012) can result in increased visual clutter. Edge bundling has been implemented to merge neighboring flows going in the same direction (Holten & van Wijk, 2009b;Peng, Lu, Chen, & Peng, 2012), providing an improved view of overall flow patterns by emphasizing large volumes of flow between clusters of areas. As Ho et al. (2011) point out, flow bundling often makes it impossible to determine the amount of flow between two specific locations, a common task when studying migration movements. Edge bundling is well suited for high-level views of complex data, for flow maps with one or two origins and many destinations (or vice versa), or for flow maps where the quantity moved does not matter. However, it is less appropriate for the goals of our migration map.
Our migration map uses the force-directed method described by Jenny et al. (2017), which is the first method for automatically laying out origin-destination flow maps according to the cartographic design principles described above. Flows are modeled as quadratic Bézier curves, which differ from cubic Bézier curves by the number of control points. The location of the control points determines the shape of the curve. Quadratic Bézier curves have a single control point per curve segment, whereas cubic curves have two. While cubic Bézier curves are commonly available in interactive vector graphics editors, quadratic Bézier curves are often used for defining the outline of typographic glyphs and therefore widely supported for data exchange and rendering. For example, quadratic Bézier curves can be rendered with the Scalable Vector Graphics (SVG) and HTML5 Canvas web standards.
The force-directed method simulates physical forces that push flows apart. Flows emit repelling forces that push away the Bézier control point of neighboring flows, creating curved paths (Figure 2, left, control points are shown as square symbols). A spring connects the control point of each flow to the midpoint between the start and end point of the flow; this counteracts the repulsing forces of other flows, and prevents flows from curving too much (Figure 2, right). Repulsing forces are strongest against nearby flows, while spring forces strengthen as flow curvature increases. To generate the repulsing force of flows, evenly spaced points are first located along each flow (shown as circular symbols in Figure 2). A force vector is emitted by each of these points onto the points along all other flows and weighted using inverse distance weighting (Shepard, 1968). For each flow, all forces applied to each point along the flow are summed and divided by the number of points along the flow. This resulting force is then used to move the control point of the flow. The equilibrium state between all forces is computed through an iterative process, resulting in continuously curved flows that reduce the number of intersections and overlaps. Additional secondary forces are applied to further refine the layout and integrate additional design principles. For example, flows that pass through unconnected nodes (the endpoints of other flows) are moved off of the nodes when possible. Flows are also moved off the arrowheads of other flows when possible. The final result is a layout with fewer flow intersections, less overlap between flows, flows that tend to be symmetrically curved, and flows that avoid passing through the end points and arrowheads of other flows.

Flow generalization
Simultaneously mapping all U.S. county-to-county flows would result in excessive clutter and render the map unreadable. Even a subset of flows, such as all the flows for n counties of one state, has the potential to contain flows numbering n times 3219 (the number of counties and county equivalents in the U.S. migration dataset minus 1), which would still result in excessive clutter. The flows must therefore be generalized to create a useful map. Deciding how to generalize flows usually involves finding a compromise between the number of flows required to see significant movement patterns, and the number of flows that can be shown without negatively impacting map readability. Tobler (1987) discusses several methods and guidelines for generalizing flows, including sub-setting, which only shows flows for a selected geographic area, thresholding, which displays only the largest flows above some threshold, and merging, which groups origins and destinations (e.g. by proximity or administrative boundaries), and then merges flows between the groups. Our map uses variations of each of these methods, described in turn below.
Sub-setting: Sub-setting flows by selected geographic areas is an effective way to reduce the number of flows shown on a map at once. Interactive maps can include tools for querying migration movements by location, areal units, or movement volume (Rae, 2009;Tobler, 1987). A downside of interactive sub-setting is that it relies on the map user to discover interesting spatial relationships through repeated querying and exploration. The advantage is that users can focus on specific areas of interest and glean finer details that would otherwise be hidden in a static summary of the overall trends. Following Shneiderman's (1996) informationseeking mantra (overview first, zoom and filter, then details-on-demand), we include the ability to 'drill down' into the data to enable users to answer questions about migration movements at a local level, such as how many people migrated to or from a specific location. Users can choose to view flows at the stateto-state or county-to-county scale. At the state-tostate scale, users can view flows between the 50 states, Puerto Rico, and the District of Columbia (Figure 3, top-left), or they can select a single state to view the incoming or outgoing flows for just that state (Figure 3, top-right). At the county-to-county scale, users can view flows between counties of a selected state (Figure 3, bottom-left), as well as flows to or from a single selected county (Figure 3, bottom-right). When displaying flows to and from a specific location, users can choose to view only the incoming or outgoing flows. Additionally, users can switch between viewing net flows or total flows at any time. This sub-setting functionality provides considerable flexibility for choosing which and what kind of flows to view.
Thresholding: In most cases, sub-setting the flows by state or county does not sufficiently reduce the number of displayed flows to avoid a cluttered and ineffective map. Thresholding only shows the largest flows based on some threshold. As Tobler (1987) points out, the distribution of flow values for a typical migration dataset resembles a Pareto curve, with a small percentage of the flows accounting for a majority of all migrations. He recommends displaying only flows with an aboveaverage volume amongst all flows within the selected subset, which generally results in displaying less than 25% of all flows while including over 75% of the total flow volume (Tobler, 1987). In the case of the migration data used in our map, there are often hundreds of flows with an above-average flow volume (depending on the subset of flows being viewed), which usually still results in too much map clutter. Instead of assigning a constant threshold as Tobler suggests, our map provides controls for manually adjusting the maximum number of displayed flows (Figure 4). We have also included information in the legend about the percentage of the total flow volume currently represented on the map, which automatically updates to provide the user with information about migrations that are not shown.
Merging: Merging groups of end points and the flows that travel between them can reduce the number of flows displayed. Flows can be aggregated by administrative boundaries such as states, or with pointclustering techniques such as hierarchical clustering . We merge county flows by state when viewing flows at the state-to-state level. When viewing county-level flows for a selected state, we merge all flows between the same county in the selected state and all counties belonging to another state. For example, all flows traveling out of Monongalia County, West Virginia to any county in Pennsylvania are merged into a single flow from Monongalia County to Pennsylvania ( Figure 5). While this does remove some detail about county-level flows between states, such as flows between neighboring counties separated by a state boundary, the benefit is that many small flows are replaced with a small number of larger flows. Also, flows that travel to counties in other states are often relatively small and would not be shown after thresholding. By merging these smaller flows together, migrations to other states are larger and more likely to be shown. Sometimes, the merged flows to other states obscure the flows between counties within the state. We therefore include the option to filter out the merged flows to other states.

Circular node layout to shorten long flows
Flows at the county-to-county scale can vary significantly in length, as some flows travel between neighboring counties in the selected state, while others travel to counties in other states. The map must be zoomed in to the selected state to clearly see the shorter flows, but this requires users to pan and zoom around  the map extensively to see the longer flows to other states. To avoid this, we introduce a method for shortening these longer flows to create a flow map that is readable at a glance. We place circular nodes representing the other states around the selected state, and connect flows to those circles instead of the actual states ( Figure  6). When viewing county-level flows for a selected state, circular nodes are added to the map for every other state that has a county flow traveling to or from it. The radius of each state node is determined by the number of migrations between the selected state and the state represented by the node. The state nodes are arranged in a ring around the selected state (Figure 7, left). The ring is large enough to contain the selected state and avoid overlap between the state and the circular state nodes. The location on the ring where each node is placed depends on the location of the state they represent; we calculate a line between the centroid of the selected state and the centroids of every other state. We tentatively place the state nodes on the intersection between the line to their state and the ring around the selected state (Figure 7, middle). This causes some nodes to overlap. To reduce this overlap, we spread nodes apart using a graph layout method included in the open-source D3 JavaScript library available at d3js.org (Bostock & Davies, 2013;Bostock, Ogievetsky, & Heer, 2011). This method uses velocity Verlet integration (Verlet, 1967) to generate repulsing forces between nodes and efficiently track node locations. The D3 graph layout method is highly customizable; we add constraints to keep nodes on the ring around the state, and a collision-detection algorithm prevents nodes from overlapping (Figure 7, right). Larger nodes, which often have more flows to and from them, are given a stronger repulsing force to create extra space around them for additional flows.
This layout technique was inspired by work by Speckmann and Verbeek (2010), where they devise a method of producing 'necklace maps' with circular nodes similarly arranged around a group of polygons. Our method is different from theirs because the nodes represent areas outside the ring instead of inside, and a force-directed method is used to arrange the nodes instead of Speckman and Verbeek's method. The result is similar to Speckman and Verbeek's. Nodes occasionally overlap slightly using our method, and there is no hard restriction on how far nodes may be placed from their starting position. However, the layout is computed very quickly, and it is suitable for easy interpretation.

Symbolization
There are numerous other design elements aside from flow curvature, generalization, and layout that have a significant influence on flow map readability and aesthetics. This section describes the reasoning behind the selection of other important flow map design elements that are included in our map.
Flow width: Migration flows are represented as curved lines of varying widths. The width of each flow is linearly proportional to the number of migrations each flow represents. In order to avoid including flows that are too thin to see, flows that are below a minimum value are represented as dashed lines with a minimum width. Small flows are placed on top of large flows to increase the readability of the smaller flows, as suggested by Dent et al. (2009). We manually choose a maximum flow width for county flows in each state that ensures no flows are wider than they are long. The minimum flow width is always 10% of the maximum flow width.
Flow color: We vary the color of flows slightly. Flow color is darker for large flows, and is brightened linearly for smaller flows. We have not come across any studies that show altering flow color with flow volume to be beneficial, but we hypothesize that it improves the visibility of overlapping flows (for a comparison between constant and altering flow color, see Figure 9 in Jenny et al., 2016). We chose to make wider flows darker because it has the added effect of making the wider flows stand out more against the background features, which is appropriate for the larger, more significant flows.
Arrows: We use arrows to indicate flow direction. Tapered line widths have been recommended for indicating direction in non-geographic node-link diagrams (Holten & Van Wijk, 2009a), however, Jenny et al. (2016) found that arrows result in better flow map interpretation and are the preferred method for indicating direction amongst cartographers and map users compared to tapered lines. Koylu and Guo (2016) also found that the direction and magnitude of flow lines are faster and more accurate to read when using arrowheads instead of color gradients or only tapered line width.
We use linear interpolation to determine arrow size in the same manner as flow width, though we increase the size of smaller arrows slightly to improve visibility (Jenny et al., 2016).
Flow end points: The precise placement of each flow's start and end point is determined in several ways. At the state-to-state scale, end points are placed at the centroids of state polygons, except in some cases where we manually moved the point. We manually moved points only for state polygons when the centroid placement was far from optimal, which is the case for some irregularly shaped states, such as Florida. End points within counties are always placed at the centroid of county polygons. Flow end points are modeled as circles in the force-directed method for curving flows (Section 2.1). We connect flows to the outer edge of the circles at each of the flow's end points, rather than the centers of the circles. This is to avoid some of the overlap that occurs when many flows converge on the same end point (Figure 8). The end point circles are not shown to prevent them from being confused for city locations. End points for states are shown only when used in the circular layout scheme described in Section 2.3. In that case, the radius of each circle is determined by the total flow to and from the state they represent.
Choropleth map: We include a choropleth map of U.S. states and counties showing population density. Significant migration flows frequently occur between densely populated counties and states, which often contain major cities. A visual representation of population density helps to explain some of these major migration flow patterns. However, this information should be treated with care as population density can vary widely across counties and states.

User interaction
A key advantage of automating flow maps is the opportunity to create dynamic user interaction. Users can interact with the map to display different subsets of flows. States and counties can be clicked to see flows to and from those locations. We include buttons for switching between state-and county-level flows, switching between viewing net flows, total flows, incoming flows, or outgoing flows, and a button for hiding flows between other states when viewing county flows for a selected state. Layouts with 50 flows are generated in about 4 s on average (including data downloading and layout processing), using the Firefox web browser on a laptop computer with a 2.3 MHz Intel Core i7 CPU, with an internet connection speed of approximately 2 MB per second.
When users hover the mouse cursor over flows, states, or counties, a tooltip appears that provides details about how many migrations the flow represents, or how many people entered or exited the state or county (Figure 9).

Discussion
We set out to create an interactive flow map of 2009-2013 U.S. county-to-county migration estimates that automatically lays out flows according to identified cartographic principles for flow map readability and aesthetics. The force-directed layout method we use is shown by Jenny et al. (2017) to reduce intersections and overlap between flows, improving the map, and is fast enough to quickly create new layouts in response to user interactions and setting adjustments. Thousands of unique migration flow configurations are possible through sub-setting, which enables the exploration of detailed migration patterns between specific areas of interest at the state or county scale.
Some design variables used in our map remain challenging to automate, and are therefore determined manually. We manually set the maximum flow width for each state because the shapes and sizes of counties differ significantly between states. Tobler (1987) suggested that maximum flow width could be automatically determined based on the minimum distance between flow end points, but this method, when applied to our map, sometimes produces flows that are too thin to see easily. Also, the maximum number of flows that can be shown at once without introducing excessive visual clutter differs between flow subsets. This is mostly because the number of counties in each state varies widely, ranging from 3 (Delaware) to 254 (Texas). We allow users to manually adjust the number of displayed flows to help account for this variation. Alternatively, it may be possible to automatically determine this setting based on map variables, such as the number of counties in the selected state.
Another issue concerns the aggregation of flows. As pointed out by Guo and Zhu (2014), aggregating flows by arbitrary administrative units such as counties and states may not be the most appropriate way to describe and compare migration patterns. Other forms of aggregation such as hierarchical clustering ) may provide better insight into migration movements. Including additional flow aggregation options, such as hierarchical clustering, may help map users discover more detailed and meaningful migration patterns. Tools could also be provided to allow users to manually merge groups of selected counties to create highly customized flow maps. Our map is likely to be suitable primarily for data exploration and hypothesis development. Further utility and analysis capability could be added to the map by including additional filtering options based on census-recorded demographic information. For example, viewing the top 25 state-tostate net flows reveals that a disproportionately large number of people appear to be moving from the north-eastern U.S. to Florida. Adding the ability to filter or symbolize flows by age group or other demographics, as done by Guo (2009), could help map users address hypotheses they may develop about the causes of the patterns they see.

Conclusion
Automating the creation of origin-destination flow maps for large datasets like U.S. county-to-county migration flows presents many challenges, most notably the graphical clutter resulting from overlap and intersections. The method we use to curve flows according to cartographic principles reduces clutter and improves map readability. We manage the massive quantity of flows through multiple types of generalization, and provide interactive tools for exploring overall movement patterns as well as finer details for selected sub-regions. The map at http://usmigrationflowmapper.com/ is the first webbased interactive flow map to automatically lay out origin-destination flows according to established cartographic design principles. This demonstrates how new automated techniques can be used in web browsers to provide an interactive environment for exploring large movement datasets using flow maps.

Software
Version 3.5.17 of the D3.js JavaScript library was used for map creation and node layout (Bostock et al., 2011;Bostock & Davies, 2013). D3.js is available at d3js.org. The algorithm described by Jenny et al. (2017) was ported to JavaScript. Developed code is available at github.com/stephdan/MigrationFlowMapper-US. The code is designed for the specific scenario of U.S. county migration. Adapting the application to visualize additional county migration dataset for the United States is possible, though adapting it for other geographic regions would require significant restructuring of the code. Excel and Python scripts were developed for formatting U.S. Census data.