Evaluation of automated airway morphological quantification for assessing fibrosing lung disease

Abnormal airway dilatation, termed traction bronchiectasis, is a typical feature of idiopathic pulmonary fibrosis (IPF). Volumetric computed tomography (CT) imaging captures the loss of normal airway tapering in IPF. We postulated that automated quantification of airway abnormalities could provide estimates of IPF disease extent and severity. We propose AirQuant, an automated computational pipeline that systematically parcellates the airway tree into its lobes and generational branches from a deep learning based airway segmentation, deriving airway structural measures from chest CT. Importantly, AirQuant prevents the occurrence of spurious airway branches by thick wave propagation and removes loops in the airway-tree by graph search, overcoming limitations of existing airway skeletonisation algorithms. Tapering between airway segments (intertapering) and airway tortuosity computed by AirQuant were compared between 14 healthy participants and 14 IPF patients. Airway intertapering was significantly reduced in IPF patients, and airway tortuosity was significantly increased when compared to healthy controls. Differences were most marked in the lower lobes, conforming to the typical distribution of IPF-related damage. AirQuant is an open-source pipeline that avoids limitations of existing airway quantification algorithms and has clinical interpretability. Automated airway measurements may have potential as novel imaging biomarkers of IPF severity and disease extent.


Introduction
We present a clinical tool for the comprehensive evaluation of airway structure on volumetric Computed Tomography (CT) imaging of the lungs.We apply this tool to quantify traction bronchiectasis in idiopathic pulmonary fibrosis.Taking as input the CT image and a detailed airway segmentation, the tool outputs quantitative metrics on each airway segment, indexed by lung lobe and airway generation.

Bronchiectasis
Airways are tubular branching structures originating centrally from the trachea and extending to the lung periphery.Airways transport gases between the external air and the alveolar sacs at the airway terminus where gas exchange into the alveolar capillaries occurs.An airway segment is defined as a continuous tube running between two airway branching points.From the major bronchi that arise from the trachea, each new division of airway branches can be considered a new airway generation.In a healthy individual, airway segments narrow or taper in diameter as they run from the central to the peripheral lung.Tapering occurs both along an airway segment and with respect to the segment of the preceding airway generation [1].
Bronchiectasis describes a structural airway disease in which the airways lose their healthy tapered structure and become abnormally dilated within a segment.Various lung diseases are associated with bronchiectasis, including those driven by infection and or inflammation in the airway wall.In fibrosing lung diseases, of which idiopathic pulmonary fibrosis (IPF) is the hallmark fibrosing lung disease [2] the airways are pulled open by fibrosis and contraction of the surrounding connective tissue and airway dilatation is termed 'traction bronchiectasis'.
Bronchiectasis is typically evaluated by a radiologist following visual inspection of a chest CT scan.Evaluation of the extent and degree of dilatation of bronchiectatic airways allows the characterisation of lung disease severity and extent.In IPF for example, the prognostic importance of airway dilatation has influenced current diagnostic guidelines with the presence of traction bronchiectasis in an appropriate distribution now being classified as a probable usual interstitial pneumonia pattern [3].
The classical morphological signs of bronchiectasis include a) the visualisation of airways within 1 cm of the lung periphery, b) a lack of tapering of the airway, c) an airway diameter greater than the diameter of the accompanying pulmonary artery [4].However, solely relying on the comparison of the airway to its adjacent pulmonary artery can be misleading.Living at high altitude [5] and normal ageing [6] can result in non-pathological airway dilation which can be confused with bronchiectasis.Furthermore, there are pathological mechanisms that can result in changes to the size of the pulmonary artery such as smoking [7] and hypoxia-induced pulmonary vasoconstriction as a result of chronic lung disease [8].Typically the visual markers of bronchiectasis are assessed for severity on an ordinal scale and on a lobar basis [9].However, these scoring systems lack sensitivity, are time consuming to apply and are associated with disagreement between radiolo-gists.Consequently they are not used in routine clinical practice or research.Moreover, the ordinal scores conflate disease extent (number of involved lobar segments) and severity (degree of abnormal airway dilatation) which could potentially dilute the prognostic signal attributable to disease extent or severity individually.
Computational image analysis of lung CT imaging may allow the derivation of objective robust quantitative measures of abnormal airway dilatation extent and severity by quantifying dilatation to the nearest millimetre.The precise quantification of airway damage, may identify IPF patients at risk of rapid disease progression.Identification of such patients would be an important cohort enrichment strategy for recruitment into clinical trials of novel IPF therapies [10,11].

Idiopathic Pulmonary Fibrosis
IPF is a lung disease characterised by excessive fibrosis in the structural framework of the lung.The fibrotic process characteristically causes traction bronchiectasis in the lung periphery.There are 80,000 new cases of IPF diagnosed every year across Europe and the United States [12] with patients typically surviving only 3-5 years following diagnosis [13].

Previous Works
Acquiring tapering measurements involves executing an airway measurement algorithm at a perpendicular plane to the airway centreline at regular intervals, this is demonstrated in figure 1(a).To date, two methods have been used to measure the tapering gradient of airways on CT imaging.Both methods require an airway lumen segmentation and airway centreline.The first utilises the full width at half maximum edgecued segmentation limited (FWHM ESL ) technique, where a one dimensional profile through the airway wall from inside the airway lumen to outside the airway wall would trace a bell shaped curve.The boundary edges lie at the half-maximum value [14,15,16].The second method of measuring an airway tapering gradient aims to optimise an initial airway lumen segmentation to better fit the luminal airway surface on the image and the surface of the outer airway wall [17].
The methodology used in this manuscript utilises and expands on methods laid out in [15].The paper by Quan et al demonstrated a pipeline that calculated the airway tapering gradient from the carina (where the major bronchi split) to the distal-most point of an airway branch segmentation.The pipeline was validated following experiments demonstrating accurate tapering measurements on bespoke CT phantom tubes which encompassed airway tapering gradients in the range of 4% to 16% down to 2.5mm diameter [15].The method by [18] was shown to be able to identify differences in airway tapering in a small cohort of paediatric cystic fibrosis patients.The methodology allowed for global airway analysis, using the intertapering measurement.However, for successful execution of the method, time-consuming manual airway segmentations and skeletonisations were required.shows corresponding diameter measured using the presented pipeline, AirQuant.The ten percent trimmedmean, i.e. uses the central 90% of diameter measures to derive the mean, highlighted in red.

Proposed airway measurement pipeline
In this paper, we adapt existing segmentation and skeletonisation methods to reliably extract airway centrelines and branching tree-like structures.To replicate lung lobar classification systems used in standard radiological assessment, we have developed an automated lobar classification system that delineates the lingula (nominally part of the left upper lobe) as a distinct lobe.
Our methodology results in the first automated pipeline capable of measuring inter-branch tapering and tortuosity of all segmented airways on a lung CT.Inter-branch tapering, known as intertapering is the relative percentage change in average diameter of an airway segment compared to its parent segment.This measure was first considered by [18], though their method relied on manually drawn centrelines, with a heavy labour cost on the user, when deriving tapering measures.
We propose an end-to-end pipeline with the motivation of providing an objectively derived measure of airway morphology.It requires little to no manual input, making it a feasible clinical tool to measure airways in order to support clinical assessment.
The lungs are classified by lobe and by airway generation allowing easy localisation of focal airway abnormalities.We describe various quantitative expressions of airway morphological abnormality in IPF patients that delineate the extent and severity of fibrosis-related airway damage.
We also show that a deep learning regression approach to calculate airway inter-branch tapering and tortuosity directly from CT images failed to generalise, thereby validating our end-to-end pipeline approach.The comparison evinced an application where modern end-to-end deep neural networks, that are perceived to be more powerful, could not and in all likelihood should not replace the proposed hand-engineered solution.Our hand-engineered solution generalised more robustly, and demonstrated superior clinical performance on unseen data.
We contrast findings in IPF with airway metrics in healthy volunteers.Notably, we measure and report airway tortuosity for the first time.

Methods
Following automated airway segmentation based on a trained dilated 2D-UNet combined with region-growing, the proposed overall pipeline to process a chest CT is shown in figure 2. The centreline is first extracted by thinning (figure 2(a)) and a network graph is derived.The airway tree is parcellated into its individual branching segments where graph edges and splitting/end points are represented by nodes.(figure 2(b)).Cubic splines are then fitted to each segment's associated centreline points, allowing sub-voxel interpolation along the airway segment ((figure 2(c)).By taking the tangent to the spline at intervals, CT patches in the plane perpendicular to the airway centreline can be cubically interpolated (figure 2(d)).From these patches, the airway luminal diameter is measured using the (FWHM ESL ) technique by [14] (figure 2(e)).Using the set of luminal diameter measurements and total arc-length by spline fitting, metrics can be derived that describe each individual airway segment.
All code was written and executed in MATLAB † .

Study Data
We consider 14 healthy never-smokers (from Mayo Clinic, Rochester, Minnesota, USA) and 14 diverse IPF patients from  two centres (11 from St Antonius Hospital, Utrecht, Netherlands and 3 from Ege University Hospital, Izmir, Turkey).
Further technical details of the study data are included in table 1.None of the cases analysed in this study were used to train or test the airway segmentation algorithm mentioned in section 2.2.1.

Automated Airway Segmentation
As an airway segmentation is required to measure airway tapering, we implement a trained 2D dilated UNet model [19] and a region growing algorithm with explosion control using the software tool, Pulmonary ToolKit (PTK) ‡ .The two methods are executed on each case in parallel, with the results combined.Any objects unconnected to the airway tree are discarded.

2D UNet Segmentation
The dilated U-Net model is an improved version of the original U-Net model that replaces standard convolution layers with dilated convolution layers [19].The addition of dilated convolution layers allows the performance of local convolutional operations on a larger region without an increased computational cost but importantly maintains image resolution.Using this methodology provides greater pixel-wise context during training and at inference.Our model was trained on manually segmented airway trees performed in-house under the supervision of an experienced chest radiologist (J.Jacob Training was implemented using Adam optimiser [21], minimising the combined binary cross entropy [22] and dice similarity coefficient ‡ https://github.com/tomdoel/pulmonarytoolkitloss functions [23].The learning rate was initially set to 1e-5, reducing to 1e-6 upon loss plateau over three consecutive training epochs.The model achieved a training and validation accuracy of 88.5% and 87.2%, respectively.Implementation was in Tensorflow [24] and Keras [25] with Python.Data were preprocessed by limiting intensity levels to a standard lung window, level -500 Hounsfield Units (HU) and width 1500 HU, and then normalising to a range from 0 to 1.The slice size was limited to 512 × 512 pixels.Larger raw CT slices were downsampled by cubic interpolation to pass through the fixed-size model and inferred labels upsampled to the original size by nearest-neighbour interpolation.All analyses were conducted on the original image size.

Pulmonary Toolkit Segmentation
The Pulmonary Toolkit (PTK) [26] implements airway segmentation by region growing from a tracheal seed.The algorithm starts by thresholding the CT to air voxel density.A wave-front propagates from the tracheal seed and travels through the thresholded mask, classifying the volume traversed as being part of the airway tree.The wave-front maintains a thickness of multiple voxels in the larger airways, which helps prevent spurious splitting of the wave-front due to noisy interior voxels.Complete splitting of a wave-front indicates new branching.Sudden increases in volume of wave growth indicates parenchymal leakage of the airway segmentation.Leakage is controlled by an explosion multiplier, set such that if the number of new voxels in a wave front are more than the factor of the explosion multiplier of the previous wavefront, the algorithm defaults to its previous iteration.The default PTK settings used to avoid parenchymal leakage were implemented i.e. maximum number of generations was set to 15 and the explosion multiplier to 7.

Centreline Extraction
PTK's full skeletonisation method [27] is used here on the final airway segmentation.This is a thinning algorithm based on [28].It first identifies airway endpoints by re-running the same wave propagation step as the airway segmentation algorithm described above 2.2.2, without explosion control or generational limit.The thick wave-propagating component helps to delineate false airway branches from real branches, thereby identifying true airway endpoints.The trachea is marked as an airway endpoint from the outset.The binary object bounded by all endpoints is then iteratively reduced to its topological centreline.Although false branches are robustly removed, airway loops may have occurred at this stage, as topological thinning does not reflect the true anatomical tree structure of the airways.
A key innovation of PTK's skeletonisation method is a post-processing step which removes inner loops by retracing out the whole skeleton voxel-by-voxel, parsing branch segments in a depth-first-search style.Starting from the top of the trachea, it considers the 27-connected neighbour skeleton voxels of the last traced voxel to determine which voxel should be added to grow the airway segment.Upon reaching the end of a segment, i.e. a bifurcation point, it has fully parsed the current segment and starts parsing one of two (or more) new child-branches.In doing so, it acknowledges the discovery, though does not immediately parse that child-branch's sibling(s).It holds in memory the first voxel of every new segment it discovers.If a candidate skeleton voxel is found to match the first voxel of a discovered non-parsed segment, then it identifies that the whole candidate segment forms a loop.The offending segment is then terminated and removed from the final skeleton.

Airway Parcellation
To facilitate analysis the airway tree is divided into airway segments.This definition enables direct conversion of the airway centreline into a graphical network representation [29], where graph edges become airway segments and airway division points or airway endpoints become graph nodes (Figure 2(b)).The carina (point of division of the main bronchi from the trachea) can be identified by graph centrality.Conversion to a directional graph, setting the trachea as the origin with airways directed in the graph means that airway segments are directed from the central lungs outwards.Setting a direction in the graph facilitates subsequent steps in the pipeline, dictates the direction in which the spline is sampled and supports airway generation classification and lung lobe classification.Figure 3 demonstrates the capabilities of this graph representation.
Each individual airway segment has a cubic spline (piecewise polynomial function) fitted to its collective centreline points which is smoothed by a moving average along the segment starting from the proximal segment and moving distally.The spline is sampled at equidistant intervals, tracking the tangent of the spline at these points to calculate the airway's perpendicular plane for diameter measurements.The limit of resolution for the change in airway diameter is considered to be no less than half the shortest voxel diameter.All interpolation sampling sizes are set individually for each given CT image at half the shortest voxel diameter.

Airway Measurement
The CT image is interpolated at spline sample points perpendicular to the tangent of the spline such that the resultant image produces a slice along the natural long axis of the airway (Figure 2(d)).The interpolation pixel size is dynamically set to half the shortest voxel diameter of the given CT.Diameter measurements are made on these airway-perpendicular slices at spline sampling intervals.
On the airway-perpendicular slices, several radial density profiles that are uniformly spaced originate from the lumen centre, sampling the change in HU; a technique known as raycasting.These radial profiles of the airway wall typically appear as a Gaussian curve, due to the nature of CT imaging of thin structures [30].It is approximated that the wall centre falls at the Gaussian maximum and that the inner and outer boundary fall at the FWHM points, i.e. the half-intensity points either side of the curve.However, due to the nature of CT imaging and proximity of lung vessels to the airways, the radial profile does not always appear as a smooth Gaussian.In reality, these HU density profiles can appear noisy with several maxima.To improve robustness, the airway segmentation is also interpolated at the same points as the CT,   the nearest local maxima to the boundary of the airway segmentation is considered the wall peak.This method is known as the FWHM ESL technique as described by [14], and implemented and validated by [15] on phantoms down to 2.5mm.The inner airway lumen boundary is therefore identified as the first half-maximum.The second half-maximum relating to the outer wall boundary is often found to be more susceptible to noise, as the outer airway wall may have a structure with similar contrast and density obscuring its boundary, e.g. a blood vessel.Thus we only consider the accurately determined inner boundary (Figure 2(e)).As a perpendicular section through an airway forms an ellipse, an ellipse is fitted [31] to the inner wall boundary points.The ellipse total area, A E is calculated and a generalised diameter D G derived for that point in the airway segment.
Where our methodology differs from [15] is that we employ an outlier detection mechanism improving robustness, demonstrated in figure 4. The distance of each inner lumen point to the centre point is calculated, if this spans greater or less than 3 times the Mean Absolute Deviation (MAD) compared to all other points then it is considered an outlier.To support this, we choose to increase the number of raycast radial density profiles from 50 (every 7.2 degrees of rotation) to 180 (every 2 degrees of rotation).The greater the number of raycasts, the more robust outlier detection becomes, particularly for larger airways where the distance between lumen boundary points will be larger for a given number of raycasts.
Ultimately, an accurate intertapering value relies on accurate average measurements of the current and parent airway segment.By sampling the airway diameter at less than half the smallest CT voxel, we are ensuring that we get the most accurate diameter measurements these methods can obtain from the CT image.Figure 1(b,c) demonstrates a series of diameter measurements on a single airway segment.

Airway Lobe classification
Airway lobe classification is based on the method described and successfully tested on 300 COPD patients by [32].The classification algorithm takes the graphical representation of the airway tree described in section 2.4.As decribed in algorithm 1, it considers the positional coordinates of nodes (see Figure 5) starting from the carinal node to identify the main left and right bronchi as well as the left upper (LUL) and lower lobes (LLL).The lingula lobe (left middle lobe (LML)) was identified as the lower branch in the left upper lobe thereby mimicking the lobar classification used in clinical radiological lung assessment.Edges (airway segments themselves) are immediately classified as off-spring of classified nodes.
The right upper lobe (RUL) is the first division of the right main bronchus.The processing of classifying the right middle lobe (RML) and right lower lobe (RLL) is described in algorithm 2. The remaining non-classified right lung is taken as a subset of the overall airway graph and the difference in axial to lateral position of every end node is calculated.The most extreme nodes are considered to originate from the RML and the least extreme from the RLL.Tracing back the paths of these two nodes to the carina identifies the set of branches belonging to the RML.Finally the remaining unclassified airways are assigned to the RLL.
We suggest a quality control check where the reclassification of airways to specific lobes may be required because of anatomical variations of the airways tree.For example the presence of a tracheal bronchus, the most common airway tree variant with a prevalence of 1 percent in the general population [33] would result in a simple manual reclassification of the right upper lobe bronchi.Since only the final analysis is dependent on the lobe classification, reclassification can be done at the end without consequence to the pipeline.We implemented high-level functions that allows easy reclassification by indicating the most central airway segment or node for a given lobe.

Metrics
The two key airway-based metrics that have been computed from the derived data using the analytic pipeline are the airway intertapering gradient and airway tortuosity.Intertapering (equation 1) describes the difference in average diameter, d to the average of the parent airway dp compared to the parent airway.
Where average diameter dp is derived from the consecutive diameter measurements from the airway segment described in section 2.5.Algorithm 1 Lobe classification for airways adapted from [32].For upper, middle and lower left lobe and right upper lobe.

(a) (b)
Given directed graph of airways, G(N, A) with N nodes and A edges, n ∈ N and e ∈ A. Arranged in tree like structure from the carina node n C .N has positional information associated (x,y,z) in ascending order towards (left, posterior, superior) orientation respectively. [ Right upper lobe node else if n 4 (z) > n 3 (z) then n RU L ← n 8 end if Algorithm 2 Lobe classification for airways from [32] for the right middle and lower lobes.
Given that upper, middle and lower left lobe and right upper lobe nodes have been classified make subgraph of remaining nodes, G s (N s , A s ) with N s nodes and A s edges,  Airway tortuosity (equation 2) describes the arc-length of an airway segment, L a expressed as a ratio of its euclidean segmental length, L e .
Where euclidean segmental length L e is derived from taking the euclidean (straight line) distance between the start and end nodes of the airway segment.Both measures are dimensionless.

Normal generational range
The first study analysis aimed to distinguish the assessment of disease severity from disease extent.This is done by truncating airway data to the middle 50% of airway generations discovered across all healthy volunteers for each lobe.Henceforth, this range of airway generations from the 25th to 75th percentile is referred to as the 'normal generation range', exact values shown in table 2.
The airways conforming to the normal generation range were evaluated in IPF patients and healthy volunteers in two ways: namely at the airway level and then the patient level.For the airway level analysis, all individual lobar airways within the normal generation range were compared between IPF patients and healthy volunteers for both the airway intertapering gradient (Table 3) and airway tortuosity (Table 4).
For the patient level analysis, the median lobar intertapering gradient (Table 5) and airway tortuosity values (Table 6) were calculated at an individual patient level and compared between IPF patients and healthy volunteers.

Central and peripheral ranges
Two final analyses compared airways at the patient level, but categorised according to the generational level of the airways.The first analysis compared the median lobar airway intertapering gradient (Table 7) and airway tortuosity (Table 8) for airway generations 2-6.Airways in generation 2-6 can be segmented by most lung airway segmentation algorithms making the results of our airway analyses relatively independent of the quality of the segmentation tool.The second analysis eschewed the normal generation range and compared at a patient level the median lobar airway intertapering gradient (Table 9) and airway tortuosity values (Table 10) for airway generations 7 and beyond.

Deep learning Regression
We also considered whether an end-to-end deep learning model could learn and therefore predict the median intertapering and tortuosity value of each lobe for the 2nd -6th generations in IPF patients.
We ran AirQuant on 183 IPF cases from the same hospitals as the main data in section 2.1 to generate a training dataset.We considered reference-quality deep learning models Alexnet [34] and VGG16 [35] for our network architecture, adapting the final layer to a linear function to suit the regression task.Implemented in PyTorch [36] with mean squared error loss and AdamW [37] optimiser.We fed in 3D CT images of the lungs for the model to regress to six lobar measures of airway intertapering and six lobar measures of airway tortuosity.
Each image was preprocessed by cropping to a lung mask generated using the trained model by [38] and cubic resampling of all images to achieve a voxel size of 1x1x1mm.A centre crop is then applied to achieve a 256x256x256 3D input block.CT Hounsfield units are standardised such that the upper and lower limits of a lung window at 250 and -1250 are scaled to 1 and -1.Each regression metric, x is also z-standardised by the training sample's mean and standard deviation.These standardisation strategies as well as the implementation of batch normalisation [39] were implemented to support faster and more stable training and model convergence.Data augmentation was applied on the CT input with random Gaussian noise and random affine transformation with the exception of scaling.Methods were implemented using the torchio library for medical image deep learning [40].
All training was implemented on 32 GB NVIDIA Tesla V100-DGXS cards.

Statistical Analysis
Airway values for healthy volunteers and IPF patients are compared using the non-parametric Mann-Whitney U test.The median values for the various airway metrics are compared across equivalent lobes and airway generational ranges between the two study groups.As this is a pilot study of 28 individuals, multiple comparison corrections are not considered in the main text.These results, which do not alter any conclusions in the paper are shown in the supplementary material.

Results
Following successful segmentation, AirQuant analyses were fully executed on all airway segments identified.Failure in the pipeline only occurred at the lobe classification phase.All lobe classifications were visually checked and manually corrected if necessary before further analysis.To summarise, two IPF and two normal cases had LUL airways mislabelled as lingular airways.One IPF case had RLL airways mislabelled as RUL airways.One healthy volunteer exhibited an anatomical variant whereby two RUL branches originated from the right main bronchus and these were mislabelled.Additional lobe results are included in the supplement.Each case takes 4-6 hours to process in MATLAB on a 3 GHz Intel-i7 9th gen.processor with 16GB memory workstation.The only manual involvement was of very brief lobe classification corrections to six anomalous cases.

Quantitative Airway Analyses
When the intertapering gradient was evaluated at the airway level across the normal generation range, a significant reduction (p ≤ 0.004) in intertapering gradient was found in IPF patients compared to healthy volunteers.Airways from patients with IPF were shown to taper less than airways in healthy volunteers (Table 3, Figure 6(a)).Airway tortuosity was significantly greater (p ≤ 0.034) in IPF patients than healthy volunteers (Table 4, Figure 6(b)).Differences were most marked within the lower lobes, in keeping with the typical distribution of IPF airway abnormalities.
For airway changes examined at the patient level, significant reductions in intertapering were seen in IPF patients in the middle (p ≤ 0.004) and lower lobes (p ≤ 0.002) compared to healthy volunteers (Table 5, Figure 6(c)).Airway tortuosity was similarly significantly increased in the LML (p ≤ 0.035) and both lower lobes (p ≤ 0.002) (Table 6, Fig-ure 6(d)).The results reflect the lower zone distribution of disease typically seen in IPF patients.

Clinical Demonstration on Patients
For airway metrics examined at a patient level for generations 2-6, the segmental intertapering value was significantly reduced (p ≤ 0.044) in IPF patients in all lobes.Differences were most marked within the lower lobes (p ≤ 0.003) (Table 7, Figure 7(a)).Segmental airway tortuosity was significantly increased in left-sided lung lobes (p ≤ 0.009) and the RLL (p ≤ 0.001)(Table 8, Figure 7(b)).
Airways at generations 7 and beyond were not routinely seen in the middle lobes.Accordingly airway metrics were only examined in the upper and lower lobes.Significant differences in segmental intertapering (p ≤ 0.004) (Table 9, Figure 7(c)) and segmental airway tortuosity (p ≤ 0.023)(Table 10, Figure 7(d)) were seen in the lower lobes between IPF patients and healthy volunteers.

Deep learning Regression
Five fold training and validation curves shown in Figure 8 for an example case with VGG16 model hyperparameters set to learning rate 0.0001, batch size 8, dropout 0.1 and weight decay 0.5.The average R-squared values across the considered 12 variables, averaged across the 5 fold validation sets was (0.00 ± 0.07).Though the training loss was found to converge, this did not occur in the validation dataset with varying degrees of loss regularisation.Therefore the model was deemed unable to generalise beyond the training dataset.

Discussion
Our results highlight the potential for CT-based morphological airway measurements to identify differences in disease extent and severity of traction bronchiectasis in patients with IPF.To date airway damage in IPF has been thought to be a primarily distal airway phenomenon.Our results have supported this by showing that the largest distinction in intertapering and tortuosity between healthy volunteers and IPF patients was seen in the most distal airway generations.Yet the identification of a reduction in airway intertapering and increase in tortuosity in proximal airway branches suggests that AirQuant could provide valuable insights when applied across a wide range of the airway tree, meaning that it can be usefully applied to most airway segmentation methods.
Our findings are comparable to previous analyses of intertapering gradients of proximal airways in patients with idiopathic bronchiectasis and healthy controls [18].Our results diverged when more distal airways were evaluated, reflecting the ability of our 2D-UNET segmentation to identify larger numbers of more distal airway branches.The primary advantage of our method lies in its automation which allows hundreds of airways in a single patient to be analysed, and therefore makes evaluation of large patient cohorts feasible.
The concept that traction bronchiectasis scored visually on CT imaging could be a prognostic variable in IPF patients was first described almost 15 years ago [41,42].More recently, change in visual traction bronchiectasis scores has  [43].The challenges associated with employing visual traction bronchiectasis estimation however are many.These include the time consuming nature of visual analysis, the requirement for expert reads of the images, where experts are typically in short supply, are expensive to employ and are prone to interobserver variation [44].The potential that automating airway tapering assessments might alleviate some of the challenges associated with visual CT estimation of traction bronchiectasis by providing an objective, rapid and sensitive measure of lung disease severity in IPF was the mo-tivation behind the current study.
AirQuant is reliant on an airway lumen segmentation.Our pipeline can be implemented following any airway segmentation method.We chose to use automated segmentation methods, specifically to demonstrate that meaningful measures can be derived using AirQuant without recourse to labour intensive manual labelling of the extensive airway network.Our results are influenced by the airway segmentation algorithm and cannot be isolated from the underlying segmentation.Yet the statistically significant results shown for airway generations 2-6 indicates that disease severity metrics can be assessed using     predicting these variables in limited data.Furthermore, a naive 3D CT input network would only produce those final median values, losing the detailed information of the airway bronchial tree.Based on these results, we argue for the validity of our end-to-end proposed AirQuant pipeline.
Our deep learning regression approach to predict airway inter-tapering and tortuosity was not able to generalise.This may be a consequence of the sheer complexity of the diverse data being analysed.A lobe as described in our experiments comprises 2nd-6th segmental generation airways.In total, approximately 31 airway segments are contained within this generational range (assuming there are only airway bifur- cations rather than trifurcations).The 31 airway segments could comprise a variety of morphologies which may yet produce the same median morphological measure.Accordingly it would be extremely difficult for a model to train without additional contextual information being provided to the network, for example, registration of all images to a common space or paired in a multi-task approach with airway segmentation.
A further constraint lies in the heterogeneity of the input data which originated from two different centres where the CT imaging was produced on different scanners and reconstructed using a range of kernels.Table 1 highlights the various reconstruction kernels that were used in the training/validation data.Our findings highlight the challenges in applying a deep learning framework to airway morphological analysis and emphasise the utility of the AirQuant pipeline.
Our airway lobe classification algorithm can be affected by anatomical variants although these are relatively uncommon [32].In mitigation of this, the lobe classification of individual airways can be easily retrospectively corrected in our pipeline.As future work we aim to improve the robustness with which our lobar classification system deals with anatomical variants which will likely be present in very large data sets (>> 1000).
In the latter airway generations, an AirQuant segmental branch may in fact consist of multiple anatomical segments.This may occur if sibling branches are not identified in the segmentation and therefore not acknowledged as bifurcations.Though unlikely in the central airways, it can affect metrics in higher generations.One idea to tackle this is by a manual review of segments over a particular length, with methods to manually add in nodes to divide up segments.Though this could be a labourious task that would need a well developed suite of tools to facilitate.Despite having a relatively small sample size of 14 IPF cases we have demonstrated in a proof of concept study that our measurements can distinguish patients with traction bronchiectasis from those without.It will be important to validate the clinical impact of our measurements on larger IPF datasets.AirQuant was successful in examining every case, but relies on a good lumen segmentation.Efforts have been made to make the centreline extraction and graph conversion stage robust to loops.However when coming across branches that cause loops, it is difficult to determine which graph edge is the valid airway branch and which is the anomaly.It is likely that the wrong offending graph edge may have been re-moved in some cases, as it is not trivial to identify the anomaly in an automated way.
In conclusion, we have demonstrated that the airway intertapering gradient is reduced and airway tortuosity enhanced in IPF patients compared to healthy participants.The findings were accentuated in the lower lobes which is consistent with the typical distribution of traction bronchiectasis in IPF.Our pilot analyses suggest that automated airway analyses show great promise for the assessment of disease severity and extent in IPF trials and clinical care.

Figure 1 :
Figure1: Example of how tapering measures are derived for a given airway, here the left major bronchus is demonstrated.(a) The airway is outlined with a solid line with its segmentation-derived centreline represented by a dashed line.An airway segment is typically bounded by nodes which represent splitting of airways or end points.The airway diameter is measured at regular intervals between nodes along the segment, shown here as bold rings.(b) Our technique uses the reformatted CT interpolated along the airway segment's centreline between nodes.(c) shows corresponding diameter measured using the presented pipeline, AirQuant.The ten percent trimmedmean, i.e. uses the central 90% of diameter measures to derive the mean, highlighted in red.

Figure 2 :
Figure 2: Demonstrating the AirQuant pipeline graphically from left to right for a given airway lumen segmentation through to the end where the lumen boundary is established.
).The training and validation/testing dataset used in the development of the dilated UNET model comprised six normal CTs in healthy never-smoker volunteers, two normal cases from the EXACT09 competition data set [20] and 17 IPF cases, totalling 25 volumetric CTs.The axial slices of all CTs were amalgamated, randomised and split 80-20 into training and validation datasets, respectively.None of the images used to train/validate/test the dilated U-NET model were analysed in the current clinical study.As a 2D input model, the dilated U-NET only considered one axial CT slice at a time, without the context of the rest of the CT volume. 0

Figure 3 :
Figure 3: Example airway graph representation of an IPF patient with traction bronchiectasis, derived from volumetric computed tomography imaging using the pipeline presented in this manuscript.Airway divisions are represented by nodes and airway segments by edges.Edge thickness is proportional to average luminal diameter.The edge label represents the lobar generation of each airway segment: 0 for the trachea; 1 for the main and intermediate bronchi (B); 2 for the first intralobar airway; 3 onwards for each subsequent airway division.Edge colour is coded to lobe classification, RUL=right upper lobe, LUL=left upper lobe, RML=right middle lobe, LML=left middle lobe, RLL=right lower lobe, LLL=left lower lobe.

Figure 4 :
Figure 4: Airway measurement slices of the same perpendicular airway slice.Dots represent inner airway boundary detection of each raycast point.The ellipse is fitted to these points to derive final diameter measurements.(a) original method by [15] and (b) with our increased number of ray points and addition of outlier removal mechanism.Note that in (a) the ellipse is poorly fitted to the boundary due to raycast points that prematurely stop due to image noise.In (b) These false boundary points are classified as outliers and are therefore removed before fitting the ellipse.

Figure 5 :
Figure 5: Schematic showing typical airway lobe structure used to automatically classify airways into their lobes.Key airway-tree nodes that are identified in the automated lobe classification algorithm are labelled.The airways arising after a given lobe node are classified into that lobe.Hollow nodes are end-points at the extreme of the airway-tree.Solid lines indicate two nodes are connected by one airway segment.Dashed lines indicate multiple airway segments between two given nodes.Arrows indicate airways that extend beyond the boundaries of the schematic.G s represents the airways considered in Algorithm 2 to identify the right middle and right lower lobes.nC , carina node; nR, right lung node; nRML, right middle lobe node; nRMLep, right middle lobe end point; nRLLep, right lower lobe end point; nL, left lung node; nLUA, left upper area node; nLUL, left upper lobe node; nLML, left middle lobe node; nLLL, left lower lobe node.

Figure 6 :
Figure 6: Violin-box plots comparing normal healthy participants and idiopathic pulmonary fibrosis (IPF) patients.The level of significance for the Mann-Whitney U tests for each lobe is shown above the respective plot.(a) and (b) considers every airway segment from every subject in each group.(c) and (d) considers the median lobar values on a per patient basis within the normal generational range.RUL=right upper lobe, LUL=left upper lobe, RML=right middle lobe, LML=left middle lobe, RLL=right lower lobe, LLL=left lower lobe.ns, not significant, *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001 in Mann-Whitney U comparison tests.

Figure 7 :
Figure 7: Violin-box plots comparing airways on a per patient basis in normal healthy participants and idiopathic pulmonary fibrosis (IPF) patients.The level of significance of the Mann-Whitney U test for each lobe is shown above the respective plot.(a) and (b) consider airway segments between generations two and six.(c) and (d) only consider airway segments from generation 7 onwards.RUL=right upper lobe, LUL=left upper lobe, RML=right middle lobe, LML=left middle lobe, RLL=right lower lobe, LLL=left lower lobe.ns, not significant, *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001 in Mann-Whitney U comparison tests.

Figure 8 :
Figure 8: Mean squared error (MSE) training and validation loss curves for the deep learning regression task to AirQuant computed variables.Training curves are shown for each data fold in cross validation training of the VGG16 model, demonstrating that though the models converge in training they do not generalise in validation.

Table 1 :
Technical details of the CT data used in this study.Values shown as median (inter quartile range) across cases where applicable.Significance shown for comparison of number of lobe segments by Mann-Whitney U test * p < 0.05 and *** p < 0.001.

Table 2 :
Lower and upper quartile of lobar generation occurrence across healthy participants used to inform the 'normal generation range'.

Table 3 :
Intertapering of every investigated airway within the normal generation range

Table 4 :
Tortuosity of every investigated airway within the normal generation range

Table 5 :
Median lobar intertapering per case within the normal generation range

Table 6 :
Median lobar tortuosity per case within the normal generation range

Table 8 :
Median lobar tortuosity per case within generations 2-6 The failure of the deep learning regression models to gen-eralise to these variables derived from AirQuant suggests that a straight-forward deep learning approach is not suitable for