Comparison of visual, automatic and semiautomatic methods to determine ventilatory indices in 50 to 60 years old adults

The aim of this study was to compare different methods of detecting ventilatory indices (VI) and to investigate the impact of cardiorespiratory fitness (CRF) level on VI detection. Fifty females and fifty males completed a graded exercise test until volitional exhaustion with continuous gas-exchange measurement. The first and second ventilatory indices (VI-1, VI-2) were detected through different single automatic methods and through a semiautomatic method which combines visual and automatic detection methods. Additionally, the VIs were detected visually by two experts which served as the study specific gold standard. When comparing the semiautomatic method at VI-1 (intraclass correlation coefficients (ICC) 0.88 [0.81, 0.92], Bland-Altman bias ± limits of agreement (LoA) 55 ± 334 ml O2 · min ) and VI-2 (ICC 0.97 [0.96, 0.98], LoA 1 ± 268 ml O2 · min ) to the visually detected VI, high levels of agreements and no significant differences were found. This was not the case for any of the other automatic methods. Additionally, we couldn’t find any relevant differences regarding the CRF level. We therefore concluded that the semiautomatic detection method should be used for VI detection, as results are more accurate than in any of the single-automatic methods. Abbreviations: CPET: cardiopulmonary exercise test; CRF: Cardiorespiratory fitness; VO2peak: peak oxygen uptake; VI-1: first ventilatory indices; VI-2: second ventilatory indices; LoA: Bland-Altman bias ± limits of agreement; ICC: intraclass correlation coefficient. ARTICLE HISTORY Accepted 27 January 2020


Introduction
Incremental cardiopulmonary exercise testing (CPET) is used in research to provide prognostic values for populations (e.g. VO 2peak ), to assess functional capacity or impairment, to demarcate exercise-intensity domains, and to design training programs for athletes as well as for therapeutic purposes (Ross, 2003). The gold standard of measuring aerobic power, which serves as an index of cardiorespiratory fitness, is achieved by assessing maximal oxygen uptake (VO 2max ) (Ross, 2003). However, submaximal indices (e.g. anaerobic threshold, respiratory compensation point) have been shown to provide an alternative to VO 2max and have shown their prognostic value in assessing the effectiveness of training, rehabilitation, and classifying the aerobic capacity of individuals (Meyer, Lucía, Earnest, & Kindermann, 2005). Furthermore, when utilising CPET it has been shown that people with a low exercise tolerance may not reach the maximal workload needed to achieve VO 2max . Physical symptoms such as hypertension, dyspnoea or skeletal muscle soreness can limit exercise and therefore submaximal indices are highly valuable for interpreting aerobic capacity (Meyer et al., 2005).
Different submaximal ventilatory indices have been identified in previous research. A three-phase model with two ventilatory indices (VI-1 and VI-2; also called ventilatory threshold 1 and 2) is generally applied to detect the changes of metabolism, gas exchange variables, and ventilation during an incremental CPET (Binder et al., 2008;Meyer et al., 2005;Westhoff et al., 2013).
As shown in previous studies the ability of interpreters to detected VI is strongly dependent on their experience and the use of a systematic approach. Dolezal et al. (2017) reported agreement levels between interpreters using visual detection methods on VO 2 at VI-1 ranging from ± 195 ml · min −1 (expert) to ± 790 ml · min −1 (novice). Similar levels of agreement (± 100 ml O 2 · min −1 for VI-1 and ± 130 ml O 2 · min −1 for VI-2) between experienced interpreters were reported by Santos and Giannella-Neto (2004). In a recent paper Meyer et al. (2005) mentioned that false detection of VI (e.g. mistakenly identifying VI-2 as VI-1) is still common in clinical exercise testing.
To increase observer reliability of VI detection, automatic methods have been developed, that commonly use piecewise linear regression models (e.g. Beaver et al., 1986). Hereby an assumption of the total number of "breakpoints (i.e. VI)" must be made beforehand utilising bi-segmental or tri-segmental models. Ekkekakis, Lind, Hall, and Petruzzello (2008) compared nine automatic methods based on linear regression and found significant differences between the methods at VI-1 ranging from ± 790 ml O 2 · min −1 to ± 1.730 ml O 2 · min −1 . A difference in VI-1 of ± 830 ml O 2 · min −1 between an automatic method using linear regression models and a visual detection by experienced interpreters was reported by Dolezal et al. (2017). A possible explanation for these large differences is that these linear regression models can identify unreliable VI, especially in subjects with a low or undetectable VI-2 (Sherrill, Anderson, & Swanson, 1990).
Another VI-detection method consists of polynomial regression analysis without the need of assumptions about the shape of the curves or the number of "breakpoints". Moreover, the shape of the polynomial function and its first and second derivate (slope and curvature of the regression) yield some additional information concerning the characteristics of the curves (Sherrill et al., 1990). Santos and Giannella-Neto (2004) reported good agreement on VO 2 between visual and automatic detection methods (using polynomial regression and calculation of the first and second derivatives of EQO 2 , EQCO 2 , PETO 2 , PETCO 2 , VE vs. VO 2 , and VE vs. VCO 2 curves) ranging from ± 320 to ± 430 ml O 2 · min −1 for VI-1, and ± 360 to ± 440 ml O 2 · min −1 for VI-2 (32 participants aged 17-45 years). Wisén and Wohlfart (2004) used polynomial regression models to automatically determine various VI in VO 2 , VCO 2 , EQO 2, and EQCO 2 time curves by calculating the first and second derivatives in 19 male participants aged 20-48 years. They had to manually adjust 34% of VI-1 and 40% of VI-2 determinations after a visual inspection of the graphical representations because of implausible values. These high percentages demonstrate the difficulty of automatic VI detection. In their analysis, Wisén and Wohlfart (2004) also detected two patterns of VI related to the cardiorespiratory fitness (CRF) levels (i.e. aerobic power, VO 2peak ) of the participants, which may reflect different strategies of individuals to cope with the exercise strain indicated by the onset of lactate accumulation, isocapnic buffering or respiratory compensation. These different patterns may be one of the reasons for implausible values of VI using visual as well as automatic detection methods. Mixing up VI-2 with VI-1 is one of the most common methodological errors in the literature as well as results reported for VI-1 being usually too high (McLellan, 1987). Especially in subjects with low CRF levels, which are often unable to reach VI-2 during exercise testing or which does not show the expected ventilatory response curves, this false detection can occur (Meyer et al., 2005). To account for these different VI patterns and ventilatory response curves, detection methods should be tested on subjects with different CRF levels to insure their accuracy.
To conclude, the results from previous studies suggest that automatic VI-detection still needs to be visually inspected and corrected due to the high number of false VIdetection (Wisén & Wohlfart, 2004). Therefore, automatic detection methods should be further improved because precise identification of VI is essential for its utility in research and clinical practice.
Moreover, regarding an ageing population there is a need for reliable detection methods especially concerning sedentary and elderly subjects. To our knowledge, there are no studies comparing different automatic methods using polynomial regression in people older than 50 years and with different CRF levels.
Therefore, the aims of this study were: 1) to compare different methods (visual, automatic and semiautomatic) used to detect VI-1 and VI-2 in a group of 50-60 year old males and females from the Paracelsus 10.000 study (Salk, 2016), 2) to investigate the impact of the CRF level on the detection of VI-1 and VI-2, and finally 3) to create a recommendation for a systematic VI-detection method with clearly defined confidence levels.

Data collection and processing
After a 2-minute stationary phase and a 2-minute warm-up phase at 10 W, each participant performed an incremental cycle ergometer task (ergo select 200P, ergo line GmbH, Bitz, Germany) until volitional exhaustion at a pedalling rate of 60 rpm. The test started at a workload of 40-60 W for females with an increase of 10-15 W every minute until exhaustion, while males started at 50-110 W with an increase of 10-15 W every minute. Starting workload and the consecutive work rates were dependent on sex and body mass to insure that volitional exhaustion was reached after 8 to12 minutes of test duration (American College of Sports Medicine, 2013). A 5-minute recovery phase at 10 W was completed after test termination.
Attainment of volitional exhaustion was confirmed if at least two of the following criteria were met: (1) a plateau in VO 2 (2) EQO 2 > 30, (3) respiratory exchange ratio (RER) > 1.1, (4) reaching 90% of age predicted maximum heart rate, (5) pedalling rate below 50 rpm. The exercise testing was terminated if any complications or contraindications occurred (Ross, 2003). Datasets were excluded from further analysis, if the participants could not complete at least five minutes of the incremental exercise test, i.e. if they had not completed at least five load levels.
Continuous respiratory gas analysis and volume measurements were obtained breath-by-breath (Master Screen CPX, Jaeger, Hoechberg, Germany). Data from the stationary, warmup and recovery phase were excluded from further analysis. 10second time-based averages were calculated and the mean of the three successive highest 10-second VO 2 -values were taken as VO 2peak . Peak power output (PPO) was determined as the mean power output during the last minute of the exercise test (Merry, Glaister, Howatson, & van Someren, 2016;Robergs & Burnett, 2003).

Visual method for VI detection
The visual detection of VI was conducted by two experts independently. VI-2 was determined by finding the disproportional increase in a VE vs. VCO 2 plot (Wasserman et al., 1981) and VI-1 was determined by finding the disproportional increase in a VCO 2 vs. VO 2 plot (v-slope method; Beaver et al., 1986). The ventilatory equivalents method (Meyer et al., 2005) was then used to verify VI determination. The experts had no prior knowledge of any results or identity of the participants. In the presence of a difference greater ±300 ml O 2 · min −1 in VI detection between the two experts (Gaskill et al., 2001), a third expert identified the respective VI after which the two closest VO 2 values were used for further analysis. As primary outcome the mean VO 2 of the two detected values was used to denote VI-1_visual and VI-2_visual (Santos & Giannella-Neto, 2004). The visual detection method served as our reference method and was compared to the automatic and semiautomatic detection methods (Dolezal et al., 2017;Novais et al., 2015).

Semiautomatic method for VI detection
A semiautomatic method (semiauto) with a systematic approach using automatic and visual detection (conducted by an expert as depicted in Table 1) methods was developed to detect VI-1 and VI-2. For this purpose, different plots of the ventilatory variables together with the automatically detected VI (as described above) were reviewed and the expert visually chooses the most appropriate value ( Figure 2). To avoid false identification of VI-1 as VI-2, the VI-2 was detected first and then shown in the plots to help VI-1 detection. Finally, the expert assigned a level of confidence for VI-1 and VI-2 detection. The confidence levels were defined equal for VI-1 and VI-2 detection as follows (method A and B refer to the description in Table 1): Confidence Level 1 (two of the following three criteria had to be fulfilled): (1) The difference in the selected time point for the VI between the first (A) and the second (B) method was ≤30 s. (2) The selected VI could be clearly identified visually in both methods (A and B). (3) At least one of the automatically detected VIs was in agreement with the visually selected VI within a limit of ±30 s.
Confidence Level 2 (two of the following three criteria had to be fulfilled): (1) The difference in the selected time point for the VI between the first (A) and the second (B) method was ≤60 s. (2) The selected VI could be clearly identified visually with one of the two methods (A and B).
(3) At least one of the automatically detected VIs was in agreement with the visually selected VI within a limit of ±60 s.
Confidence Level 3: all cases which didn't fulfil the criteria for confidence level 1 and 2 were defined as indeterminable.
The VO 2 values corresponding to the selected time points of VI-1 and VI-2, detected with the different automatic and semiautomatic methods described above, were then obtained by fitting a 6 th order polynomial (method: linear least squares; robust fitting option: bi-square) to the VO 2 time series data.

Statistical analysis
Data are given as mean ± standard deviation. Two-way ANOVA and post-hoc tests with Bonferroni-adjusted α were conducted to compare CRF-subgroups separated by sex ( Table 2). The agreements between the experts for visual detection was assessed by calculating intraclass correlation coefficients (ICC [95% confidence interval]) based on a single-rater, absoluteagreement, and a two way mixed effects model (Koo & Li, 2016).
VO 2 values between detection methods and CRF-subgroups were compared separately for VI-1 and VI-2 using two-way repeated-measures ANOVA. Post-hoc tests with Bonferroniadjusted α were conducted to identify the differences between the detection methods.
The level of absolute agreement between the visual and the other two methods (automatic and semiautomatic) were evaluated by calculating ICC as described above and mean biases ±95% limits of agreement (LoA) according to Bland and Altman (2010) for VI-1, VI-2, and each CRF subgroup.
A statistical power analysis was performed a priori for sample size estimation using G*Power version 3.1.9.2 (Faul, Erdfelder, Lang, & Bucher, 2007). With an assumed Type I error of .05 and a Type II error of .20 for finding a large effect (f value = .4 (Cohen, 1969)) between CRF level groups, the projected sample size is approximately 26 participants per group. Thus, our proposed sample size of 50 subjects per CRF level group should be more than adequate for the main objective of this study. The level of significance was set at α < .05. The statistical analyses were performed using RStudio version 1.1.383 (RStudio Inc., Boston, Massachusetts, USA). Table 2 lists age, anthropometric characteristics and results of CPET for all CRF subgroups. There were no significant differences between the CRF subgroups in relation to age, height, VI-1_visual and VI-2_visual. For body mass and body mass index (BMI) we found significant higher values in participants of the low CRF subgroups. As expected, PPO, VO 2peak , peak heart rate (HR peak ) and number of one-minute steps to reach volitional exhaustion during CPET (step max ) were significantly higher in participants of the high CRF subgroups.

Agreement between researchers of the visual detection method
Agreement between the two researchers of the visual detection of 100 cases showed good to excellent reliability for VI-1 (ICC = .93 [.80, .97]) and excellent reliability for VI-2 (ICC = .98 [.98, .99]). Because of the high expert agreement and in accordance with others, the visually detected VI served as our reference method for further comparisons with the automatic and semiautomatic methods (Higa et al., 2007;Novais et al., 2015;Santos & Giannella-Neto, 2004).
(B1) To verify VI-2 selection, EQCO 2 , PETCO 2 , EQO 2 , and PETO 2 time plots were drawn and the identified inflection point from (A1) as well as the calculated VI-2s (min_EQCO 2 , maxcurv_EQCO 2 , max_PETCO 2 and maxcurv_PETCO 2 ) were shown on the plots for guidance. Based on the displayed information, a final visual selection of VI-2 was made. If the results of method (A1) and (B1) didn't agree, the result of method (A1) was preferred (Figure 2(b,c)) . (C1) A confidence level for VI-2 selection was given. # 2 nd step: VI-1 detection (A2) The first inflection point in the VCO 2 vs. VO 2 plot was visually identified, with the calculated VI-1 (maxcurv_VCO 2 /VO 2 ), the line of identity and the previously selected VI-2 shown on the plot for guidance (Figure 2(d)). (B2) To verify VI-1 selection, EQO 2 , PETO 2 , EQCO 2 and PETCO 2 time series plots were drawn and the identified inflection point from (A2), as well as the calculated VI-1s (min_EQO 2 , maxcurv_EQO 2 , min_PETO 2 and maxcurv_PETO 2 ) and the previously selected VI-2 (1 st step) were shown on the plots for guidance. Based on the displayed information, a final visual selection of VI-1 was made. If the results of method (A2) and (B2) didn't agree, the result of method (A2) was preferred (Figure 2(e,f)). (C2) A confidence level for VI-1 selection was given. # # The confidence levels were defined equal for both VI-1 and VI-2 detection and are described in the text. VO 2 obtained via different detection methods at VI-1 and VI-2 The absolute values (VO 2 in ml O 2 · min −1 ) corresponding to VI-1 and VI-2 detected via the different methods are shown in Tables 3 and 4. The estimates for VI-1 ranged from 977 ± 319 ml O 2 · min −1 (min_EQO 2 ) to 1334 ± 537 ml O 2 · min −1 (maxcurv_PETO 2 ) and for VI-2 from 1418 ± 457 ml O 2 · min −1 (max_PETCO 2 ) to 1735 ± 597 ml O 2 · min −1 (visual). The CRF-subgroup resulted generally in lower estimates for VI-1 and VI-2 than the CRF+ subgroup.
Post hoc testing revealed no significant differences of VO 2 to the visual method for maxcurv_EQO 2 , maxcurv_PETO 2 , and maxcurv_VCO 2 /VO 2 . Additionally, no significant differences were found in the semiauto method for VI-1 detection when comparing the total sample or the CRF+ subgroup. In the CRFsubgroup, none of the automatic methods differed significantly to the visual method (Table 3). For VI-2, no significant differences to the visual method were found for maxcurv_PETCO 2 and semiauto method when comparing the total sample or the CRF subgroups.

Agreement between methods
The highest level of absolute agreement of the visual method was found for the semiauto method at VI-1 (ICC = .88 [.81, .92]) and VI-2 (ICC = .97 [.96, .98]) in the total sample. Similar results were found for both CRF subgroups (see Tables 3 and 4). The lowest levels of agreements were found for maxcurv_EQO 2 , maxcurv_PETO 2 and maxcurv_VCO 2 /VO 2 method for VI-1, and Figure 2. Sample ventilatory indices (VI) determination using the semiautomatic method for one subject. The left panels (a-c) illustrates the first step of the detection procedure (VI-2), the right panels (d-f) illustrates the second step (VI-1). The data-points (open and closed symbols) represent the 10-second time-based averaged breath-by-breath results of the CPET. Open symbols = PETO 2 (2b/2e) and EQO 2 (2c/2f); Closed symbols = PETCO 2 (2b/2e) and EQCO 2 (2c/2f); The solid lines along the data-points represent the fitted 6 th degree polynomial with the 95% confidence interval (dark grey area). The dotted vertical lines represent the automatically calculated VIs, and the solid vertical lines represent the visually selected VIs. The vertical dashed thick lines in the panels 2b/2c and 2e/2f represent the visually selected VI-2 from panel 2a and the visually selected VI-1 from panel 2d, respectively. The dotted and dashed vertical lines in panels 2d -2f represent the finally selected VI-2 from the 1 st step (panels 2a -2c). max_PETCO 2 and maxcurv_VE/VCO 2 method for VI-2 in the total sample as well as in both CRF subgroups.
The mean biases ± LoA for the visually detected VI are shown in Table 3 and Figure 3 for VI-1 and in Table 4 and Figure 4 for VI-2. Although the mean bias of comparison between the visual and the various automatic methods was small (ranging from −292 to −216 ml O 2 · min −1 ), the dispersion of agreement was large with the LoA ranging from ±268 to ±777 ml O 2 · min −1 in the total sample. The smallest LoA were found for min_EQO 2 , min_PETO 2 and semiauto at VI-1, and for semiauto at VI-2 in the total sample and in both CRF subgroups. The confidence levels for semiautomatic detection method can be seen in Table 5.

Discussion
It has been shown previously, that changes in gas exchange analysed visually or automatically, and blood lactate accumulations correspond with each other. Therefore, the detection of VI from gas exchange and ventilatory variables is an appropriate non-invasive method to describe metabolic changes during incremental CPET (Binder et al., 2008;Meyer et al., 2005;Westhoff et al., 2013). The purpose of this study was to extend these findings by providing additional information about the accuracy of different methods used to detect VI-1 and VI-2 in a group of 50-60 years old males and females, and to investigate the impact of CRF level on VI detection. Further, we created a recommendation for a systematic VI detection method with clearly defined confidence levels.

Participant characteristics
Body mass and BMI, as well as the results of CPET (PPO, VO 2peak , HR peak , step max ) showed significant differences between the CRF subgroups (Table 2). Compared to reference values for VO 2peak published by Rapp et al. (2018), the CRF-and CRF+ subgroup presented with a mean VO 2peak correspond approximately to the 10 th and 80 th percentile, respectively.
In the current study, the visual detection method (the reference method in this study) revealed a VI-1 of 54 ± 10% of VO 2peak and VI-2 of 83 ± 10% of VO 2peak in the total sample. No significant differences for VI-1 and VI-2 could be found between the CRF subgroups. These findings are in agreement with other studies using automatic as well as visual detection methods (Gaskill et al., 2001;Meyer et al., 2005;Santos & Giannella-Neto, 2004;Wisén & Wohlfart, 2004).

Agreement between the visual and the various automatic methods
For VI-1 detection, we found higher agreement (expressed as ICC and mean bias ± LoA) between visual, and min_EQO 2 and min_PETO 2 method than for the "maxcurv" detection methods (maxcurv_EQO 2 , maxcurv_PETO 2 and max_VCO 2 /VO 2 ) in the total sample as well as in both CRF subgroups. As described by Meyer et al. (2005), the EQO 2 time series can produce two inflection points, a first corresponding to VI-1 and a second caused by hyperventilation at VI-2. When using automatic as well as visual detection methods, the second inflection point could mistakenly be identified as VI-1 (e.g. as a consequence to irregularities in breathing pattern at VI-1 or a greater disproportional increase at VI-2), which is "[. . .] the single most common methodological error in the literature" as stated by McLellan (1987). This second inflection point could also be a reason for the slightly smaller inter-rater reliability for visual detection of VI-1 than VI-2. This smaller inter-rater reliability for VI-1 detection is substantiated by the high number of cases (17 out of 100), where the disagreement between the two experts during the visual detection was greater 300 ml O 2 · min −1 and therefore had to be verified by a third expert. For VI-2 detection, agreement was similar throughout all applied automatic methods in the total sample with ICC ranging from .74 to .85, mean biases ranging from −292 to −69 ml O 2 · min −1 , and LoA ranging from ±511 to ±727 ml O 2 · min −1 . Despite the ICC-results have shown moderate to good reliability, the result of the LoA analysis indicates substantial Table 4. Mean ± standard deviation of oxygen uptake (VO 2 ) at ventilatory indices 2 determined with visual and automatic detection methods. Intraclass correlation coefficients (ICC) and mean bias ± 95% limits of agreement (LoA) for comparisons between visual and the different automatic detection methods (5 single-and 1 semiautomatic).  differences between the visual and the different automatic detection methods. Similar conclusions can be drawn from the analysis of the CRF subgroups for VI-2 detection. A thorough examination of the Bland-Altman plots for all automatic methods (Figures 3 and 4) revealed similar inter-method differences (y-axis) throughout the full range of their mean estimates (x-axis). It is noteworthy that, for nearly every automatic method we found subjects with values far outside the concordance intervals. This was found for the total sample as well as for both CRF subgroups (empty vs. filled symbols). Therefore, CRF level does not seem to have an impact on VI detection in 50 to 60 years old adults.
Despite the high ICC found for some of the automatic detection methods, taking into account the high LoA and the number of subjects with values outside the concordance intervals, none of the automatic detection methods used in our study can be recommended for VI-1 as well as VI-2 detection in 50 to 60 years old adults.

The semiautomatic detection method
It has been shown by Dolezal et al. (2017), that a systematic approach combining detection methods based on different ventilatory and gas exchange response variables can increase agreement on VO 2 at VI-1 between interpreters. However, the Bland-Altman plots of the difference between the determinations of ventilatory indices 1 (VI-1) by visual, automatic (min_EQO 2 , min_PETO 2 , maxcurv_EQO 2 , maxcurv_PETO 2 and maxcurv_VCO 2 /VO 2 ) and semiautomatic (semiauto) methods vs. the mean of their determinations. The solid horizontal line represents the mean bias between the two methods, and the top and bottom dashed lines represent the 95% limits of agreement [± 1.96 · standard deviation]. Open symbols = low cardiorespiratory fitness subgroup; Closed symbols = high cardiorespiratory fitness subgroup; Circles = females; Squares = males. agreement is strongly dependent on the experience of the interpreters and the characteristics of the subjects investigated. In the referred study CPET data from 10 healthy, recreational active young men (25 ± 5 years) were analysed (Dolezal et al., 2017). Gaskill et al. (2001) used CPET data from three different populations (athletes, active, and sedentary males and females aged between 15 to 52 years) to show improved agreement on VO 2 at VI-1 up to 11 ± 300 ml O 2 · min −1 by combining three different visual detection methods. We found no studies which looked at combined detection methods for VI-2.
In our study, the semiautomatic detection method yielded the highest agreement for VI-1 and VI-2 in 50 to 60 years old adults, shown by the lowest mean bias ± LoA and the highest ICC (Tables 3 and 4) in the total sample as well as in both CRF subgroups.
These improvements in inter-method agreement are similar to the studies mentioned above and therefore, this is the first study to show an improved accuracy of VI detection by . Bland-Altman plots of the difference between the determinations of ventilatory indices 2 (VI-2) by visual, automatic (min_EQCO 2 , max_PETCO 2 , maxcurv_EQCO 2 , maxcurv_PETCO 2 and maxcurv_VE/VCO 2 ) and semiautomatic (semiauto) methods vs. the mean of their determinations. The solid horizontal line represents the mean bias between the two methods, and the top and bottom dashed lines represent the 95% limits of agreement [± 1.96 · standard deviation]. Open symbols = low cardiorespiratory fitness subgroup; Closed symbols = high cardiorespiratory fitness subgroup; Circles = females; Squares = males. combining different automatic procedures along with visual judgement. The high confidence levels rated by the expert for the semiautomatic method (VI-1: level 1 = 63% and level 2 = 34%; VI-2: level 1 = 71% and level 2 = 25%) indicate that the semiautomatic systematic approach provides a useful support for VI detection. Especially in clinical circumstances, where false VI detection can lead to harmful consequences, e.g. by misclassifying surgical candidates into inappropriate risk categories (Vainshelboim et al., 2017), the semiautomatic method with clearly defined confidence levels can support clinical personal in their decision making. However, further research is needed to improve confidence level definitions for specific populations (e.g. age classes, athletes, sedentary or clinical populations).
The lack of blood lactate measures to validate the visually detected VI can be seen as a limitation. However, previous reports have shown good agreement between visually detected VI based on respiratory and gas exchange measures, and lactate responses (Binder et al., 2008;Meyer et al., 2005;Westhoff et al., 2013). Therefore, it can be considered appropriate using the visually detected VI as our reference method. Because of the high number of not detectable cases in the various automatic methods, only 43/100 complete cases for VI-1 and 64/100 complete cases for VI-2 could be used to calculate the two-way repeated-measures ANOVA. Therefore, results should be interpreted with caution.

Conclusion
The results of this study demonstrate that automatic methods based on polynomial regression are useful for detecting VI in 50 to 60 years old adults. In this study we could not find any differences between the CRF subgroups regarding the accuracy of VI detection. However, as different patterns in the gas exchange and ventilatory response curves may exist, the detection of VI based on a single automatic method can lead to incorrect VI detection and therefore false classification of aerobic capacity or inadequate training recommendations. The proposed semiautomatic detection method using a systematic approach and clearly defined confidence levels seem promising in overcoming these limitations.