Validation of spectral simulation tools in the context of ipRGC-influenced light responses of building occupants

With the growing awareness about ipRGC-influenced light (IIL) responses, design applications related to these responses are flourishing. To compare design options and optimize lighting conditions for building occupants, lighting simulations are typically used. However, as our IIL responses depend on the spectral characteristics of light, spectral simulations are required. The goal of this study is to validate two spectral simulation tools, ALFA and Lark, for the study of indoor spaces in relation to occupants’ IIL responses. Indicators associated with IIL responses derived from ALFA- and Lark-simulated data are compared against indicators derived from data measured under indoor daylighting and electric lighting conditions. The results show that Lark outperforms ALFA in most cases, with a simulation error in the ±20% range for point-in-time indicators. When accounting for time dynamics of light exposure, at least 9% of the daylight exposures simulated for a 6-h period in Lark lead to a significant error.


Introduction
In the early 2000s, researchers identified a new photoreceptor in the mammalian retina, called the intrinsically photosensitive Retinal Ganglion Cell (ipRGC) (Brainard et al. 2001;Thapan, Arendt, and Skene 2001). This discovery led to the development of an entire new field of research, that of the ipRGC-influenced light (IIL) responses, also referred to as non-visual responses to light or non-image forming effects of light. When exposed to light, the ipRGC induces behavioural and physiological responses-the so-called IIL responses which include but are not limited to circadian rhythms-in humans (CIE 2009), even in the absence of rod and cone photoreceptors (Dacey et al. 2005;Dkhissi-Benyahya et al. 2007;Güler et al. 2007). While the rods and cones play a major role in our visual system, the ipRGCs play a leading role in our non-visual system.
More specifically, the human retina contains a population of ipRGCs, which, in the presence of light, express the photopigment melanopsin (or 'OPN4') (Foster 2021). Through their own neural connections, the ipRGCs send light-induced signals to the suprachiasmatic nucleus (SCN)-the site of the central pacemaker driving circadian rhythms-and other areas in the brain-such as those CONTACT Clotilde Pierson clotilde.pierson@oregonstate.edu 1501 SW Campus Way, Corvallis, OR 97331, USA Supplemental data for this article can be accessed here. https://doi.org/10. 1080/19401493.2022.2125582 implicated in the regulation of arousal (Cajochen 2007;Zhang et al. 2021). There seem to be two main pathways through which a signal sent by the ipRGCs induces IIL responses (Amundadottir 2016; Soto Magán 2021): -the indirect pathway, through which light can shift the timing of our circadian phase, i.e. our body internal clock, and for which the effect is not immediate; -the direct pathway, through which light can have more immediate effects on our alertness, melatonin level, or pupil size for instance.
Consequently, our ocular light exposure affects our health, well-being, and performance through the action of these ipRGCs.
From the evolutionary standpoint, outdoor daylight exposure would be the most suitable exposure for optimal IIL responses since our non-visual system has evolved to it for several millions of years. Over the course of the past 200 years, however, our life and work styles have changed drastically to the point that we now spend more than 90% of our time indoors (Klepeis et al. 2001). This change has had a strong impact on our light exposure, significantly reducing our daytime exposure and increasing our nighttime exposure by electric lighting and screens. As sleep disruption and other circadian rhythm-related health issues (Xie et al. 2019) affect a large portion of the population (Heschong 2021), and as occupants' well-being and performance are key concerns of building design nowadays (Hensen 2018), there is a need to improve building occupants' light exposure in relation to their IIL responses. Although there is not yet a consensus on the optimal light exposure that building occupants should get daily (if such optimal light exposure even exists, as there are large interindividual differences in the sensitivity to light of our non-visual system (Phillips et al. 2019)), recommendations for lighting design in relation to IIL responses have started to sprout (Stefani and Cajochen 2021).
Whether to implement such recommendations or to compare different building design options based on their potential for occupants' IIL responses, we need to predict the occupants' light exposure in the building. Lighting simulations are typically used for such purpose. Our IIL responses to light depend on various aspects of the light exposure, including (retinal) irradiance, timing (relative to the circadian rhythmicity), temporal characteristics (i.e. duration and variability over time), prior light history, directionality (although this aspect is still discussed (Prayag et al. 2019)), and spectrum (Khademagha 2021). Regarding this last one, the spectral sensitivity of the nonvisual system differs from that of the visual system, and peaks in the short-wavelength portion of the visible spectrum due to the contribution of the ipRGCs (Lucas et al. 2014). Knowledge concerning the extent to which rods and cones also play a role in driving IIL responses is still growing (Rahman et al. 2021). Unlike the visual responses for which light is described based on the photopic action spectrum and its related photometric units, there is no single action spectra for IIL responses (Amundadottir, Lockley, and Andersen 2017a;Schlangen and Price 2021). To provide guidance for the description of light in the context of IIL responses, the CIE therefore recommends the use of a framework based on five α-opic spectral sensitivity functions (CIE 2018). The application of this framework requires to evaluate light radiometrically, and not photometrically.
As most current lighting simulation platforms are usually based on a three-dimensional (RGB) colour space and photometric quantities (Ayoub 2019), such as the ubiquitous Radiance (Ward 1987), they are not appropriate per se to study the impact of different building design options and electric lighting decisions on building occupants' light exposure in relation to their IIL responses. A numerical study demonstrated that calculating the α-opic quantities from the CIE framework using a 3-band (RGB) spectral resolution-i.e. that of typical lighting simulation tools-produced considerable errors compared to an 81-band spectral resolution (Abboushi, Safranek, and Davis 2021). This is because the use of three values to summarize a spectral power distribution (SPD) affects the radiometric accuracy of the simulated light. Several workflows that rely on such three-dimensional lighting simulation platforms have been developed to study lighting or building design in the context of IIL responses (Gkaintatzi-Masouti, van Duijnhoven, and Aarts 2021), but while these workflows can be useful in some cases (e.g. for neutral indoor environments with only one type of light SPD), they do not take into consideration the interactions between the light sources and the materials from a spectral perspective. Spectral simulations, defined here as simulations for which the light transport equations are solved for more than three bands over the visible spectrum, are required to alleviate this problem.
A recent literature review listed 17 lighting simulation tools offering a spectral resolution (Gkaintatzi-Masouti, van Duijnhoven, and Aarts 2021). Among them, four only are able to output spectral irradiance and α-opic weighted quantities and can thus be used to study indoor spaces in relation to their occupants' IIL responses. In particular, ALFA (Solemma and Alertness 2018) and Lark (Inanici and Architects LLP 2015) emerged as the most commonly used for building design in relation to IIL responses, given that they were specifically developed for that purpose. Both tools rely on the physically accurate Radiance rendering engine, but Lark offers a 9-band spectral resolution while ALFA allows for an 81-band resolution. More detailed information about the two tools is available elsewhere (Balakrishnan and Jakubiec 2019;Inanici, Brennan, and Clark 2015). Up to now, no validation study can be found for these tools, i.e. no studies that determine whether the tools provide outputs within a satisfactory range of accuracy for their intended purpose over the domain of their intended applicability (Sargent 2013). One study did compare the outputs of ALFA, Lark, and Radiance simulations to one another (Balakrishnan and Jakubiec 2019). However, the scenes in that study did not represent the full domain of the tools' intended applicability, since these were outdoor urban scenes under daylight only. Moreover, the comparative analysis did not focus on the intended purpose of the tools, since no indicators associated with IIL responses were included.
With the fast-growing interest for design applications related to IIL responses, the validation of these spectral simulation tools is urgent. In two previous studies, the authors have investigated the performance-i.e. the accuracy and speed-of ALFA and Lark by simulating indoor daylight and electric light exposure based on spectral irradiance Andersen 2021a, 2021b). Besides ALFA simulations being almost three times faster than Lark simulations, the results showed that Lark produces more accurate results for daylight only simulations (most errors within the ±20% range), whereas ALFA was shown to be more accurate for electric light only simulations (errors between −28.6% and 33.4%). From these results, it still remains unclear whether such a range of errors in terms of spectral irradiance is acceptable for the evaluation of building and lighting performance in relation to IIL responses. Therefore, the objective of the present study is to validate those two spectral simulation tools so that they can be used for the prediction of light exposure indoors and the optimization of the built environment in relation to IIL responses.

Methodology
To validate a simulation tool that aims to represent an observable system, Sargent recommends comparing the simulation outputs to the system outputs and checking the range of accuracy (Sargent 2013). Additionally, this comparison needs to be done considering the simulation tool's purpose and domain of applicability.
In this case, the system that is simulated is light emission and transport-or more specifically, the spectral irradiance within the visible range-in a built environment. To take into account the simulation tools' purpose, which is to evaluate IIL responses of building occupants and use this information to optimize the building's (lighting) design, the measured and simulated spectral irradiance data are not compared directly but are used to derive indicators associated with IIL responses, which are then compared. Two indicators associated with IIL responses are selected for the comparison: the melanopic equivalent daylight (D65) illuminance (mel. EDI or E D65 v,mel ) (CIE 2018) and the relative non-visual direct response (r D ) (Amundadottir 2016). The four other α-opic EDI of the CIE framework have also been compared, and the analyses are reported in the supplementary materials. To match the simulation tools' domain of applicability, the comparison between the indicators associated with IIL responses derived from measured and simulated spectral irradiance is done for data collected in two experimental setups: one under indoor daylighting conditions and the other under indoor electric lighting conditions. This comparison is made through visual analyses of graphical displays and statistical tests. The simulation time, being another relevant factor of any simulation tool, is also included in the comparison. The experimental setups, measurements, simulations, indicators associated with IIL responses, and comparative analyses are described below.

Experimental setups
For this study, data were collected in two different experimental setups to comprise a representative range of the simulation tools' domain of applicability.
The first experimental setup considered indoor daylight exposure only. Spectral irradiance was measured vertically, at eye level, in seated position (i.e. 1.2 m above the floor), for three desk positions, facing a computer screen, in two office-like test rooms located in Eindhoven (51. 45°N -5.48°E) and in Lausanne (46.52°N -6.56°E). Each test room had only one glazed facade, facing west in Eindhoven and south in Lausanne. The desks were set up as in Figure 1 to generate three distinct lighting conditions. Light measurements-further detailed in section 2.2-were taken every 6 min between 8am and 6pm over the period between 30 January and 17 March 2020, in Eindhoven, and for the period between 21 August and 1 September 2020, in Lausanne. In total, this first dataset contains around 9000 datapoints, including 13% of clear skies, 34% of hazy/intermediate skies, 32% of overcast  skies, and 21% of rainy skies. The skies occurring during the measurements were classified based on the sky's clearness coefficient, ε, between clear (ε ≥ 4.5), intermediate/hazy (1.065 < ε < 4.5), and overcast (ε ≤ 1.065) (Perez et al. 1990). In situ pictures of the sky and precipitation data from local weather stations were also collected to subdivide the skies in the overcast category between the heavy rain clouds (i.e. rainy) and overcast categories of ALFA.
The second experimental setup considered indoor electric light exposure exclusively, and applies only to the Eindhoven location. Blackout blinds on the facade prevented daylight from entering the room. The luminaires were set up to generate distinct lighting conditions: Desk 1 with indirect light, Desk 2 with mainly direct light, and Desk 3 with both direct and indirect light, as illustrated in Figure 2. As the situation was basically static (as opposed to dynamic over the day when daylight is involved), data were gathered every 6 min over one hour (to ensure that fluctuations were minimal) for three different electric lighting scenarios: • the 'Fluorescent' scenario, with two luminaires Philips TBS600 1xTL5 49W HFP (CCT: 3030 K) • the 'Warm LED' scenario, with two luminaires Philips PowerBalance recessed tuneable RC464B PSD W60L60 1 xLED80S TWH (CCT: 2795 K) • the 'Cool LED' scenario, with two luminaires Philips PowerBalance recessed tuneable RC464B PSD W60L60 1 xLED80S TWH (CCT: 6436 K) The measured SPD of these electric light sources are displayed in Figure 3. For each scenario, the lighting was turned on at least one hour before the start of the measurements. Consequently, the measured spectral irradiance at each desk was constant throughout the measurement hour. The electric lighting dataset contains 99 datapoints, i.e. 33 under each electric lighting scenario.

Measurements
In both experimental setups, spectral irradiance was measured simultaneously for the three desk positions. The measurements were done by Ocean Insight spectrometers-the USB4000 model in Eindhoven and the Jaz model in Lausanne-that had all been previously calibrated with a radiometrically calibrated light source. Additionally, a Hagner SD2 luxmeter was located next to each spectrometer in both experimental setups (except for Desk 3 in Lausanne) to simultaneously measure vertical illuminance. 1 Vertical illuminance values were used to check the validity of the simulation models and scale the measured spectral irradiance if needed.
To anticipate the modeling needs to run the spectral simulations, additional measurements were collected, including the dimensions (and precise location) of the office-like experimental modules in Eindhoven and Lausanne and their furniture, as well as the spectral reflectance or transmittance of most materials found in the respective rooms. The dew point temperature, as well as the diffuse horizontal (DHI) and direct normal (DNI) irradiance together with High Dynamic Range (HDR) images of the sky were collected simultaneously to the test room measurements to be able to realistically simulate the sky for the setup under daylighting conditions. While Lark allows to input a custom sky spectral power distribution (SPD) to generate the sky model, and although the sky SPD had been measured simultaneously to the other measurements, it was decided to use the CIE Standard Illuminant D65 SPD for each Lark-simulated time step. This decision was made to provide a fair basis of comparison-i.e. a measured SPD cannot be inputted in ALFA-and to reproduce the conditions in which future users would run the simulations-i.e. designers typically do not have access to measured sky SPD. Finally, the SPD of each electric light source was also measured to spectrally simulate the light sources for the setup under electric lighting conditions. More details about these additional measurements are available elsewhere (Pierson, Aarts, and Andersen 2021a, 2021b).

Simulations
Before running the spectral simulations, the simulation models-i.e. the 3D geometry, and the material and light source definition-of both experimental setups were checked. This was done by comparing rendered images from the simulation models to images collected on site (Figure 4), and by computing the relative error between the vertical illuminance outputs of basic Radiance simulations and the vertical illuminance measurements (Pierson, Aarts, and Andersen 2021a, 2021b). There was no bias error between Radiance simulations and the associated measured values, and most relative errors in vertical illuminance were within the ±17.5% range suggested in the literature (Mardaljevic 1995). These results demonstrate that the simulation models for both experimental setups are valid, and the spectral simulations can be run in ALFA and Lark.
For both experimental setups, spectral irradiance was simulated with both ALFA and Lark for a seated person looking straight at each desk position. Regarding the experimental setup under electric lighting conditions, an IES photometric file of each luminaire was obtained from the manufacturer and uploaded in ALFA and Lark to simulate the electric light sources. It should be noted that Lark was originally developed to run daylight spectral simulations only. To conduct spectral simulations under electric lighting conditions, Lark developers shared with us a custom Grasshopper sheet based on existing Lark components. The method implemented in the sheet is described elsewhere (Pierson et al. 2021b). Besides the IES photometric file, a Light Loss Factor (LLF) of 0.8 was defined for the fluorescent light sources and of 1.1 for the LED light sources (Pierson et al. 2021b). As there is no option to input a LLF in ALFA, and as Lark does not allow for an LLF higher than one, the LLF of the LED light sources-as well as of the fluorescent light sources in the case of ALFA-was applied to the simulation outputs during the postprocessing phase. At last, the measured SPD of the electric light sources was inputted in the custom Grasshopper sheet for Lark simulations. Although ALFA does not offer the option to upload a custom SPD of an electric light source in its graphical interface, it is possible to add a custom SPD in ALFA library by copying it to the ALFA Luminaires folder on the file system. The three measured SPDs ( Figure 3) were added to ALFA library in this way, so that they can be used for the spectral simulations.
Regarding the experimental setup under daylight conditions, the spectral sky model is defined differently in both tools. In Lark, the spectral sky model is generated based on a spectral sky and a non-spectral, equal-energy white sun. Although the sky luminance distribution in Lark is originally defined using the CIE standard sky models, the script of Lark was edited to use the Perez sky model (Perez et al. 1990), which offers a luminous efficacy model, as recommended in the literature (Balakrishnan and Jakubiec 2019). In ALFA, the spectral sky model is generated based on a spectral sun and sky that are precomputed in a radiative transfer library, called libRadtran, that uses measured atmospheric profiles. Therefore, the only additional input required is the sky condition, and it is not possible to scale the sky model according to measured data. To overcome this issue (that would have made the comparison with Lark unfair), ALFA developers provided to us a modified version of ALFA 0.5.6.99 (that was not officially released), in which the irradiance of the sky model could be scaled based on the measured global sky horizontal irradiance (GHI).
The simulation process was automated for every time step in both Lark and ALFA. First, the constant inputs, such as the materials' spectral reflectance, the location, the D65 SPD of the sky, or the SPD and IES file of the luminaires, were provided to each tool in the required format. Then, both simulation tools were set up in such a way that they loop through all the time steps and that the simulations are automatically run one after the other for the three desks with the necessary inputs at each time step. Since Lark runs on the Grasshopper plugin, the automation for Lark was implemented through the component Fly. The Radiance parameters and the software versions used for the simulation in Lark are detailed in Tables 1 and 2, respectively. The RGB values at each desk resulting from each simulation run in Lark were then extracted. The average spectral irradiance over the waveband corresponding to each RGB value of each simulation run was derived based on the wavebands defined in Lark (Inanici, Brennan, and Clark 2015). On the other hand, the latest version of ALFA used in this study did not allow any automation in its interface. Therefore, a robotic process automation was used: a Python script leveraging the  PyAutoGUI package controlled the mouse and keyboard to automate the interactions with ALFA. ALFA was configured to run 75 passes and the parameters were set as in Table 1. It should be noted that not all Radiance simulation parameters can be modified in ALFA, as some are not accessible in the interface. For this reason, the parameters in both tools might not have had the same value. Nonetheless, the accessible parameters were set up in such a way that ten runs of the same simulation converge towards the same output in each tool, while minimizing the simulation time. At last, ALFA outputs the average spectral irradiance over each 5 nm consecutive waveband. More details about the simulations and simulation inputs are available in our previous studies (Pierson, Aarts, and Andersen 2021a, 2021b).

Indicators associated with IIL responses
For the validation of both spectral simulation tools, indicators associated with IIL responses derived from the measured and simulated spectral irradiance are compared. Such indicators, whether a metric or a mathematical model, support the objective interpretation of a light exposure in relation to the IIL responses that may be related to it. Because research on the IIL responses is still at its infancy, there exists only a few such indicators that have been either approved by experts internationally or endorsed by field studies. Consequently, only two indicators have been selected for the comparison in this study: the mel. EDI metric (CIE 2018), and the r D output of the non-visual direct response (nvR D ) model (Amundadottir 2016;Amundadottir et al. 2017b). The mel. EDI is one of the metrics of the α-opic framework that has been formalized as an SI-compliant system of metrology (CIE 2018). It considers the spectral sensitivity of the ipRGCs in its equation and therefore indicates how different light spectra can impact IIL responses. It takes the spectral irradiance at some point in time as an input, and outputs a value in lux. A group of experts has recently recommended a mel. EDI of at least 250 lux during daytime for healthy indoor light exposure (Brown et al. 2022). For all the time steps in the two datasets used in this study, the mel. EDI was computed as in Equation 1, based on the measured and simulated spectral irradiance.
where E e,λ (λ) is the spectral irradiance in W/m 2 /nm; s mel (λ) is the melanopic action spectrum; 0.0013262 is the melanopic efficacy of luminous radiation for daylight (D65) in W/lm. The nvR D model, on the other hand, is one of the two models that showed a reasonable positive correlation with measured values of alertness in a semi-controlled field study in realistic daytime conditions (Soto Magán 2021). The nvR D model requires a time series of spectral irradiance as input (for a period of maximum 24 h), and outputs a time series of relative non-visual direct responses (r D ), which can be integrated (over time) to generate a cumulative response (R D ). The r D values, expressed on a scale from 0 to 1, represent the direct driving force of the light exposure on the non-visual system, and are used for the comparison between system outputs and simulation outputs in this study. The r D was automatically computed via a MATLAB (v.9.8.0) code written by the author of the model (Amundadottir 2016) for all the time steps (one day at a time) in the dataset under daylighting conditions, and for all the time steps over one hour in the dataset under electric lighting conditions.

Analyses
To compare the indicators associated with IIL responses derived from measured and simulated spectral irradiance, it is recommended to apply two different types of analysis: visual analyses of graphs for a subjective comparison and hypothesis tests for a more objective one (Sargent 2013). In this study, the comparison is therefore made first through a visual analysis of scatterplots of indicators associated with IIL responses derived from measured versus simulated data for both ALFA and Lark. These scatterplots help detect tendencies of under-or over-estimation by the spectral simulations. Secondly, boxplots of the relative bias error (RBE) between the mel. EDI derived from measured and simulated data and of the bias error (BE) between the r D derived from measured and simulated data are also visually analyzed. The RBE and BE are calculated using Equations 2 and 3. Via these boxplots, the bias and distribution of the simulation errors for both tools can be analyzed.
where E D65 V,mel,sim is the mel. EDI derived from simulated data, in lux; E D65 V,mel,meas is the mel. EDI derived from measured data, in lux; r D,sim is the relative non-visual direct response derived from simulated data; r D,meas is the relative non-visual direct response derived from measured data.
Finally, hypothesis tests are applied to objectively compare the means of the indicators associated with IIL responses derived from measured and simulated spectral irradiance. Since there is more than one comparison (i.e. two spectral simulation tools), an analysis of variance (ANOVA) is recommended (Roungas, Meijer, and Verbraeck 2018). First, the assumptions (homogeneity of variance and normality of distributions) to apply an ANOVA are checked. Then, a normal ANOVA (Chambers, Freeny, and Heiberger 1992) or a robust ANOVA (Mair and Wilcox 2020) is applied depending on whether the assumptions are met (Field, Miles, and Field 2012).
It should be noted that when the direct simulation output-i.e. the spectral irradiance-is analyzed, no RBE or BE can be computed. The spectral irradiance consists of a large number of values, and not just one like the mel. EDI or the r D , making it impossible to apply Equations 2 or 3. In such cases, we calculated instead the Normalized Root Mean Squared Error (NRMSE) between the simulated and measured spectral irradiance using Equation 4.
where SI sim,i is the simulated spectral irradiance at the wavelength i in W/m 2 ; SI meas,i is the measured spectral irradiance at the wavelength i in W/m 2 . All the graphs and analyses were done in R (v. 3.6.1) with the packages data.table (v. 1.14.2), ggplot2 (v 3.3.5), readxl (v. 1.3.1), stringr (v. 1.4.0), and car (v. 3.0-11), and were applied separately to the two datasets. A separate analysis of the two datasets seems relevant as the data present different characteristics in terms of distribution and sample size, and due to the pertinence of evaluating the accuracy of the tools separately for daylight and electric light.

Spectral irradiance
Although not a direct indicator associated with IIL responses, the spectral irradiance measured and simulated in ALFA and Lark is compared in Figure 5, in order to link the findings of the present study to those from our previous studies (Pierson, Aarts, and Andersen 2021a, 2021b). The median and percentiles 25th and 75th NRMSE between the simulated and measured spectral irradiance was plotted against the mean simulation time for one run ( Figure 5).
Despite being around three times more time-consu ming than ALFA to run when the parameters are set as in Table 1 (and not accounting for the time required to set up the simulation, which is much more consequent in Lark), Lark provides the most accurate results in spectral irradiance under daylighting conditions. Conversely, ALFA appears to provide more accurate results than Lark under electric lighting conditions. Also, the range of NRMSE is much larger for the electric lighting scenes than for the daylighting scenes. As observed in our previous study (Pierson et al. 2021b), the more continuous the light source SPD is (i.e. the fewer peaks it contains), the better it can be spectrally simulated with ALFA and Lark.

Melanopic equivalent daylight illuminance
The mel. EDI derived from the simulated data was plotted against the mel. EDI derived from the measured data in Figure 6 for both daylighting and electric lighting scenes. In these scatterplots, it can be observed that the mel. EDI derived from simulated data generally align with the mel. EDI derived from measured data, despite a tendency of overestimation for ALFA simulations under daylighting conditions. Indeed, more than 12% of the mel. EDI derived from ALFA-simulated data under daylighting conditions are overestimated by 50% or more, against below 3% of the mel. EDI derived from Lark-simulated data.
In Figure 7, the boxplots of the RBE in terms of mel. EDI show that, under daylighting conditions, there is a larger distribution for ALFA than for Lark with a slight positive bias for ALFA and a slight negative one for Lark. Under electric lighting conditions, ALFA shows a larger distribution and a larger negative bias than Lark.
It should be noted, however, that the distribution of RBE in mel. EDI for ALFA is not the same under different sky types ( Figure 8). Indeed, it is mostly the mel. EDI derived for the clear sky type that suffers from a large positive bias. Additional results based on the same analyses using photopic illuminance are similar to those presented here. Since the error pattern is similar for melanopic and photopic quantities while these quantities are based on two spectral sensitivity functions with different peak sensitivities, it can be hypothesized that ALFA tends to overestimate the irradiance of clear skies, although the simulated spectral characteristics might be appropriate.
This hypothesis can easily be tested by analyzing the ALFA-simulated spectral irradiance under clear sky conditions against the measured one. Figure 9 displays one example of such comparison, which is representative of other cases. While the D65 spectral characteristics used for Lark simulations do not match well with those measured, the ALFA-simulated spectral characteristics appear similar to those measured, albeit with an overestimation of the ALFA-simulated irradiance, which is relatively evenly distributed over all wavelengths.  On the other hand, since the assumptions of homogeneity of variance (checked through a Levene's test) and of normal distribution (checked through a Shapiro-Wilk test) were not met, a robust ANOVA was applied for both the daylight and electric light datasets (Mair and Wilcox 2020). The hypothesis test reveals that there is a statistically significant difference in mel. EDI under daylighting conditions, but with a very small effect size. A robust post-hoc analysis-for which a critical p-value of 0.017 is used to control for family-wise error-shows that the  mel. EDI values derived from ALFA-simulated data are significantly greater than those derived from Lark-simulated data. However, there is no statistically significant difference between mel. EDI derived from measured data and those derived from the simulations.
Additionally, in view of Figure 8, a robust ANOVA was applied for each sky type separately. The tests show that there is a significant difference between mel. EDI derived from measured and simulated data under clear and hazy sky conditions, with a medium effect size for the clear sky conditions and a very small one for the hazy sky conditions. Robust post hoc analyses reveal that ALFA leads to overestimated mel. EDI values under clear skies. On the contrary, there is no statistically significant difference and a very small effect size between mel. EDI derived from measured and simulated data under hazy, overcast, and rainy sky conditions. These results agree with those presented in Figure 8.  Regarding the electric light dataset, the robust ANOVA indicates no significant difference in mel. EDI and a very small effect size when comparing mel. EDI derived from measured and simulated data. The results for all these hypothesis tests are reported in Table 3.

Relative non-visual direct response
The r D derived from the simulated data was plotted against the r D derived from the measured data in Figure 10 for both daylighting and electric lighting scenes. In these scatterplots, it can be observed that, contrary to mel. EDI analyses, there is a tendency for r D derived from ALFA-simulated data to underestimate r D derived from measured data. The other r D derived from simulated data tend to follow those derived from measured data, except for some outliers.
For the daylighting scenes in Figure 10, the outliers having a null r D derived from measured data are due to the spectrometers in the Eindhoven test room that stopped recording when the spectral irradiance became too low. Instead of being recorded as a very small value, which could have ended up in a notable r D , the measured spectral irradiance and the r D derived from it were considered as null. A similar explanation applies to the outliers having a null r D derived from simulated data, as the pyranometer in Eindhoven stopped recording when the sky irradiance became too low. As for the line of outliers  for ALFA-simulated data under electric lighting, it corresponds to the 1-hour time series of datapoints under the warm LED scenario at Desk 1. This was indeed the scene with the largest negative bias in spectral irradiance (Pierson et al. 2021b), hence having an underestimated mel. EDI. As explained later in the discussion, an underestimated mel. EDI will lead to a larger error in r D than an overestimated mel. EDI.
In Figure 11, the boxplots of the BE in terms of r D show that, under daylighting conditions, there is a larger distribution for ALFA than for Lark. There are no notable biases in both conditions. When looking at the distribution of BE in terms of r D for the different sky types (Figure 12), it can be observed that the distributions for ALFA simulations vary between the four sky types. Indeed, it is mostly the r D derived for the overcast and rainy sky types that suffer from a large distribution range, a tendency that is similar for Lark simulations, albeit to a smaller extent.
Also, since the assumptions of homogeneity of variance (checked through a Levene's test) and of normal distribution (checked through a Shapiro-Wilk test) were not met, a robust ANOVA was applied for both the daylight and electric light datasets (Mair and Wilcox 2020). There is Table 4. Results of the hypothesis tests on the difference between simulated and measured r D . a significant difference in r D under daylighting conditions, although the effect size is very small. A robust post-hoc analysis-for which a critical p-value of 0.017 is used to control for family-wise error-reveals that the r D derived from ALFA-simulated data are significantly smaller than those derived from measured data. However, there is no significant difference between the r D derived from Larksimulated data and those derived from measured data.
Additionally, a robust ANOVA was applied for each sky type separately. There is a significant difference between r D derived from measured and simulated data under clear, hazy, and overcast sky conditions, all of which have a small effect size. There is no significant difference under rainy sky conditions. Although these results seem counterintuitive when compared to the distribution of BE per sky type in Figure 12, robust post hoc analyses reveal that the significant difference is typically between r D derived from measured and ALFA-simulated data, and that the difference is of only 0.004 under clear sky conditions and −0.02 under hazy sky conditions, while it increases to −0.056 under overcast sky conditions.
Regarding the difference in r D under electric lighting conditions, the robust ANOVA indicates no significant difference and a very small effect size when comparing r D from measured and simulated data. The results for all these hypothesis tests are reported in Table 4.

Discussion
Under daylighting conditions, Lark proved to be a reliable tool to simulate spectral irradiance and derive indicators associated with IIL responses such as mel. EDI and r D . The indicators derived from Lark-simulated data demonstrate a reasonable accuracy, i.e. a relative error typically within the ±20% range for mel. EDI, which is consistent throughout the different sky conditions. One disadvantage of Lark is the need to input a user-defined SPD for the spectral sky model. The results of this study, in which a constant D65 SPD was used, and previous ones (Pierson, Aarts, and Andersen 2021a) demonstrate that, in those conditions, the sky irradiance has a much greater impact on the simulation accuracy than the sky spectrum. Nonetheless, the sky SPD (and related CCT) can vary largely throughout the day and the year (Inanici, Abboushi, and Safranek 2022). A spectrally accurate model of the sky, when available, is therefore expected to reduce the ±20% range of error in mel. EDI of Lark.
ALFA presents a more mitigated picture under daylighting conditions. On one hand, ALFA appears to overestimate the irradiance of the sky for clear skies, leading to overestimated mel. EDI in these conditions. On the other hand, the r D derived from ALFA-simulated data generate larger errors for overcast and rainy skies than for clear and hazy skies. These results might be surprising as they contradict each other. However, this contradiction can be explained: the nvR D model contains, among other functions, an intensity-response curve that returns values on a zero to one scale. This intensity-response curve is defined from a 4-parameter logistic function based on previous studies (Cajochen et al. 2000;Zeitzer et al. 2000). In the model, a feedforward term has been added to the intensity-response function to account for the adaptation of the non-visual system to prior light intensities 2 (Amundadottir 2016). Therefore, if we look at the intensity-response functions that are applied to the daylight dataset in this study (i.e. accounting for the adaptation to prior light intensities), we can observe that the range of mel. EDI for which it is critical to simulate as accurately as possible the spectral irradiance seems to be between 150 and 1500 lux (Figure 13). A slight difference in simulated spectral irradiance over this range (meaning a slight difference in mel. EDI derived from simulated spectral irradiance) would cost a significant difference in r D .
If we compare this range to the distribution of mel. EDI by sky type for the daylight dataset (Figure 14), it appears that, because most of the mel. EDI under overcast sky conditions lie in this range, there is a larger risk to make considerable errors in r D . Since ALFA tends to lead to underestimated mel. EDI under overcast sky conditions (Figure 8), there is a considerable negative bias in r D derived from the ALFA-simulated data.
From Figure 14, it could also be argued that a large amount of mel. EDI datapoints lies in that range for the hazy sky type, whereas the errors in r D seem more negligible for that sky type. However, we should keep in mind that, on one hand, ALFA tends to lead to overestimated mel. EDI under hazy sky conditions (i.e. towards the upper flattening part of the intensity-response function). On the other hand, the implemented function for the intensityresponse curve to adapt to prior light intensities relies on a logarithmic term (i.e. if we have a half-maximum constant at 53 lx for a prior light intensity of 1 lx, we will have a half-maximum constant at 106 lx for a prior light intensity of 10 lx; a half-maximum constant at 212 lx for a prior light intensity of 100 lx; and a half-maximum constant at 424 lx for a prior light intensity of 1000 lx (Amundadottir 2016)), while the sky intensities vary rather linearly over one day. Therefore, the difference between the light intensity needed to reach the half-maximum constant of the relative response and the ambient light intensity will typically be negative under hazy sky conditions, while it will typically be positive under overcast sky conditions. This means that the ambient light intensity under hazy sky conditions will typically be located again towards the upper flattening part of the intensity-response function.
Since the light intensities under clear sky conditions are mainly outside of the critical range (Figure 14), errors in r D are minimal (Figure 12). Although the ALFAsimulated spectral characteristics of the sky under clear sky conditions were found to be more appropriate than the Lark-simulated one in a previous study (Balakrishnan and Jakubiec 2019), the impact of the light irradiance appears to be greater than that of the spectral characteristics when computing indicators associated with IIL responses. Therefore, the version of Lark tested in this study is more relevant than the tested version of ALFA to derive such indicators for daylighting scenes.
These results mean that there would be a range of light intensities within which high simulation accuracy is required for spectral simulation tools to be used to optimize the built environment in relation to IIL responses. The simulation accuracy for light intensities outside of this range would not matter much, as demonstrated by the results under clear sky conditions. This finding brings nuance in the argumentation that 'great precision is not needed when designing lighting for melatonin suppression or to elicit an acute alerting effect because the physiological responses are modulated by large relative changes, not small fractional ones' (Houser 2021). Nonetheless, it is important to remember that the intensity-response function implemented in the nvR D model was developed from experimental studies conducted at night under electric lighting conditions (Cajochen et al. 2000;Zeitzer et al. 2000). Moreover, the logarithmic term used to represent the adaptation of the intensity-response function to prior light intensities was an assumption made during the development of the nvR D model and is not based on experimental results (Amundadottir 2016). Therefore, it is very probable that our non-visual system presents a different intensityresponse mechanism during daytime and under daylighting conditions, which would then impact the range of critical light intensities for spectral simulations, if only there is such a critical range.
For electric lighting scenes, ALFA was more accurate than Lark in simulating spectral irradiance (Pierson et al. 2021b). The outperformance of ALFA over Lark vanishes when the spectral irradiance is integrated to derive an indicator associated with IIL responses. Lark seems to make a better job than ALFA in simulating the right amount of energy in the short-wavelength channels (despite its low spectral resolution) as it leads to a smaller negative bias in mel. EDI. It could be argued that this result highly depends on the applied LLF. However, some mel. EDI derived from ALFA simulations are also overestimated, as shown in Figure 6. Overall, there is a larger distribution of RBE in mel. EDI from ALFA than that from Lark, but the difference in mel. EDI is never statistically significant. While these results are difficult to explain since both simulation tools use Radiance in the background and have the same inputs, they are based on a limited number of datapoints and should be interpreted carefully.
Most errors in mel. EDI derived from Lark-simulated data under electric light is within the ±20% range, which is similar to the range of errors for Lark under daylight. Regarding this ±20% range of errors, it should be noted that the range of errors actually due to the Lark simulation process is most probably smaller. Indeed, Abboushi et al. demonstrated that calculating lighting quantities such as α-opic values from a 9-band resolution spectrum compared to an 81-band resolution one led to a mean absolute percent error of 4% (Abboushi, Safranek, and Davis 2021). As this 4% error is included in the ±20% range of errors calculated in this study, the actual range of errors due to Lark simulations would be smaller.
Even though the simulation error is expressed in indicators associated with IIL responses in this study (i.e. a ±20% range of errors in mel. EDI and r D values derived from Lark-simulated data), it remains difficult to interpret such range of errors. Would a 20% difference in mel. EDI have a significant effect on occupants' IIL responses, hence making the comparison of different building/lighting design options through such spectral simulation tools irrelevant? To answer this question, we found only two studies that have investigated, under real daytime conditions (i.e. semi-controlled (or quasiexperimental) field studies), what difference in light exposure is related to a significant difference in the actual IIL responses of occupants -be it in terms of subjective or objective alertness, melatonin level, etc.
In the first one, the difference in participants' nonvisual direct responses to a high and low electric light scenario in a real office environment with daylight access was analyzed (Peeters et al. 2021). No significant difference in alertness or vitality was found in the study, but the authors reported a low statistical power and accuracy issues for the light logger devices, which prevented the verification of participants' exposure to significantly different light levels (Peeters et al. 2021). In the second study, the difference in participants' non-visual direct responses to different daylight scenarios (having varied light spectrums and intensities controlled through window filters) in a real workspace was analyzed (Soto Magán 2021). The light exposure was defined in terms of cumulative response, R D , which represents the capacity of the light exposure over time to influence occupants' non-visual direct responses. By comparing the results between two daylight scenarios (a neutral and a blue filtered daylight), it was observed that a difference of around 0.14 in R D after six hours (between 9am and 3pm) of indoor daylight exposure was related to statistically significant differences in self-reports of alertness and of objective reaction times. Since this R D value appears to be the most relevant one to define a threshold of accuracy for the validation of the spectral simulation tools, the R D was derived from the measured and simulated data in this study for each day available in the daylight dataset from 9am until 3pm. If the difference between the R D derived from simulated data and that derived from measured data is larger than 0.14, it was considered that the simulation error is unacceptable as it would create a significant difference in building occupants' IIL responses. However, if the difference were smaller than 0.14, we could not consider that such difference would not create a significant difference in building occupants' IIL responses as we do not know the minimal threshold at which a significant difference is observed.
An example of R D evaluated over a six-hour period is available for one day of the daylight dataset and two desk positions in Figure 15. Desk 1 is the position facing the window, hence with high mel. EDI, whereas desk 3 is the position facing the wall, hence with low mel. EDI. As discussed before, it can be observed that, when the mel. EDI is not in the critical range (i.e. for Desk 1), it does not matter to accurately simulate the spectral irradiance as it will not impact r D , which is used to derive R D . Conversely, when light intensities are in the critical range (i.e. for Desk 3), small errors in simulated spectral irradiance can lead to notable errors in r D , which then accumulated to a difference in R D larger than 0.14 in this case.
Over the 29 days and 3 desk positions included in the daylight dataset (summing up to a total of 87 R D ), 8 R D (9.2%) derived from Lark-simulated data presented a difference larger than 0.14 with the R D derived from measured data, against 23 R D (26.4%) derived from ALFAsimulated data. This means that, under similar conditions, at least 9% and 26% of the simulated daily light exposures in Lark and ALFA respectively would lead to wrongly predicted building occupants' non-visual direct responses. These findings need, however, to be interpreted carefully, as these numbers will highly depend on the range of light intensities in the simulated scenes and on the implemented intensity-response mechanism, as discussed above.

Limitations
This study presents some limitations, which should be considered when interpreting its results. The main limitation lies in the range of visual scenes used in the study; these scenes typically contain only neutral colours without strong specular reflections. Although they might be representative of typical offices, these scenes are fairly simple to simulate spectrally and did not allow us to test the two tools under more extreme conditions. Additionally, only one light source type is considered for each different visual scene. Besides being less realistic (i.e. few offices today operate under daylight or electric light only), we did not test how the tools perform when different light source types interact in a scene. In future studies, it would be interesting to take these limitations into considerations and validate the tools under more extreme (with coloured materials or specular reflections) and more realistic (with a mix of daylight and electric light) scenarios.
Another limitation lies in the fact that the system outputs, i.e. the indicators associated with IIL responses, are derived from measurements. In the literature, it is argued that measurements are only estimates of the actual truth as there always is a measurement error involved (Houser 2021). In this study, it was assumed that, since all measuring equipment had been previously calibrated according to the manufacturer recommendations, these measurements could be used as representatives of the truth.
Additionally, it should be noted that other indicators associated with IIL responses exist, with some offering a more complete model (i.e. accounting for additional aspects of the light exposure in their equations) of our non-visual system. This is the case for the arousal dynamics model that considers the timing of the light exposure on top of the light spectrum, intensity, and temporal characteristics (Tekieh et al. 2020). However, we chose to base the comparison on only two indicators, the mel. EDI and the r D , as there was either an international approval or an existing study in realistic daytime conditions endorsing the validity of these two indicators.
Finally, the number of electric lighting scenes in this study was rather limited, which reduces the statistical power of the analyses. To validate these results, the validation of these spectral simulation tools should be applied to a larger sample of electric lighting scenes.

Conclusion
In this study, we aimed at validating two spectral simulation tools, ALFA and Lark, for the study of building occupants' IIL responses. The validation methodology consisted in comparing indicators associated with IIL responses derived from simulated spectral irradiance to indicators derived from measured spectral irradiance. Therefore, spectral irradiance was collected at three desk positions over several days in two test rooms facing south and west to create a dataset of indoor daylighting scenes, as well as over one hour under different luminaires in one test room to create a dataset of indoor electric lighting scenes. In total, the daylight dataset contains around 9000 measured, ALFA-, and Lark-simulated spectral irradiance, whereas the electric light dataset contains 99 measured, ALFA-, and Lark-simulated spectral irradiance. The melanopic equivalent daylight illuminance (mel. EDI) and the relative non-visual direct response (r D ) were derived from these data and compared through visual analyses of scatterplots and boxplots and hypothesis testing.
The results show that Lark, which is approximately three times slower than ALFA, leads to more accurate indicators associated with IIL responses under daylight and slightly more accurate indicators under electric light (i.e. errors in mel. EDI within the ±20% range). More specifically, there were no statistically significant and practically relevant differences between the point-in-time indicators associated with IIL responses derived from measured and Lark-simulated data. However, when accounting for time dynamics of daylight exposure and based on a threshold defined in a former semi-controlled field study (Soto Magán 2021), it was shown that at least 9% of the daylight exposures simulated for a 6-hour period in Lark lead to a significant error in the indicator associated with IIL responses, R D . When using Lark to optimize design decisions related to occupants' IIL responses, it is recommended to take this last result into consideration.
ALFA led to more mitigated results. Despite the fact that the relative SPD of the simulated sky under clear sky conditions in ALFA might be more accurate than that in Lark (Balakrishnan and Jakubiec 2019), ALFA tends to overestimate the irradiance of the sky under clear sky conditions. This leads to overestimated mel. EDI under these conditions. Additionally, it was observed that r D is more sensitive to underestimation errors than overestimation ones, especially for light intensities in the critical range of the intensity-response curve. Since ALFA led to a small but still notable negative bias in mel. EDI under overcast skies, the r D derived from ALFA-simulated data in these conditions suffered from a moderate to large error. In addition, when accounting for time dynamics of daylight exposure and comparing R D derived from ALFA-simulated data over a 6-hour period against that from measured data, it was shown that at least 26% of the simulated daylight exposures led to a significant error in R D .
In view of these results, it can be concluded that improvements still need to be made on these two spectral simulation tools before they can be used as reliable and efficient decision-making tools for the optimization of the built environment with respect to the occupants' IIL responses. Moreover, until we have a better understanding of our non-visual system, and especially its operation under realistic daytime conditions, we will not be able to define with a reasonable level of confidence what difference in light exposure matters for our IIL responses, nor what level of accuracy should be aimed for in these spectral simulation tools.

Notes
1. When mentioned alone in the text, illuminance refers to photopic illuminance. 2. The term 'light intensity' comes from the intensity-response curve (Cajochen et al. 2000;Zeitzer et al. 2000), and was originally related to photopic illuminance. As more relevant quantities related to IIL responses have been developed (i.e., ipRGC effective irradiance or mel. EDI), the intensityresponse curve has been adapted to these new quantities. The term 'light intensity' is used to encompass these variations.