Using correlations between observed equivalent black carbon and aerosol size distribution to derive size resolved BC mass concentration: a method applied on long-term observations performed at Zeppelin station, Ny-Ålesund, Svalbard

Abstract The aim of this study was to explore particle size dependent properties by combining long-term observations of equivalent black carbon (eBC) and number size distributions to investigate their correlation as function of particle size. The work was conducted in two main parts. The first part consisted of a short laboratory experiment to compare observed total particle light absorption (σabs) with that observed according to particle size by using a combination of a Differential Mobility Analyzer (DMA) and a Particle Soot Absorption Photometer (PSAP). The laboratory study confirmed strong similarities between the observed and derived σabs. In the second part the statistical approach using correlation between the σabs and the dN of each bin of the number size distribution was tested on long-term data ranging from 2002 to 2010 observed at Zeppelin station, Ny-Ålesund Svalbard. The data was clustered according to the number size distribution and grouped in four major categories: Washout, Nucleation, Intermediate and Polluted. Each category presented different features with respect to the derived eBC mass distributions, the Intermediate category showed similarities to the few available Single Particle Soot Photometer (SP2) observations in the Arctic. Overall, the statistical distribution of eBC, according to particle size, presented a larger dynamical range in the location of the mode(s). To check for consistency, the eBC mass distributions were transformed into number based eBC size distribution and compared to the observed total number size distribution. Whereas the Washout, Nucleation and Intermediate categories presented plausible number distributions, the Polluted category displayed a mode at small sizes (about 50 nm) that was significantly exaggerated.


Introduction
Black carbon (BC) is a notoriously difficult aerosol species to characterize and quantify (e.g. Andreae et al. 2006;Petzold et al. 2013), which is why each property reported about these particles is primarily defined by the measurement technique used. Research surrounding these particles has been conducted with respect to air quality and health related problems since the 1950s (Novakov and Rosen 2013), but BC has also been studied with respect to post nuclear war scenarios of so-called Nuclear winter (e.g. Crutzen and Birks 1982). In this extreme case it is predicted that smoke from extensive fires will block incoming the light from the sun, which will cool the surface of the Earth. However, in the ambient atmosphere the mass faction of BC is typically small compared to other aerosol species such as sulfates, nitrates, sea salt, dust, or non-black organic (Seinfeld and Pandis 2016). Nevertheless, despite its minute relative contribution it has been proposed that BC is only second to CO 2 in contributing to global climate change (Hansen and Nazarenko 2004;Johnson et al. 2019;Jones et al. 2018). The reason for this is the many possible feedback mechanisms that can be activated if BC change the surface albedo of snow and ice. Different from the Nuclear winter scenarios, BC in the present atmosphere is expected to have a net warming effect on the Earth's climate system (IPCC: Chlimate Change 2014). Characterizing the ratio between scattering and absorbing aerosols and its evolution over time from long term observation provide necessary knowledge for constraining the radiative forcing of aerosols in General Circulation Models (GCM).
BC is formed during combustion of carbon fuels and freshly emitted particles are typically chain aggregates of small spheres, called primary spheres. These primary spheres are often in the range 20-50 nm in diameter. In conditions of high BC concentrations, such as for instance in vehicle tailpipes, the aggregates will grow rapidly and reach characteristic sizes of several hundreds of nanometer in diameter (Kittelson 1998). Because these particles are different from perfect spheres, they are often described by their fractal dimension (e.g. Wang et al. 2017), which characterizes the fluffiness of the particle compared to a compact sphere. Traditionally, the strong light absorption by BC has been used to quantify the amount present in exhaust, in the atmosphere, or in snow and ice (Clarke and Noone 2007;Hansen et al. 1984). The measurement principle is to collect particles on a filter substrate that will be stained by BC particles (and other light absorbing particles). The sample spot is compared to a clean reference surface of the filter and the blackness of the sample area, or rate of blackening, is converted to equivalent BC (eBC) (Petzold et al. 2013). Over the latest years, advances in aerosol technology allowed for single particle analysis of aerosols and in particular the introduction of the SP2 (Single Particles Soot Photometer) instrument (Schwarz et al. 2006) brings new insight to the properties of BC. The SP2 uses incandescence to determine the amount of BC in each particle. Since the emitted infrared light is due to the remaining refractive particles, this BC is often referred to as rBC (Petzold et al. 2013). Main advantages from these measurements is that the total BC mass can be distributed as function of particles size and also the state of mixing with non-refractory material can be determined. Additionally, characterizing the ratio between scattering and absorbing aerosols size depending and its evolution over time from long-term observation provide necessary knowledge for constraining the radiative forcing of aerosols in GCMs.
Based on results comparing long-term data from remote locations and numerical transport models, it is apparent that there is a lack in process understanding concerning factors controlling even the magnitude and seasonal variation of the first order metric, the eBC mass concentration in the Arctic (Korhonen et al. 2008). Hence, more observed metrics, which describe BC in several dimensions could be helpful in order to better understand the interactions between BC and the environment. This includes improved BC source characterization and BC interaction with clouds and precipitation.
Currently, size resolve BC information from state-ofthe-art instrumentations do not extend as far back in time and do not have the same spatial coverage as traditional filter based light absorption. In an attempt to alleviate this lack of multi-dimensional data, this study explores the usage of a statistical relationship between observed aerosol size distributions using a DMPS (Differential Mobility Particles Sizer) system and light absorption measured using a custom-built PSAP (Particle Soot Absorption Photometer). In this study, the bulk aerosol absorption measured using the PSAP is distributed over the size distribution based on the correlation between individual size bins and the total absorption signal. This approach is tested in the laboratory and applied on long-term data from the Zeppelin station, Ny-Ålesund Svalbard, in order to explore this derived additional dimension in data.

Rationale
The study is divided into two main parts. The first part of the study is a presentation of a laboratory experiment conducted at Stockholm University, which was aimed at comparing observed size resolved eBC data with the same type of data, but inferred from using correlations between observed size distribution and observed total eBC concentrations. This first part will serve as a proof-of-concept of the statistical approach. In the second part of the study, the statistical approach is applied on long-term data to explore the potential of the added value from combining the data and to test the integrity in the results.
In section three, a brief description is made of the instrumentation used in the laboratory experiment and the monitoring program. In section four, the laboratory experiment and results are presented. In section five, the statistical approach is applied to the long-term data and the results are presented. For analyzing the data we used kmeans clustering on observed size distributions. A brief description of this methodology is included in section five. A flowchart outlining the methodological steps taken to derive a size resolved BC number distribution is shown in Figure S1. The results from the statistical approach is compared to reported Arctic observations using the SP2 instrument.
Statistical relationships between size distributions and BC have been reported before by (Krecl et al. 2017;Olivares et al. 2007), however, it is our opinion that the statistical approach tested in this study adds value to existing long-term data sets. This method of characterizing BC and will be advantageous in studies of the BC life cycle and processes controlling it.

PSAPS and MAAP
To measure eBC at the Zeppelin station and during the laboratory experiment, custom-built PSAP's were used. The PSAP instruments measures the rate of change in light transmission (wavelength of 526 nm) as particles are collected over a filter medium (Krecl et al. 2007). For reference, transmission is also measured over a clean filter area to compensate for any variations in the light source intensity. This is achieved by splitting the light from a single light source into two by using light pipes in combination of two light detectors, one beneath the unloaded area of the filter and one beneath the particle loaded area of the filter. The PSAP's used in this study (one instrument at the Zeppelin station and two during the laboratory experiment) are essentially identical with respect to the sensing part of the instrument. The main difference is that the instrument at the Zeppelin station required a manual change of the filter once the transmission of light reached 50%. The two instruments used in the laboratory study were designed to change filter area automatically once the transmission threshold is reached. One feature of the custom-built PSAP, compared to the commercial versions, is the option to change the sample flow rate as the concentration of light absorbing particles changes in the ambient air. For every hour, the instrument checks if the ambient concentration increases or decreases and the sample flow is adjusted accordingly to maintain a small variation in the signal to noise ratio (Krecl et al. 2007). This allows for a larger dynamical range in the observed concentrations compared to operating with a fixed sample flow rate. This feature is not used if the PSAP is connected to a DMPS system, where the flowrate was 1 L min À1 .
The primary absorption signal was corrected for the enhancement in the signal from the filter medium, loading factor, and influence from embedded light scattering aerosols using Bond et al. (1999). This correction factor depends on the scattering coefficient of the ambient aerosol. Ideally, an independent instrument such as a Nephelometer (Anderson and Ogren 1998) is used. The Zeppelin PSAP data is corrected for scattering at three levels depending on data availability. When the scattering coefficient from integrating Nephelometer (TSI 3563) is available these data are used. If these are not available, a scattering coefficient is calculated based on the observed size distribution and assumed chemical composition. If neither the Nephelometer data is available nor the size distribution data is used, a constant single scattering albedo (x) is assumed. For the Zeppelin data, the correction would use a value of x ¼ 0.925. For the laboratory studies, only a constant x was used and in this case, it was chosen to be 0.85. During the laboratory experiment a MAAP (Multi Angle Absorption Photometer) was also used to compare with the PSAP results. The MAAP instrument is a commercial instrument that collects particles on a filter and determines the rate of change in light transmission (wavelength of 637 nm, M€ uller et al. 2011), which is reported as a mass concentration of eBC (Petzold et al., 2002;Petzold and Sch€ onlinner, 2004). The particular advantage of this instrument is that it internally corrects for the scattering enhancement effects by measuring the scattering from the sample in multiple angles in real time.

DMPS
The DMPS systems used at the Zeppelin station, Str€ om et al. (2003) and Tunved et al. (2013) and the DMPS system used in the laboratory experiment, Salter et al. (2015), share many similarities. The systems are both socalled Vienna type DMA (Differential Mobility Analyzer) with a closed loop system, where the sheath flow is controlled using critical orifices in a returning flow of air. The sample air to sheath air ratio is around 1:5, and the fairly low ratio is a compromise in order to increase the counting efficiency on the expense of the precision of the particle size measurement. Before entering the DMA, the sample flow passes through a Ni63 bipolar charger in order to neutralize the charge distribution on the aerosols. The assumed charge probability distribution is used in the inversion routine to correct for the fraction of single and double charge particles (Wiedensohler 1988). The length of both DMA's is 28 cm and a TSI (Thermal System Inc., USA) model 3010 CPC (Condensation Particle Counter) was used to count particles at the Zeppelin station and a TSI model 3272 was used in the laboratory experiment. To check for the sizing consistency of the laboratory system, two sizes of latex spheres (100 nm and 200 nm diameter) were nebulized in deionized water. The agreement was within 2 percent in size. Particles in the range 20 to 630 nm diameter was used for the Zeppelin data and the range 13 to 406 nm was used for the laboratory study.

Zeppelin data
The data from Zeppelin Observatory are available at the EBAS database (http://ebas.nilu.no/) as hourly averages. We have selected the period between 2002 and 2010 as this represents the start of PSAP measurements and the period is well characterized with respect to number size distribution (NSD) statistics (Tunved et al. 2013). In order to emphasize measurements when the station is out of cloud, only data points when the relative humidity (RH) was below 95% are used. A total of 34432 hourly averages with concurrent PSAP and DMPS data below an RH of 95% are available for the period in question, which corresponds to about 43% data coverage of the entire 9 years.

Feasibility study
The laboratory experiment served three specific purposes. One, to show that the two PSAP instruments used in the experiment operated identical. Two, to show that the two PSAP instruments could readily be compared with an independent instrument, in this case the MAAP. Three, to show that distributing the total PSAP signal over the size distribution based on correlations between individual size bins and the signal from the PSAP has consistent properties to that of direct measurements of individual particle sizes. The data for this study were collected during the winter 2018/2019 at Stockholm University (59.37 N and 18.06 E), approximately 5 km north of the city center of Stockholm and close to a highway.

Instrument intercomparison
The laboratory experiments took place during the winter period 2018/2019 and was divided into two phases. The two experimental setups are illustrated in Fig. 1. Firstly, the two PSAP's and the MAAP was simply connected to the same ambient air inlet and was operating in parallel for approximately one week.
During the second phase, PSAP2 was connected to the aerosol outlet of the DMA of the DMPS system. Figure 2a shows the time series of the observed light absorption coefficient, r abs (m À1 ), by the three instruments during the first phase of the experiment. The primary mass concentration values by the MAAP are normalized by the mass absorption coefficient (MAC MAAP ¼ 6.6 m 2 g À1 ) given by the manufacturer for the operating wavelength of 637 nm order to arrive to an absorption coefficient for the MAAP instrument. The r abs observed by the MAAP is further adjusted to correspond to the same wavelength as the PSAP instruments by using the ratio of the wave lengths 637/526 (Bergstrom et al., 2002). Figures 2b-d, present the corresponding scatter plots between the PSAP's and MAAP.
From the short laboratory test, we can establish that all three instruments agree well and that in particular PSAP1 and PSAP2 behaves almost identically with slopes very close to the 1:1 line. If we use the primary output by the MAAP, which is given as eBC mass concentration, we can derive a MAC value for the PSAP that harmonizes the PSAP and MAAP mass concentration observations. Fig. 3 presents the statistical analysis of this MAC value that center around 9.4 m 2 g À1 . This is within literature values compared to a European survey performed by Zanatta et al. (2016).

Comparing size resolved r abs with statistical correlation derived r abs
The second phase was performed between 11-01-2019 and 25-02-2019. In Fig. 4 the total r abs observed by the PSAP1 is presented as hourly averages (blue line) and the hourly integrated value for PSAP2 (red dots). An interesting feature in the time series is that between 24-01-2019 23:54 to 25-01-2019 08:50 a smoke plume from a nearby fire (about 5 km distance) reached the inlet and resulted in a strong signal.
The DMPS was configured to scan 15 positions in 60 minutes (4 minutes per voltage setting). Due to the parameter settings used, the smallest voltage (first bin) was always 0 volt and the smallest size measured (second bin) was 13 nm. The last bin was near maximum voltage of the infrastructure and the setting did not provide a stable measurement. Therefore, the largest size reported here is 406 nm. Hence, the thirteen size bins used are: 13,17,23,31,41,55,73,97,129,172,229,305, and 406 nm aerodynamic diameter.
Due to the time lag between setting the voltage in the DMA and the detection of particles in the sensors, the initial task was to adjust the time series accordingly. This was readily performed by visually inspecting the time series and harmonizing the oscillation due to the scanning of the DMA voltage with the CPC and PSAP2 signals. In this process, it was apparent that the zero-voltage bin presented a small average and persistent negative signal in r abs , an example is shown in the Fig. S2. The exact reason for this effect is not resolved (see Supplement for more discussion). To reduce this small effect, an average absorption coefficient (1.096 Â 10 À8 m À1 ) for all zerovoltage data was added to the PSAP2 signal. A total of 892 scans (hourly time stamps) were available from the second phase.
For each size bin scanned (D p ), the raw counts DN from the CPC in the DMPS is used by the inversion software to calculate the aerosol number size distribution (dN/dlogD p ). The calculations take into account the geometric dimensions of the DMA, sample and sheath flows, transfer function, single and multiple charge statistics. By multiplying the derived dN/dlogD p with dlogD p we can directly compare the raw DN input with the inverted dN output for each size bin. The ratio dN/DN replaces the complete inversion routine with a single transformation factor as function of particle size. Close inspection of this factor shows that it changes only slightly for a given size over the measuring period. The median, mean and quartiles for this factor are presented in Fig. 5.
Hence, by using the factor presented in Fig. 5, it is possible to directly approximate the ambient dN based on the observed DN behind the DMA without using the inversion routine. To our knowledge there is no specific inversion routine for absorption coefficients measured behind a DMA using ambient aerosols. We therefore simply assume that the particles responsible for the light absorption in the PSAP2 can be scaled using the same transformation factors as presented in Fig. 5. This is based on the assumption that particles in a specific mobility bin can be corrected using the same factor irrespective how the light absorbing material is distributed among these particles (as cores, as individual particles, etc.). Hence, particles with or without light absorbing material are corrected in the same manor, as long as they have the same mobility size. In theory, this would yield dr abs for each size bin and integrating over the sizes would give a similar r abs value as observed by the PSAP1. The integrated PSAP2 data are compared with the PSAP1 signal in Fig. 4. From Fig. 4, it is clear that the variation in the two data sets are very similar, but there is a significant difference in magnitude where the integrated dr abs value is about a factor of 2-3 greater than the total measurement by the PSAP1. The offset has a tendency to be less in the first half of the experiment and increase towards the end.
The reason for this offset is not known. For scaling the observed absorption for each size bin to ambient conditions the transformation factor dN/DN (see Fig. 5) is derived and follows the correction used in the DMA system. However, if the charge probability distribution for a given aerodynamic size is different between light absorbing particles and other particles, then this might introduce  a bias in the results. Studies show that fractal particles may differ by some 30% in charge probability, but certainly less than a factor of two (Lall et al. 2006). The process above essentially calculates an average absorption coefficient per particles as function of size. This single averaged absorbing particle is multiplied by the ambient dN for the same size, and finally integrated over the size range. Additionally, the factor dN/DN takes only doubly charged particles in consideration, it is not accounted for other multiple charged particles and could therefore account for the askew absorption signal of PSAP2 (Cotterell et al. 2020). Even a few large particles could lead to an overestimation of the absorption on the filter. This is underlined by the data collected during and after the fire event (starting 23.01.), during which a lot of small newly formed particles were measured (see Fig. 4). One other possible error can be that size dependent optical properties are not considered. In our sequence, the correction for enhancement in absorption due to light scattering particles was performed on the bulk signal and not specifically for each particle size. Based on theoretical Mie calculations, light absorption is dependent on the size of the BC particle and the state of mixing with light scattering material. Hence, if the fraction of internally vs externally mixed particle change with particles size this might introduce a difference between the integrated r abs based on PSAP2 behind the DMA and the ambient r abs measured by PSAP1. In want of more information, we can only provide the speculations above for the observed offset and additional analysis provided in Supplement.
The overall hypothesis is that the distribution of the total absorption signal can realistically be attributed to different particle sizes according to the correlation between specific DMPS size bins and observed total absorption. In Fig. 6, the normalized average distribution observed by the PSAP2 downstream the DMA is compared with the distribution derived from statistical methods using PSAP1 and the number size distributions. The statistical distribution is generated through four steps. One, the Pearson correlation coefficient (r) is calculated for PSAP1 and each size bin of the DMPS data. Two, negative values are taken to be zero (r ! 0). Three, the correlation coefficient is transformed to a scaling factor that is related to r 2 (Hull 1927) The reason for this factor is that the correlation by itself does not serve well as a prognostic probability variable. Hull (1927) investigated the predictability of the correlation coefficient and he found that the adjustment in Eq. (1) provided a better predictive skill (see also Supplement). Four, the distribution is normalized to one. As can be seen in Fig. 6, the normalized f s distribution (based on the Pearson correlation using PSAP1) and the median size resolved observations (using PSAP2) match well with respect to the main mode, but differ in the mode of the distribution based on the correlations which is less pronounced. Therefore, this correlation distribution suggests a larger contribution of smaller particles. This may result from causality, but also that the absorption signal per particle in each size bin decreases very much with decreasing particles size. Hence, the signal to noise ratio is too small for the PSAP2 behind the DMA for small particles unless there are very many of them.

Statistical approach applied to long-term DMPS and eBC data from Svalbard
The first step in this process was to merge the DMPS data and the PSAP data together with observations of ambient relative humidity (RH). The last variable was used to screen data from high RH conditions. This was done because the aerosol inlet used during the sampling period was not a true whole air inlet. High ambient RH is indicative of, current or recent, cloud processed air that may influence the results. We selected a threshold of 95% hourly averaged RH to screen the data from possible significant interference by clouds. A second step was to exclude the biomass plume from early May in 2006 Fig. 6. Comparison of the truncated and Normalized Pearson correlation scaling factor f s (see text for details) and the normalized median r abs distribution based on DMPS and PSAP2 measurements. The dotted lines are lognormal fitted distributions (Heintzenberg 1994) and the y-axis is normalized to 1. Geometric mean diameter and geometric standard deviation is 0.22 nm and 1.67, and 0.23 and 2.18 for observed distribution and derived distribution from correlations, respectively.  (Stohl et al. 2007). This plume gave such record concentrations that it displayed anomalies in very many observe variables, and thus this period (26 April to 5 May) was removed from the data set. After data reduction, a total 34240 hourly averages are available which represents an additional data reduction by approximately 0.6%.

Clustering of DMPS data
As an alternative to grouping the remaining data according to the months of the year, we instead used the strategy to group the size distributions using the technique of clustering. This has proven an useful tool in sorting the size distribution according to the stage of aerosol life cycle, rather than the date of the year (Tunved and Str€ om 2019). Of course, some clusters are very much linked to particular seasons due to available sunlight or transport patterns and associated meteorological history and source profiles. In this study, we have used a MATLAB version (R2018b, Statistics and Machine Learning Toolbox) of kmeans clustering (kmeans.m) to perform a clustering of the hourly size distributions. The mathematical function will maximize the inter-cluster variability while at the same time minimizing the intra-cluster variability. Although there are tools available to aid in the selection of the number of clusters used, it will in the end be given subjectively by the user. Larger data sets can potentially resolve more clusters that are unique and meaningful and the best mathematical solution to optimal number of clusters is not necessarily the best in terms of provision of useful information. For this data set we initially work with 12 clusters (both fewer and more clusters were explored, but 12 clusters proved the best balance between cluster number-to-cluster information). This concerns not Fig. 7. Clustering of hourly averaged DMPS NSD data from the available data after data reduction. Except for cluster 1, the distributions are organized according to geometric diameter from cluster 2 to cluster 12. Shaded area represents the quartile range for the full available data set. The solid lines represent medians and the error bars show the upper and lower quartiles. only in the actual number size distribution properties, but also when considering associated parameters such as diurnal variability and seasonal variability of occurrence. For a more detailed description the rational for deriving optimal number of clusters c.f. Tunved and Str€ om (2019). The clustering was performed on hourly averaged data, using "max iterations" of 10 000 and "number of replicates" set to 10 in MATLAB. The distance function applied was squared Euclidean distance, assuming that the difference is calculated from the centroids defined as the mean of the points in the clusters, d(x,c)¼(x À c)(x À c) 0 , where x is an observation (i.e. the size distribution vector) and c is the centroid. Using the squared Euclidean method, it emphasizes extreme situations better than other measures of distance. Therefore, sporadic events such as new particle formation could be easily accentuated. For more information on the clustering approach see Supplement.
The 12 different clusters are presented in Fig. 7. Cluster 1 is a common cluster that represents the lowest number densities with a dominating accumulation mode. This is interpreted as an aged and cloud processed air mass. Clusters 2 through 6 are dominated by the smallest particles and are interpreted as the evolution of new particle formation and subsequent growth. Clusters 7 through 9 are dominated by Aitken mode particles, but the integral number density is less than clusters 7 through 9 and the distributions show more developed accumulation mode. The clusters 10 through 12 are dominated by the accumulation mode. In this context, these clusters are interpreted as long range transport of polluted air corresponding to Arctic haze situations. To simplify the analysis and to increase the statistical material in each group, clusters 2 through 7 are grouped to into one category referred to as "Nucleation", clusters 8 through 9 are grouped into one category and referred to as "Intermediate", and clusters 10 through 12 are grouped into one category referred to as "Polluted". Group 1 is kept as it is and referred to as "Washout". These four categories are presented in Fig. 8 (see Table 1 for details).
The seasonal variation of the four categories are presented in Fig. 9. The Washout category is present throughout the year with decrease during the summer and an enhanced peak during the cleanest months of the year in September through November. The Nucleation category is the least frequent of the four categories and is essentially only present during the most sunlit period of the year between May and August. The Intermediate category is also mainly present during the summer, but with a somewhat broader distribution compared to the Nucleation category. Finally, the polluted category is a winter and spring phenomenon which peaks in March and April. A similar breakdown of each group over time of day, revealed strong diurnal preference of Group 2 (Nucleation), with a clear maximum of members found between 12:00 and 18:00 UTC (see Fig. S5). This in turn is suggestive of local processes linked to intensity of solar radiation. The other three groups showed small (Intermediate group) or insignificant (Washout and Polluted group) diurnal preference. We can view the Washout category as something of a base line distribution and the other three as superimposed perturbations. The Nucleation and Intermediate categories are attributed to the seasonal variation of sunlight in the Arctic region, whereas the Polluted category is attributed to the seasonal pattern of long-range transport. Hence, these two main processes serve as complements to Washout category. Actually, the Nucleation category is probably more washed out than the Washout category itself, as removing condensational aerosol surface area by precipitation sets the stage for new particle formation (Tunved et al. 2013).

Statistical distribution of integral absorption signal as function of particle size
For each category presented in Fig. 8, the correlation between the integral PSAP signal and dN/dlogD p is calculated as described in section four and illustrated in the flowchart S1. This procedure results in a size dependent normalized absorption signal for each category. The normalize correlation is further multiplied by the median signal for the entire category in units of m À1 and divided by the MAC value of 9.4 m 2 g À1 derived in Section 4. Hence, correlations are transformed into dM eBC /dlogD p mass distributions, which are presented in Fig. 10 as dM eBC /dlogD p distributions normalized to the maximum value. In the derivation of the normalized dM eBC /dlogD p distributions, the calculation of correlations was both performed for the complete dataset and for bootstrapped dataset. The bootstrapping procedure was performed on 5% subsections of paired NSD and PSAP data, and resulting dM ebc /dlogD p was pooled, giving a range of distribution indicated by the error bars in Fig. 10.
To date, only few observations of size resolved observations from Svalbard are available. Ohata et al. (2019) presented an average distribution over an about a twoweek period in March 2017 based on SP2 observations conducted at the Zeppelin station. The SP2 mass median diameter (MMD) derived from the incandescence signal was 228 nm and the geometric standard deviation (r g ) 1.74. The average integral mass rBC was 28 ng m À3 . Zanatta et al. (2018) presented average parameters for a short campaign conducted at the Zeppelin station in 2012 between 22 March and 11 April. They reported a MMD of 251 nm, a r g of 1.22 and an average rBC of 39 ngm À3 (median 37 ngm À3 ). With respect to the time of the year, both these observations would best fit with the Polluted category (c.f. Fig. 9), but the relatively low integral values would better fit some of the other categories. Raatikainen et al. (2015) reported observations from Pallas (in northern Finland) during the winter 2011-2012. For this season the geometric mean diameter was 199 nm with r g ¼1.7, and an integral value of 27 ngm À3 . Other SP2 observations related to the Arctic was reported by Liu et al. (2015), which was based on aircraft data in March 2013 between northern Norway and Svalbard. They observed a range in MMD 190-210 nm and r g 1.55-1.65 and integral values varied greatly from about 20 to above 100 ngm À3 , which depended on the origin of the air mass. Lower integral values (typically below 10 ngm À3 ) was observed by Taketani et al. (2016) from a cruise near Bering Strait in September 2014. However, the modal parameters were similar to the studies mentioned above and MMD ranged from about 170-190 nm and r g was about 1.8.
Despite different areas, different integral concentrations, different altitudes, and different time of year, the mass distributions observed by the SP2 are rather similar in the different studies with a MMD around 200 nm and a r g around 1.7. This typical SP2 dM/dlogD p distribution is included in Fig. 10, normalized to the peak value of 1, as comparison to the statistically derived dM ebc /dlogD p . In Table 1, statistics on derived eBC mass is given. As can be seen, the calculated mass varies substantially between the clusters, from around median values of 9.6 (3.8-19.4 ngm À3 , quartile range) in the Washout group to around 50 ng m À3 (18.0-110.4 ngm À3 ) for the Polluted cluster group. These ranges are in the same range as the observations reported for the SP2 above.
The characteristic modal parameters from the SP2 studies reported above fit best the Intermediate category. From Fig. 10, it is evident that the majority of derived dM ebc /dlogD p distributions have a shape that is well in agreement with SP2 data. The derived mass distributions for the Washout and Polluted groups are situated on either side of the SP2 range. Washout group peaks at around 220-250 nm and the Polluted group peaks around 180 nm. Normalized distributions of eBC for the Nucleation and Intermediate groups are located below the typical SP2 range (70 nm and 130 nm, respectively). It should be noted that 77% of the data belongs to Washout and Polluted groups, which suggest that the To test the consistency on how the derived eBC mass distributions relate to the DMPS size distributions we convert the dM eBC /dlogD p distributions to dN eBC /dlogD p distributions. For this test we assume that eBC is externally mixed and that the density decreases with increasing particle size. The density function assumes that BC particles consists of agglomerates of primary spheres with a diameter (a) of 20 nm and a density (q prim ) of 2 gcm À3 , and a fractal dimension (D f ) of 2.5. A value of 2.5 is chosen to represent a generally aged aerosol type. Fresh BC particles are very fluffy with D f around 1.8, but with aging these particles become more compact and D f can increase well above 2 (e.g. Colbeck et al. 1990;Khalizov et al. 2013). For instance, at the remote Pico Mountain observatory in the Azores, approximately 70% of the BC particles were found to be highly compact with a D f >2.67 (China et al. 2015). The D f ¼ 2.5 used in this study is an assumed property.
Using a different D f will influence the number density of mainly the accumulation mode particles. To illustrate the effect using different D f , we refer to Fig. S6. The number of primary spheres (N) is related to the fractal dimension by the so-called radius of gyration (Rg) as Fig. 10. Derived dM eBC /dlogD p (MSD) of each major cluster group normalized to 1 (black line). Corresponding error bars indicate the 25 th -75 th percentile range resulting from bootstrapping procedure on 5% subsets of data (in total 20 subsets). Normalized (Dg ¼ 200 nm, r ¼ 1.7) dM eBC /dlogD p is added from a typically logged-normal distribution of SP2 for comparison (typical SP2 distribution, red line).
It has been shown that the electrical mobility diameter measured by the DMPS is represented reasonable by R g for fractal agglomerates (Sorensen 2011). Hence, we simplify the size dependent density by comparing the mass of N primary spheres for agglomerates of size 2 R g with the mass of compact spheres D p . This is simply Eq. (2) divided by (D p /(2a)) 3 . Assuming that 2 R g ¼D p , the size dependent density (q BC (D p )) can take the simple form For comparison, this incidentally gives a similar size dependent effective density as observed for heavy duty engine at idling conditions as observed by Rissler et al. (2013). Using the eBC mass distributions presented in Fig. 10 and Eq. (3), dN eBC /dlogD p distributions can be inferred. These are presented in Fig. 11 with the DMPS observed size distributions (as in Fig. 8) for comparison.
It is important to emphasize that the comparison presented in Fig. 11 is much simplified and that the amplitude of dN eBC /dlogD p is directly proportional to the MAC value used and obviously depends on the assumptions made in Eqs. (2) and (3). The assumption about external mixture is not made to capture the actual conditions at a remote location such as Svalbard, but rather to represent the limiting case of the least number of particles required to equal the eBC mass in each size bin. Ideally, the inferred eBC NSD should be less or, at maximum, equal to the size distribution observed by the DMPS. Fig. 11. Comparison between observed particle size distributions and inferred dN eBC /dlogD p distributions assuming external mixtures and parameters described in the text. The solid red line is based on the median absorption coefficient for each category of clusters and the shaded area is based on the upper and lower quartiles. The DMPS is data is identical to Fig. 8. From Fig. 11 it is clear that the Polluted category do not always satisfy this condition and the eBC NSD can significantly exceed the DMPS distribution for particles smaller than about 100 nm. The other three categories are within bounds, but present very different features. The main eBC mode in the Washout and Nucleation categories are located above and below 100 nm, respectively. Whereas the Washout dN eBC /dlogD p mode is essentially co-located with the main mode based on DMPS observations, the Nucleation category present the eBC mode between the main modes by the DMPS. The overall shape of the Intermediate category eBC NSD resembles the shape of the DMPS NSD with an enhanced peak of small particles around 40 to 50 nm.
The Polluted category is particularly interesting as it displays two eBC NSD modes of comparable amplitude. The larger mode peaks just above 100 nm, which is between the two DMPS modes at 150 nm and 40 nm, whereas the smaller eBC mode is around 50 nm. This smaller eBC mode stands out as it's range significantly exceeds the smaller DMPS mode. Whereas, the three other categories present plausible derived dN eBC /dlogD p distributions, the small mode of the Polluted category is not realistic. Despite this, it is an interesting mode since it is somehow linked to the observed variability in particle light absorption and polluted air masses. Further research, outside the scope of this study, is needed to resolve this feature.

Summary and conclusions
This study was conducted in two parts, one laboratory experiment to compare size resolved measurements of r abs to independent integral observations of r abs , and one statistical application to derive size resolved eBC from long-term observation of r abs and aerosol size distributions in the Arctic. In the laboratory study it was first established that the two PSAP instruments used in the study agreed very well, and by using a scaling factor MAC ¼ 9.4 m 2 g À1 the two PSAP instruments agreed closely to the MAAP measurements both in numerical value and temporal variations when all instruments were operating in parallel.
One of the PSAP instruments was placed downstream of the DMA in parallel to the CPC in the DMPS system, while the second PSAP instrument continued to measure the ambient air directly. By assuming that the r abs signal measured behind the DMA could be inverted with the same factor as the CPC (combined effect from charge probability and transfer function) a direct measurement of the size dependent r abs was calculated. This size dependent incremental light absorption was integrated and compared to the total measurement. Overall, the two measurements correlated well over time, but the integrated value was more than a factor two greater. The exact reason for this is not known and we can only speculate on the cause. The assumption that light absorbing particles can be corrected in the DMPS system as other particles detected by the CPC is maybe not be valid and thus the problem is related to the sampling of light absorbing particles. The other possibility is that the optical response by the particles on the filter is different if the measurements are made for limited size ranges compared to bulk observations. The observed size dependent r abs was normalized to unity and compared to the statistically derived distribution based on about 892 hourly averages. The derived distribution is essentially based on the correlation between individual size bins of the particle size distribution and the total observed r abs . The locations of the modes were similar between 220 and 250 nm diameter, but the width of the distributions differed. The r g was approximately 1.8 for the observed distribution and 2.8 for the derived distribution. Either the observed distribution is narrower because of measurement limitations above and below the mode (at small sizes very little particle material is available downstream the DMA), or the derived distribution is broader because the distribution of correlation coefficients is not accentuated enough by Eq. (1).
Encouraged by the co-location of the observed and derived modes of normalized r abs distributions (Fig. 6), the statistical method was applied to a large data set from the Zeppelin station, Svalbard. The data cover the years 2002-2010 and was screened for RH above 95% to reduce the effect of in-cloud measurements. A total of 34240 hourly averaged data points of concurrent measured size distributions, r abs , and RH were available. The data was initially clustered into 12 groups based on the particle size distributions. These, were further grouped into four categories named; Washout, Nucleation, Intermediate, and Polluted.
Each of the categories represent a unique derived dM eBC /dlogD p with different characteristics, which in shape resembles available SP2 observations. Whereas, categories Washout, Nucleation and Intermediate present plausible derived dN eBC /dlogD p distributions, this is not the case for the small particle eBC mode of the Polluted category. The eBC NSD mode around 50 nm is often over estimated compared to the DMPS NSD, which is an intriguing observation. The mode clearly shows up as a result of the linkage to variations in light absorbing particles, but requires further research. It is important to emphasize that the amplitude of dN eBC /dlogD p is directly proportional to the MAC value used. A greater MAC value will decrease the dN eBC /dlogD p . A systematic over-14 P. TUNVED ET AL.
or under-estimation of r abs will directly affect the result, as will the made assumption of the size dependent density or light absorbing properties. Because this calculation is based on the assumption that eBC is externally mixed, the inferred dN eBC /dlogD p represents the minimum number of particles containing eBC. Based on the investigations above we can make the following conclusions: The comparison between the PSAP and MAAP instruments used in this study shows excellent agreement and the scaling factor MAC between PSAP and MAAP was 9.4 m 2 g À1 .
The location of the mode of the size dependent r abs agree very well (about 10%) between the statistically derived distribution and of that observed using a combination of PSAP and DMA. However, the r g is larger for the statistical distribution compared to the observed distribution, and the amplitude of the integrated observed distribution is 2-3 times larger than the observed total r abs . The latter discrepancy is not resolved and needs further investigations.
The statistical approach applied on long-term Arctic data presents more variability in the derived dM eBC / dlogD p between the different groups than have been reported for rBC observed using the SP2 instrument. On the other hand, both the Washout and Polluted groups (76% of the data) are associated with dM eBC /dlogD p distributions that agrees very well with typical SP2 distributions.
The category labelled Polluted is particularly interesting because the derived eBC NSD does not agree well for small particles around 50 nm when compared to the DMPS NSD. This indicates that for clusters belonging to the Polluted category at least some cases show correlation with particles around 50 nm that are not necessarily light absorbing eBC particles. More in-depth analysis of this category is of particular interest. It is however concluded, that the shape of derived eBC mass distribution still agrees very well with observed SP2 data.
This study demonstrated the feasibility in using the statistical relation between observed size distribution and the light absorption to gain insight to particle size dependent properties, where such direct observations are not available. Especially useful for analysis of historical data. Here, we used clustered distribution merged into four categories, but many other ways of grouping the data is possible e.g. by season, by optical properties, or by linking with trajectories and transport patterns.