Search Engine Use for Health-Related Purposes: Behavioral Data on Online Health Information-Seeking in Germany

ABSTRACT Internet searches for health-related purposes are common, with search engines like Google being the most popular starting point. However, results on the popularity of health information-seeking behaviors are based on self-report data, often criticized for suffering from incomplete recall, overreporting, and low reliability. Therefore, the current study builds on user-centric tracking of Internet use to reveal how individuals actually behave online. We conducted a secondary analysis of passively recorded Internet use logs to examine the prevalence of health-related search engine use, the types of health information searched for, and the sources visited after the searches. The analysis revealed two key findings. 1) We largely support earlier survey-based findings on the prevalence of online health information seeking with search engines and the relatively minor differences in information-seeking behaviors between socio-demographic groups. 2) We provide a more granular picture of the process of HISB using search engines by identifying different selection patterns depending on the scope of the searches.


Introduction
Searching the Internet for health information is a common practice.For example, more than 72% of Germans searched online for health-related purposes in 2019, the year of our study (Link & Baumann, 2020).Since then, the share of online health information seekers has been steadily increasing; however, the COVID-19 pandemic has not led to the expected boom in digitalization in Germany (Link, Czerwinski, et al., 2022).If we look at how most individuals get health-related online information, search engines are the primary access point (Brkic et al., 2021;Hassan & Masoud, 2021;Macias et al., 2018;Ofran et al., 2012) and an essential asset in locating health information (Cartright et al., 2011).By offering a rapid, easily accessible, low-cost opportunity to get individually tailored information on health conditions and preventive measures (Choudhury et al., 2014;Hassan & Masoud, 2021;Macias et al., 2018), search engines aid users in making informed health decisions (Ayers & Kronenfeld, 2007), improving their self-care skills, and coping with uncertainties (Lambert & Loiselle, 2007).
Understanding how individuals acquire online health information is a scientifically and practically relevant endeavor for various reasons: First, insights into online health informationseeking behaviors (HISB) are foundational to assessing the extent to which the Internet fulfills its ascribed functions as a means for health-related coping, attitude formation, and decision-making (Choudhury et al., 2014;Powell et al., 2011;Quinn et al., 2017) and in reducing gaps in health knowledge (Bach & Wenz, 2020).Second, search queries are patientgenerated information on public attention (Phillips et al., 2018;Qin & Peng, 2016).Since they mirror health-related needs, interests, and concerns (Agarwal et al., 2016;Choudhury et al., 2014), they can be used to tailor health communication efforts (Ofran et al., 2012).Third, in the tradition of infodemiology, search engine queries yield population health indicators (Eysenbach, 2009;Hochberg et al., 2020;Lee et al., 2023) and are used for discovering and addressing adverse events and behaviors (Agarwal et al., 2016).For example, influenza and dengue fever have been tracked by the frequency of search queries for their symptoms (e.g., Chan et al., 2011;Ginsberg et al., 2009).
While an accurate description of health-related search engine use and its users is necessary for such endeavors, the process from information needs to search behavior and source selection is seldom examined in more depth.Although information needs, types of health information sought, and sources selected are vital components of HISB (Zare-Farashbandi & Lalazaryan, 2014), they are seldom examined in a theory-driven manner.The widely used information-seeking models, such as the Comprehensive Model of Information Seeking (Johnson & Meischke, 1993) or the Risk Information Seeking and Processing Model and its iterations (Griffin et al., 1999;Kahlor, 2010;Yang et al., 2014), focus only on the antecedents of seeking intentions or behaviors instead of the process of HISB (see also Ou & Ho, 2022;X. Wang et al., 2021).
In addition, most empirical research on online HISB relies on survey self-reports.False recall and self-presentation can result in unreliable and inaccurate self-reports of media and Internet use (Prior, 2009;Scharkow, 2016Scharkow, , 2019)).Direct and unobtrusive observation of online HISB promises to overcome such limitations.By collecting every query searched for and every link followed, user-centric tracking of Internet use reveals how individuals behave online (Christner et al., 2022;Freelon, 2014;Wieland et al., 2018), promising more extensive and fine-grained information on individuals' online HISBs than survey data.In contrast to fields like political communication (e.g., Dvir-Gvirsman et al., 2016;Stier et al., 2020), health communication still needs to integrate this new data source.The few studies that investigated online HISB with user-centric tracking data were limited by small numbers of participants (e.g., Quinn et al., 2017), narrow measurements of HISB (Bach & Wenz, 2020), or a narrow topical focus (e.g., on vaccine-related information, Guess et al., 2020).Aggregate Google Trend data were more commonly used, for example, for public health monitoring and to investigate search engine use for specific topics such as breast cancer, pregnancy and childbirth, influenza, COVID-19, or vaccines (e.g., Asseo et al., 2020;Brkic et al., 2021;Cervellin et al., 2017;Ginsberg et al., 2009;Jun et al., 2018;see Mavragani & Ochoa, 2019, for an overview).However, these studies have limited informative value regarding individual online HISBs of typical health information seekers because the aggregate trends hide individuals and are driven mainly by extreme users.
Against this background, we aim to deepen the understanding of who searches for health information online and shed light on what individuals actually do.Using user-centric tracking data over three months in 2019 from 994 German Internet users, we study individuals' health-related search engine use, how common sear ches are, what individuals search for, and which sources they select.We examine the prevalence and frequency of individuals' health-related searches, the types of health information sought, and the sources visited after the searches.
According to survey research, search engines are the most common tool for finding and navigating health-related information online (Brkic et al., 2021;Hassan & Masoud, 2021;Jadhav et al., 2014;Macias et al., 2018).As such, their use requires scholarly attention in its own right.Moreover, examining search engine use provides a dynamic and holistic view of online HISB.The data identifies how information needs are transformed into actual seeking behaviors, covering the magnitude of sources individuals can choose from (Chi et al., 2020;Pang et al., 2014).Bach and Wenz (2020) investigated online HISB with user-centric tracking data of Internet use over four months.They found that only 46.1% of German search engine users searched for health-related information at least once.However, their identification of healthrelated searches relied on a restrictive operationalization, primarily including medical terms related to illness and diseases, which might explain the relatively low prevalence estimate compared to survey results.Our study, in contrast, builds on the WHO's (1946) integrative understanding of health as "a state of complete physical, mental, and social well-being and not merely the absence of disease and infirmity" when detecting health-related searches.Given this definition, we ask the following research question: RQ1: How common is search engine use for health-related information seeking?
The prevalence of health-related search engine use is one important matter.How the phenomenon is distributed across different strata is another.Although primarily based on surveys, previous research has identified relatively nuanced yet consistent differences.Women, younger individuals, and individuals with higher socio-economic status were more likely to engage in online HISB (Alvarez-Galvez et al., 2020;Bach & Wenz, 2020;Bachl, 2016;Baumann et al., 2017;Jung, 2015;Link, Baumann, Linn, et al., 2021;Link, Baumann, et al., 2022;X. Wang et al., 2021;Zimmerman & Shaw, 2020).In this light, we derived the following hypotheses: H1a-c: Search engine use for health-related information seeking is more frequent (a) among women than men, (b) individuals of younger age compared to higher age, and (c) individuals possessing higher levels of education compared to lower levels of education.

The process of online health information seeking
HISB is a multi-stage process.Based on the model for the selection of online health information (Sbaffi & Zhao, 2020;Zhang, 2013;Zhang et al., 2017), the process can be separated into four components: establishing an information need; identifying and accessing information sources; examining and evaluating information; and interpreting information.The current study will focus on information needs and source selection, considering that different information needs lead to differences in which types of information are sought and which kinds of sources are selected (Galarce et al., 2011;Zhang, 2013).
Information needs arise when individuals realize that their existing knowledge is inadequate to satisfy their goals (Case, 2002) or they feel uncertain as a situation is perceived as ambiguous, complex, and unpredictable (Brashers, 2001).Online HISB based on problem-or interest-driven information needs often starts with visiting a search engine and translating the information need into a search query (Jadhav et al., 2014).The information needs that lead to using a search engine can be related to a wide variety of questions -from information on diseases, symptoms, medical procedures, and treatments to questions regarding public health, health systems, unhealthy and healthy lifestyles, prevention, and wellbeing (Cao et al., 2016;Goldner, 2006).Survey research indicates that the Internet is used for HISB regarding various issues (Link, Baumann, Linn, et al., 2021;Powell et al., 2011;Zschorlich et al., 2015).For example, Link and colleagues (2021) found that the most frequently searched-for issues were as diverse as symptoms and causes of diseases on the one hand and questions about a healthy lifestyle, fitness, and wellness on the other hand.
Few studies specifically examined health-related queries to search engines beyond specific diseases.Cartright et al. (2011) found that queries referring to symptoms of diseases were the most common.In contrast, Palotti et al. (2016) indicated that search queries foremost revolved around diseases rather than symptoms.However, other research found symptoms, causes, treatments, and drugs to be the most common categories of search queries (Jadhav et al., 2014).Again, the sparse literature using behavioral data is limited by its narrow focus on diseases.We aim to examine the types of information searched for against a broader understanding of health and concerning differences based on the socio-demographic characteristics of the information seekers: RQ2a: How common is it to seek different types of healthrelated information with search engines?RQ2b: How does search engine use for seeking different types of health-related information differ by gender, age, and education?
After submitting a query to a search engine, the next step is to select and visit one (or more) search results from various sources to satisfy the information needs (Jadhav et al., 2014).Although source selection is a defining component of HISB (e.g., Johnson & Meischke, 1993), sourceselection behaviors are seldom deconstructed by identifying which need is followed by which source-visiting behaviors (Chi et al., 2020).The current study covers all online sources selected by the participants to provide a holistic view of online HISB (Chi et al., 2020).Online sources for health-related information are highly diverse and can be classified in different ways (Fox & Duggan, 2013;Link, Baumann, Linn, et al., 2021;Zhang et al., 2017).One differentiation is between sources of health content and sources of health community (Gitlow, 2000;Rossmann & Karnowski, 2014).Sources of health content vary in domain focus (i.e., specialized websites for health information or general sources that cover health information, among other kinds).They are provided by different actors and entities, such as media publishers, NGOs, governmental authorities, insurance companies, and health professionals.Sources of health community include online social networks, health communities, and question-andanswer services.
Survey research into online HISB indicates that, in general, health websites, online encyclopedias, online pharmacies, and websites of health professionals are the most popular types of sources (Link, Baumann, Linn, et al., 2021; see also Beck et al., 2014;Ratcliff et al., 2021;L. Wang et al., 2012).Bach & Wenz, (2020) give the first insights into source selection using behavioral data.Most respondents used websites covering health and fitness, exercise and weight loss, alternative medicine, psychology, vitamins and food supplements, and women's health.However, these results must be considered with the reservation that they rested on automatic domain classifications by the commercial provider Webshrinker.Above and beyond restrictions in research transparency due to the company's proprietary methodology and the risk of classifications incompatible with scholarly definitions, the domain-level approach has the inherent drawback that we may miss out on essential parts of healthrelated Internet content.For example, a visit to https://en.wiki pedia.org/wiki/headachewould be classified as education despite the visit's apparent health-related nature.We aim to provide a more comprehensive picture of the sources accessed after health-related searches.In addition, we aim to identify the determinants of source selection considering the characteristics of the individuals and their search queries: RQ3a: Which kind of sources are accessed after using a search engine?RQ3b: How are individual characteristics (gender, age, education) and types of information sought related to the kind of sources accessed after a search?

Methods
We conducted a secondary analysis of the German sample (n = 994) of a comparative study on news use conducted in 2019 (Stier et al., 2022).The data integrate participants' passively collected individual web-browsing histories with the same participants' survey responses.They consist of a total of 16 million website visits collected over three months.The secondary analysis's main empirical contribution was identifying, classifying, and entangling health-related searches and website visits to investigate online HISB comprehensively.All steps are described in the subsequent sections.Additional details, including R code, replication data, and materials from the content analyses, are available from the OSF. 1

Participant sample and tracking of internet use
The data was collected by the survey company Netquest (an affiliate of GfK) in compliance with EU General Data Protection Regulation.Panelists in the Netquest online access panel are incentivized to participate in online surveys and install tracking software that hooks into their browsers and logs their website visits, including the complete Uniform Resource Locator (URL), date, time, and duration.The Oxford Internet Institute's Departmental Research Ethics Committee at the University of Oxford approved the data collection.Participants were informed about the nature of the data collection and asked for their consent to participate in surveys and Web tracking.
Data from 994 participants for whom tracking data had been collected on at least four days were used to answer RQ1, H1 and RQ2a/b.Participants selected themselves into the online access panel and subsequently into the web tracking.Despite the non-probabilistic recruitment, the sample is diverse in terms of age, gender, and education.The average age was 47 (SD = 14; 18 to 35 years: 23%, 36 to 55: 47%, 56 to 84: 30%).48% of the participants identified as male, 52% as female. 2We distinguished between lower (28%; up to Mittlere Reife, ≈ GED), middle (38%, Mittlere Reife and vocational training or limited university entrance certificate), and higher (34%, full university entrance certificate or university degree) education.The sample distributions reasonably matched the German population in 2019: 49% were male, and 51% were female.The average age of the German population was 44,5 years.32,8% have a low, 30% a middle, and 33,5% a high level of education (destatis.de).The analyses of source visits (RQ3a/b) are based only on data from 618 participants who searched at least once for health-related information and for whose searches at least one relevant follow-up visit could be identified.The subsample does not differ meaningfully in terms of age (M = 47 years, SD = 14; 18 to 35: 24%, 36 to 55: 47%, 56 to 84: 29%), education (lower: 27%, middle: 39%, higher: 35%), and gender (54% women, 46% men).

Identification of health-related internet searches
Column A in Figure 1 summarizes the multi-step process used for identifying health-related search queries from the collected URLs.The aim was to identify HISB using search engines that referred to health as understood by the WHO's definition.We started by identifying all requests to the eight most popular search engines during the study period in Germany.The URL of a search result contains the search query, which was extracted and decoded.We constructed an extensive dictionary consisting of eleven health-related categories and one category for negative terms (i.e., terms often used with a health-related term but indicating that the search query was unrelated to human health).The dictionary was based on three sources: health-related glossaries, 3 domain knowledge of the authors, and exploring the current data set.The dictionary was optimized for recall; that is, we tried to identify as many health-related search queries as possible.In the final step, two authors and a trained student assistant manually selected the truly health-related search queries.Intercoder reliability was sufficient in a test before the selection task (Krippendorff's α = .82,n = 553 search queries).The procedure yielded 15,532 health-related search queries, which were matched with the participant data to derive measures of an individual's online HISB.As primary outcomes, dichotomous variables indicated whether an individual had searched for health-related information at all, both in general and in each of the 11 categories during the three-month study period.In addition, the frequency of search engine use was captured by the number of days an individual has searched for health-related information, again both in general and in each of the categories.The reported findings in the main text focus on the primary outcomes.The results for the frequency measure are provided as supplementing materials.Table A1 in the Online Appendix shows the sample statistics for the individual-level measures.

Identification and categorization of subsequently visited sources
The identified health-related searches were the starting point for studying source selection (Figure 1, Column B).First, the subsequently visited URL was extracted and paired with the search query for each health-related search.As this simple rule resulted in many visits substantially unrelated to the previous query, we further scrutinized the search visit pairs with three checks: 1) We matched the search terms with the URL of the visit to probe whether at least one of the search terms with a length of at least four characters was contained in the URL.If so, we concluded that the visit was related to the search.2) Two authors coded the remaining domains as being likely not a result of a health-related search (e.g., banking, e-mail), likely the result of a health-related search (popular websites focusing on health issues or with a health-related label), or as impossible to judge solely based on the domain.The first group of pairs was removed, the second group was included in the source categorization, and the third group was subjected to a final check.3) Two authors and three trained student assistants manually checked all remaining search visit pairs.Intercoder reliability was sufficient in a test before the selection task (Krippendorff's α = .78,n = 300 search visit pairs).
In the final step, all sources associated with a health-related search were sorted into one of 15 categories.The categorization was adapted from survey measurements (Link, Baumann, Linn, et al., 2021) and designed to reflect the diversity of online sources while reducing complexity to enable quantitative description.It captured different facets of the actors responsible for the sources and the importance of health-related content for the source (see Table 1).Again, intercoder reliability was sufficient (Krippendorff's α = .66,n = 161 sources).

Statistical analysis
We used Bayesian estimation with (mildly) regularizing priors for all statistical analyses.This approach has two main advantages over frequentist estimation for our analysis.First, the results of both approaches are generally similar if there is sufficient information in the data, but the Bayesian estimator is more robust to the influence of smaller irregularities (McElreath, 2020).Second, the approach enables the reliable estimation of complex models with locally sparse data even when Maximum Likelihood estimation fails, as is the case for our last analysis (RQ3b).
The results for RQ1, H1, and RQ2a/b are from (generalized) linear models with mildly regularizing priors.The outcomes were regressed on age categories, education categories, and gender.All interactions between the regressors were included in the models.The reported estimates are average adjusted predictions from these models, i.e., predictions were averaged over the sample distribution of the other regressors.The estimates are adjusted for differences in the distributions of the other variables within the comparison groups.The results for RQ3a are based on generalized linear models estimating the proportions of health information seekers using each source.The results for RQ3b are based on regularized generalized linear mixed-effects models.The outcomes, dichotomous indicators of whether a visited source was in a given category, were regressed on dichotomous indicators of the search query categories and characteristics of the individual participants (i.e., age, gender, and education groups).We report log(odds) coefficients to quantify the partial The first number is the overall occurrence in the sample; the second number in parentheses is the share of online health seekers in the sample who visited a type of source at least once.
association of a search or individual characteristics with the probability of selecting a source from a category.All inferential quantities reported in the results section are posterior means with 95% posterior intervals.The Online Appendix reports all data analysis software.

Health-related searches in general (RQ1, H1)
The first research question addresses the prevalence of health-related search engine use (RQ1), whereas H1 postulates that online HISBs differ by gender, age, and education (H1a-c).Overall, we estimate that about two-thirds of the population (P = .68,95% CI: [.65, .71])have searched for health-related information during the study period at least once.On average, these individuals searched for healthrelated information on 6.3 (95% CI: [5.7, 6.9]) out of the 94 recorded days (RQ1a).The differences between sociodemographic groups were mostly nuanced (H1).The share of online health seekers was relatively lowest in the oldest age group (56 to 84 years) and in the group with lower education.The differences in search frequency were similarly minor.Men searched somewhat less often than women.Among the age groups, the 36-to 55-year-olds were most active, and the oldest age group was least active (see Online Appendix, first panel of Figure A2).Therefore, the findings support that search engine use for health-related information seeking was more pronounced in females (H1a), individuals of younger age (H1b), and higher education (H1c).

Health-related searches by types of information sought (RQ2a/b)
The second research question addresses the various types of health-related information searched via search engines overall (RQ2a) and for the socio-demographic groups (RQ2b).Half the population (P = .50,95% CI: [.47, .53])searched for health professionals, medical specialties, or health organizations.About one-third of the population used search terms related to diseases, pathogens, or symptoms (P = .38,95% CI: [.35, .41])and the anatomy of the body (P = .32,95% CI: [.30, .35]).Specific medical topics, such as vaccination, contraception, or diagnostics, were searched for by relatively few people.Searches beyond the issues of illness and medical needs were also identified, as about one-fifth of the population used search engines to find information on nutrition, exercise, or self-care (P = .22,95% CI: [.20, .25]).The frequency of searching for the different types of information showed a similar rank order (see Online Appendix, Figure A1).
To answer RQ2b, Figure 2 presents the prevalence of health-related searches by type of information sought and individual characteristics.The fine-grained breakdown makes some meaningful differences visible.Men were less likely to search for all specific categories, with the notable exception of recreational drugs, where their share (P = .16,95% CI: [.13, .20])was double that of women (P = .08,95% CI: [.06, .11]).Differences between the educational tiers were overall minor and of inconsistent direction.Individuals with lower education were less likely to search for information about nutrition, exercise, and self-care, as well as drugs and pharmaceutical ingredients, and more likely to search for diagnostics.Larger proportions of those with a medium and high level of education were interested in nutrition, exercise, and selfcare, as well as in disease, pathogens, and symptoms.Searches for health professionals, medical specialties, and health organizations, as well as drugs and pharmaceutical ingredients, were most common among those with intermediate levels of education.Individuals with higher education were more likely to search for information about the anatomy of the body, therapies, and vaccines.
We also identified some age-related patterns.The youngest age group, 18-to-35-year-olds, were less likely to search for health professionals and more likely to use search terms related to conception and contraception as well as nutrition, exercise, and self-care.The overall lower likelihood for health-related searches in the oldest age group, 56-to-84year-olds, was particularly pronounced in the categories of diseases, pathogens, and symptoms, as well as therapies.They were also less interested in nutrition, exercise, and self-care, as well as conception and contraception.In contrast, they were more likely to search for information about diagnostics.The respondents aged between 36 to 55 yearscharacterized by the overall highest likelihood for healthrelated searches -were most likely to search for information about disease, pathogens, and symptoms, the anatomy of the body, therapies, as well as health professionals, medical specialties, and health organizations.

Source selection after health-related searches (RQ3a/b)
The third research question asked about source selection after submitting a query to a search engine.Figure 4 shows the shares of online health information seekers (i.e., individuals with at least one health-related search) who selected a source type at least once after a search (RQ3a).
Specialized health websites were the most popular sources (P = .58,95% CI: [.54, .62]).Websites of medical doctors, hospitals, and nursing services were also commonly selected.About a quarter to a fifth of health information seekers selected websites from other health-related actors, such as pharmacies, health-related NGOs, health insurance companies, and other health professionals (e.g., physiotherapists, health coaches), and used health-specific directories.Online shops and product presentations were the most common nothealth-specific source categories.Content from special-interest, non-health media outlets and portals as well as from general-interest media outlets and portals, was also regularly selected from the search results.Online communities, including general-interest and topic-specific community platforms, were chosen only by about 17%.Finally, it is noteworthy that while relatively few individuals selected sources from the category general-interest encyclopedias compared to other categories, wikipedia.orgwas the second-most-popular single domain.RQ3b asks how individual characteristics and the types of information sought were related to the kind of sources accessed after a search.Because of the large scope of the models, the presentation focuses on the seven most common types of information sought (Figure 5) and the individual characteristics (Figure 6).The Online Appendix (Figure A3) provides a complete overview of the results.
Focusing on the relationship between the types of information sought and the selected sources, searches for health professionals, medical specialties, and health organizations were most likely followed by visits to the websites of actors of the healthcare system (e.g., health professionals), health-specific directories (e.g., jameda.de)and general directories (e.g., Google Maps).A similarly distinctive pattern emerged for searches for drugs and pharmaceutical ingredients, most likely leading to visits to (online) pharmacies or specialized health information websites (see Figure 5).
Searches for diseases, pathogens, or symptoms were more likely followed by the selection of expert sources such as specialized health information websites and websites of health-related NGOs, as well as general-interest media outlets and Wikipedia (encyclopedias).
Searches related to therapies led mainly to visiting those who provided the therapies: medical doctors, hospitals, and nursing services; other health professionals; and information provided about these institutions and professionals in general directories.Notably, online communities were also among the more likely follow-up visits.
Queries about nutrition, exercise, and self-care were more likely followed by visiting online shops, product information websites, and special-interest, not health-related media outlets.Information from other health professionals (e.g., fitness coaches or nutritionists) was also more frequently selected from the search results.
Searching for recreational drugs primarily led to visits to shops and product information.Search terms that referred to the anatomy of the human body were not strongly related to the selection of sources in specific categories but led to a somewhat more likely selection of general directories, health insurance companies, and general media and portals.
When accounting for the search interests, the sociodemographic characteristics of age, gender, and education had relatively weak relationships with individuals' source selection (see Figure 6).Older individuals were less likely to visit general   The results are grouped by the type of sought information.The coefficients of the issues of vaccination, diagnostics, contraception, and other terms were omitted from the figure.The coefficients of the individual characteristics are depicted in Figure 6.The complete models are reported in the Online Appendix, Figure A3.Note that the source type is a categorical variable, i.e., each source visit was sorted into one category.Consequently, the coefficients tend to balance each other in their absolute amounts.If, for example, searches for recreational drugs made visits to online shops and product presentations more likely, they also made visits to sources in other categories less likely, per the logic of the categorization.We, therefore, focus our description on the positive coefficients (i.e., which type of sources are more likely to be selected) and take the meaning of negative coefficients (i.e., which type of sources are less likely to be selected) implicitly as granted.directories and online communities than younger individuals.Younger individuals were less likely to use health-specific directories and websites of pharmacies but more likely to turn to encyclopedias than both older age groups.Lower education was associated with a higher probability of selecting general directories, health-specific directories, and general media and portals and a lower probability of selecting encyclopedias.Individuals with a higher level of education are further characterized by a higher probability of turning to other health professionals than individuals with a low level of education.Men, compared to women, were more likely to use encyclopedias but less likely to turn to other health professionals and general directories.

Discussion
We analyzed user-centric Internet tracking data to provide an accurate and detailed characterization of health-related search engine use and its users.Our study revealed two key findings.First, the analysis of passively measured behavioral data showed similar patterns as earlier studies based on survey self-reports, instilling confidence in the prior literature on online HISB.Second, the new data source enabled a more granular picture of the process of HISB with search engines, opening new opportunities for future theorizing.

Towards a more robust picture of health-related search engines use
In contrast to research on general Internet use (Scharkow, 2016) or political news use (Prior, 2009) which identified discrepancies between self-reports and tracking data, the patterns found in the passive observations of health-related search engine use showed pronounced commonalities with the patterns identified in previous studies using survey self-reports to examine online HISB.This strengthened confidence in the top-level findings from earlier research, which is crucial for the entire field of health communication using survey data to describe and understand online HISB.From a methodological perspective, the higher concordance between self-reports and behavioral data aligns with evidence that more specific forms of media use suffer less from reporting errors than more general media use patterns (Scharkow, 2016).In addition, the emotional importance of health-related issues (Brashers, 2001) might also improve recall.Two similarities between our findings and research based on survey data are particularly noteworthy.
First, the share of online health information seekers, the frequency of online HISB, and the most common sources resembled current findings of survey data (e.g., Finney Rutten et al., 2019;Link, Baumann, Linn, et al., 2021;Ratcliff et al., 2021).Over two in three Internet users searched for health information and services using general-purpose search engines, navigating them to websites for specialized health information and websites of health professionals and other health actors (Beck et al., 2014;Link, Baumann, Linn, et al., 2021;Ratcliff et al., 2021;L. Wang et al., 2012).However, we found a significantly higher share of online seekers than in an earlier tracking study from Germany (Bach & Wenz, 2020).
The differences highlighted the need to consider health information's various facets beyond illness and disease.Based on our findings, search engines can thus be regarded as an established tool for HISB (Brkic et al., 2021;Hassan & Masoud, 2021;Jadhav et al., 2014;Macias et al., 2018).Meanwhile, its usage frequency indicates that it is not an everyday behavior but triggered by interest-or problem-oriented needs (Galarce et al., 2011;Zhang, 2013).
Second, there were only minor socio-demographic differences in the prevalence of search engine use (H1a-c), the different types of health-related information (RQ2b), and source selection (RQ3b).In line with prior research, we found that women and individuals with higher education are more likely to engage in online HISB for various interest-driven as well as problem-driven purposes (Bach & Wenz, 2020;Baumann et al., 2017;Link, Baumann, Linn, et al., 2021;Link, Baumann, et al., 2022;Zimmerman & Shaw, 2020).With age in focus, we can further differentiate extant research.Instead of pointing to a higher prevalence of online HISB at younger ages (Link, Baumann, Linn, et al., 2021), our study showed that search engine use for health purposes was most prevalent among individuals aged between 36 to 55.This is an age group in which health complaints tend to increase.Problem-oriented searches, e.g., for consultations and therapies, add to the interest-oriented searches already common at a younger age, like searching for nutrition, exercise, and self-care.In addition, with a view to media socialization, it can be assumed that, on average, Internet use is more routine in this age group than among those older than 56.Media socialization might also explain the minor differences in source selection, particularly a lower openness to and experience with online communities and encyclopedias shown in the older age group.Older individuals also seemed less interested in using Internet searches for self-diagnosis and self-treatment, as indicated by their lower likelihood of searches about disease, pathogens, symptoms, and therapies.
In sum, the relatively nuanced differences related to participants' age, gender, and education strengthen the evidence about the limited influence of socio-demographic factors on online HISB, suggesting that socio-psychological and situational factors are more crucial than individual characteristics (Kahlor, 2007(Kahlor, , 2010)).In addition, the minor differences speak against worries about digital divides regarding the use of health information within the population of (routine) Internet users.However, as more and more people get accustomed to searching for health information online, the divide between individuals who have and who have not integrated the Internet into their lives must be feared to grow even wider (Bachl, 2016;Viswanath & Kreuter, 2007).

Towards a granular picture of HISB processes with search engines
Above and beyond supporting the top-level findings from survey-based research, the integration of user-centric Internet tracking data offered novel insights into the process of online HISB.Our findings are more granular than ever possible with survey-based methods because they are built on categorizing all original search queries and source visits instead of only asking about a few categories (Pang et al., 2014).Moreover, the findings are process-oriented, in contrast to the static snapshots of survey research.We were able to link search queries and subsequently visited sources within search processes and combine them with person-level characteristics.Overall, we got a clearer picture of the actual HISB, which in turn allows us to make inferences about likely information needs and how they were fulfilled using online health information (Chi et al., 2020).
The fine-grained recordings of the search queries allowed more precise descriptions of the health-related functions of search engines, which enriches the prior literature on healthrelated information needs, sought health issues and their fulfillment using online HISB (e.g., Cao et al., 2016;Link, Baumann, Linn, et al., 2021;Palotti et al., 2016).The most common category included search queries about health professionals, medical specialties, and health organizations.The finding indicates search engines' essential role in navigating the modern healthcare system because they support action planning and health provision.The common inquiries about diseases, pathogens, and symptoms suggest that people also turn to search engines to increase their understanding of health concerns.On a more general level, the findings suggest that health-related searches are triggered by both information (e.g., understanding disease, pathogens, or symptoms) and instrumental needs (e.g., navigating the healthcare system).In addition, more searches were problem-rather than interest-driven.They referred to specific health-related burdens or uncertainties, including searches for health professionals and search queries related to disease, pathogens, or symptoms.Interest-driven HISB, such as queries related to information about nutrition, exercise, or self-care (Brashers, 2001;van der Rijt, 2000), was less common.
This distinction between patterns of HISB related to various needs and sought issues should be considered when modeling the process of online HISB.Regarding the process from information needs to search queries and source selection behaviors, our findings revealed two general patterns.First, some searchselection processes showed a close correspondence between the type of information searched for and the selected sources.The pattern is characterized by a narrow scope and selection from a rather small set of information sources.This pattern was most notable when the queries looked for instrumental support or very specific informational support, for example, queries on health professionals, drugs and pharmaceutical ingredients, and recreational drugs.Particular sources, such as health professionals' and pharmacies' websites or online shops, offer these types of support.For example, particular needs like recommendations for a nearby dentist, searches for drugs, or the range of therapies offered by a physiotherapist indicate specific instrumental and information needs, which online information and services can quickly fulfill.
Second, medical and disease-related informational support needs, as inferred from the search queries, led to selecting a broader, more diverse set of sources.For example, individuals turned most likely to general media, health insurance companies, and general directories when their searches referred to the anatomy of the human body.Searches for diseases, pathogens, and symptoms were associated with a higher probability of selecting specialized health information, health-related NGO websites, general media, and encyclopedias.In general, expert sources were among the most likely selected, suggesting a preference for high-quality information.The attempt to evaluate information quality is reflected in the preference for sources of health information over sources of health community (Gitlow, 2000;Rossmann & Karnowski, 2014).
In summary, two selection patterns emerged.The first pattern is characterized by a narrow scope and close correspondence of searches and sources.Search engines are used as instrumental support tools for navigating the healthcare system.The second pattern is characterized by a broader scope and greater source type variety, corresponding to various medical and disease-related information needs.Of course, these prototypical descriptions are only a first attempt at understanding the individual and situational processes of online HISB in detail.They call for developing theories that focus on the process of online HISB that explain how individual differences and situational needs impact search processes and their outcomes.Such theories must build on established health communication models explaining information-seeking intentions (Griffin et al., 1999;Kahlor, 2010).They must be supplemented with a dynamic perspective on the mutual influences of seeking behaviors and fulfilled and newly emerging needs during a search process and its healthrelated outcomes.Generic models from the information science literature (e.g., Kuhlthau, 1991;Zhang, 2013) can provide some guidance, but they need reflection from a dedicated health communication perspective.

Limitations and tasks for further research
The study has several limitations that provide starting points for further research.First, we must assume that participants in an opt-in web tracking panel are usually experienced Internet users who might be more inclined to health-related online behaviors than the average user.Second, several blind spots of tracking data, such as omissions and measurement errors when respondents turn off the tracking, tracking only some devices, or technical or user errors, must be considered (Scharkow, 2016).Third, we were able to analyze individuals' online behaviors but not their motives.Our descriptive and explorative approach, which was necessary because of the limitations of the secondary analysis, had to neglect the theoretically-derived motivational factors of information behaviors subsumed, for example, in models such as the Planned Risk Information Seeking Model (Kahlor, 2010;Link, Baumann, & Klimmt, 2021;X. Wang et al., 2021).Fourth, we focused on types of sources but did not analyze the actual content of each visit.Work on research infrastructures is needed to remedy the first and the second limitations.The communication research community desperately needs high-quality respondent panels in which user-centered Internet use tracking is conducted across devices and with better coverage while preserving the respondents' privacy.To address the third and fourth shortcomings, future research should, among others, expand the survey section with dedicated health-related questions and consider the content of the visited pages.Fifth, the data were collected before the COVID-19 pandemic, which raises the question whether the results still hold today.Even though COVID-19 may have displaced many other health topics and contributed to an increase in online use, studies from Germany before, during and after the pandemic suggest no sustained boost in digitalization (Link, Czerwinski, et al., 2022).We, therefore, assume that the general patterns of search engine use identified in this study persist as society enters a post-pandemic state.Of course, this assumption needs to be tested in future studies.

Conclusion
The similarities of the findings based on user-centric Internet tracking data and prior survey research support the viability of existing findings.Furthermore, we provided a granular picture of the search processes showing either specific needs for instrumental support or a broader search for information related to health challenges.These findings also show which health information needs could be addressed by online health communication interventions, and they suggest appropriate sources for health campaigns.In addition, the function of search engines to prepare for and supplement consultations became apparent.They serve a crucial navigation function within the healthcare system, highlighting the need for search engine optimization of expert sources such as health professionals.

Notes
1. https://osf.io/4hnfv/.2. One person who identified as non-binary had to be excluded from the quantitative analysis.While this was necessary because of the statistical approach, it also highlights the lack of research effort and knowledge on non-binary and genderqueer groups in health communication.Research including and addressing these groups is needed for broader inclusivity.3. From netdoktor.de,gesundheitsinformationen.de, and de.wikipedia.org.

Figure 1 .
Figure 1.Extraction of measures from Internet use tracking data.

Figure 2 .
Figure 2. Prevalence of health-related internet searches by individual characteristics.Each panel shows the estimated proportions of the population who searched at least once for (a specific type of) health-related information based on n = 994 participants.Note the different scaling of the x-axes.See Figure 3 for a comparison of the types of information on the same scale.

Figure 3 .
Figure 3. Prevalence of health-related internet searches by type of information.The figure shows the estimated proportions of the population who searched at least once for (a specific type of) health-related information based on n = 994 participants.

Figure 4 .
Figure 4. Sources selected after health-related searches.The x-axis shows the estimated proportions of online health information seekers (i.e., individuals with at least one health-related search) who selected a specific source at least once after a search based on n = 618 participants.

Figure 5 .
Figure 5. Source selection by type of information sought.The figure summarizes the results of 15 models predicting the source selected from the results of a healthrelated search by individual characteristics and the type of information.The models are based on 8,521 observations (i.e., search-source pairs) from n = 618 participants.The results are grouped by the type of sought information.The coefficients of the issues of vaccination, diagnostics, contraception, and other terms were omitted from the figure.The coefficients of the individual characteristics are depicted in Figure6.The complete models are reported in the Online Appendix, FigureA3.Note that the source type is a categorical variable, i.e., each source visit was sorted into one category.Consequently, the coefficients tend to balance each other in their absolute amounts.If, for example, searches for recreational drugs made visits to online shops and product presentations more likely, they also made visits to sources in other categories less likely, per the logic of the categorization.We, therefore, focus our description on the positive coefficients (i.e., which type of sources are more likely to be selected) and take the meaning of negative coefficients (i.e., which type of sources are less likely to be selected) implicitly as granted.

Figure 6 .
Figure 6.Source selection by individual characteristics.The figure summarizes the results of 15 models predicting the source selected from the results of a healthrelated search by individual characteristics and type of information.The models are based on 8,521 observations (i.e., search-source pairs) from n = 618 participants.The results are grouped by the individual characteristics predictors in the facets.The coefficients of the type of information predictors are depicted in Figure 5.The complete models are reported in Figure A3.

Table 1 .
Categorization and frequency of selected online sources.