Toward more rigorous and informative nutritional epidemiology: The rational space between dismissal and defense of the status quo

ABSTRACT To date, nutritional epidemiology has relied heavily on relatively weak methods including simple observational designs and substandard measurements. Despite low internal validity and other sources of bias, claims of causality are made commonly in this literature. Nutritional epidemiology investigations can be improved through greater scientific rigor and adherence to scientific reporting commensurate with research methods used. Some commentators advocate jettisoning nutritional epidemiology entirely, perhaps believing improvements are impossible. Still others support only normative refinements. But neither abolition nor minor tweaks are appropriate. Nutritional epidemiology, in its present state, offers utility, yet also needs marked, reformational renovation. Changing the status quo will require ongoing, unflinching scrutiny of research questions, practices, and reporting—and a willingness to admit that “good enough” is no longer good enough. As such, a workshop entitled “Toward more rigorous and informative nutritional epidemiology: the rational space between dismissal and defense of the status quo” was held from July 15 to August 14, 2020. This virtual symposium focused on: (1) Stronger Designs, (2) Stronger Measurement, (3) Stronger Analyses, and (4) Stronger Execution and Reporting. Participants from several leading academic institutions explored existing, evolving, and new better practices, tools, and techniques to collaboratively advance specific recommendations for strengthening nutritional epidemiology.

Guidelines relying on the circumstantial evidence can be a little more than educated guesses. (Teicholz and Taubes 2018) Some academicians even suggest abolishing nutritional epidemiology. As Scott Lear, a professor of Health Sciences at Simon Fraser University, posited, "One may wonder if we should stop nutritional research altogether until we can get it right" (Lear 2019). John P.A. Ioannidis stated, "Nutrition epidemiology is a field that's grown old and died. At some point, we need to bury the corpse and move on to a more open, transparent sharing and controlled experimental way" (Belluz 2018).
But the field-and the status quo-also has its defenders. Ambika Satija and colleagues asserted in Advances in Nutrition (Satija et al. 2015): Nutritional epidemiology has recently been criticized on several fronts, including the inability to measure diet accurately, and for its reliance on observational studies to address etiologic questions.… These criticisms, to a large degree, stem from a misunderstanding of the methodologic issues.… Misunderstanding these issues can lead to the non-constructive and sometimes naïve criticisms we see today. (Satija et al. 2015) Rightly or wrongly, nutritional epidemiology's research findings have played a large role in shaping how we perceive relationships between food or nutrients and disease and how national policy guidelines are determined. These research findings also shape public opinion and affect public health. But nutritional epidemiology's research outcomes are all too often derived by study modalities that yield only low-level evidence.
Despite limitations in the study designs commonly usednot to mention poor executions of analyses and misreporting of subsequent results-researchers often make claims of causality when reporting nutritional associations (Cofield, Corona, and Allison 2010). In particular, the discipline has relied heavily on simple observational studies and meta-analyses of these simple observational studies. But these ordinary association tests alone cannot determine causality. At best, simple observational studies may, as part of a larger body of evidence, result in collective evidence of causation sufficient for some standards (Hill 1965).
Moreover, current methods of measuring dietary intake, food composition, and environmental "exposome" covariates arguably fall short of both the accuracy and the precision necessary to confidently detect causal risk relationships or their magnitude, and do not meet the standards of quality often held in other research domains (Schoeller and Westerterp 2017;Patel and Ioannidis 2014).
Nutritional epidemiology can-and must-do better by pursuing greater scientific rigor, academic honesty, and intellectual integrity. And the time is right to do so. Some academicians, believing such change is impossible, wish to jettison the tools upon which nutritional epidemiologists historically have relied. Still others advocate for only normative refinements. But neither abolition nor minor tweaks are appropriate. Nutritional epidemiology, in its present state, offers utility and substantial room for improvement. To change the status quo will require an ongoing, collective examination of nutritional epidemiology's research questions, practices, and reporting-and a willingness to admit that "good enough" is no longer good enough.
In the spirit of strengthening the field of nutritional epidemiology, an online event, "Toward more rigorous and informative nutritional epidemiology: the rational space between dismissal and defense of the status quo," was held from July 15 to August 14, 2020. The symposium comprised 15 prepared research talks, several moderated panel discussions, and small-group, open-forum sessions related to the need for reforms in four areas of focus: (1) Stronger Designs, (2) Stronger Measurement, (3) Stronger Analyses, and (4) Stronger Execution and Reporting.
Invited participants from several leading academic institutions explored new best practices, tools, and techniques to strengthen the nutritional epidemiology field. Following small-group discussions, the working groups presented their findings, considered the various perspectives offered, and then collaboratively worked through specific recommendations. For each of the four focus areas, we first summarize the recommendations that resulted from the prepared talks and discussions. We then summarize and expand on the content of the prepared talks and discussions. Finally, we provide some concluding comments.

Recommendations
• Begin with the research question to be answered, consider which measurements could most effectively answer this question, and develop a study design best-suited to delivering these data. Researchers can strengthen the quality of observational research by expanding study design options beyond traditional observational methods or by combining traditional observational methods with more objective measurements.
• First consider what a design can and cannot accomplish (to the extent this is known). Such consideration is essential for understanding a study design's strengths and limitations and systematically addressing any assumptions and limitations, such as through sensitivity analyses, falsification tests, or constraints on conclusions. • Designs exist both for generalizability of results and for making inferences to a specific individual (e.g., pragmatic trials versus repeated N-of-1 trials, respectively), and investigators should be precise about the population or individual to which inferences are appropriately made. • Before declaring a randomized trial to be impossible, impractical, or unethical, researchers should thoroughly review all available options. Myriad design options are employed across academic disciplines, including conventional trials, unconventional trials, and emerging approaches. Creative solutions may make a randomized design possible, practical, and ethical, thereby adding stronger causal inference to some nutritional epidemiological questions.

Discussion summary
Beyond the dichotomy of ordinary association tests versus randomization. Much of the nutritional epidemiological literature has relied on simple observational studiesspecifically ordinary association tests (OATs) These have been defined as: Observational studies on samples of individuals in which the sole or primary means of controlling for potential confounding factors is inclusion of measures of some potential confounding factors as covariates in statistical models (or stratifying by measures of such factors). OATs are heavily relied upon in thinking about plausible effects of policies, but have also been heavily criticized in general and in the obesity and nutrition domains in particular for multiple reasons. (Richardson et al. 2017) The limitations of OATs as a means of reliably determining causation are well established (Jepsen et al. 2004), and these limitations are not necessarily specific to nutritional epidemiology. Indeed, Hernán, in discussing the use of causal language for epidemiological research, succinctly titled a section, "Of course 'association is not causation,'" and outlines the importance of articulating better causal questions than those calculated by simple associations (Hernán 2018). Such tests do, however, have some merit. At a minimum, they assist in the generation of hypotheses-even if OATs, themselves, are poor tests of the hypotheses they help to generate. But unless a study is designed to strengthen an inference-and not simply to rehash others' findings-it will do little to advance nutritional epidemiology's body of knowledge and may merely amplify bias. The literature remains replete with OATs, sometimes repeating different iterations of the same association dozens upon dozens of times (e.g., Brown, Bohan Brown, and Allison 2013). The field needs to use more robust modes of inquiry, some of which we discuss below, especially when it becomes clear that yet another OAT will not advance the causal-or even associational-understanding of a nutrition-health relation (Brown et al. 2014).
While randomization may be the gold standard, in some cases it is not feasible or ethical. As Randomistas author Andrew Leigh (2018) explains: Not every intervention can-or should-be randomized. A famous article in The British Medical Journal searched the literature for randomized trials of parachute effectiveness. Finding none, the researchers concluded (tongue-in-cheek): 'the apparent protective effect of parachutes may be merely an example of the "healthy cohort" effect… The widespread use of parachutes may just be another example of doctors' obsession with disease prevention.' Using similar critiques to those leveled at non-randomized studies in other fields, the article pointed out the absurdity of expecting everything to be randomly trialed. (Leigh 2018) Others have similarly outlined the challenges to implementing conventional randomized designs for questions related to diet (Hébert et al. 2016). The parachute analogy, or those like it at the logical extreme (Katz 2019), is not directly comparable to questions of how nutrition relates to chronic disease. Nutrition can impact chronic disease outcomes in multiple ways and with generally small effect sizes. A lack of a parachute, by contrast, has one causal pathway and large, clearly observed acute effects (Hayes et al. 2018). Nonetheless, other biomedical disciplines have falsely claimed that their interventions were akin to a parachute, even when randomized trials were conducted to test the intervention (Hayes et al. 2018).
Still, challenges with applying randomization need not justify acceptance of the status quo. Modifications can be made. As Leigh continues, "The parachute study has been widely quoted by critics of randomized evaluations. But it turns out that experiments on parachute efficacy and safety are widespread" (Leigh 2018). Crash test dummies have been used in impact testing of jumps from various altitudes, and paratroopers were randomized to protective ankle braces which were found to reduce parachuting-related ankle sprains by a factor of six (Leigh 2018).
There is important and worthwhile middle ground between the position that conventional randomization is the only valid avenue and the notion that, for cases in which conventional randomization is not possible, any nonrandomized study is equally acceptable and valid. In actuality, a mix of conventional and unconventional interventional approaches and quasi-experimental designs can be leveraged to great effect. As such, it is time for scholars to broaden their research modalities and to employ creativity during a study's initial design.
Researchers also need to stay current on the growing number of available novel designs when simpler designs cannot answer the research question, and understand how to appropriately select, execute, and analyze results from them. Many textbooks on clinical and experimental research design in the behavioral health sciences emphasize "conventional" experimental or quasi-experimental designs, including (but not limited to) two-group parallel arm, factorial, withdrawal, crossover, and pragmatic trial designs (Friedman et al. 2010;Shadish, Campbell, and Cook 2001;Windsor et al. 2001;Meinert 1986). While such designs are essential to the experimentalist's toolkit, novel or less well-known variations on these designs may be useful. Stepped-wedge cluster randomized designs (Hemming et al. 2015); within-cohort randomized trials (also called Trials within Cohorts) (Kim, Flory, and Relton 2018); Randomization to Randomization (R2R) ); packet-randomized experiments (Pavela et al. 2015); multiphase optimization strategy trials (MOST) (Collins 2018); sequential, multiple assignment, randomized trials (SMART) (Almirall et al. 2014;Lei et al. 2012); and repeated N-of-1 trials (Duan, Kravitz, and Schmid 2013) each extend or provide alternatives to the standard randomized design by varying design features including the planned timing of treatment assignment, treatment optimization criteria, participant expectancies, the consent process, and nature of the treatment. Increased familiarity with these designs and other novel designs, as well as the reasons for using them, will make it more likely that researchers will select an appropriate randomized design.
The question remains as to which randomized designs provide the most relevant evidence for health care providers versus policymakers. A particularly salient example may be what Sacristán and Dilla (2018) call the "contradiction" of the pragmatic design: The root of the contradiction is that the same model that considers that a pragmatic attitude aims to inform clinical decision-making assumes that health care decision-makers speak the language of populations. In reality, while historically decisions made by policy-makers have been population based, clinical decisions are always individual based. (Sacristán and Dilla 2018) This "contradiction" has become more evident with growing interest in precision-medicine and the possibility of making inferences to individuals rather than to populations by using N-of-1 trials, which resemble the conventional crossover design in that they are multiple-period crossover experiments comparing two or more treatments, but within individual patients. It is encouraging to see debates about the relevance of different randomized designs to decision-making (Pavela et al. 2015).
Nonrandomized studies, too, have their place-provided that their designs' potential limitations and assumptions are systematically evaluated and addressed accordingly. Furthermore, not all nonrandomized designs are the same, ranging from controlled (but nonrandomized) interventions to OATs. At a minimum, a thoughtful consideration regarding the goal of nonrandomized research, such as causal inference, should be articulated (c.f., Chiu et al. 2021;Tobias and Lajous 2021). Error can influence study results in any direction, and even some stronger nonrandomized designs may, in practice, inadvertently exacerbate some bias.

Example: Family-based designs.
Family-based designs, like all designs, are susceptible to threats to internal validity and have other assumptions that may or may not be met in a given application. To compare outcomes among siblings who do and do not experience an exposure or intervention, for example, a family-based design exploits familial relatedness to enhance confounder control. Yet, the design does not obviate the need for longitudinal data, measured at a suitable timescale, to rule out reverse causation and provide an appropriate test of the hypothesized effect (McGue, Osler, and Christensen 2010). Neither will the design, in and of itself, resolve bias from nonrandom measurement error . Moreover, the design can be especially vulnerable to bias from random measurement error and unmeasured confounders that are not shared by family members (Frisell et al. 2012). Further limitations stem mostly from the requirement of discordance-the use of within-family variation in exposures and outcomes to estimate associations (D'Onofrio et al. 2016).
Even so, sibling comparisons-especially comparisons of discordant twins-were applied successfully as early as the late 18th century. Considered a health hazard at the time, coffee consumption was banned in Sweden. King Gustav III ordered a study on a pair of identical twins: One twin agreed to drink three pots of coffee for the rest of his life, and the other one a similar amount of tea. Two prominent physicians were monitoring their health. Both physicians died before the experiment completed, one dying before the other. Gustav III himself was assassinated in 1792, while both twins lived healthily for a long time. Eventually, the tea consumer twin died at the age 83 years, and coffee won! (Afshari 2017) Sweden's coffee ban would be reversed in the 1820s, but today science demands stronger evidence than comparisons between only two twins. By the mid-20th century, some epidemiological uses of twin studies would include ruling out genetic confounding in associations of tobacco smoking with mortality (Lichtenstein et al. 2002;Kaprio and Koskenvuo 1989), studying correlates of obesity in small samples of well-characterized discordant twins and diet and mortality in larger samples linked to national health registers (Naukkarinen et al. 2012;Granic et al. 2013), and exploring the association of exposure to breastfeeding with obesity in childhood and adolescence using multiple sibling comparisons (Metzger and McDade 2010;Colen and Ramey 2014). There is potential for extending family-based designs to omics research-for instance, examining identical twins discordant for dietary factors (Pallister, Spector, and Menni 2014;Barron et al. 2016).
While family-based approaches can enable researchers to control for sources of familial similarity (e.g., genetic factors, educational background, home environment, parenting practices) by design (i.e., without the need for measured covariates), the benefits of such study designs to nutritional epidemiology are bound by some conditions. Family-based designs are likely to be used to greatest effect in nutritional epidemiological study when (1) constructs, such as nutritional exposures, can be measured well, (2) unmeasured confounders are likely shared among siblings or other family members, (3) exposures vary within the family members studied (twins, siblings, etc.), and (4) limitations and assumption violations can be identified and assessed, including through the use of multiple family-based designs and systematic sensitivity analyses (D'Onofrio et al. 2016).

Challenges moving forward.
No matter which study designs they use, researchers should be willing to tolerate and openly share statements of uncertainty and to publish their results, regardless of characteristics like statistical significance or consistency with the present zeitgeist. Unfortunately, doing so can present challenges for publishing results.
After conducting a systematic review of the literature to examine the relationship between built environments and physical activity or obesity rates, a group of researchers (including authors on this paper) identified the need for higher-quality evidence, noting: Recognizing that experimental studies are potentially not feasible in many situations, researchers should look for opportunities to employ quasi-experimental designs. One example of such designs is the difference-in-difference approach that seems particularly applicable for the study of environmental changes such as the addition of a greenway to a neighborhood. (Ferdinand et al. 2012) Upon learning that a new park would be erected in downtown Birmingham, AL, USA, the same group of researchers decided to study its impact on the body mass index (BMI) of children living nearby (Goldsby et al. 2016). They extracted changes in BMI from electronic health records collected by downtown Birmingham clinics and tested whether children living closer to the new park exhibited changes in BMI, pre-and post-park, relative to those of children who lived farther away from the park.
Using difference-in-difference statistical modeling, they investigated park proximity and its associations on children's BMI. The main takeaway? "Proximity to a park was not associated with reductions in BMI z-score" (Goldsby et al. 2016).
The researchers were forthcoming about their study's limitations: The sample sizes of the near groups were relatively small, potentially limiting the power to identify significant differences between groups. Having more children in the near groups would have been ideal, but being able to examine BMI longitudinally, even in a small group of children, provides valuable information for other obesity researchers and policymakers working to address the U.S. obesity epidemic. (Goldsby et al. 2016) However, it took myriad attempts for the researchers to find a journal willing to publish their findings. This difficulty may have been in part because the results ran counter to conventional thinking about the potential health benefits of proximity to green spaces. It may also have been in part that, although common to other fields of study, the researchers' more rigorous, quasi-experimental approach was generally unfamiliar to reviewers.
Nonetheless, null results still offer value. According to Reproducibility and Replicability in Science, produced by the National Academies of Sciences, Engineering, and Medicine (NASEM), "The advent of new scientific knowledge that displaces or reframes previous knowledge should not be interpreted as a weakness in science" (2019). On the contrary, such occurrences are a function of science's "continuous process of refinement to uncover ever-closer approximations to the truth." Rather than put forward yet another simple observational study, the group of researchers highlighted a different method, raised new questions, and created a roadmap for continued exploration. It is in these ways that the body of nutritional epidemiological knowledge can be made to move forward.

Recommendations
• Self-reporting tools have utility for some uses; however, when possible, self-report should be used in conjunction with additional, objective means of evidence validation, and should be avoided when invalid or unfit for a particular use. Increasing the accessibility of information about available objective measurements, their appropriate uses, and their relative costs could facilitate their increased use. • Blending varying degrees of automation with traditional, observational studies can improve the quality of self-reported data. Ideally, investigators should have access to completely independent methods of determining food intakes, comprehensive analyses of the chemical compositions of foods that include ranges of nutrient variability, and fully independent methods of assessing specific nutrient intakes. • Researchers should continually seek additional biomarkers and other new technologies and methods for collecting objective data. Multi-disciplinary partnerships and interactions may be able to hasten improvements in available objective measurements. Institutions and funders should prioritize the development, training, and use of such improvements.

Discussion summary
Status quo: Self-report. The field's continued reliance on substandard measurements has hindered progress in nutritional epidemiology. In particular, traditional observational methods such as self-reporting remain mainstays, despite their potential for inaccuracy.
Consider, for instance, the assessment of energy intake using self-reporting. Self-reported energy intake was first compared with doubly-labeled water-a biomarker of habitual energy intake-in 1986. The self-reported measure underestimated energy intake by 34% in women with obesity (Prentice et al. 1986).
Moreover, systematic reviews have since identified 59 studies, including 6,298 adults, and 15 studies in 664 children that compared energy intake from food diaries, 24-hour recalls, or food frequency questionnaires (FFQs) with doubly-labeled water energy expenditure (Burrows et al. 2019; Walker, Ardouin, and Burrows 2018). Underreporting of energy intake averaged about 20%, but varied from 1% to 67% across these studies. Underreporting was common in participants older than eight years, increased with body mass index, and was found in countries at all stages of economic development. Attempts to reduce bias in energy intake estimates by excluding extreme values, for example, the use of "Goldberg cutoffs," have been shown to be unreliable (Ejima et al. 2019). These problems with self-report-based estimates of energy balance have resulted in calls to discontinue their use in the calculation of actual energy balance (Dhurandhar et al. 2015). Yet, their use continues.
Studies using biomarkers of protein, potassium, and sodium intake have found that underreporting occurs most for foods characterized by lower protein and sodium content, which is consistent with selective underreporting of high-fat, high-sodium snack and savory food items. These deficiencies were first reported over 35 years ago and have been confirmed in multiple studies. Yet, their use continues.
The FFQ, which is most commonly applied in large cohort studies, does not accurately estimate frequency of intake or gauge serving size (Willett et al. 1987). Correlations of intake data with a limited number of plasma biomarkers suggest mostly weak to moderate associations among large groups (Cade et al. 2004). Correction for energy intake might account for different energy needs. However, energy intake from FFQs is invalid (Dhurandhar et al. 2015), and thus correcting for it calls into question all extrapolated nutrient data, especially for extrapolation of risk for individuals (Krall and Dwyer 1987). There do not appear to be any modifications of the FFQ that would produce accurate and precise data (Kipnis et al. 2002). The clearest benefit of the use of FFQs is the collection of some degree of dietary information before disease occurs, precluding reverse causality by accounting for a temporal relationship.
Dietary intake assessments frequently depend on conscious recall of the foods being studied. The data on the contents of all of the nutrients and bioactive substances present in the reported foods being consumed are also a source of error. And so are the data on the variability of contents-due to individual cultivars, harvesting and storage conditions, food preparation and cooking methods, and the relationships of these to the lifestyle, behavioral, and environmental variables (sometimes referred to as an "exposome") that co-vary with dietary intakes.
Despite such well-documented but frequently unaddressed short comings, the use of self-reported dietary assessment instruments remains commonplace. Proponents of the continued use of self-reported nutritional data often surmise that using such methods is better than nothing, given the importance of diet in the maintenance of health and development of disease (Satija et al. 2015). Unfortunately, when used in isolation, self-reporting is a rather blunt instrument-one that can limit the scope of a study's design and negatively impact the nature and quality of the research questions pursued. Indeed, in the example of self-reported energy balance, the direct use of self-report has been worse than nothing (Dhurandhar et al. 2015).

Utility of self-report.
That is not to say there is no longer any room for self-reporting. Two of the most common methods for assessing dietary intake, 24-hour recalls and FFQs, can be implemented in many ways, and some of these implementations-like the Automated Multiple Pass Method (AMPM) and the ASA24 (adapted version of AMPM for self-administration)-perform better than others (Moshfegh et al. 2008).
Originally developed by the U.S. Department of Agriculture (USDA)/Agricultural Research Service (ARS) for the National Health and Nutrition Examination Survey (NHANES), the AMPM puts energy intake within 3% of estimates from doubly-labeled water in people with BMI < 25 (Moshfegh et al. 2008). However, with increasing degree of overweight, this decreases to 80% of true intake, so overall population estimate is 89% of actual energy requirement. Additional validation of sodium intake, which correlates strongly with energy in the diet, showed 90%recovery in urine samples (Rhodes et al. 2013). Even these levels may not be adequate for usual weight changes but may be enough for macronutrients.
NHANES uses two nonconsecutive recalls to estimate intake. The first of these-a face-to-face meeting with trained personnel-uses three-dimensional models to estimate food serving sizes. The second interaction occurs via telephone and uses actual-size, two-dimensional food images in booklets provided to participants. It is well understood that older-style, 24-hour recalls must be administered multiple times to estimate nutrient intake in individuals, whereas a much smaller number of recalls is needed to approximate population averages, which is the proper use of NHANES data (Basiotis et al. 1987). NHANES data are cross-sectional, and thus provide weak causal evidence, even when repeated over time.
Because it takes 30 minutes to complete and is administered in the NHANES setting, the 24-hour AMPM may be ill-suited to large cohort studies due to time and cost. But the National Cancer Institute (NCI) has developed an internet-based, self-administered version of the AMPM called the Automated Self-Administered 24-hour (ASA24) Dietary Assessment Tool. ASA24 similarly may afford more standardization than other, traditional observational methods (Subar et al. 2012). Investigators can also rely on limited biomarkers for other nutrients to assess the performance of these diet assessment tools.

Moving beyond self-report in isolation.
Combining traditional observational methods with more objective measurements can greatly boost a study's utility, but in deciding which objective measurements to include in a design, researchers across scientific disciplines face the same trilemma. The ideal tool would provide measurements that are 1) accurate and precise, 2) detailed, and 3) frequent. However, the fundamental nature of a trilemma is that it is impossible to secure all three equally and simultaneously.
This notion certainly applies to dietary measurements. Case in point: while food diaries, 24-hour recalls, and FFQs all focus on dietary detail, that emphasis on detail inherently reduces both the accuracy and frequency of measurement. On the other hand, a tool such as doubly-labeled water provides accurate measures of metabolizable energy, but little in the way of detail (Speakman et al. 2021). For some diet-disease relationships, knowledge of specific nutrient intake is important. However, for obesity, the most prevalent diet-related condition in the United States, no clear dietary intervention prevails for long-term efficacy. For obesity, measurement tools need to focus on accuracy and frequency of measurement of energy intake.
Some of the tools nutritional epidemiologists can leverage to collect data more accurately and with more frequency include wearable devices developed to detect and measure consumption (Hoover and Sazonov 2016). These devices can reduce underestimation of energy intake by providing an objective measure that does not rely on self-reporting (Salley et al. 2016).
For example, in one study of automated bite counting that used wrist-motion tracking, estimates of energy intake were significantly more accurate than a guess, and automated bite counting was comparable to human estimates using a detailed menu. By automating the measurement process, these devices also reduce cognitive load and, hence, user burden (Weathers, Siemens, and Kopp 2017). This, in turn, helps to increase frequency of measurement.
Although these wearables may be designed to accurately estimate energy intake, they offer poor precision. Some wearables have poor precision because they treat all foods equally. One study of 30 subjects that used a sensor to measure chews and swallows, found an average error of approximately 30% of energy intake per validation meal and 16% for training meals, compared to investigator measured intake (Fontana et al. 2015). The study also used estimates from photographic food records, which showed approximately 20% error in both cases. Another study of 77 people compared automatic bite count with kilocalories measured by use of 24-hour recall over a two-week period (2,975 meals/snacks) and found a per-meal correlation of 0.53 (Scisco, Muth, and Hoover 2014). While the self-reported intake in the latter study was subject to the issues already discussed with self-report, in both of these studies, the average per-meal accuracy was high, at the expense of lower per-meal precision.
Although these examples focused on energy intake, other technologies exist (e.g., photogrammetric approach, continuous blood glucose monitoring, metabolomic profiles). Each case sacrifices parts of the trilemma. A bite counter gives (on average) accurate energy intake, details on consumption and chewing patterns, and frequent measurement, but lacks information on other characteristics of the food or meal. Photogrammetric approaches provide a richer detail of meal content and context, but do not provide the same information on eating rate and may require user input to calibrate information for specific meals. Despite their limitations, however, the newer technologic and biomarker approaches provide objective measurements that are not dependent on self-report.
Addressing nutritional epidemiology's data-related challenges may entail developing entirely new ways of measuring food consumption, assessing specific nutrient intakes, and of more accurately analyzing the chemical compositions of foods. Collecting data via a mix of more objective measurements and using increasingly robust study designs will help to advance nutritional epidemiology. But for these changes to be truly effective, additional reforms related to the analysis of these data are also needed.

Recommendations
• The relationship of dietary factors to numerous potential confounders, such as age, sex, education, and income, should be determined, and uniform standards developed to include and address these. • Investigators should use multiple analytical methods, including appropriately robust and sometimes novel statistical tools, to mitigate biases common to simple observational studies. • To resolve the complex problem of innumerable interacting variables in the exposome, investigators should seek information technology approaches to the investigation, reduction, and interpretation of data.

Discussion summary
The factors discussed in "Stronger Designs" and "Stronger Measurements" allow the field of nutritional epidemiology to employ strategies that result in stronger prediction and causation (Imbens and Rubin 2015;Rosenbaum and Rubin 1983;Morgan and Winship 2015). Yet many investigations still use OATs to explore the relationship between X and Y, in which an investigator might speculate that X causes Y. After observing the association of X and Y, a potential confounder, Z, arises. So the investigator measures and controls for the presumed confounder, Z. What is the problem with this method? In addition to the quality of inference depending critically on the quality of the measurement (discussed above), failure to model the functional form correctly can at best reduce and at worst contribute to bias (Westfall and Yarkoni 2016). Reliance on just one approach for ruling out alternative explanations can leave invalidities or biases behind. When multiple methods are incorporated, they can greatly reduce the number of alternative explanations. In this section, we touch on three potential concerns for inference: withinversus between-subject effects, incorporating complexities of new measurement methods, and stronger inferential analyses.

Within-versus between-subject effects.
Analysis of "between-person" and "within-person" data can provide clarity to research questions. To the extent observational data are used to support causal inference, assumptions must be considered for how X (the independent variable) was "assigned" in the population, such as the level of analysis at which the variables are covarying.
Often, X, Y, and Z data are collected on a sample of people and, so, the covariation matrix represents the between-person covariation. However, if the variables were collected as repeated measures on an individual person, then the covariation represents correlated changes within the person. This distinction of between-and within-person covariance is not trivial. The two levels of analysis are fundamentally different, and only under special conditions can inferences at one level be extended to the other.
People are complex, dynamic biological systems-systems that evolve over time. Nutritional epidemiology is interested in the covariation of certain variables relevant to health. However, the data usually represent the between-person covariation of various factors, yet the inferences are usually intended to be at the within-person level. The implication is that changes in one variable might lead to (that is, cause) changes in another variable within a person. But the covariance matrix of X, Y, and Z can look very different at the between-and within-person levels. For many biological processes, there is abundant homogeneity in the human population. But individual variation and path dependencies exist as well. Growth, development, and learning are all non-stationary processes (Molenaar 2004).
To the extent that nutrition-related exposures are analyzed to show correlations, the analysis takes place at the between-person level and does not necessarily capture within-person correlations. As one example, Forbes (1984) re-analyzed food diary data and body weight data that seemed perplexing because of an apparent lack of relationship between the two. But, when he plotted within-person change in intake against within-person change in body mass, he observed a nearly perfect correlation.
Although worthwhile, within-person analyses can be more expensive to gather and more complex to analyze. They sometimes necessitate intensive longitudinal measures sufficient to estimate the covariance structure. They also require statistical methods such as time series, vector auto-regressive models, and hidden Markov models, which are not usually taught in many graduate programs.
Additionally, conducting within-person analyses can involve feedback among the X, Y, and Z variables. Should this take place at the within-person level, such feedback can dampen correlations observed at the between-person level.
Although within-person studies may require substantial effort to conduct, it is important to collect the appropriate data to support the intended inferences. To be clear, there are many questions that are well-answered with between-person data, such as any health factor where human response is highly uniform. However, these cases are the ones in which the conditions of homogeneity of process are best met. The relevance of between-person covariance to within-person inference is an assumption that must be considered to support more robust research claims.

Incorporating complexities of new measurement methods.
Whether OATs or more advanced analyses are used, automated measures such as wearable physical activity monitors can introduce assumptions that also must be considered and corrected for. While the use of newer automation tools in tandem with more traditional observational study methods can help to provide researchers with greater accuracy and clarity, the potential presence of measurement error cannot be discounted.
Nutritional epidemiology lacks uniform approaches to handling such a mountain of exposome data and their relationship to nutrition or health outcomes. Attributing a health risk to a single food or nutrient, as nutritional epidemiology does in studies that often dominate the list of most 'popular' nutrition articles, is no longer entirely defensible given the food-to-exposome, food-to-food, and food-toother relations among variables. Relying solely on self-reporting as the method of observation is even less defensible, but some of the same or related complexities arise in more objective measurements.
Consider for example, the complex, high-dimensional data collected from physical activity monitors. Ideally, researchers would be able to explore the full algorithms used to generate the end-user data that physical activity monitors provide. But, because manufacturers of commercial-grade wearable technology largely deem these equations proprietary, these algorithms are difficult to obtain except from research-grade devices. Nevertheless, it is possible to account for and resolve measurement error associated with wearable devices, but classic regression methods may be inadequate for this purpose.
Recently, researchers developed and applied novel statistical modeling to correct for wearable device measurement error in a childhood obesity study (Tekwe et al. 2019). Those authors note: In this setting, we considered a scalar valued outcome with a functional covariate that was corrupted by measurement error. Most existing methods either implicitly assume the measurement errors are independent over time, or the measurement error covariance is known or can be estimated. However, the measurement errors are likely to be correlated over time. In addition, the measurement error variances are never known and estimates are seldom available. In this paper, we took advantage of the additional information provided in an instrument variable and developed a generalized method of moments-based approach to identify and consistently estimate the functional regression coefficient. (Tekwe et al. 2019) The researchers illustrated that ignoring measurement error can lead to biased estimations. They add, "We successfully applied our proposed model to conclude that the estimated association between baseline measures of energy expenditure and the 18-month change in BMI was sometimes significant. This association indicated that school programs and policies that increase physical activity among students might have some beneficial impact… Our developed methods improve on the current statistical approaches used to evaluate the effectiveness of such policies" (Tekwe et al. 2019).
Multiple analytical strategies together-including appropriately robust statistical tools-can also mitigate residual confounder, reverse causation, and other biases that commonly plague observational studies (Davey Smith and Ebrahim 2003). Some of these tools have been popularized in other disciplines, and nutritional epidemiology can learn from the groundwork laid by others. We introduce a few such approaches below.

Stronger inferential analyses.
Widespread genetic testing in large-scale cohorts promises statistical power sufficient for generating stronger (and often polygenic) analyses for nutritional exposures. These genetically informed analyses are useful for investigating genetic associations and producing more individualized nutrient-outcome predictions. More important to the discussion of causation, Mendelian randomization (MR) using genetic information can be used in these large-scale cohorts to link nutritional exposures to health outcomes.
An adaptation of the instrumental variable approach, MR relies on the genotype as a valid proxy for nutritional (or other types of) exposure and quantifies the causal effect of this proxy on the outcome of interest. Because genotype necessarily precedes any disease outcome, MR conclusions are impervious to reverse causality. Furthermore, because of Mendel's law of independent assortment, both (unlinked) measured and unmeasured confounders are, on average, similarly distributed across genotype/exposure groups, thereby reducing the likelihood of bias due to confounding.
To date, MR has been successfully used to investigate causal effects of exposures related to alcohol and obesity (Au Yeung et al. 2012;Winter-Jensen et al. 2020). In other studies, the causal effect of dairy consumption on a variety of cardiometabolic outcomes was successfully estimated by using lactase persistence polymorphism (LCT-12910C > T) (Mendelian Randomization of Dairy Consumption Working Group 2018; Vissers et al. 2019). LCT-12910C > T has been shown to be a reliable proxy for dairy intake, although its effectiveness as such may vary by population (Chin et al. 2019).
The validity of such MR conclusions generally is predicated on three assumptions. First, a genotype must serve as a strong proxy for the exposure that it is purported to represent. This assumption is often tenuous in nutritional epidemiology. This is especially true for the most controversial exposures such as red meat, eggs, industrially-processed food, or sugar-sweetened beverages. Because of the limited magnitude of genetic effects, often MR studies require very large samples to achieve sufficient statistical power. Yet, even with well-powered studies, confounding by total energy intake remains an almost intractable possibility, threatening the validity of the resulting findings.
The second assumption precludes any horizontally pleiotropic effects of the genotype on the outcome. This condition may be tested using commonly implemented statistical methods and replaced by more lenient assumptions in some MR models (Haycock et al. 2016).
The third assumption excludes any confounding of the relationship between the genetic proxy and the disease outcome and is not directly verifiable. Even with its caveats, MR is another potentially useful analytical tool for nutritional epidemiology. Training in best practices should include selecting appropriate nutritional exposures and their genetic proxies, testing of MR assumptions, choosing appropriate statistical models, and establishing reproducibility of MR findings.
Another approach, used more frequently by econometricians to uncover previously unmeasured biases, uses statistical analyses to create an empirical distribution of non-causal associations. That is, a model is run on the exposure-outcome relationship of interest (e.g., a food's relationship to cardiovascular disease), and on relationships that are not expected to be causally related, which are treated as controls. Finding an association in these presumed non-causal, control relationships may indicate that a common bias explains the association in both the relationship of interest and the control relationships. A related comparison has been used to discuss the causal evidence behind a diet-mortality association (Klurfeld 2015), and investigators on the present paper are involved in using a generalized method sometimes referred to as empirical p-value calibration (Schuemie et al. 2014) to investigate nutrient-disease relationships.
Yet another approach relates to the flexibility in choices in OATs, in which selection biases (intentional or unintentional) in choosing covariates (e.g., age, sex) and operationalization of dietary variables (e.g., dichotomous, continuous) may result in substantially different results. Rather than trying to identify one "best" model, another approach is to test the robustness of the analysis on many different legitimate analytical choices. In the simplest form, this is done in OATs by modeling a bivariate relationship, the selectively adjusted model, and the "kitchen sink" (or inclusion of all covariates) model. A multi-verse of analyses (Steegen et al. 2016), also called vibration of effects (Patel, Burford, and Ioannidis 2015) or specification curve analyses (Simonsohn, Simmons, and Nelson 2019), extends this to test many different model specifications. If the models are not robust to these choices, the nature of any causal relationship between the exposure and outcomes of interest comes into question.
Combining these stronger analytical approaches, along with considering appropriate inference (e.g., between-person and within-person designs) and applying novel statistical approaches to datasets, are just some of the many supplemental methods nutritional epidemiology could-and should-be using to move beyond OATs. Doing so is integral to the management and mitigation of alternative explanations and can serve to strengthen nutritional epidemiology's contributions to science.

Recommendations
• Nutritional epidemiology should adhere to reporting guidelines (e.g., CONSORT and STROBE-nut). • To prevent selective non-reporting of studies and results, investigators should register research prospectively (e.g., on ClinicalTrials.gov) and report results for all outcomes and analyses. • To improve transparency and openness, investigators should share research materials, data, and code. • To promote scientifically appropriate interpretations, researchers should avoid "spin" in scientific reports and press releases and identify limitations associated with their findings.

Discussion summary Considering causal inference in nutrition.
What it means to have cause and effect is the same whether an investigator is considering chemical reactions in a tube, pharmaceuticals in people, or social determinants of health. What differs is the ability to probe those questions with gold-standard, causal methodology. The difficulty in answering causal questions in nutrition has resulted in some authors proposing lowering the field's standard of evidence Schwingshackl et al. 2016). This includes elevating or disregarding problems with limited-quality assessments, such as FFQs; trusting nonrepresentative, qualitative evidence as causal evidence; and assigning arbitrary point values to (misinterpreted) heterogeneity across studies in nutrition science.
One approach, NutriGrade, suggested down-weighting evidence based on disclosure statements or affiliations (Schwingshackl et al. 2016). However, evaluating evidence based on disclosures is untenable given inherent biases in the field regardless of funding . Interpreting science should be limited to the data, the methods, and the logic connecting the data and methods to the results (Brown, Kaiser, and Allison 2018). Indeed, members of the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group rebutted incorporation of funding bias in a nutrition-specific grading scheme, stating, "There is no plausible rationale or supporting evidence to justify their approach to include funding bias as a separate item" (Meerpohl et al. 2017). Evidence should be evaluated based on the science, not the scientists.
One challenge in evaluating evidence in a "hierarchy of evidence" is the implicit or explicit assumption that studies are addressing the same question, but often they may not be. Any description of the strength of evidence must include a clear delineation of the research question being asked. In evidence-based medicine, this is often represented by PICOTS elements. These include the Population being studied; the Intervention or exposure thought to cause an effect; the Comparator or Control, which is an alternative to the exposure; the Outcome, which is the health state being assessed; the Time at which the outcome is being assessed, and the Setting or Study design, which includes description of an experimental setting and type.
Frequently, a "hierarchy of evidence" is constructed with meta-analyses of randomized trials perched at the top and with observational evidence, animal studies, and in vitro studies descending the pyramid. Yet, a randomized trial will frequently investigate a well-characterized exposure (the I in PICOTS), such as a defined or specified nutrient or diet, but the trial will evaluate an intermediate outcome, such as blood cholesterol rather than atherosclerosis.
On the other hand, some of the more useful observational studies may use poorly characterized exposures-such as self-reported food frequency estimates extrapolated to actual nutrient quantities-but measure the actual outcome of interest, such as ischemic stroke. In those examples, the randomized controlled trial has a high-quality exposure and inferential design, but it fails to evaluate directly the outcome of interest.
Meanwhile, the observational study has a low-quality exposure and inferential design, but it does study the outcome of interest.
Systems such as GRADE explicitly consider the strength of evidence in causal health claims (Guyatt et al. 2006). Meta-analyses of high-quality randomized controlled trials often lead to the greatest certainty ratings, given their key importance for average causal effects.
Te Morenga and colleagues (2012) communicated the challenge of evaluating nutrition science using approaches like GRADE, by noting that nutrition research may be subject to "potential bias, inconsistency, indirectness, imprecision or reliance on study type other than randomized trials", which results in the downgrading of evidence. They suggested that "formally identifying effects which are regarded as important and based on high quality evidence using the GRADE system may be unattainable in the context of nutritional determinants of chronic disease" (Te Morenga, Mallard, and Mann 2012).
This sentiment was echoed by others in response to the publishing of NutriGrade (a nutrition-specific alternative to-rather than an extension of-GRADE), in which members of the GRADE Working Group remarked that "…lack of blinded randomized controlled trials and the resulting sparse bodies of randomized evidence is not a methodologic shortcoming of the GRADE approach but a limitation of the evidence base" (Meerpohl et al. 2017). The strength of causal evidence is therefore understood to be a property of the science, and that nutritional epidemiology is not in some way exceptional.
In one argument against approaches such as GRADE, the authors of NutriGrade state their group is comprised of nutrition scientists, "whereas GRADE is historically composed of mostly clinical research scientists" and that other disciplines have "found that processing evidence in the clinical research compared with the public health research areas follows slightly different approaches" (Schwingshackl et al. 2017).
This seems to imply that nutrition should not be a clinical science and that public health should be held to lower standards of causal evidence. It is true that the strength of evidence in public health and nutritional epidemiology is, frequently, of lower causal strength compared with some other health-related fields. Rather than adopt low standards of evidence, individual scientists, journals, and scientific societies should embrace transparency and communicate the strengths and limitations of various approaches to nutrition research with greater clarity and nuance.
Example of applying GRADE to nutrition. GRADE approaches have been successfully used to evaluate and communicate nutrition evidence. NutriRECS (Nutritional Recommendations and accessible Evidence summaries Composed of Systematic reviews) , for instance, has evaluated multiple questions in the domain of nutrition using the GRADE approach, including named dietary patterns and weight or cardiovascular disease risk factors; red and processed meat and health outcomes; probiotics to prevent Clostridium difficile infection; and others (Ge et al. 2020;Johnston et al. 2019;Goldenberg, Mertz, and Johnston 2018). The approach demonstrates that guidance consistent with internationally-accepted standards can be achieved in the domain of nutrition.
However, recommendations coming from evidence summaries can sometimes be confusing, even if based on strong methodology at the core. For example, NutriRECS uses the GRADE "evidence to decision" framework (Alonso-Coello, Schünemann, et al. 2016;Alonso-Coello, Oxman, et al. 2016), which includes factors like how much stakeholders value the outcome and whether the intervention would be acceptable. The red and processed meat example mentioned earlier received considerable public attention. Although characterized as a successful use of the approach, the authors establish weak evidence, yet make an active recommendation to continue current levels of consumption based on the "evidence to decision" framework. The recommendation process thus risked conflating conclusions derived from the science with the decisions based on what the target population may prefer. An active recommendation to continue current practice might imply to the audience that changing consumption levels-in either direction-would be deleterious for health, as opposed to being most consistent with preferences.
Nonetheless, much of the public criticism of the NutriRECS conclusion was based on nutrition exceptionalism (i.e., that evidentiary standards should be different in nutrition), or perceived conflicts of interest (i.e., factors unrelated to the data, methods, or conclusions) (Qian et al. 2020;Neuhouser 2020;Leroy and Barnard 2020;Rubin 2020;Oreskes 2021;Vernooij et al. 2021).
Improving research reporting. The strengths of a body of evidence cannot properly be evaluated if the collection of evidence is inadequately reported. All too often nutritional epidemiology is discredited through multiplicity and the selective non-reporting of studies and results. When combined, these may be the greatest contributor to the falsity of scientific claims (Goodman, Fanelli, and Ioannidis 2016).
Studies might evaluate the effect of a nutrient by calculating multiple primary and secondary outcomes. The number of results estimated in a study is a function of both the number of outcome definitions and the number of methods used to analyze those outcomes . Even more results are calculated when studies also evaluate the effects of multiple exposures (e.g., nutrients). Because some results will appear to be both clinically and statistically important by chance alone (i.e., false positives), conducting multiple studies and calculating multiple results leads to both true discoveries and false discoveries (Tannock 1996;Greenland 2008). One potential solution is, within a dataset, to make public the number of possible independent variables after accounting for their correlations, and the typical correlations between exposures and outcomes to place new results in their context (Patel and Ioannidis 2014).
Additional methods, as discussed previously, have been developed to assess results from the spectrum of model specifications simultaneously to test the robustness of reasonable analytical choices (Steegen et al. 2016;Simonsohn, Simmons, and Nelson 2020;Patel, Burford, and Ioannidis 2015). Such methods may mitigate bias from the numerous researcher choices during the analysis phase (Gelman and Loken 2013).
Results in journal articles might be systematically biased if they include a disproportionate number of "positive" results, and if the "negative" (e.g., non-significant) results are disproportionately represented in investigators' file drawers (Rosenthal 1979). That notion is supported by direct evidence of study non-publication and by evidence that "primary outcomes" reported in journal articles differ systematically from those reported in study protocols, both of which are related to the significance of results (Chan et al. 2004;Hahn, Williamson, and Hutton 2002;Cooper, DeNeve, and Charlton 1997). Reviews and meta-analyses Goodman and Dickersin 2011;) and scientific theories might be incorrect if they depend on a biased subsample of results and hypothesizing after the results are known ("HARKing") (Kerr 1998).
The selective non-reporting of studies and results, known as "publication bias" and "outcome reporting bias," respectively, is prevalent in health research (Dwan et al. 2013). Underreporting research has been proposed to be a form of scientific misconduct (Chalmers 1990;Wallach and Krumholz 2019), and some investigators withhold data because of competing interests (Blumenthal et al. 1997). Others fail to submit null findings for publication because they believe their results are uninteresting or unimportant, or that publishers simply will not wish to print them (Chan and Altman 2005;Franco, Malhotra, and Simonovits 2014;Dickersin 1990).
The ability to reproduce results from previous studies is often a hallmark of their truthfulness (Goodman, Fanelli, and Ioannidis 2016). Both multiplicity and selective non-reporting have contributed to irreproducibility in nutritional epidemiology, and the field's dearth of data transparency.

Reproducible workflows and open science.
Large-scale statistical modeling, simulation, and data analytics are hindered by a lack of uniformity in software workflows. This has further contributed to the ongoing "reproducibility crisis" in several science domains, including nutritional epidemiology (n.b., we recognize disagreement over calling it a "crisis") (Sweedler 2019). Computational platforms, data sharing frameworks, and archiving of computing environments support reproducibility by lowering barriers to scientific sharing and information preservation (Huo, Nabrzyski, and Vardeman 2015;Open Science Collaboration 2015;Baker 2016). Nutritional epidemiology can take advantage of scientific workflows in order to process large-scale scientific computations in distributed systems. Workflows and distributed systems have been adopted across scientific domains and have underpinned some of the most significant discoveries of the past several decades (Deelman et al. 2015;Klimentov et al. 2015).
Nutritional epidemiology can also leverage open-source software and open science. The datasets and code used in nutritional epidemiology are rarely made public, hindering reproducibility efforts. Because scientific computing has moved toward the adoption of such tools to perform analyses, data-including input and output datasets, graphs, and intermediate results-are increasingly made available as part of the scientific outcome. Some initiatives have proposed systems, such as RunMyCode.org, Research Compendia, Research Objects, and myExperiment, which facilitate the reproducibility of analyses across processing environments Stodden, Hurlin, and Pérignon 2012;Nüst et al. 2017;Bechhofer et al. 2010). Other initiatives have published online, open-source books that share data, code, software versions, or archived computational environments to foster reproducible practices (Kitzes, Turek, and Deniz 2017). These open-source items could then be used as "proof-of-reproducibility" elements in scientific publications or as executable receipts to assist others as they attempt to reproduce equivalent environments.
Registering the details of one's study also would raise the bar for nutritional epidemiology. First proposed for clinical trials (Chalmers and Nadas 1977;Simes 1986;Meinert 1988), study registration is now a widely used method for recording basic details of both trials and observational studies (Nosek et al. 2015), and is a scientific and ethical imperative (World Medical Association 2001;De Angelis et al. 2004). To register a study, investigators enter information about study design and procedures in a public, independently-controlled register. By defining outcomes completely (Zarin et al. 2011;Cybulski, Mayo-Wilson, and Grant 2016) and by registering studies prospectively, sometimes called "preregistration" (Rice and Moher 2019), investigators can improve trust in their findings, link multiple reports about the same study (Mayo-Wilson, Li, et al. 2018), and increase access to their results (Chan et al. 2014). The World Health Organization has defined a minimum dataset and maintains an international list of study registers (De Angelis et al. 2005). The largest register, ClinicalTrials.gov (Zarin et al. 2017), is maintained by the U.S. National Institutes of Health (NIH) and includes both trials and observational studies from around the world (Williams et al. 2010). Because registering and updating registrations requires time and expertise, and because institutions may be ethically and legally responsible for ensuring that studies are registered, universities should support investigators in this process (Mayo-Wilson, Heyward, et al. 2018).
In addition to registers, detailed methods can be published in study protocols (Chan et al. 2013) and statistical analysis plans (SAPs) (Gamble et al. 2017). Protocols and SAPs are useful for minimizing and identifying multiplicity and selective non-reporting. That is, a well-defined outcome can be analyzed using many statistical methods, which will produce different numerical results (Mills 1993;Simmons, Nelson, and Simonsohn 2011). Publishing protocols and statistical analysis plans can help investigators to clarify their hypotheses in advance (Nosek et al. 2018), avoid the temptation to conduct inappropriate analyses (Wang, Yan, and Katz 2018), and identify differences between planned results and the results in their final reports (Pc et al. 2010). These documents are critical components of a system needed to promote rigorous design, me asurement, and rep or ting .
Developing a core outcome set-the minimum group of outcomes to include in studies of a health condition (Boers et al. 2014)-can also promote consistency across studies and better interpretation of multiple results within studies (e.g., ADOPT standards for obesity) (MacLean et al. 2018). Nutritional epidemiology will benefit when individual researchers willingly commit to a degree of similarity across studies, such as harmonizing experimental definitions and measures of exposures of interest.
Nutritional epidemiology must also communicate the limitations of approaches more clearly and for a broader audience. Traditional news outlets, social media, and others over-interpret weak evidence shared by researchers, journals, and institutions' press offices (Brown, Bohan Brown, and Allison 2013). The spread of conflicting information itself may be problematic by making future communications about nutrition and health difficult to accomplish (Clark, Nagler, and Niederdeppe 2019). Here, too, nutrition is not unique (Selvaraj, Borkar, and Prasad 2014;Haber et al. 2018). Fortunately, a growing number of investigators in nutrition and related fields are not only motivated, but also well-positioned to bring about each of these much-needed reforms.

Discussion
Should such widespread reforms finally begin to take hold, there may be winners and some "losers" in the short term. For instance, academicians making a switch from inexpensive, easily implemented observational study methods to a mix of stronger, more intensive methods may find themselves with fewer opportunities to publish and fewer awarded grants, particularly during the transition. But, over the long term, the field of nutritional epidemiology would have much to gain. Bringing nutritional epidemiology into the realm of more rigorous science would boost the discipline's credibility. It could also help to further strengthen the field by attracting top, new talent. And employing stronger study designs-including a mix of more accurate measures and analyses and more transparent reporting overall-would add value to science as a whole. Armed with more trustworthy results, health care providers and policymakers potentially could make a real and more lasting impact on public health.
Accomplishing such sweeping change will take dedication, time, and patience, but seeking allies with shared interests could help to facilitate nutritional epidemiology's transition. Investigators could take a multi-disciplinary approach, with nutritional epidemiologists leveraging the expertise of engineers, computational analysts, geneticists, and other outside investigators. Such multi-disciplinary collaboration could engender study designs that are not only more rigorous but also more creative.
Academic journal editors, editorial boards, and peer reviewers can also help drive essential changes to nutritional epidemiological investigation. They can choose to elevate the visibility of more complex studies that make clear contributions to science. They can also enact journal-wide policies requiring study reproducibility, design preregistration, the availability of data repositories, rigorous methods, and more. Both seasoned investigators and those who are new to the field will be incentivized to apply greater scientific rigor to their research efforts. Meanwhile, academicians failing to adapt to journal changes in policy risk being left out of the literature.
By signaling that they, too, demand more scientific rigor, grant-funding agencies could act as drivers of change. Agencies could request specific project types that incorporate the stronger designs, measurements, analyses, and reporting recommendations outlined herein.
Taking on so much change is never easy. But neither is this particular field of study. As author Stuart Ritchie notes: Rather like psychology, nutritional epidemiology is hard. An incredibly complex physiological and mental machinery is involved in the way we process food and decide what to eat; observational data are subject to enormous noise and the vagaries of human memory; randomised trials can be tripped up by the complexities of their own administration… . Perhaps the very scientific questions that the public wants to have answered the most-what to eat, how to educate children… and so onare the ones where the science is the murkiest, most difficult, and most self-contradictory. All the more reason that scientists need to take more seriously the task of sensibly communicating their findings to the public. (Ritchie 2020) And all the more reason to give more than lip service to the need for reform. It is time to act-to introduce, teach, promote, and normalize stronger methods of study-and time to elevate nutritional epidemiology to the highest standard.