Bayesian benefits with JASP

Abstract We illustrate the Bayesian approach to data analysis using the newly developed statistical software program JASP. With JASP, researchers are able to take advantage of the benefits that the Bayesian framework has to offer in terms of parameter estimation and hypothesis testing. The Bayesian advantages are discussed using real data on the relation between Quality of Life and Executive Functioning in children with Autism Spectrum Disorder.

However, one major obstacle prevents pragmatic researchers from taking full advantage of the possibilities that the Bayesian framework has to offer: for most researchers it takes a prohibitive investment of effort, time and patience to derive and programme even the most basic Bayesian analysis. Thus, an important impediment to the widespread use of the Bayesian approach to data analysis is the lack of easy-to-use software that support Bayesian methods for common statistical tests.
To overcome this obstacle, we have recently developed the free and opensource statistical software program JASP (JASP Team, 2016;jasp-stats.org). JASP features both classical and Bayesian implementations of the most popular tests in psychological research, that is, t-tests, ANOVAs, correlation tests, linear regression and tests for contingency tables. Importantly, JASP is intuitive and comes with a simple and attractive graphical user interface shown in Figure 1. JASP includes the ability to annotate analyses, to share data and analyses on the Open Science Framework (osf.io), to copy-paste APA-formatted tables into popular text editors such as Microsoft Word, and to generate informative and publication-ready figures.
In this article, we use a real-data example of a correlation analysis to showcase JASP and some of the advantages that a Bayesian analysis has to offer. Specifically, we will demonstrate how to use JASP for Bayesian parameter estimation, Bayesian hypothesis testing and Bayesian sequential analyses. An annotated .jasp-file is available at the Open Science Framework (osf.io/m2quv), which can be viewed without having JASP installed.

Example: Quality of Life and Executive Functioning for children with ASD
Throughout this article we use data from de Vries and Geurts (2015) to illustrate several Bayesian analyses implemented in JASP. De Vries and Geurts studied the relation between Quality of Life (QoL) and Executive Functioning (EF) in children with and without Autism Spectrum Disorder (ASD). We focus on a subset of their data and analyze the correlation between QoL and EF in n = 119 children with ASD. Both QoL and EF were assessed using standardized questionnaires (see Bastiaansen, Koot, Bongers, Varni, & Verhulst, 2004;de Vries & Geurts, 2015;Gioia, Isquith, Guy, & Kenworthy, 2000;Smidts & Huizinga, 2009;Varni, Seid, & Kurtin, 2001, for further details). Figure 2 shows a scatterplot of the QoL and EF scores for the children with ASD. The observed Pearson correlation coefficient is r = .451, and a classical analysis reveals that the correlation is significant (p < .001), suggesting that the null hypothesis H 0 : ρ = 0 can be rejected. Below we showcase three different Bayesian analyses from JASP that arguably provide a more complete statistical assessment.

Analysis I: Bayesian parameter estimation
In our first analysis we wish to use the observed data to update our knowledge about the latent correlation ρ. The possibility that ρ = 0 is not of special interest -our goal is to estimate the size of the correlation, not to test whether the correlation is present or absent. In order that the data may be used to update our knowledge about ρ we need to specify our knowledge about ρ before any data are observed. Bayesians do this by specifying a prior distribution which expresses advance knowledge, uncertainty, belief, or the relative plausibility of the possible values for ρ. Here we specify a prior distribution which asserts that every value of ρ between −1 and +1 is equally plausible a priori (Jeffreys, 1961). This uniform specification is the default option in JASP.
The prior distribution is updated using the information in the data to yield a posterior distribution. The posterior distribution expresses our uncertainty about the unknown ρ after having seen the data. Figure 3 shows the prior distribution and the posterior distribution for ρ obtained after updating using the example data from the ASD children. Figure 3 reveals that the posterior distribution for ρ is more peaked than the prior distribution, which indicates that we have learned about ρ from the observed data.
When we compare the prior distribution to the posterior distribution, we see that the prior distribution assigns considerable plausibility to values lower than .20 and values higher than .70, 75% to be exact, whereas the posterior distribution assigns only little plausibility to these values. Furthermore, the central 95% credible interval ranges from .29 to about .58, which implies that we can be 95% confident that the true value of ρ lies between .29 and .58. Note that such an intuitive statement cannot be obtained within the classical statistical framework (e.g. Berger & Wolpert, 1988;Pratt, 1961). Finally, it should be stressed that Figure 3 is obtained in JASP by simply dragging and dropping the relevant variables in the graphical user interface. No programming or mathematical derivation is required. 1 A characteristic advantage that the Bayesian approach has to offer is the ability to incorporate prior information. For instance, previous results and theory described by de Vries and Geurts (2015) anticipated a positive correlation between QoL and EF for children with ASD, suggesting the one-sided hypothesis H + : ρ > 0. This information can be incorporated by updating the prior distribution through the order constraint ρ ≥ 0, such that we assume that each value of ρ between 0 and 1 is equally plausible a priori (Hoijtink, 2011;Hoijtink et al., 2008;Klugkist, Laudy, & Hoijtink, 2005). The addition of this order restriction requires setting a single tick mark in the JASP input panel.
The results based on this new prior distribution are shown in Figure 4. Since the prior distribution needs to integrate to unity (i.e. the area under the curve equals 1), we observe that the prior density in Figure 4 is twice as high as the prior density in Figure 3. A comparison of the posterior distributions in Figures 3 and 4 reveals that the order restriction did not alter the posterior distribution in a meaningful way. The reason for this robustness is that most of the posterior mass was already consistent with the restriction. 1 for an indication of the mathematics that have been derived 'behind the scenes' see http://arxiv.org/ abs/1510.01188.

Analysis II: Bayesian hypothesis testing
As mentioned earlier, the Bayesian parameter estimation approach presupposes that the correlation is relevant, that is, that ρ ≠ 0, but we have not seriously considered the situation that the correlation is irrelevant, that is, that ρ = 0.  If the goal is hypothesis testing, we need to assess the predictive adequacy of the null hypothesis H 0 : ρ = 0 that stipulates the correlation to be absent, and contrast this against the predictive adequacy of an alternative hypothesis H 1 that stipulates the correlation to exist . Without explicitly taking H 0 into consideration, it is not possible to make meaningful statements about the presence or absence of an effect (e.g. Wrinch & Jeffreys, 1921).
In order to compare the predictive adequacy of H 0 against H 1 , the hypotheses have to be translated to statistical models, which requires the specification of prior distributions for ρ under each of the hypotheses. Under H 0 , the prior on ρ is a point mass at zero (but see Morey & Rouder, 2011 for an alternative specification); under H 1 , ρ is assigned a prior distribution that relaxes the restrictionto-zero. Here we employ the default uniform prior distributions that we also used to estimate the parameters in Figures 3 and 4.
With the models relating to H 0 and H 1 fully specified, we may compare their relative predictive performance using the Bayes factor (Jeffreys, 1961): The subscripts '10' in BF 10 indicate that the model associated with H 1 is in the numerator and that the model associated with H 0 is in the denominator. That is, BF 10 = 1/BF 01 . Similarly, we use BF +0 to express the comparison of H + to H 0 . When the Bayes factor BF 10 equals 20, the data are 20 times more likely under H 1 than under H 0 . Similarly, when the Bayes factor BF 10 is equal to .05, the data are 20 times more likely under H 0 than under H 1 .
The Bayes factors for our example are shown in the top part of Figures 3 and 4, and reveal overwhelming evidence against H 0 . Specifically, the Bayes factor BF 10 shown in Figure 3 indicates that the observed data are over 50,000 times more likely under H 1 (i.e. the unconstrained hypothesis that the correlation is either positive or negative) than under H 0 . Similarly, the Bayes factor BF +0 (i.e. the constrained hypothesis that the correlation is positive-only) shown in Figure 4 indicates that the observed data are over 100,000 times more likely under H + than under H 0 . Both Bayes factors indicate that the observed data provide overwhelming support for the existence of a positive correlation between QoL and EF for children with ASD. Note that in classical statistics the nature of the support in favour of the alternative hypothesis is much less direct: the p-value is based on what can be expected if the null hypothesis were true, and ignores what can be expected if the alternative hypothesis were true (e.g. Berkson, 1938).
Given that the posterior distributions shown in Figures 3 and 4 are located away from ρ = 0 it should not come as a surprise that the evidence against H 0 is overwhelming. However, it may come as a surprise that the evidence is almost . twice as strong in favour of H + than it is in favour of H 1 ; after all, the posterior distributions for ρ are virtually identical under both hypotheses. Both regularities can be clarified using the Savage-Dickey density ratio method, a useful shortcut that allows the Bayes factor to be easily computed and visualized (Dickey & Lientz, 1970;Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010). Assume we wish to compare a point hypothesis H 0 (e.g. ρ = 0) to an encompassing alternative H 1 (e.g. ρ ~ Uniform (−1, 1)). Then the Savage-Dickey identity holds that the Bayes factor BF 10 can be obtained from a consideration of the prior and posterior distribution under H 1 . Specifically, the Bayes factor BF 10 is given by the ratio of the height of the posterior distribution for ρ evaluated at ρ = 0 against the height of the prior distribution for ρ evaluated at ρ = 0. These points are shown as the gray dots in Figures 3 and 4.
The intuitive result that posteriors away from zero correspond to overwhelming evidence against H 0 follows from the Savage-Dickey identity -when the posterior is away from zero, the height of the prior at ρ = 0 will be much higher than the height of the posterior at ρ = 0.
The counter-intuitive result that the evidence is almost twice as strong in favour of H + than it is in favour of H 1 also follows from the Savage-Dickey identity -whereas the posterior distributions of ρ under H 1 and H + hardly differ, the prior density at ρ = 0 for H + is twice as high as the prior density at ρ = 0 for H 1 . Conceptually, the restricted hypothesis makes more specific and daring predictions than H 1 , which hedges its bets and distributes its predictions more widely. When the data are consistent with the restriction, the more daring hypothesis should be rewarded, receiving a bonus for parsimony (e.g. Jefferys & Berger, 1992;Klugkist, van Wesel, & Bullens, 2011;Lee & Wagenmakers, 2013, Chapter 7;Myung & Pitt, 1997;Vanpaemel, 2010). The prior distribution directly affects the model predictions (i.e. model parsimony), and consequently the prior distribution also directly affects the Bayes factor. Therefore, Bayes factor hypothesis testing requires that prior distributions are selected with special care. Fortunately, much development in 'objective Bayesian statistics' has concerned the construction of prior distributions that obey a series of desiderata, such that these priors are suitable for a reference-style analysis that may be refined when additional information is available (e.g. Bayarri, Berger, Forte, & García-Donato, 2012; for a discussion see .

Analysis III: Bayesian sequential analysis
Another practical advantage of the Bayesian approach to data analysis is that researchers are free to monitor the evidence as the data accumulate, for example in the form of a Bayes factor or a posterior distribution (e.g. Berger & Berry, 1988;Edwards, Lindman, & Savage, 1963;Kadane, Schervish, & Seidenfeld, 1996;Rouder, 2014;Schönbrodt et al., in press;Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012;. As summarized by Anscombe (1963, p. 381), So long as all observations are fairly reported, the sequential stopping rule that may or may not have been followed is irrelevant. The experimenter should feel entirely uninhibited about continuing or discontinuing his trial, changing his mind about the stopping rule in the middle, etc., because the interpretation of the observations will be based on what was observed, and not on what might have been observed but wasn't.
In other words, a researcher using Bayesian statistics may terminate data collection when the evidence is overwhelming and further data collection would be a waste of time, energy and money. Similarly, that researcher may decide to collect additional observations in case the interim results are not sufficiently compelling. This flexibility in data collection releases researchers from the straitjacket imposed by classical statistics (e.g. Armitage, McPherson, & Rowe, 1969;Feller, 1940Feller, , 1970 and makes for experimentation that is both efficient and ethical. In JASP, a graph of the evidential trajectory can be obtained by setting a single tick mark. Figure 5 illustrates the evidential trajectory in favour of H 1 over H 0 and shows that the evidence (shown on the y-axis) increases with the number of data points (shown on the x-axis). From Figure 5 we observe that after about 60 observations the Bayes factor equals 10 and after about 90 observations the Bayes factor equals 1000. Note that initially, for small values of n, the Bayes factor indicates modest evidence for H 0 ; in other words, when little information is available the Bayes factor prefers the more parsimonious model, as is desirable. Figure 5. Jasp graphical output for the sequential analysis that displays the flow of evidence for H 1 : ρ ~ uniform (−1, 1) vs. H 0 : ρ = 0 as the data accumulate.
note: the sequence of data points n is shown on the x-axis and the associated Bayes factor is shown on the y-axis.

Concluding comments
Using data on QoL and EF in ASD children we showed how JASP can be exploited to obtain a series of informative Bayesian results. Specifically, we indicated how researchers can adopt the Bayesian framework to estimate an unknown correlation, to test for the presence of the correlation, and to monitor the evidential flow as the data accumulate. Whether viewed as an alternative or as a wholesale replacement of classical inference procedures, the Bayesian approach provides distinctive benefits and encourages a flexible and intelligent approach to data analysis. Armed with JASP, these Bayesian benefits are only a mouse click away.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by the European Research Council [grant number 283876].