Self-administered online test of memory functions

Abstract Online cognitive tests have gained popularity in recent years, but their utility needs evaluation. We reviewed the available information on the reliability and validity measures of tests that were designed to be performed online without supervision. We then compared a newly developed web-based and self-administered memory test to traditional neuropsychological tests. We also studied if familiarity with computers affects the willingness to take the test or the test performance. Five hundred thirty-one healthy individuals, who have a history of a perinatal risk and who have been followed up since birth for the potential long-term consequences, participated in a traditional comprehensive neuropsychological assessment at the age of 40. Of them, 234 also completed an online memory test developed for follow-up. The online assessment and traditional neuropsychological tests correlated moderately (total r = .50, p < .001; subtests r = .21−.45). The mean sum scores did not differ between presentation methods (online or traditional) and there was no interaction between presentation method and sex or education. The experience in using computers did not affect the performance, but subjects who used computers often were more likely to take part in the voluntary online test. Our self-administered online test is promising for monitoring memory performance in the follow-up of subjects who have no major cognitive impairments.


Introduction
Neuropsychological tests using computers and hand-held devices are gaining popularity, especially in the field of cognitive decline and dementia (Shah et al., 2011). Traditional pen-andpaper tests still dominate clinical practice as well as research, even though they are labor-intensive and may not detect theoretically important new findings in neuroscience (Bilder & Reise, comparison to pen-and-paper tests is cost efficiency and better accessibility for individuals who may have difficulty traveling to laboratory or healthcare settings due to geographical distances (Casaletto & Heaton, 2017). Computerized tests, both on-site and online, may provide more standardized presentation methods and results with automated scoring (Iverson et al., 2009), and the measurement of reaction times is more accurate (Moore et al., 2016). Computerized bedside and outpatient online tests may be flexibly linked to patient and research databases (Bilder & Reise, 2019). There can also be situations, such as recent pandemic restrictions, where remote online testing becomes a necessity.
Several validated tests for screening and evaluation of circumscribed cognitive problems are available, e.g., Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) (Iverson et al., 2003;Mielke et al., 2015), Automated Neuropsychological Assessment Metrics (ANAM) (Levinson et al., 2005), and CNS Vital Signs (Gualtieri & Johnson, 2006). These tests have been successfully used for screening and follow-up of effects of traumatic brain injury, multiple sclerosis, human immunodeficiency virus, or psychiatric disorders (Arrieux et al., 2017;Biagianti et al., 2019;Kamminga et al., 2017;Lapshin et al., 2012;Levy et al., 2014). Commercial tablet-or computer versions of many of the common neuropsychological tests also exist, and some authors have suggested that we are very close to the point at which administration of all subtests of the Wechsler Adult Intelligence Scale (WAIS) could be performed by a computer (Vrana & Vrana, 2017).
Most of the existing computerized cognitive test instruments are designed to be used in healthcare and research requiring trained personnel resources (Alsalaheen et al., 2016;Farnsworth et al., 2017), but online tests marketed for self-administered screening especially of the aging population have also emerged (Hansen et al., 2015(Hansen et al., , 2016Kluger et al., 2009;Trustram Eve & de Jager, 2014). Some commercial internet sites for cognitive self-evaluation, e.g., The NeuroCognitive Performance Test (NCPT) (www.lumosity.com) and TestMyBrain (www.testmybrain.org) have shown reliability in initial evaluations (Germine et al., 2019;Morrison et al., 2015). The newest addition is the Great British Intelligence Test that, although not originally designed for the purpose, has been used in monitoring the consequences of the COVID-19 pandemic (Hampshire et al., 2021).
Computerized tests and especially the online tests of cognition must have comparable reliability and validity to a gold-standard method to be suited for clinical work. Reviews and meta-analyses mainly focus on the suitability of computerized tests in the follow-up of deviations from baseline measurements, especially in sports-related brain injuries and progressive diseases (Clionsky & Clionsky, 2014;Farnsworth et al., 2017;Maerlender et al., 2010;Zygouris & Tsolaki, 2015). The results have been contradictory. For example, in a meta-analysis of sports-related injuries (Farnsworth et al., 2017) the reliability was only moderate in 53% in the computerized tests. In a review of 17 off-line and online tests (Zygouris & Tsolaki, 2015) most of the reviewed tests were sufficiently valid and reliable for differentiating normal from abnormal. However, the psychometric properties of the reviewed tests were documented only partly making comparisons difficult. Tasks of the computerized tests that evaluate memory and executive functions are prone to low repeatability (Hansen et al., 2016;Resch et al., 2018;Rijnen et al., 2018), but a learning effect has been observed also in tasks that measure processing speed and reaction time, especially in the first repeated measurement (Fredrickson et al., 2010;Jongstra et al., 2017;Rijnen et al., 2018). Table 1 summarizes the reliability and validity measures of tests that were designed to be performed online without supervision. Two of these tests were designed for screening or short-term follow-up of the aged (Assmann et al., 2016;Jongstra et al., 2017), and several for use in a wider range of ages (Biagianti et al., 2019;Morrison et al., 2015). One test is designed for use in long-term follow-up (Ruano et al., 2016). Correlations in the validity and reliability studies with standard neuropsychological tests have produced coefficients ranging from r ¼ 0.11 to r ¼ 0.82 (Mielke et al., 2015;Wallace et al., 2017;Zakzanis & Azarbehi, 2014). The repeatability has mostly been acceptable with intra-class correlation (ICC) within 0.29 À 0.89, and Cronbach's alpha within 0.73 À 0.93 (Feenstra et al., 2018;Hansen et al., 2016). The tests have been found to differentiate patients with progressive memory impairment from the healthy (Jacova et al., 2015;Mackin et al., 2018;Mielke et al., 2015;Morrison et al., 2015). Several hurdles must be overcome to ensure competent and valid application of online assessment such as quality of the internet connectivity, examinee difficulties with comprehension of instructions, inconsistent effort, or distractions during the test (Casaletto & Heaton, 2017). There has been some research on the effect of familiarity with computer use on test performance, but the results are mixed. Among healthy people, or adults with somatic diseases such as cancer, those who had used computers regularly performed better in tests of psychomotor speed, visual search, complex attentional regulation, and cognitive flexibility. They also had faster reaction times and more efficient keyboard use than those who only used the computer occasionally (Iverson et al., 2009;Zakzanis & Azarbehi, 2014). In contrast, studies of the elderly found that computer experience was not directly related to computer-assisted or online test performance when age and education level were controlled (Feenstra et al., 2018;Hansen et al., 2016;Rentz et al., 2015).
We developed a self-administered online memory test for research use in a cohort of Finnish-speaking middle-aged individuals, who have a history of a perinatal risk and who have been followed-up from birth for potential long-term consequences. In order to study convergent validity, we compared the test with standard neuropsychological tests similarly in Finnish language. We also assessed to which degree demographic variables and reported use of information technology affected participation or performance in the online test.

Subjects
The subjects are part of the Perinatal Adverse Events and Special Trends in Cognitive Trajectory (PLASTICITY) study, a prospective follow-up cohort of healthy adults born in a single maternity unit in 1971 À 1974 (Hokkanen et al., 2013;Launes et al., 2014;Michelsson et al., 1978). This cohort of initially 1196 newborns with predefined perinatal risks which typically caused no marked disability is described in detail elsewhere (Launes et al., 2014). Inclusion required at least one of the following criteria: hyperbilirubinaemia (bilirubin ! 340 mmol/L or transfusion), Apgar score < 7 at 5 or 15 min, birthweight < 2000 g, maternal diabetes, marked neurological symptoms (e.g., rigidity or apnoea), hypoglycaemia (blood glucose 1.67 for full-term and 1.21 for preterm infants), mechanical ventilation due to poor oxygenation, or severe infection. From 5 years onwards 845 cohort subjects participated in follow-up and a control group, without perinatal risks (n ¼ 199) has also been followed from childhood. The subjects have been followed in order to study the long-term consequences of the perinatal risks (Immonen et al., 2020;Michelsson et al., 1984;Michelsson & Lindahl, 1993;Schiavone et al., 2019). Individuals with severe disabilities, e.g., cerebral palsy, brain malformations, sensory deficits, and intellectual disabilities, were excluded from further follow-up by the age of five years.
Around the age of 40 a total of 531 subjects participated in cognitive testing and a detailed neurological (JL) and neuroradiological evaluation. They were all community-dwelling adults with normal work history and an education level corresponding to the general population of Finland (Official Statistics of Finland (OSF), 2020). Of them, 234 subjects completed both the online test and a comprehensive neuropsychological test battery, 273 completed the traditional neuropsychological test only and 24 completed the online test only. The subjects also filled in an extensive questionnaire concerning somatic and mental health, cognition, substance use, occupation, leisure activities, social background, education, social media activity, and information technology skills . The age of the subjects was 42.1 ± 1.3 years. Table 2 gives a description of the whole sample and the subgroups.
The Ethical Review Board of the Hospital District of Helsinki and Uusimaa has approved the project (journal number: 147/13/03/00/13). Written informed consent was obtained from participants. No sensitive identifying information was stored in the online server during the test.

Online test
The online test (OLT) was designed to be completed at home, without supervision, using written instructions given on the screen. The test used the SoSci Survey platform (www. soscisurvey.de), which is scalable from smartphones to desktop computers. Desktop, laptop,  (8) 19 (8) 16 (6)  7 (29) Missing 62 (12) 11 (5) 47 (17) 4 (17) tablet computers, and smartphones were allowed. Subjects were instructed to take the test without interruption in a quiet environment, but the conditions were not monitored. SoSci Survey platform recorded a log on the progression times. The online test consisted of four tasks focusing mainly on verbal memory. A twelve-item word list learning task with (1) immediate and (2) delayed recognition, (3) a story recognition task, and (4) a visuospatial mental rotation and working memory task. The test required no motor activity, except the use of a mouse or a keyboard. Completing the four tasks took approximately 15 minutes (ranges 9 À 23 minutes). Figure 1 presents the OLT flow. Self-administered online test of memory functions Words of the word recognition task were unrelated two-syllable nouns from the 1000 most common words in the Finnish corpus (Saukkonen, 1979). Words were shown one at a time for two seconds, after which they were automatically replaced by the next one. The twelve words were presented one at a time three times in the same order. Immediate recognition was measured by selecting the correct word from a list of four words after each trial. Recognition order and the order of the alternatives were randomized. After completion, the subject was informed that delayed recall will be measured later. Delayed recognition was tested as the last task of the OLT, and the delay ranged from 7 to 22 minutes depending on the subject's overall speed. See Appendix 1 for the list of words and their alternatives.
The story recognition test consisted of five short paragraphs (32 À 45 words), displayed on the screen, 189 words in total (see Appendix 2). No alternative forms were used; all subjects read the same story. The story was about shipwrecks in the Baltic Sea and the subjects were informed that parts of the story were factual, while others were imaginary. The Baltic Sea was chosen as the subject topic because at the time, pollution of the Baltic Sea was widely discussed in the media. Also, a well-preserved wreck of the brig ship Vrouw Maria ("Vrouw Maria," 2018) was recently discovered near the Finnish coast, equally prominently reported in the media. We aimed to reduce the effect of good general knowledge by selecting current themes.
Following the story, the mental rotation (stickmen) task was presented. After this delay, twenty multiple-choice questions were presented about the story. The subject was asked to select one correct statement out of three alternatives, one of which in all cases was "information not given in the story".
The visuospatial stickmen task was a modification of the Manikin Test by Ratcliff (Ratcliff, 1979). There were two presentation types in the task. First, 10 pairs of stick figures were displayed on the screen, one pair at a time. The task was to assess whether the gray ball is held in the same hand by both figures and the view changed automatically after the response. The subject was told that this was a speed task and should be done as quickly as possible. An example of a stick figure is displayed in Figure 2. In presentation type two, the subject was informed that the test will now require remembering the figures and was shown 10 new pairs 1 sec at a time, each followed by a separate response screen for 5 sec. The scores of the two presentation types were added (maximum of 20 points).

Neuropsychological evaluation
The traditional, face-to-face neuropsychological assessment (NPS) involved a comprehensive battery of tests and lasted for approximately three hours. For the present study, we analyzed the tests best corresponding to the online tests. The OLT wordlist and story recognition tasks were compared with the Wechsler Memory Scale 3 rd version (Wechsler, 1997) Word list learning immediate and delayed conditions and Logical memory (story A) immediate recall. The visuospatial stickmen test was compared with the immediate recall of the Rey-Osterrieth Complex Figure Test (ROCFT) (Osterrieth, 1944;Rey, 1941). The full-scale intelligence quotient (FSIQ) of the Wechsler Adult Intelligence Scale, version IV (Wechsler, 2012) (WAIS-IV) was used as a measure of general cognitive ability.

Familiarity of computers and information technology
The frequency of social media use, including a social media addiction questionnaire, and the proficiency of using information technology was evaluated using a questionnaire . We used three items for the subject's frequency in computer use: social media use, computer gaming, and computer use including programming as a hobby. Items were presented on a five-step Likert scale (1-5, from "not at all" to "every day") and the three items were combined for a total sum score with a maximum of 15 points. The frequency of computer use was re-categorized as low (0-3 points), medium (4-6 points), and high (7 points and above).
The educational level of the participants was also collected using the questionnaire. The answers were categorized into 4 levels: 9 years or below (corresponds to the obligatory education), 9.5-11 years (corresponds to the vocational school or part of college), 12-15 years (corresponds to college and some years of higher education), and 16 or more years (corresponds to completed university or other higher education degrees).

Statistical methods
For comparison, all cognitive subtest scores were standardized into z-scores and a sum score was created as the sum of the z-scores of the four subtests. Correlations were estimated by calculating Pearson correlation coefficients. For interpretation, we use the effect size conventions: a coefficient of .10 represents a weak or small association, a coefficient of .30 a moderate, and a coefficient of .50 or above a strong or large correlation (Cohen, 1988). Sum score differences between presentation methods (NPS or OLT) and the interaction effects of sex and education were evaluated using a general linear model for repeated testing. Ceiling and floor effects were evaluated by comparing the means and standard deviations to the maximum and minimum values, respectively. For classification of the relative performance level in NPS and OLT, three groups corresponding low, medium, and high performance were created using the cut-off points of the 25 th and 75 th percentile of the participants' performance. As the distribution of information technology experience measures was skewed, we used three categories formed from the total score and

Results
Raw score means in the subgroup that participated in both test presentation methods (n ¼ 234) are given in Table 3. The online tests correlated significantly with corresponding traditional neuropsychological tests (r ¼ .21À.45 depending on the subtest), see Table 4. The strongest correlations were found between the OLT word list recognition and the NPS WMS-III Word list, both immediately (r ¼ .45, p < .001) and delayed (r ¼ .41, p < .001).
The NPS sum score correlated with the OLT sum score (r ¼ .50, p < .001). In the OLT word list task-delayed recognition, the mean was within one standard deviation of the maximum, indicating a ceiling effect. No other indications of floor or ceiling effects were found. The correlation between the OLT sum score and the FSIQ (r ¼ .45, p < .001) was similar to the correlation between the NPS sum score and FSIQ (r ¼ .49, p < .001), see Figure 3. The presentation method (OLT or NPS) did not affect the cognitive performance. There was no  When categorized for individual performance, subjects with low performance in NPS tests also tended to show low performance in the OLT and high performers likely showed high scores in both tests (v 2 (4) ¼ 48.62, p ¼ 0.001), see Table 5. Two subjects had high performance in the OLT but low performance in the NPS tests. They had an average intelligence (FSIQ 108 and 98) and neither reported high experience in computer use in the questionnaire (2/12 and 5/12 points). Seven subjects had high performance in NPS, but low performance in OLT. They had average or high FSIQ (mean 111, ranges 91 À 129). In all of these seven, fluctuation of performance was observed within the OLT subtests and all but one spent time on test falling below the mean. These subjects had low experience in computer use (0 À 4/12 points). Seven of the nine subjects with inconsistent performance had a mood or anxiety disorder.
When the three aspects of computer use were analyzed separately, social media use (H (2) ¼ 7.76, p ¼ 0.021) and computers as a hobby (H (2) ¼ 16.92, p ¼ 0.000) influenced participation in the OLT. Frequent gaming had no association to OLT participation (H (2) ¼ 4.30, p ¼ 0.117. Computer use frequency had no association the performance in NPS, F (2,446) ¼ 0.020, p ¼ .980, or OLT, F (2,240) ¼ 1.370, p ¼ 0.256. There were no differences between the groups also in models where sex and education were controlled for.

Discussion
The performance in the online test correlated moderately with the performance in the corresponding standard neuropsychological tests, suggesting convergent validity for the online and traditional memory testing. The online test also correlated with the Full-Scale Intelligence Quotient. The correlations between all subtests of the online test and the corresponding traditional tests were statistically significant. The mean sum scores did not differ between presentation methods and there was no interaction between presentation methods and sex or education. The familiarity of using computers did not affect the performance, but subjects who used computers often were more likely to take part in the voluntary online test.
Overall, performance in the online test corresponded to the performance in traditional neuropsychological tests and was not influenced by gender or education any more than the standard neuropsychological tests are. The correlations were only moderate but in consonance with previous validity studies of online tests, 0.30 À 0.75 (Trustram Eve & de Jager, 2014), 0.49 À 0.63 (Hansen et al., 2015), 0.40 À 0 .70 (Jacova et al., 2015), and 0.17 À 0.51 (Rentz et al., 2015). Stronger correlations have been found in online tests that resembled the traditional neuropsychological tests most (Kluger et al., 2009;Morrison et al., 2015;Wallace et al., 2017). The correlation between both OLT and NPS tasks and the FSIQ suggest that memory also associates with the general cognitive ability, potentially through the association between working memory and intelligence or the general g-factor (Colom et al., 2004;S€ uß et al., 2002).
The online word list learning task had the highest correlation coefficient with the corresponding Wechsler WMS-III item, and also moderate correlations with most other neuropsychological tasks. The only non-significant correlations for the OLT word list learning and delayed recognition tasks emerged with the NPS Rey visual memory task, which suggests an, at least partially specific, verbal memory element. Previous studies have also found that the computerized word list learning tasks agree best with standard neuropsychological tests (Feenstra et al., 2018;Hansen et al., 2016;Morrison et al., 2018). This suggests that wordlearning tasks are useful when presented online. In our study, the online 12-word learning task had a ceiling effect especially in the delayed recognition, suggesting that to enhance sensitivity the online task should have been more demanding. Other studies have also found ceiling effects in the online word list recognition tasks (Hansen et al., 2016) (Feenstra et al., 2018;Trustram Eve & de Jager, 2014). In a longitudinal follow-up of the gradually ageing population, memory performance tends to decrease rather than increase, however (R€ onnlund et al., 2005;Salthouse, 2019;Schaie, 2005), and despite the potential learning effects in repeated testing, the performance that was at ceiling initially may not be so 10 or 20 years later. No floor effects were found. Increasing the length of the list would be an option, but more difficult and complex online tests might not optimally solve this problem, as subjects may be tempted to discontinue difficult tasks. Other solutions include presenting the words in an auditory manner. Word recall, in turn, could be accomplished online by writing or using a speech recognition program instead of recognition to increase difficulty (Feenstra et al., 2018;Morrison et al., 2015). However, more technically advanced methods of presentation and input may demand costly, complex, and difficult-to-operate equipment, which can be an obstacle for self-administration of tests. The online story recognition task correlated significantly with the WMS-III Logical memory task, but had an even higher correlation coefficient with the WMS-III Word list learning task. As its correlation coefficient was the lowest with the NPS Rey immediate recall, the task appears to measure verbal rather than non-verbal memory. The relatively weak correlation between the two story recall tasks may be caused by the different presentation (reading versus auditory) and recall (recognition with a slight delay versus spontaneous immediate recall) modes used in the online test. Reading time was not controlled in the online test, and it cannot be excluded that the subjects repeated reading the paragraph before moving forward with the screens. A time limit would increase the similarity to the standard test. However, as subjects progressed at their own pace, the situation resembled more a natural situation where reading material needs to be memorized. Because we anticipated the online test to be easier than the WMS-III due to the way it was presented, we added questions that could not be answered based on the text and also the response option "information not given" in all items. The purpose of this was to complicate the task and to make room for the false memory phenomenon.
Another relatively low correlation between the online test and the standard neuropsychological test was found for the stickmen mental rotation task which correlated only modestly with the Rey-Osterrieth Complex Figure task and not at all with the other tests. The stickmen task has similar cognitive, especially visuo-spatial, elements as the ROCFT, but they are not directly comparable. In the ROCFT, executive functions, visual perception, and figural memory have a greater role than in our stickmen task, which requires more visual mental rotation and working memory. Mental rotation ability has been linked to right hemisphere functions with a strong parietal involvement (Corballis, 1997;Hattemer et al., 2011;Morton & Morris, 1995) but our version added the working memory component that was not included in the original Manikin Test (Ratcliff, 1979) increasing and modifying the task demands. In previous studies of self-administered tests, visual memory has often been measured with tasks more closely resembling traditional neuropsychological tests, which likely explains higher correlations (Jacova et al., 2015;Wallace et al., 2017).
We observed that experience with computers did not affect the performance either in an online or standard paper-and-pencil test with or without sex and education being taken into account. A similar observation was made in previous studies among the elderly (Fazeli et al., 2013;Hansen et al., 2016;Jacova et al., 2015). Among healthy individuals, experience with computer use has been found to affect reaction times and processing speed (Iverson et al., 2009;Lee Meeuw Kjoe et al., 2021) but since our test focused on memory, speed was not directly evaluated. In individual cases, the discrepancy between high performance in standard tests, but low performance in the online tests could perhaps be caused by low computer use. In the previous studies, attitudes toward IT technology have been found to have an effect on task performance even more than the use of technology (Fazeli et al., 2013;Ruano et al., 2016). This was not studied here.
Unsurprisingly, the use of computers seemed to explain whether a subject is willing to participate in an online test. Subjects who use social media or program computers as a hobby were more likely to participate in the voluntary online test. Computer experience and education were associated, and also higher education was linked to higher online participation. The effect of age was not assessed in this study, as all the subjects in the cohort were generally the same age, around 42. The results of our study may therefore not be generalizable to younger or older generations. Our subjects differ from the present time adolescents and young adults, sometimes referred to as "diginatives," as well as from older subjects who mostly have only little computer experience.
Computerized tests for assessing cognitive performance face many challenges that may lead to erroneous or completely invalid results, e.g., with regard to internet connectivity, inconsistent effort, distractions during the test, test standardization, and how instructions are given (Casaletto & Heaton, 2017;Gates & Kochan, 2015). In the present study, nine subjects were found to have inconsistent performance when the overall online and traditional performance levels were compared, and some of the relatively poor online performance may have been due to fluctuating attention. The reason for this could not be investigated because no monitoring of the environment or test-taking conditions was applied. Comparing test performance between supervised and non-supervised conditions has yielded conflicting results. Performance in the non-supervised test has been found to be better (Mielke et al., 2015), worse (Feenstra et al., 2018), or similar (Rentz et al., 2015) compared to the supervised conditions. Distracting factors like noise, interruptions, and fatigue cannot be controlled without observation. The subject's inability to understand instructions may go unnoticed. The use of tests completed independently has been reported to be mostly successful, however, with a low frequency of discontinuations or failures to complete the test due to lack of skill or technical problems (Feenstra et al., 2018;Jongstra et al., 2017;Levy et al., 2014;Rentz et al., 2015;Tierney & Charles, 2014). A questionnaire confirming a successful test would be helpful after the online test.
There are limitations in the study that warrant consideration. No test-retest reliability analyses have been conducted, as we had no repeated measurements with the OLT. For a longitudinal study design, that should be evaluated separately. Also, in order to use the method for diagnostic or other purposes beyond the follow-up of a single longitudinal cohort, normative studies should be done. Our study was aimed at finding comparable traditional tests to compare the OLT with, but for a more detailed discriminant validity analyses a wider variety of traditional test methods could be used. As the OLT tasks were memory oriented, the traditional methods were memory oriented as well. The OLT stickmen task had very low correlations with the other tasks partly suggesting it, especially in its first presentation type, measured something different. Still, we cannot be certain if other non-memory tests would have confirmed memory specificity of the measure. Adding more measures, and then conducting a factor analysis to confirm the structural elements of the test could be useful. Finally, the study population was a well-defined cohort based on consecutive deliveries in a single maternity unit with data collected prospectively since birth. We did not analyze the differences between risk and control subjects because that was not the focus of the present paper.
The speed of development of new hardware, and consequently, the speed at which hardware gets obsolete presents a hard challenge for software developers (Germine et al., 2019;Mielke et al., 2015). A desirable computerized test is generalizable to many ages (Casaletto & Heaton, 2017;Gates & Kochan, 2015). Almost all information technology hardware is currently designed to operate the online graphical user interface in a fairly uniform maner, reducing the distraction caused by hardware. Our online test was designed for a long-term follow-up study to be repeated at five to ten-year intervals. The test is easy to present using several online platforms, and it measures memory and cognition producing comparable results with traditional tests. While test-retest validity needs to be evaluated separately, based on the results so far the method should be useful in monitoring memory performance if applied longitudinally in a healthy middle aged sample with no major cognitive impairments. Computer experience, while being associated with the willingness to participate the online test, did not affect test performance but a wider evaluation of attitudes for information technology should be included in future studies. Also, a questionnaire or other methods to confirm the test taking conditions should be added.

Disclosure statement
No potential conflict of interest was reported by the author(s).
Molemmat laivat upposivat vuonna 1631 Loviisan ja Haminan v€ alill€ a olevan Pyht€ a€ an edustalla k€ aydyss€ a meritaistelussa. Hyv€ a s€ ailyminen johtuu osittain siit€ a, ettei puuta ravinnokseen k€ aytt€ av€ a€ a laivamatoa ole alhaisen suolapitoisuuden vuoksi esiintynyt It€ ameress€ a. My€ osk€ a€ an merkitt€ avi€ a lev€ akerrostumia ei ole hylkyjen p€ a€ all€ a Both ships sank in 1631 in the sea battle off Pyht€ a€ a, between Loviisa and Hamina. The good preservation is partly due to the fact that the shipworm, which feeds on wood, has not been present in the Baltic Sea due to low salinity. There are also no significant layers of algae on the wrecks.
In addition to the lack of shipworms, wrecks in the Baltic Sea also survive because of the lack of strong ocean currents and tidal effects. Even iron-bodied wrecks in the deep waters of the Baltic Sea survive longer than elsewhere, as they corrode slowly due to environmental hypoxia in the low-oxygen water. However, wrecks sunk in shallow water are at risk of breaking up, due to shipping, sea traffic and, especially in the northern Baltic, the freezing of the shores in winter.
The risk of destruction of wooden wrecks in the Baltic Sea increased with the arrival of shipworm in 1993 due to a strong salt pulse. The winter storms pushed an exceptional amount of strongly salinated water of the North Sea through the Danish Straits into the Baltic Sea, and the shipworm became part of the biota at least in the southern Baltic Sea.