Type 2 diabetes and cognitive performance in middle age: a cross-sectional study

ABSTRACT Introduction Type 2 diabetes has been associated with cognitive decrements already in middle-age. However, the sample sizes of the studies have been small and the neuropsychological tests used have been heterogeneous. In addition, only a few studies have matched the groups in terms of age, education and gender. In this cross-sectional matched pairs study, we investigated the cognitive performance of Finnish middle-aged type 2 diabetes patients compared to healthy individuals. Method A neuropsychological test battery consisting of 16 tests and 21 outcome measures was applied to 28 patients and 28 age-, education- and gender-matched healthy individuals. Various exclusion criteria were applied to minimize the risk of cognitive dysfunction due to factors other than diabetes. Results We did not find between-group differences in any of the neuropsychological tests measuring attention, concept formation and reasoning, construction and motor performance, executive functions, memory, processing speed or working memory. In addition, there were no group differences in the frequency or severity of subjective cognitive symptoms, or in anxiety, depression, burnout, fatigue or alcohol use disorder symptoms. The effect sizes in this study were mostly negligible or small, with the mean effect size being −0.12. Conclusions In a carefully matched sample of middle-aged type 2 diabetes patients and healthy individuals, we found no significant effects and no meaningful evidence of cognitive differences between the groups.


Introduction
Type 2 diabetes mellitus (T2DM) is a disease characterized by chronic high blood glucose levels (hyperglycemia) that result from variable degrees of insulin resistance and secretory defect (e.g., Alberti & Zimmet, 1998). In 2021, approximately 537 million people worldwide had diabetes, with T2DM accounting for over 90% of all cases, and the prevalence of the disease is predicted to rise to 643 million by 2030 (Magliano et al., 2021). Globally, diabetes is one of the top 10 causes of death, with over 6.7 million adults (aged 20 to 79) predicted to die from diabetesrelated causes in 2021 (Magliano et al., 2021). Longterm complications include retinopathy, nephropathy, neuropathy and autonomic dysfunction, as well as a risk of cardiovascular, peripheral vascular and cerebrovascular disease (e.g., Alberti & Zimmet, 1998). In older people with diabetes, risk ratios for developing vascular dementia or Alzheimer's disease are approximately 2.3 and 1.6, respectively (Gudala et al., 2013).
In most meta-analyses the mean age of patients has been around 70 years, and only few studies have investigated the presence and nature of cognitive deficits in middle age. To our knowledge, one meta-analysis exists where the neuropsychological performance of middleaged people with T2DM is compared to healthy individuals (Pelimanni & Jehkonen, 2019). Patients performed worse in most of the cognitive domains assessed, with medium effect sizes found for processing speed, attention/concentration, working memory and executive functions; small effect sizes for verbal memory, language and perception/construction; and a negligible and statistically nonsignificant effect size for visual memory. Only 12 original studies on middle-aged patients were identified for the meta-analysis and most of them had small sample sizes ranging from 13 to 50 patients, except for one major study that included 1779 patients (Rawlings et al., 2014). However, that study contributed to only three cognitive domains, with one test used per each domain. Some major cognitive domains in which group differences have been observed in older patients, namely attention and working memory, have been studied in only five original studies each, and these have included a maximum of 50 patients. Many of these studies have not controlled for age, education and gender either by matching or by using statistical adjustments. In addition, few original studies have assessed subjective cognitive or psychosocial symptoms. The small number of studies assessing the cognitive performance of middle-aged T2DM patients using multiple neuropsychological tests, adjusting for confounders and examining subjective cognitive and psychosocial symptoms, serves as a motivation for this study.
In this cross-sectional matched pairs study we examine which cognitive domains, if any, are affected in middle-aged Finnish T2DM patients as compared to age-, education-and gender-matched healthy individuals. We use a neuropsychological test battery, emphasizing the domains where previous studies have found differences between patient and healthy comparison groups. Based on the meta-analysis by Pelimanni and Jehkonen (2019), we hypothesize that middle-aged T2DM patients exhibit poorer cognitive performance than healthy individuals in the following cognitive domains as classified by Lezak et al. (2012): attention, concept formation and reasoning, construction and motor performance, executive functions, memory, processing speed and working memory. Our secondary hypothesis is that the patient group will report more subjective cognitive symptoms than the healthy comparison group. This hypothesis is based on a study by Faiz et al. (2021), in which middle-aged T2DM patients had more subjective memory complaints than healthy individuals, although a smaller study did not find a group difference in subjective cognitive symptoms (Aberle et al., 2008).

Participants
T2DM patients were recruited from City of Tampere Diabetes Outpatient Clinic, during a four-year period from 1 June 2018 to 31 May 2022. We included male and female T2DM patients and healthy individuals aged 35-65. The exclusion criteria for patients were as follows: type 1 diabetes, diabetes diagnosed less than a year ago, no diabetes medication, diabetes medication for less than one year, pregnancy, hypothyroidism, neurological or psychiatric disorder, developmental neuropsychiatric disorder, severe obstructive sleep apnea, severe late complication of diabetes (dialysis treatment or severe visual impairment), severe sugar imbalance resulting in HbA1c ≥100 mmol/mol, and substance abuse. Every patient who met the inclusion criteria upon visiting the clinic during the recruitment period received written and oral information about the study. The healthy comparison group was recruited by a newspaper advertisement published in a local free distribution magazine on 25 September 2021. The exclusion criteria for the healthy comparison group were type 1 or type 2 diabetes, pregnancy, hypothyroidism, neurological or psychiatric disorder, developmental neuropsychiatric disorder, severe obstructive sleep apnea and substance abuse. The exclusion criteria were chosen to minimize the risk of cognitive dysfunction due to dementia or factors other than T2DM. Pregnancy was selected as an exclusion criteria because during pregnancy, blood sugar balance is regulated very precisely and blood sugar values can be even better during pregnancy than at other times. Fatigue and nausea can also occur during pregnancy, which can impair performance in cognitive tests. In addition, follow-up during pregnancy is organized in a different unit than the unit recruiting for the study.
To overcome bias caused by potential confounding variables, we applied one-to-one Mahalanobis distance matching without replacement, using age, education and gender as confounders. This method was chosen because propensity score matching methods would have resulted either in worse balance or an unnecessarily small sample size. A caliper of 2.5 years was used for education and a caliper of 5 years for age. The caliper for education was chosen to ensure that people with no more than compulsory basic education (9 years) could not be matched with people with a secondary level education (12 years) or higher, and to ensure that people with an academic degree could only be matched with each other. The caliper for age was chosen to minimize the age difference between each individual matched pair without leaving an unreasonably small sample. After applying the eligibility criteria and matching, we had a total sample size of 28 matched pairs. Twenty patients and 24 healthy individuals were left without a match and excluded from all analyses. In the matched data, standardized mean differences for the covariates were below 0.1. The average age was approximately 55 years and the average years of education approximately 13 years in both matched groups. Table 1 presents descriptive statistics for the background variables of the matched and unmatched patients and healthy individuals. All participants were of Finnish ethnic origin.

Sampling procedure
All patients and healthy individuals who met the inclusion criteria and who were willing to participate were recruited in this study. A non-probability, purposive, selfselection sampling method was used. The healthy comparison group underwent a medical examination with laboratory tests to rule out diabetes and hypothyroidism. For patients, corresponding medical data were drawn from medical records based on their most recent annual diabetes checkup. Both groups underwent a neuropsychological assessment. The medical examination was conducted at the City of Tampere Diabetes Outpatient Clinic, the laboratory tests were taken at Fimlab Laboratories Oy Ltd and the neuropsychological assessment was carried out at Tampere University. All assessments were conducted between 13 June 2018 and 1 June 2022. Participants were given no compensation or reward for their participation. The research project was approved by the Tampere University Hospital Ethics Committee (ETL: R16126, date: 14.10.2016). The Declaration of Helsinki was followed throughout the study and the participants were recruited voluntarily based on their informed consent.

Power analysis
An a priori power analysis using G*Power software version 3.1.9.6 (Faul et al., 2007) indicated that the required sample size was 51 matched pairs for pairedsamples t-test to achieve 80% power for detecting a medium effect size (d = 0.50 as classified by Cohen [1988]) at a significance criterion of α = .01. Due to financial and time constraints we stopped data collection before this requirement was fulfilled. Post hoc power analyses with G*Power revealed that a power of 46% was achieved (α = .01) in the study.

Diabetes status and medical measures
The primary independent variable in this study was diabetes status (T2DM or no diabetes) as determined by two endocrinologists (E.P & J.L) based on HbA1c levels and medical history. HbA1c levels higher than 48 mmol/mol (6.5%) were considered to indicate the presence of diabetes. The laboratory tests performed were complete blood count, sodium, potassium, creatinine, thyroid-stimulating hormone, thyroxine, glucose, HbA1c and lipids. Patients were asked to measure their blood glucose at the beginning and end of the neuropsychological assessment.

Background variables and confounders
The research team constructed a questionnaire that included closed and open-ended questions regarding education, school success, medical history, cognition, occupation and life situation, family situation, social relationships, exercise, sleep habits, tobacco and substance use, diet and hobbies. Patients were asked additional questions regarding diabetes anamnesis, form of treatment, frequency of blood sugar testing and history of hypoglycemia. The purpose of the questionnaire was to further confirm the eligibility of the participants and to collect background variables relevant to our research questions. Age, education in years and gender were considered confounders because there are well known associations between these variables and cognitive performance.

Questionnaires and neuropsychological outcome measures
All the neuropsychological tests used, their cognitive domains and performance measures are presented in Table 2. The neuropsychological assessment consisted of Beck Anxiety Inventory (A. T. Beck et al., 1988), Beck Depression Inventory-II (A. Beck et al., 1996), Fatigue Impact Scale (Fisk et al., 1994), Bergen Burnout Indicator 15 (Näätänen et al., 2003), Alcohol Use Disorders Identification Test (Babor et al., 1992), the aforementioned research questionnaire and standardized neuropsychological tests that were assigned to cognitive domains and their subdomains according to a widely used classification by Lezak et al. (2012). Three tests, namely d2-R (Brickenkamp et al., 2016), Four Word Short-Term Memory Test (STMT [Butters & Cermack, 1980]) and Vilkki Dual Task (Vilkki et al., 1996), are not explicitly mentioned by Lezak et al. (2012), and they were classified by two specialized  Lezak et al. (2012). STMT is a variant of the Brown-Peterson task (Brown, 1958;Peterson & Peterson, 1959), which is considered a working memory task by Lezak et al. (2012); d2-R is a cancellation test with minimum demands on attentional capacity or divided attention, classifying it as a focused attention measure; and Vilkki Dual Task requires a person simultaneously to focus on two tasks, backward counting and dot cancellation, making it a test of divided attention. To avoid excessive multiple comparisons, the outcome measures were chosen a priori by two authors (T.S & M.J) with the requirement that the score chosen is the primary measure of the domain that the test is measuring.

Measures of subjective cognitive symptoms
All participants were asked whether they had experienced cognitive decrements and to rate their severeness on a scale from 0 to 5, where 0 indicated barely noticeable symptoms and 5 indicated symptoms that severely affect daily functioning. If a participant reported cognitive symptoms, they were asked to describe them in detail. A licensed neuropsychologist (T.S.) grouped the symptoms based on the participants' narratives.

Quality of measurements
The neuropsychological assessment was carried out by a specialized neuropsychologist (T.S), by a psychologist, or by a master's student in psychology under the supervision of a specialized neuropsychologist (T.S). All questionnaires and neuropsychological tests were first scored by the person who did the assessment and then reviewed and coded by two specialized neuropsychologists (T.S & M.J). Medical data were collected and coded by an endocrinologist (E.P).

Data collection
After receiving information sheet and having the possibility to ask questions participants gave informed consent. Thereafter medical examination was performed by endocrinologist (E.P) , laboratory tests were performed and participants filled in questionnaires used in this study. Patients' medical history was reviewed through medical records. If the eligibility criteria were still met based on medical data, participants were contacted to make an appointment for the neuropsychological assessment. They were asked to bring completed questionnaires to the neuropsychological assessment. Participants arrived at the neuropsychological assessment individually and were reminded that it would last two to three hours. Water, juice and snacks were provided throughout the assessment and participants were given the opportunity to take breaks if necessary. The person carrying out the assessment ensured that all the questionnaires had been completed and asked for clarification if needed. Patients were asked to measure their blood glucose levels at the beginning and at the end of the assessment. All participants rated their level of fatigue on a scale from 0 (not at all tired) to 10 (very tired) at the beginning and at the end of the assessment. The neuropsychological tests were presented according to the instructions provided in test manuals or original research articles describing the test, and in the same order for all participants. The order was as follows: ROCF, RAVLT, WAIS-IV Block Design, WAIS-IV Similarities, WAIS-IV Coding, Stroop, phonemic fluency: P-A-S, semantic fluency: animals, STMT, ToL, d2-R, Vilkki Dual Task, TMT A, TMT B, WAIS-IV Digit Span, and CPT 3. Delayed recall tasks were presented after an one-hour delay. All data were stored in a locked cabinet.

Data processing and diagnostics
Participants were excluded from analyses after data collection if they did not complete the neuropsychological assessment or if they could not be matched with another participant. Missing data was handled by replacing each individual missing score with the score of the closest match in the same group, or with the mean score of closest matches if there were multiple exact matches. Values that were over three standard deviations from the group mean were considered outliers. Outliers were not removed since they were minimal and not considered errors by the research group.

Data analysis strategy
Statistical analyses were performed using R statistical software version 4.1.3 within RStudio software version 2022.02.1 (R Core Team, 2022). The "MatchIt" package was used to match participants (Ho et al., 2011). Differences in outcome measures between patient and healthy comparison groups were assessed using pairedsamples t-test. A parametric test was employed because visual examination of Q-Q plots showed evidence of normal or near-normal distribution for all outcome measures. To control for Type 1 error inflation, an alpha level of .01 was used for all statistical tests.

Participants
Patients were recruited from the City of Tampere Diabetes Outpatient Clinic between 1 June 2018, and 31 May 2022. Healthy individuals were recruited by a newspaper advertisement published on 25 September 2021. Out of the 58 patients recruited, seven withdrew their participation and three could no longer be contacted. Out of the 60 healthy individuals recruited, eight did not meet the inclusion criteria due to hypothyroidism or diabetes. We were thus left with 48 patients and 52 healthy individuals out of which 28 matched pairs were formed and included in all analyses.

Missing data
The "Tower of London Test -total move score" was missing for one healthy individual and the "semantic fluency: animals -total raw score" for one healthy individual. Both missing values were due to a scoring error by the person carrying out the assessment. The missing scores were replaced with the corresponding score of the closest match in the healthy comparison group. Men were overrepresented in the group of unmatched patients χ2 (1, N = 48) = 15.45, p = < .001. There were no differences between the groups regarding age and years of education. Unmatched patients performed worse than matched patients in RAVLT Trial 1-5 total t(42.61) = −2.78, p = .008, Hedge's g = −0.81. No between-group differences were found for other tests or subjective symptoms.

Medical variables and questionnaires
For medical background variables, there were statistically significant group differences in body mass index, t(27) = 4.29, p = .000, Hedge's g = 1.12, diastolic blood pressure, t (27)

Cognitive performance
The neuropsychological test results are presented in Table 3. Raw scores were used in all analyses. There were no statistically significant differences between the age-, education-and gender-matched patient and healthy groups in any of the 21 neuropsychological test scores analyzed. The effect sizes ranged from −0.73 to 0.40, with the mean effect size being −0.12 and the median effect size being −0.14. All effect sizes were negligible or small, with the exception of WAIS-IV Digit Span Forward in which a medium negative effect size was obtained.

Discussion
We did not find statistically significant differences between the groups in any of the 21 neuropsychological test scores analyzed. Most of the effect sizes obtained were negligible or small, although mostly in the direction of patients performing more poorly than healthy individuals. Furthermore, there was no between-group difference in the number of individuals reporting subjective cognitive symptoms or in the severity of the symptoms.
Our results are not in line with the meta-analysis in which the middle-aged T2DM patients performed worse than healthy individuals in all cognitive domains except visual memory (Pelimanni & Jehkonen, 2019). However, many of the original articles included in the meta-analysis did not control for age-, gender-and education. The studies that did control for all of these confounders were small, with sample sizes ranging from 13 to 38 patients (Aberle et al., 2008;Biessels et al., 2001;García-Casares et al., 2014;Kálcza-Jánosi & Anett, 2016;Mehrabian et al., 2012;Yau et al., 2009). The largest of these studies, in which crystallized intelligence, fluid intelligence, verbal memory, visual memory, executive functioning and psychomotor speed were assessed, found no difference between the groups with all effect sizes being negligible (Aberle et al., 2008). The participants were on average nine years older and had three years less education than the participants in our study. In the five other studies, patients performed worse than healthy individuals in at least one cognitive domain, with all of the studies reporting poorer memory functioning in the patient group than in the healthy comparison group (Biessels et al., 2001;García-Casares et al., 2014;Kálcza-Jánosi & Anett, 2016;Mehrabian et al., 2012;Yau et al., 2009). The effect sizes were mostly medium or large.
In some of these studies where between-group differences were observed, the participants were roughly the same age, slightly more educated, and the duration of diabetes was a few years shorter than in our study (Mehrabian et al., 2012;Yau et al., 2009), making these factors unlikely to explain the differences in the results. The different findings cannot be explained solely by the neuropsychological tests selected either, since in some of the studies the patient group performed worse than the healthy group in tests that were used in our study as well (García-Casares et al., 2014;Mehrabian et al., 2012). In these tests, the mean scores of the healthy individuals were approximately the same as in our study, but the patients performed surprisingly poorly, with mostly large effect sizes obtained. It is possible that some of the previous studies have included a few particularly poorly performing patients, which could have lowered the overall group mean given the small sample sizes. This may be due, for example, to the fact that hypothyroidism, severe obstructive sleep apnea or diabetes complications were not mentioned as exclusion criteria in most of the other studies. For example, in a study with only 13 patients, one patient was reported to have background retinopathy and two patients a history of ischemic heart disease (Biessels et al., 2001). In a small dataset, these factors can already explain the differences in cognitive performance between the groups. In one study, the exclusion criteria were only related to the age and education level of the participants (Kálcza-Jánosi & Anett, 2016). Considering the small number of published studies and the small sample sizes, it is also possible that publication bias distorts the results published so far.

Subjective cognitive symptoms
Similar numbers of patients and healthy individuals reported subjective cognitive symptoms, and there was no difference in the subjective severity of the symptoms reported. A previous study with a larger sample size found that middle-aged T2DM patients had more memory complaints than healthy individuals (Faiz et al., 2021). Aberle et al. (2008) did not find differences in subjective cognitive symptoms between middle-aged T2DM and healthy comparison groups in a study of similar size to ours. In addition to the statistical power of the studies, the differing results could be due to the heterogeneity of the participants studied or the differences in the methods used to investigate the symptoms.

Strengths and limitations
A general weakness of observational studies is that due to unknown confounding variables, they cannot establish causal relationships. In our study we achieved excellent matching regarding age, education and gender, and found no group differences in any of the psychosocial symptom or alcohol use measures.
The participants of our study were recruited through self-selection, which is a potential source of sampling bias. The healthy individuals in our study were more highly educated and more often female than the patients. When the groups were matched, this resulted in the unmatched healthy individuals performing better than the matched individuals in some of the measures used. Men were overrepresented in the group of unmatched patients and this group performed worse than the matched patient group in one memory measure. The person conducting the neuropsychological assessment was aware of the participants' diabetes status, which may have introduced performance bias. Although we followed an a priori research plan throughout the study and our hypotheses were based on previous research, we did not pre-register our research protocol, which can be considered a limitation.
To our knowledge, this is the first study in middleaged T2DM patients that uses a careful matching procedure and a neuropsychological test battery covering most cognitive domains where group differences have been observed in older age groups. In addition to cognitive performance, we studied subjective cognitive symptoms and investigated several medical and psychosocial factors that could potentially confound the results. We believe that this study, despite its limitations, brings valuable information about the cognitive performance of middle-aged people with T2DM.

Conclusions
T2DM is a fast-growing worldwide epidemic that has been associated with cognitive decrements already in middle age (Biessels et al., 2001;García-Casares et al., 2014;Kálcza-Jánosi & Anett, 2016;Mehrabian et al., 2012;Yau et al., 2009). However, all of the studies that have controlled for the effects of age, education and gender have been small and heterogeneous in regards to sampling methods, neuropsychological tests used and the participant characteristics.
In our data, which was carefully matched and applied strict exclusion criteria, there were no differences in cognitive performance between the patient and healthy groups. In addition, the effect sizes obtained were mainly negligible. We consider it unlikely that workingage people with T2DM without other medical conditions, such as hypothyroidism, severe obstructive sleep apnea or late complications of diabetes, would, as a group, perform significantly worse than healthy controls on the neuropsychological tests used in our study. This does not mean that an individual with type 2 diabetes is not at greater risk of developing cognitive symptoms. We encourage readers to investigate this topic further, preferably using a longitudinal design, to better identify factors associated with poorer cognitive performance or decline in cognitive function with aging in people with T2DM.