Motor Neuron Disease Register for England, Wales and Northern Ireland—an analysis of incidence in England

Abstract Introduction Amyotrophic lateral sclerosis (ALS) has a reported incidence of 1–2/100,000 person-years. It is estimated that there are 5000 people with ALS in the UK at any one time; however, the true figure and geographical distribution, are unknown. In this study, we describe the establishment of a population register for England, Wales, and Northern Ireland and report-estimated incidence. Methods: People with a diagnosis of ALS given by a consultant neurologist and whose postcode of residence is within England, Wales, or Northern Ireland were eligible. The catchment area was based on six data contributors that had been participating since 2016. All centres included in this analysis were in England, and therefore Wales and Northern Ireland are not included in this report. Crude age- and sex-specific incidence rates were estimated using population census records for the relevant postcodes from Office of National Statistics census data. These rates were standardized to the UK population structure using direct standardization. Results: There were 232 people in the database with a date of diagnosis between 2017 and 2018, when missing data were imputed there were an estimated 287–301 people. The denominator population of the catchment area is 7,251,845 according to 2011 UK census data. Age- and sex-adjusted incidence for complete cases was 1.61/100,000 person-years (95% confidence interval 1.58, 1.63), and for imputed datasets was 2.072/100,000 person-years (95% CI 2.072, 2.073). Discussion: We found incidence in this previously unreported area of the UK to be similar to other published estimates. As the MND Register for England, Wales, and Northern Ireland grows we will update incidence estimates and report on further analyses.


Introduction
Motor neuron disease, also known as amyotrophic lateral sclerosis (ALS), is an adult-onset neurodegenerative disease affecting upper and lower motor neurons. Estimated global incidence of ALS is 1-2/100,000 person years, due to geographical variation the range of estimated incidence by subcontinent is 1-3/100,000 person years (1)(2)(3). ALS causes progressive weakness and paralysis, with death from respiratory failure usually between 2 and 3 years after diagnosis, but clinical presentation and disease progression are highly variable (4). There is currently no cure for ALS, although riluzole and more recently in some countries, edaravone are approved drugs that modestly extend survival for some people (5,6). Since the initial discovery that mutations in the SOD1 gene can cause ALS, there has been considerable progress in the identification of genetic risk factors (7). Despite these advances, disease etiology in the majority of cases is not understood. Heritability estimates are compatible with the possibility that non-genetic factors such as stochastic biological events in aging, environmental exposures or lifestyle choices contribute to disease risk, but there is no consensus on what these factors are (8).
Population registers collect information about every person diagnosed with a given condition in a defined geographical area, providing a source of representative data that can be used by researchers and authorities responsible for healthcare funding and organization (9).
ALS is highly variable in its presentation and clinical course. Collection of population-level data about the clinical features in ALS, including cognitive impairment, has led to a greater understanding of prognostic significance of phenotypic subgroups of ALS (10,11). Data from several European population registers have been used to create an accurate disease progression model which has helped inform care planning and communication with patients about prognosis (4). Population register data were used to show that a multistep model of disease may be relevant to ALS etiology, and to estimate the number of "steps" likely to be involved (12)(13)(14). Information from population registers has also been used to estimate the projected number of people with ALS in the future if current population demographic trends persist, as well as for modeling the potential effects of future disease-modifying treatments (15,16).
Population-based datasets eliminate the inherent ascertainment bias of intervention and casecontrol studies based on referral cohorts (17), providing unbiased estimates of the effects of exposure to risk factors associated with ALS (18). Comparisons of clinical characteristics of patients enrolled in drug trials and population-based data from the same recruiting area show large differences that might help explain lack of generalisability of results from intervention trials.
Some trends in ALS are consistently reported between countries, for example, most people present with symptoms in the limbs (19). However, the phenotypic spectrum differs between countries as shown by studies quantifying incidence, peak age of onset of symptoms, proportions of people with different sites of presentation, and survival time (19,20). These differences are probably due to demographic and healthcare factors, therefore, it is important to collect information at a local level to inform patients and healthcare professionals. In the UK there are five regional population registers for ALS: the South East ALS (SEALS) Register, Peninsula Network, South Wales Register, Northern Ireland register, and MND Care in Scotland, as well as many long-standing databases that document the attendees of specialist ALS clinics (21)(22)(23)(24)(25). All have provided insight into the overall picture of ALS in the UK and have contributed to estimates of UK-wide incidence, prevalence, and lifetime risk. There are an estimated 5000 people living with ALS in the UK at any one time, but whether this is the true figure and how people with ALS are geographically distributed is not known.
In this paper, we describe the creation of the MND Register for England, Wales, and Northern Ireland through the incorporation of local population registers, use of data collected routinely to organize ALS clinics, and involvement of people with ALS directly through a self-registration website. We also report initial findings on incidence for areas with complete case ascertainment.

Patient eligibility
Eligible individuals were defined as having been diagnosed with ALS, Primary Lateral Sclerosis (PLS), or Progressive Muscular Atrophy (PMA) by a consultant neurologist. Where motor neuron involvement appeared to be restricted to the upper or lower motor neurons (including flail limb variants), but time since diagnosis was less than 4 years, the diagnostic category was recorded as "upper motor neuron predominant ALS" or "lower motor neuron predominant ALS", with a free text box available for provision of more detail if needed. Site of onset of first focal weakness, El Escorial category, and co-existing dementia were considered as phenotypic modifiers and recorded as separate variables (26). People with cognitive impairment, including those with frontotemporal dementia were eligible. People with Kennedy's disease were not included.
People with ALS provided informed consent for the inclusion of identifiable data in the register to their healthcare professional or via our website (details below). As an informed consent discussion may not always be appropriate during a clinic appointment, or progression may be so rapid as to preclude an approach for informed consent, an anonymised data capture protocol was devised.

Identifying data sources
It is estimated that 90% of people with ALS will visit a specialist ALS service as part of their pathway of care in the UK; many of these services are funded in part by the ALS charity the Motor Neurone Disease Association and are referred to as MND Care and Research Centers or Networks (90% figure from internal report from Motor Neurone Disease Association). Therefore, we specifically invited all of these services to contribute data. To ensure complete case ascertainment, we identified other services, including general neurology clinics, community services, clinical nurse specialists, and hospices where people with ALS also receive care.

Catchment areas
Specialist centres generally oversee a defined geographic area of the country. We asked each site to identify the areas in which every incident case of ALS would be referred to them to map areas of complete ascertainment. This information was generally provided in the form of UK postcode districts (for example, SE22 or SE5), unitary authorities, or counties. Many areas were overlapping between centres so cases were sometimes reported more than once.

Data collection and transfer
The project has been designed to avoid duplication of data collection efforts for health professionals and researchers. Where there was a local population register or long-standing clinical database already in use, the local dataset was aligned with the agreed Register dataset. Where there was no database in use we provided a Microsoft Access template with data export functionality. There are many pieces of information that are collected to facilitate routine care organization and some of these, such as postcode, name, hospital identifier, and sex are also part of our dataset. The template database was designed to be compatible with use as part of routine care to avoid duplication of the data collection effort. A minimum dataset of name, date of birth, unique national health service identifier, date of diagnosis, diagnosis (subtype of ALS), date of first weakness, site of first weakness, sex, and postcode of residence was requested to ensure the ability to estimate incidence and identify duplicate records.
People with ALS in the UK will encounter a variety of services, including tertiary referral centres, general neurology clinics, general practitioners, clinical nurse specialists, and local therapy teams. ALS is a clinical diagnosis which needs to be monitored over time, and people often see more than one consultant neurologist to confirm the diagnosis. As a consequence, duplicate records could be generated for the same patient by different data contributors. We used pseudonymisation to differentiate duplicate records while maintaining confidentiality of participants.

Website for patient self-registration
A website was developed to allow people living with MND to self-register for inclusion in the database and to provide consent for access to their clinical information (https://mndregister.ac.uk) with the aim of increasing direct patient participation and case ascertainment. At registration, participants were asked to indicate the neurologist who provided their diagnosis or ongoing care, to facilitate confirmation of clinical details from the medical record.

Statistical analysis
Data were extracted from local databases between September and October 2019 The data included complete records from people who had provided consent, as well as de-identified records from individuals who could not be approached for informed consent.
We used disease diagnosis to estimate incidence, focusing on the years 2017 and 2018 to include the most complete dataset based on available records.
Patients were grouped by age at diagnosis and sex in five-year age bands. We had an open-ended cohort for individuals over 85 years at the time of disease diagnosis, so these cases were analyzed together. Crude age-and sex-specific incidence rates were estimated using age-and sex-specific 2011 population census records for the relevant postcodes from Office of National Statistics (ONS) census data, the estimates are reported in personyears (27). These rates were standardized to the UK population structure as measured by the 2011 UK census, the US population structure as measured by the US 2010 census and the European standard population using the direct standardization method (28-31). We received residential data from people who had not provided consent for transfer of identifiable data at postcode area level (e.g. SE) to ensure anonymity. Our denominator population was made up of postcode areas where we had 100% capture, which was a subset of our total catchment area (darker gray areas in Figure 1).
Confidence intervals for crude rates were estimated using the exact method for Poisson intervals (32). Confidence intervals for overall age-and sexadjusted incidence rates were estimated at the 95% level using an approximation of the standard error for a binomial proportion (33).

Multiple imputation
To impute diagnostic delay we used predictive mean matching to generate 22 datasets (as 22% of cases were missing diagnostic delay) and calculated incidence for all datasets (34,35). The model for predictive mean matching included data collection center, age of onset, gender, diagnosis subtype, site of onset, and family history. We pooled estimates of incidence and 95% confidence intervals using Rubin's rules (36). Imputated datasets were generated using the R package "mice" (37,38).

Results
We defined an area of complete data capture by combining the catchment areas of six data collection centres that had been participating continuously since 2016. The complete postcode area for the catchment zone, the denominator population, represents a population of over seven million people and is indicated by the darker gray areas in Figure 1.
As of October 2019, there were 5066 records in the MND Register, through data transfers from 17 centres, including 426 people who had signed up online. We extracted data based on postcode area of residence at diagnosis. After data cleaning there were 1748 records, of these, 232 recorded a date of diagnosis between 2017 and 2018, referred to as the complete case dataset. 312 people had no date of diagnosis recorded, but all had a date of onset. We used imputed values of diagnostic delay to estimate date of diagnosis ( Figure 2).
The casemix of the complete case dataset is shown in Table 1.
Counts of people in the catchment area by age and sex (from the complete case dataset) are reported in Table 2.
After multiple imputation, estimated numbers of people diagnosed between 2017 and 2018 ranged from 287 to 301. The pooled and complete case incidence estimates are presented in Table 3. The estimated age-and sex-adjusted incidence for the UK is 1.61/100,000 (95% confidence interval 1.58, 1.63) based on complete case analysis and 2.07/100,000 (95% CI 2.072, 2.073) people based on the imputed dataset.

Discussion
We have established a population register that identifies records from multiple sources and uses data that are often available as part of routine data collection for care. The base population for our incidence calculation is over seven million people and therefore represents a large register compared to others globally. Once the MND Register includes data from all areas of the UK, which is the aim of the project, it will represent a database of a scale not yet reported. There are significant challenges in organizing and maintaining a database for a population this size, including the coordination of data collection across independent hospital systems.  Organizing a population register as a federated database can result in selection bias because of boundary effects. Through annexation of areas of complete ascertainment over time, it will provide a more complete, unbiased picture of the disease. Although the catchment area is constructed from six centres, extracting data by postcode of residence meant including data from eight centres rather than six, 25 cases were transferred by the two extra centres.
UK geography is organized into many partially overlapping administrative units. Postcode is ubiquitously recorded but hospital catchment areas and population estimates are made up of county or unitary authority boundaries that are not always congruent with postcodes. This and the transfer of anonymised data that includes high-level postcode rather than the full postcode data has made estimating incidence challenging while there are few centres participating. This is expected to improve as the MND Register includes more data contributors over time. Our study is part of the UK Clinical Research Network, so other services not included in this mapping effort will potentially be notified of the project and be incentivised to participate. The MND Register team regularly attends  5  1746724  21  1760819  45-49  8  664362  3  334775  5  329587  50-54  17  572399  7  286996  10  285403  55-59  32  498531  8  252548  23  245983  60-64  31  530802  13  271113  18  259689  65-69  31  423035  14  217876  17  205159  70-74  41  339173  16  177671  24  161502  75-79  25  281413  13  153323  12  128090  80þ  31  434587  21  272241  10  162346 Ages 16-44 are shown as one category but were analyzed in 5-year age bands (except 16-19 which was 4 years).
Ages 80-84 85þ were also analyzed separately but are displayed in aggregate. The local population numbers were multiplied by 2 to calculate person-years. Imputed rates are shown to three decimal places to reflect the accuracy needed to display the pooled 95% confidence intervals. symposia and local conferences, and use social media in order to raise awareness about the project, including the self-registration website. There are regular campaign efforts from the MND Association including a spread about the Register in their quarterly magazine and videos to help people self-register. The advantages of collecting clinical data from already existing databases are that it reduces burden on healthcare professionals who may have to collect similar data for a range of different reporting processes and care tasks, and is relatively inexpensive. It is a system that is successfully used by other population registers in the UK. The disadvantages are that there is less control over the format of data collected and it cannot be easily modified to incorporate other data collection. Although centralized databases have worked successfully in smaller areas such as Scotland and Northern Ireland, the scale of NHS services in England and Wales mean this is unlikely to be possible at present. Through establishment of contributing centres in many different locations throughout England, Wales, and Northern Ireland we have encountered variation in care processes locally.
The National Amyotrophic Lateral Sclerosis Registry in the US and the TREAT-NMD neuromuscular network utilize patient-reported data (39,40). In addition, the US Registry collects clinical data from administrative datasets, a method we plan to use to supplement data collection in the future. Analysis of case ascertainment of the US Registry shows variation by race and insurance use (41). We may not be counting privately treated patients who prefer not to register online, although this is expected to be a small number of people as NHS services provide high-quality multidisciplinary care.
Using this new register, we have estimated the incidence of ALS for previously unreported areas of England. We estimate that age-and sexadjusted UK incidence is 1.61/100,000 personyears and 2.07/100,000 person-years using imputed data. The comparison of rates will focus on the imputed incidence because it is less likely to be an under-estimate. The imputed estimated incidence is slightly lower than reported in some smaller population registers in the UK, for example the rate of 2.52/100,000 in Devon and Cornwall, 2.1/100,000 in the South East ALS Register, and higher than 1.76/100,000 reported in Lancashire (16,24,25).
Standardizing incidence to the European standard population resulted in an estimate of 2.26/ 100,000 person-years, higher than the estimate of 1.4/100,000 reported in Northern Ireland (22). Our imputed incidence standardized to the US population is lower at 1.87/100,000 person-years, which comparable to the 1.89/100,000 person years reported for Northern Europe in 2017, but lower than 3.83/100,000 person-years reported in Scotland (2,42).
The EURALS consortium reported average crude incidence rate of 2.16/100,000 person-years, while the crude rate in this study 2.06/100,000 person-years (43). Our crude imputed incidence rate is between the 2.40 and 1.49/100,000 personyears reported for Northern and Southern Europe in a recent global incidence study (3).
Our estimates are based on data from areas that have not been sampled before, so the results may reflect true lower incidence in these parts of the UK. It is also possible that there are areas of low case ascertainment in our sample. The MND Register as a federated database is relatively new, and the collection of data was initiated at different times by individual participating sites. Detailed reporting by population register in the Republic of Ireland and Scotland have shown that data quality and ascertainment improves over time (42,44,45). As more centres contribute data, we will be able to perform capture-recapture analysis of overlapping areas allowing more accurate incidence estimates.
In the future we will estimate prevalence and lifetime risk, as well as mapping incidence compared to healthcare provision. Collecting large, national datasets has helped improve care and understanding of other diseases and we have laid the groundwork and generated the momentum to do this for ALS as well.