Home pregnancy tests in the hands of the intended user

The objectives of this study were to investigate the usability and performance of seven visual home pregnancy tests, available in Europe. Part one of the study was home-based and involved volunteers testing a selection of four home pregnancy tests. The tests used and order of use were randomized. Part two, performed at a study site, involved volunteers reading and interpreting the results of the same selection of home pregnancy tests used in part one, but using urine standards representing early pregnancy (25 mIU/mL human chorionic gonadotropin) or a ‘not pregnant’ (0 mIU/mL human chorionic gonadotropin) sample. The volunteers completed a questionnaire after each test in both parts. Three of the seven tests met their accuracy/ reliability claims: tests A (99.8%), B (100%), and F (97.6%) (not statistically different from the claimed 99% accuracy). The remaining four tests had accuracies/reliabilities of <99% at 81.6% (C), 89.0% (E), 92.5% (D), and 95.9% (G), respectively. Test A was the highest-rated test for each attribute tested in both settings. Test D was ranked the lowest in part one and test C was ranked lowest overall for part two. Home pregnancy tests vary in performance and usability, therefore requiring better standardization and performance evaluation in Europe. Clinical Trials Reference Number: NCT03589534


Introduction
When a woman suspects that she may be pregnant, it is very important that an accurate pregnancy status is confirmed as early as possible to enable her to make informed decisions on her on-going health and well-being. [1] Home pregnancy tests have advanced significantly since their launch in 1976 and, as a result, women are able to confirm their status in private, in the comfort of their own homes. [2,3] These tests confirm the presence of the pregnancy hormone, human chorionic gonadotropin (hCG), which is produced at a very early stage of the pregnancy by trophoblast cells. [3,4] The hormone hCG has also been shown to be detectable in the maternal circulation 8 days after conception and a concentration of approximately 10 mlU/mL is observed in serum between 9 and 10 days after follicular rupture. [5,6] The pattern of appearance and concentration increase of hCG that occurs in the maternal circulation has similarly been shown in the urine of pregnant women. [3,7,8] There are numerous home pregnancy tests currently available on the market that claim accuracy of >99% when used from the day of the expected period. [9] However, for some products, these claims cannot be replicated when tested either on clinical samples in the laboratory or when lay users perform the test. [3] Additionally, studies have shown that when participants perform pregnancy tests on their own urine compared with testing samples in a laboratory setting, a lower sensitivity is reported with lay user testing. [10] It is important that home pregnancy tests are easy to use and also provide an accurate result in the hands of users. [11] In the USA, the home pregnancy tests available on the market are required to provide objective evidence of product performance according to specific definitions of test sensitivity and include lay user testing for products designed for consumer use. [12] However, in the European market, such strict requirements do not currently exist. Manufacturers who market these tests in both Europe and the USA tend to conform to these definitions across both markets, [3] but products intended for marketing solely in the European market may not have followed US Food and Drug Administration guidelines for performance evaluation.
In the absence of any available data on test performance and the lack of standardization for evaluating test credentials, any declaration of test accuracy on the package labeling is potentially misleading. [3] European guidelines, as per the in vitro diagnostic directive implemented in 1998, require manufacturers to do their own risk assessments [13] ; however, specific performance requirements are not defined in this directive and due to the evolution of the medical device industry, it has become outdated. [3,14] More prescriptive European guidelines would be beneficial to ensure that similar risks are taken into consideration by all manufacturers, and this is a key aim of the in vitro diagnostic regulations (IVDR) that entered into force in May 2017. [3,15] The IVDR are a set of requirements that include the necessity for comprehensive validation data (including data on customer usability) on home pregnancy tests and other in vitro devices, and will replace the European Union's current Directive on in vitro diagnostic medical devices after a transition period of 5 years (2022) when the new requirements are enforced in Europe. [14,16] The objective of this study was to investigate the usability and performance of seven visual home pregnancy tests available in Europe among a group of volunteers, representative of lay users. Performance was defined as the device accuracy/reliability and usability was defined by the attributes on the 7-point Likert scale used to gain feedback from the volunteers.

Materials and methods
This study of 250 volunteers from the UK was conducted between May 29 th and August 10 th , 2018. A total of 246 volunteers completed the study; one was lost to follow-up and three were withdrawn. All volunteers gave written, informed consent before entering the study. The key inclusion criteria comprised women aged 18-45 years who were willing to provide informed consent, conduct a personal home pregnancy test, and reveal their pregnancy status. The key exclusion criteria included the following: current or previous employees or immediate relatives of Abbott, Procter & Gamble, and their affiliates, and those who have taken a hormonal preparation containing hCG in the last month (e.g. Pregnyl, among others). This study was a formal assessment of the usability and performance of seven visual home pregnancy tests available in Europe among volunteers that were representative of lay users (tests detailed in Table 1). The tests selected were market-leading or second to market-leading tests in France, Italy, Spain, Germany, or UK, and therefore represent tests used by a large number of women. Six of the seven tests claimed an accuracy or reliability of >99%, and one test claimed to have a sensitivity of 20 mIU/mL, a concentration of hCG that corresponds to 2 days of pregnancy.
The investigation comprised of two parts, taking place at different sites: the first part of the study examined the home experience of pregnancy testing, where volunteers were required to test four out of the seven pregnancy tests at home. Reviewing four tests was deemed the maximum number that could feasibly be compared and prevent bias as a result of volunteer fatigue. The products and order of testing were randomized, ensuring that the effect of being exposed to other products was balanced for the products being analyzed; each volunteer had test A among their four products to allow cross-comparison of data.
Volunteers were required to read the appropriate instructions and then test the products at specified times of the day using either the 'in-stream' method or a container to collect a urine sample. The 'how to use' and 'how to Not all volunteers who completed part one of the study went on to the second part of the study. The second part of the study examined the experience of reading the same home pregnancy tests run with urine hCG standards that were representative of an early pregnancy result (25 mIU/mL) and a 'not pregnant' result (0 mIU/mL) at a study site. This assessment was conducted by trained technicians and appointments were scheduled soon after the at-home testing conducted in part one. Each test was dipped by the technician, following each products respective instructions for use, into either of the two urine standards (pre-randomized). One at a time, each volunteer read the results from a total of eight tests (two urine standards tested with each of the four products).
The volunteers verbally stated their interpretation of the results to the study technician who recorded these on a results sheet. The technician then interpreted and recorded the result for each test. The technicians and volunteers were blinded to the hCG concentration of each sample for the duration of the study. Once testing was complete, the volunteers completed a second usability questionnaire (also using the 7-point Likert scale) on each product which included the following questions: • How CERTAIN were you that you read the test results correctly? • How CLEAR did you think the test results were? • How EASY did you think the test results were to read? • How ACCURATE did you believe this test was? • How much would you TRUST this test?
Analysis of the at-home and on-site usability questionnaires were summarized for each product using the percentage of volunteers scoring either 1 or 2 on the Likert scale. Agreement between the volunteer-read results and expected result at the study site (accuracy) was calculated for each product along with exact binomial 95% confidence intervals. The data collected from the study was analyzed in SAS data format by qualified SPD statisticians that were not directly involved in the conduct of the study.
The protocol was approved by the SPD Ethics Committee. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

Demographics: ethnicity, education, and occupation
In total, 250 volunteers were recruited to the study and the mean age was 35 ±6 years (age range: 19-45 years; median age 36 years). The majority of the volunteers (99.2%) were Caucasian and of Western European descent. A total of 99 (39.6%) volunteers had attained at least Advanced Levels, vocational level 3, and equivalents, which was the highest level attained by the majority. With regard to occupations, 41.2% of volunteers had either a professional occupation (n=51) or an administrative/secretarial occupation (n=52).

At-home testing
The results from the at-home testing of the seven home pregnancy tests showed that test A was ranked highest for each of the usability attributes. This was highlighted by the percentage of volunteers scoring each attribute as either 1 or 2 for ease of use (95.9%), certainty of test use (96.7%), clarity of results (92.7%), certainty of correctly reading test results (95.5%), hygiene of the test (81.3%), perceived accuracy of results (95.5%), perceived level of trust in at-home use based on their overall experience (93.5%), and how much the test was liked by the volunteers (92.7%) ( Table 2). The percentage of volunteers scoring a 1 or 2 on the questionnaire was lowest for test D for the following usability attributes: ease of use (65.1%), clarity of results (69.8%), certainty of results (83.3%), perceived level of trust (65.9%), perceived level of accuracy (74.6%), and how much the test was liked by the volunteers (40.5%). Furthermore, the overall scores for hygiene were low, ranging from 55.6% to 81.3%. All seven tests were rated highly (over 80%) for the level of certainty in correctly read results. For other attributes there were broad differences between the highest and lowest scoresfor example for the question 'How much do you LIKE this test?', test A was scored at 92.7% whereas test D was scored at 40.5%.

On-site testing
The results from the on-site testing showed that for the usability attributes, as per the results for part one, the highest rated test was A. This was shown by over 90% of volunteers scoring the following attributes as either 1 or 2 on the Likert questionnaire: certainty of correctly reading the test results (99.2%), clarity of results (98.4%), ease of reading the results (97.6%), perceived accuracy (98.8%), and perceived level of trust (99.2%). Test B scored highly in part two for perceived accuracy (84.8% vs 86.4%) and the level of trust recorded onsite vs at home (78.4% vs 76.8%). The lowest-rated test from the on-site testing was test C for usability attributes: certainty of correctly read results (44.1%), clarity of results (33.1%), ease of reading the results (31.4%), perceived accuracy (50.0%), and perceived level of trust (39.0%) ( Table 3).
For tests C to G, the number of volunteers scoring either 1 or 2 on the questionnaire was lower in the on-site testing compared with the at-home testing for each usability attribute, whereas for test A, the percentages were higher for the on-site testing compared to the at-home testing. For example, for test A, for the attribute 'How CERTAIN were you that you   (Table 4).

Discussion
Home pregnancy tests have progressed considerably since their launch, [2] enabling women to confidently determine their pregnancy status on their own. [2,3] It is important for women to ascertain their pregnancy status as early as possible because this could empower them to make the necessary lifestyle changes that come with a pregnancy confirmation, such as seeking medical advice in a timely manner and adjusting behaviors, such as drinking, smoking, and/or eating habits. [11] In light of this, a false-negative/positive test result due to the accuracy of the home pregnancy test used can be detrimental to women; this emphasizes the importance of the usability and performance of home pregnancy tests available on the market. The results obtained from this current study assessing the usability and performance of seven home pregnancy tests, available in Europe, showed that only three of the seven tests evaluated (tests A, B, and F) met their claimed accuracy/reliability of >99% under controlled conditions during the on-site testing. This was demonstrated by their agreement with the expected results when testing with standards. In addition, test A was the highest-performing test in both part one and two for each attribute, and there was also an increase in scores shown in the rating for part two compared to part one. However, tests C, D, E, F, and G all showed lower scores in part two, compared with part one. Test B showed an increase in scores for some attributes (level of certainty and how trustworthy the results were), whereas other attributes scored lower than in part one. The change in scoring is likely to be because the women participating in the study all knew their pregnancy status, so when conducting a test at home on their own sample, they already knew what the result should be so provided the results matched their own knowledge, they judged the test acceptable. However, during on-site testing, volunteers were unaware of what the result should be and therefore had to pay more care with regard to result interpretation; thus highlighting readability differences that were not as apparent when testing their own sample. This study highlights the discrepancies that exist between home pregnancy tests in terms of both usability and performance. This is evident since only tests A, B, and F met their >99% accuracy claims when tested against standards. Across both part one and part two, the results of the usability questionnaires were broadly consistent at ranking the tests, with test A consistently rated highest, and tests D and C rated as the lowest in part one and part two, respectively. Tests B and F, which along with test A showed high agreement with the claimed accuracy results, were consistently rated in the middle of the seven tests in the usability questionnaires. These results strongly emphasize that many of the visually read home pregnancy tests available on the market in Europe are not as accurate as their packaging claims, and usability can vary considerably, independent of accuracy.
A previous study highlighted the importance of readability, stating that volunteers expressed concern over the lack of readability of the faint lines on many of the tests used, making the results difficult to interpret. [17] In the study outlined here, volunteers rated a number of attributes relating to readability, such as the certainty of reading results correctly, the clarity of results, and the ease of reading the results. The variability seen across these attributes (31.4% for test C and 97.6% for test A) further emphasizes the importance of readability to users when choosing to use a home pregnancy test. This study also highlighted variability between the tests in other important areas of usability, such as trust in the results obtained, ease of use, and how hygienic the tests are to use. These results emphasize that there are differences in many attributes among home pregnancy tests available on the European market.
To ensure only robust pregnancy tests are available, manufacturers should design the product with the end-user in mind, ensure that it has undergone robust validation and verification testing which must include actual usage by lay-users, have excellent quality control procedures in manufacturing and a good post-market surveillance system to gather in-market feedback.
The hCG standards used in the on-site testing are representative of early pregnancy, which is the period of time where most women would be taking the test if they suspect pregnancy; therefore, this represents a realistic assessment of pregnancy status for home pregnancy kits. At present, there is no agreed or consistent validation method in Europe for medical devices, [3] as there is in the USA.
Home pregnancy test labeling usually only provides simple information on product accuracy and in some cases the ability to detect pregnancy early, so a user has little information to base her selection. However, the addition of further information would probably not be helpful, as a woman with a need to test for pregnancy should not have to decipher complicated technical information to determine whether the test is suitable for her; the onus should be on the manufacturer to provide robust products. If a woman wishes to make a better informed choice, selecting a product that is also sold in the USA, as well as Europe, would at least guarantee she has purchased a test that has met the more robust requirements of the FDA.
This study supports the need for the new standardization that the in vitro diagnostic regulation aims to achieve. The new regulations, which will be enforced in Europe in 2022, [15,16] aim to set high standards of quality and safety for in vitro medical devices in order to meet common safety concerns. [15] Our findings further highlight the importance and need for robust validation testing criteria to be included in the in vitro diagnostic regulation, in order to help standardize home pregnancy tests across Europe.

Conclusion
The study highlighted that there are discrepancies between home pregnancy tests available in Europe, in terms of performance and usability, with accuracy only matching the product labeling in three out of seven tests. Furthermore, tests that meet their accuracy claims are not always scored highly for usability, with only one test scoring consistently highly across all parameters. Such variation between tests might have a significant impact on user confidence. Therefore, to provide the assurance required by users at such an important time, there is a need for better standardization and performance evaluation requirements for home pregnancy test products in the European market. The new in vitro diagnostic regulations, due to come into force in 2022, will help to provide this much needed standardization.

Funding
The study was funded by SPD Development Company Limited (Bedford, UK), a wholly owned subsidiary of SPD Swiss Precision Diagnostics GmbH (Geneva, Switzerland).

Author Contribution
This study was designed by SJ and SW. The study was prepared and conducted by JB and DB. Statistical analysis was conducted by CH. SJ and SW were involved in preparation of the manuscript.
Disclosures JB, SW, DB, CH, and SJ are employees of SPD Development Company Limited.

Funding
The study was funded by SPD Development Company Limited (Bedford, UK), a wholly owned subsidiary of SPD Swiss Precision Diagnostics GmbH (Geneva, Switzerland).