Non-repudiation and privacy-preserving sharing of electronic health records

Abstract In this work, we present a framework that facilitates the sharing of EHRs among the community of health-care providers (HCPs). However, the sharing might be obstructed by patients’ privacy and the controlling legislation. Nevertheless, our sharing scheme of EHRs strives to respect patients’ privacy and comply with relevant legislator guidelines, e.g., HIPPA. The proposed work introduces two services while sharing the EHRs: privacy and non-repudiation. To this end, we introduce the partners and the role of each during the course of exchanging of EHRs. The principle of sharing EHRs among HCPs has to be reinforced to save patients’ lives while cryptographic primitives have to be employed to serve this purpose. In this paper, we are motivated to introduce the notion of non-repudiation private membership test (NR-PMT). In NR-PMT, we help patients receive medical services with great flexibility while maintaining their privacy and thwarting all possible threats to disclose their identities. In addition, a formal security analysis based on Kailar’s accountability framework has been used to analyze the proposed framework. Also, a complexity analysis has been conducted.


PUBLIC INTEREST STATEMENT
Free-Space Optical (FSO) Communication has fundamentally altered the way people communicate. It allows for efficient voice, video, and data transmission over a medium such as air as an alternative to wire communication systems. The primary benefits of FSO include high speed, cost savings, low power, energy efficiency, maximum transfer capacity, and applicability. However, atmospheric turbulence such as fog, rain, haze is one of the main problems that seriously limits the transmitted data rate in FSO systems. Atmospheric turbulence causes signal distortion during transmission that leads to data loss at the receiver side totally or partially. As a result, this study attempts to mitigate adverse effects of atmospheric turbulence, including clear weather, medium haze, medium rain, and medium fog during transmission of the data through the FSO link and increases the capacity of data transmitted and the distance under diverse atmospheric turbulence. Increasing the data rate transmitted through the FSO link enables the use of FSO in various applications essential to our daily lives.

Introduction
The main focus of this work is on Electronic Health Record (EHR), which is the aggregate of patient's health-related information. EHR includes the patient's medical history, laboratory data, radiology images, progress report, immunization, and medications. EHR has been adopted by policymakers to replace the paper-based system and to enhance the service provided at hospitals and doctors' offices through the elimination of manual handling of patients' data. In addition, EHR could help in preventing dishonest claims and achieving better coordination between health-care service providers. It seems that EHR is a promising new technology but with an expected financial cost and potential concerns as well. From one hand, the cost of deploying EHRs to the cloud has been already estimated and could fit or marginally exceed the budget, but eventually it will be within the acceptable limits. On the other hand, sharing EHRs among caregivers has significant concerns, such as privacy and authentication. Therefore, a great body of research focuses on achieving the aforementioned security. To this end, this work aims at promoting a privacypreserving, accountable, and non-repudiation collaboration between health-care providers. However, such collaboration my incur a leakage of sensitive personal information that violates patients' privacy.
To determine what may explicitly identify users, we follow the guidelines that have been enacted by the officials, i.e., personally identifiable information (PII;McCallister et al., 2010). PII can be used to uniquely identify an individual's identity. PII is considered highly sensitive. Examples of attributes and assets that we consider PII: name, SSN, phone number, email address, IP address, MAC address, biometric data, face image, etc. The security risk of sharing PII information with other parties is the possibility of unauthorized access to data and thus identity theft, fraud, and misconduct.
However, contingencies sometimes mandate that patients need immediate health service, so access to their EHRs could be a factor in their survival. For example, a person travels abroad in a business trip and becomes in need for a procedure or a check-up. Care providers may be unwilling to embark a procedure before knowing about the medical history of the patient or instead waiting for a lab work that reveals whether a patient is allergic or intolerant to specific medications. Our model assumes that data trustees (custodians) are responsible for preserving patients' records according to the relevant legislation and guidelines, i.e., HIPPA in the USA (United States, 1996), GDPR in the European Union (Union, 2016), etc. In medical sector, patient's sensitive data in transit or at the recipient side cannot be used to identify the patient.
However, a custodian cannot share a patient's EHR, because this is considered a violation of the patient's privacy. Our proposed model assumes that the sharing process follows an anonymity approach that hinders privacy violations. Also, we suggest the presence of a trusted third party (TTP) that works as a library to facilitate the mapping/linking of patients to thier custodians or health-care providers. The difference between the scenario that we propose here and the wellknown Privacy-preserving record linkage (PPRL; (Fellegi & Sunter, 1969)- (Churches & Christen, 2004)) is that the linkage key that is produced by the participant service providers is not necessarily unified among all participants and the check for membership is conducted at the TTP.
Contribution of this work is as follows: proposing a framework to allow secure sharing of EHRs that facilitates immediate interventions of care providers (CPs) in times of pandemics. As those CPs do not own patients EHRs, a collaboration framework with the guardians of the EHRs is inevitable to save lives. Having a contract or umbrella under which all participants have to deliver responsibly is mandatory. This work targets the sharing of EHRs without compromising the patients' privacy or leading to any kind of records linkability whatsoever. Also, at a point of cooperation, the framework guarantees that each participating CP has committed with its part undeniably.

Related work
Data hiding and the attainment of secure communication have been recently the main concern to attain secret sharing of sensitive data (Parah et al., 2017), (Sarosh et al., 2021). Next, we cover bloom filter with its variation and applications as we use it for membership check.

Matching under uncertainty
Approximate matching to test for a similarity between two sets A and B has been presented in several works for more details check the survey by Vatsalan et al.(Vatsalan et al., 2013) and that of Dong et al. (Dong et al., 2013) where scalability is an issue.
In this work, we focus our model on approximate matching and test for membership with a level of confidence. In our settings of private set membership (PSM), we are given a group of n sets and a record, i. We may consider it as a variation of the privacy-preserving record linkage (PPRL) by assuming that the given record is a set of cardinality 1 and then look for the set with which an intersection exceeds a given confidence value, i.e., threshold. From practical standing, we adopt 2 À gram decomposition, i.e., bigram. Let w 2 � � be a string and � denote the alphabet. Then, the 2 À gram of w ¼ hassan 0 is expressed as follows: 2 À gramðhassanÞ ¼ f h; ha; as; ss; sa; an; n g; where ‚�.

Similarity measure
Approximate matching in literature has been relied mostly on either Jaccard index or Dice coefficient with a scale that is preferably inclined more toward Dice coefficient. They only differ in the branch lengths. However, their dendrogram topologies are the same (Grzebala & Cheatham, 2016).-Given two sets A = 2 À gram (hassan) and B = 2 À gram (hasan), one can compute the Dice coefficient using the following equation: For example, A ¼ f h; ha; as; ss; sa; an; n g and B ¼ f h; ha; as; sa; an; n g have a Dice coefficient computed as follows: Definition 1 Threshold Dice coefficient (TDC). Given a record a, a set A i and a predetermined match threshold, k, s.t., 0 � k � 1, TDC is defined as ThDice and is calculated by the following equation: For specific domains, even the size of items is considered sensitive information (Ateniese et al., 2011). For example, finding members of a flight who belong to the list that is banned from traveling to the USA has to be done with discretion as the Department of Homeland Security (DHS) cannot give away the size of Terror Watch List (TWL). Padding is a possible solution that is adapted from literature-computer networking. However, padding incurs a sacrifice of computational resources and is considered a naive solution.

A membership test
Membership test between an entity and a given party A is shown in Algorithm 1.

Algorithm 1 Threshold-based membership test
Input: {x, k, ½a 1 ; a 2 ; . . . ; a n �}, where ½a 1 ; a 2 ; . . . ; a n � is A's database of size n, x is an entity and k is a threshold

Preliminaries
In our pursuit for space-efficient data structure, we consider Bloom filters (BF), devised by Bloom (Bloom, 1970). An alternative to BF could a cuckoo filter (CF), which is a space-efficient probabilistic data structure. One merit of cuckoo filer over BF is that the former supports the potential of deleting existing elements while the later does not support deletion of existing elements. Also, CF has lower space overhead than BF.
Several usages of BF and extensions have been proposed to add features such as scalability (Almeida et al., 2007). Also, BF has been applied in numerous disciplines, e.g., network applications such as resource routing, collaboration in overlay and peer-to-peer networks, packet routing, to name a few, see Broder and Mitzenmacher for more details (Broder & Mitzenmacher, 2004). In our study for efficient database representation and privacy-preserving, we adopt the strategy followed by Schnell et al. (Schnell et al., 2009).
A BF is an array of m bits with all bits initially set to zero. To map an object to a BF, the object is passed to a group of k independent hash functions, The work by Bose et al. (Bose et al., 2008) proves that the false-positive rate (FPR) computed in Eq. 3 represents a lower bound of FPR.
where m denotes the size of BF, k denotes the independent hash functions and n denotes number of objects mapped to the BF.
From Eq. 3, it is obvious that the value of k that minimize the FPR is computed as follows: However, this is still an optimization to a lower bound FPR. In Table 1, we set FPR ¼ 0:001, kðindependent hash functionsÞ ¼ À lnðFPRÞ ln 2 and let n, number of stored items, to take a value 2 f100; 1000; 10000g and compute the Bloom filter length m ¼ À n lnðFPRÞ ðlnð2ÞÞ 2 .
A possible solution to mitigate the effect of false positive is to let each custodian share nondiscriminant (non PII) but enlightening information about the patient, e.g., 50 � age � 60jj telephone area code as last four digits of patient's SSN, e.g., ***-**-9783, is considered PII. The returned information helps the enquirer to eliminate the majority of the 'false' custodians if not all of them except the right one. Figure 1 shows approximate matching of two inputs, i.e., A = "HASSAN", and B = "HASAN" by bigram Bloom filter with two hash functions. Computing the Dice coefficient, ThDice ¼ 2�12 12þ13 ¼ 0:96, yields that most likely there is a match, whereas exact matching returns null.
Another problem with custom Bloom filter is frequency attack. As we increase the number of hash functions among q-grams more positions will be shared and set to 1. Some techniques have been proposed to extenuate this as balanced Bloom filter presented by Schnell and Borgs (Schnell & Borgs, 2016) where each Bloom filter of length l is concatenated with its negation and then permuting the resulting 2l. Also, Rivest's chaffing and winnowing (Bleumer, 2011) could be applied for the same purpose. To overcome frequency attacks, the main effort is revolving around masking the frequency distribution of the original filter. Figure 2 shows Bloom filter false positive rates based on different parametrization.

The proposed framework
Private set membership (PSM) is a tweak to the well-known private set intersection (PSI), a secure multiparty computation cryptographic technique. Figure 3 shows the interactions between an HCP and the TTP server. Initially, the HCP knows only its client, C and looks for the custodian of this C's EHR. While TTP initially knows the populated sets of the subscribed HCPs, A ¼ fA 1 ; A 2 ; . . . ; A n g.  Eventually, an HCP knows (A i iff 9A i ; s:t:;C 2 A i or nil otherwise). Next, the HCP corresponds with A i to retrieve the sought EHR that belongs to C.

Publishing patients pseudonyms at a TTP
Next, we propose the usage of patient's ID to look for the health-care provider to which the patient's EHR belongs. However, the search for membership has to respect patients' privacy. Namely, the process is called private set membership (PSM; Meskanen et al., 2015)- (Tamrakar et al., 2017).
Let B ¼ fb 1 ; b 2 ; ::; b n g be the set of subscribed health-care providers (SHCP), s.t., "PatientID i ; 9b i : PatientID j 2 b i ; "i 2 ½1; n�andj 2 ½1; m�, where n and m are number of SHCP and patients, respectively. Figure 4 shows the framework and the interactions between the three participating parties, i.e., custodian, A, requester, B and the trusted linkage library, C. The first step begins with obtaining A's consent to publish its records keys that are unidentifiable and cannot disclose its patients true  identity in the trusted library, C. Then, a patient who is obliged to have a medical service away from its HER's custodian at healthcare provider B, step 2. So, B on behalf of the doctor will issue a request to linkage library asking for the patient's EHR, step 3. By its turn, the trusted party, C, upon successfully identifying the custodian of the patient, forwards the enquiry to the corresponding custodian, step 4. Upon the reception of the enquiry from the trusted party, custodian A inspects its records and looks for a match if any, step 5. If there is a match, A sends B the paient's sanitized EHR, step 6, otherwise A refrains or sends a decline to B, step 6.
However, sending patients' medical records has to go through a sanitization process (Bier et al., 2009) in which sensitive data or generally any PII information is removed and replaced with meaningless information. The aim is to prevent the exposure of patients' identities. Figure 5 shows typical patients' data categorized into two overlapping sets, i.e., sensitive and medical data. Any item that falls in the sensitive class is wiped out before sharing non-linkable medical data. As we mentioned earlier, medical images could compromise patients's identities. Yet, diagnosis, drug allergy and prescriptions may be considered -in our perspectives -as of less or zero sensitivity level and could be a life saving information in contingencies.
The linkage key that we seek to match has to be registered within a trusted third party. It is widely accepted that a patient may be authenticated using one factor or more of the three-factor authentication: 1) a factor that the patient knows, e.g., passwords, 2) a factor that the patient has, e.g., token, and 3) a factor that the patient is, e.g., bio-metric.

Medical data exchange
Algorithm 2: Query TTP for patient P's custodian Input: {The visiting patient Bloom filter}; Output: A list of potential candidates, L, which might own P's HER; /* " custodian i 2 L */ for i ¼ 1 to n do Query i for P; Response L½i� sends P's EHR but non PII, however limiting, e.g., age 2 ½50 À 60�; Filter Inquirer overlooks (due to probabilistic nature of BF) the seemingly non-conformant responses; Ensure utility Inquirer may negotiate with L½i� more tightening approaches without compromising patient's privacy, e.g., affirming telephone area code (k-anonymity).  Figure 6 shows a depiction of the proposed non-repudiation protocol in Alice/Bob notation. The aim is to warrant that no participating party denies receipt of the requested and delivered information. In addition, the protocol assumes that the trusted third party (TTP) is honest and will not collude with other parties. However, even if the TTP colludes, the collusion effect is dissipated by masking (anonymizing) patients' sensitive data that might identify them directly or indirectly by linking with other datasets. Table 2 shows the used symbols along with definitions.
The first two messages are between the requestor and the TTP. The aim is to figure out the custodian of that patient's EHR. Once Alice, A, figures out the custodian, A forwards the request to Bob, B.
To ask trustee B for a patient's EHR, party A sends a message that is composed of the participating parties identities along with Sign K À 1 A ðXjjPatientID i Þ to vindicate the origin of the request. However, the signed part is encrypted with a shared secret key, K 1 , that is shared between A and C. As the trustee cannot approve the request without the aid of a TTP, C, B, once agreed to process the request, signs fXjjYg using its private key and then forwards the message to C.
When C receives the message sent by B, C's role is to check whether the request is valid and to prepare a response to B. Firstly, C authenticates the sender using its public key. Secondly, C decrypts fEnc Kc ðK 1 Þg using its private key. Now, C has an access to K 1 that enables it to proceed to the next step. Thirdly, once C verified the passed request from B and acquired K 1 , C decrypts ½Enc K 1 ðSign K À 1 A ½XjjBjjCjjPatientID i �Þ� using K 1 , that C has just accessed it, as shown in the previous step. Fourthly, C authenticates the validity of the decrypted message using A's public key, computes HðPatientID i Þ and compare it with the one that has been retrieved earlier. If the received hash and the computed one match, C commences preparing a response to B to enable it to reply to A's request of PatientID i health record.
Next, C has to send two messages: first message acts as a proof of request origin (PoREQO) and it is directed to B and second message indicates a delivery of request to the recipient and acts as a proof of request delivery (PoREQD).
Then C computes message 5 which contains POO and forward it to B. In message 5, PoREQO is computed as T Afterwards, C computes message 6 which comprises POD and sends it to A. In message 6, POD is computed as follows:

Figure 6. Non-repudiation sharing of EHRs with an online TTP.
M For reliability issue, A and B may acknowledge receipt of messages (PoREQO and PoREQD) and C may retransmit if timers expire before receiving acknowledgements.
Next, as B approves the request, B starts preparing the response by decrypting fEnc K B ðK 1 Þg using its private key. Once B acquires the key, K 1 , it decrypts the part Enc K 1 ðTÞ to obtain T Then B uses C's and A's public keys successively to obtain fXjjPatientID i g.
At that point, B has an access to the patient's ID which will be used to retrieve and send the patient's anonymized record to the requestor. First, B sign fAjjBjjCjjEHR i g with its private key and encrypt the result with a key, K 2 , that the requestor has to figure out by consulting C to obtain EHR i .
Upon reception of the response from B, A knows it has to forward a request to C to help it to know K 2 . When C receives the request from A, C authenticates the origin of the message by using A' public key. Then, C uses it private key to decrypt fEnc K C ðK 2 Þg and obtain K 2 . Once K 2 is acquired, C uses it to decrypt ½Enc K 2 ½Sign K À 1 B ðXjjEHR i Þ�� and sign the decrypted message using its private key.

The previous step results in Sign
Finally, C has to send two more messages, each contains a proof ticket: first message carries a proof of response origin and second message holds a proof of delivery response.

Security analysis
Secure protocols may suffer one or more of the following attacks: Dictionary (guessing) attack, cryptanalysis attack, frequency attack, impersonation attack, denial of service (DoS) attack, replay attack, and other possible attacks, like collusion attack if the protocols count on thirdparties. Table 3 presents a comparison between the proposed protocol and other related secure protocols. Next, we present a detailed analysis of the proposed protocol accountability and nonrepudiation.

Accountability and non-repudiation analysis
In addition, we utilize a familiar accountability framework that has been introduced by Kailar (Kailar, 1996) to prove our protocol accountability. The basic principle is that whether we have a proof of a statement validity or a mere belief of the statement validity; we look for the former.
According to Kailar (Kailar, 1996), principals are denoted by uppercase letters and statements are denoted by lowercase letters. Kailar introduces two proofs: strong proof and weak proof. In strong proof, a principal Y can prove a statement t. In this sense, Y, without disclosing secrets, follows a procedure and eventually convinces X of t. Our goals are to show that:

Proposed Framework
Privacypreserving linking  Also, the above inference indicates that, certainly, MSP has requested the EHR of that patient.
Message 5: Upon reception of message 5, CSP can use TTP's public key and based on the association between (K TTP and K À 1 TTP ), i.e., assumption A1, one can interpret message 5 using the signature postulate as: It is apparent that G1 is a vivid proof of origin (POO), i.e., non-repudiation of origin while G2 is a proof of delivery (POD).

Message 6:
This message is used a confirmation from TTP to MSP that CSP has received its request. Therefore, this message is a proof of request delivery (PoREQD). This message is interpreted as follows: The above statement indicates that, undeniably, CSP has replied to MSP's request. Next, from 8.2 and following the same construct we used in 8.1, we can reach the following statements: TTP Can Prove ðCSP Says EHRÞ: Moreover, applying assumption A5, we can infer: TTP Can Prove ðCSP Sent EHRÞ: Also, the above inference indicates that, certainly, CSP has sent the patient's EHR.
Message 9: Upon reception of message 9, MSP can use TTP's public key and based on the association between (K TTP and K À 1 TTP ), i.e., assumption A1, one can interpret message 9 using the signature postulate as: It is apparent that G1 is a vivid proof of response origin (PoRESO), i.e., non-repudiation of origin while G2 is a proof of delivery (POD).
When CSP receives message 10 from TTP, CSP can use it as an evidence of delivering the requested medical data to MSP so that MSP can provide the emergency medical service to the patient. Message 10 is considered a proof of response delivery (PoRESD). So message 10 stands as a non-repudiation of delivery.

Performance analysis and limitations
There are occasions in which things do not follow the predefined (legitimate) course of actions. This is what we call malicious. A malicious behavior could be, but not limited to, any of the following: a party or more are unwilling to participate, they are manipulating their feedback (inputs) and they abruptly abort or cease participating. A protocol design should be flexible to handle this. One measure is to set a time frame for each response and expect loss and/or corruption of messages. Along with keeping a log of misbehaving parties to exclude them from the multi-party computation model as they are devious.

Input size:
-Client's set has 1 item only -TTP's has a population of size w that belongs to n HCPs.

Computational complexity:
-The TTP runs OðwÞ, where w is the population size, comparisons with its database against client's C. It is apparent that the computation is linear w.r.t. the input size at the server.

conclusion
For long, officials had to barter information utility for privacy and vice versa. In other words, adopting strict privacy guidelines means a sacrifice of information utility. In this work, we deal with sensitive information that improper handling could compromise it and hence identity theft may befall. However, due to the implications and pandemics, we have to propose models that permit the sharing of patients' medical records among health-care providers (HCP) without compromising patients' privacy. The proposed framework is built upon the existence of a trusted party that manages and authenticates all correspondences among participating HCPs. The framework adds two security services, i.e., privacy and non-repudiation. We believe that EHRs contain sensitive information and sharing the right part in a timely fashion without compromising a patient's privacy could save lives. We adopt Bloom filter (BF) as a secure probabilistic data-structure. Custodians may publish its BF on a trusted third party allowing other HCPs to seek a visiting patient's medical data. Future work may rely on the usage of Blockchain and utilize its infrastructure to build a secure health-care chain. Also, a secure framework that can perform without a trusted third party is a potential area of study.