TY - GEN
T1 - MedLink
T2 - 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023
AU - Wu, Zhenbang
AU - Xiao, Cao
AU - Sun, Jimeng
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/8/6
Y1 - 2023/8/6
N2 - A comprehensive patient health history is essential for patient care and healthcare research. However, due to the distributed nature of healthcare services, patient health records are often scattered across multiple systems. Existing record linkage approaches primarily rely on patient identifiers, which have inherent limitations such as privacy invasion and identifier discrepancies. To tackle this problem, we propose linking de-identified patient health records by matching health patterns without strictly relying on sensitive patient identifiers. Our model MedLink solves two challenges faced with the patient linkage task: (1) the challenge of identifying the same patients based on data collected in different timelines as disease progression makes the record matching difficult, and (2) the challenge of identifying distinct health patterns as common medical codes dominate health records and overshadow the more informative low-prevalence codes. To address these challenges, MedLink utilizes bi-directional health prediction to predict future codes forwardly and past codes backwardly, thus accounting for the health progression. MedLink also has a prevalence-aware retrieval design to focus more on the low-prevalence but informative codes during learning. MedLink can be trained end-to-end and is lightweight for efficient inference on large patient databases. We evaluate MedLink against leading baselines on real-world patient datasets, including the critical care dataset MIMIC-III and a large health claims dataset. Results show that MedLink outperforms the best baseline by 4% in top-1 accuracy with only 8% memory cost. Additionally, when combined with existing identifier-based linkage approaches, MedLink can improve their performance by up to 15%.
AB - A comprehensive patient health history is essential for patient care and healthcare research. However, due to the distributed nature of healthcare services, patient health records are often scattered across multiple systems. Existing record linkage approaches primarily rely on patient identifiers, which have inherent limitations such as privacy invasion and identifier discrepancies. To tackle this problem, we propose linking de-identified patient health records by matching health patterns without strictly relying on sensitive patient identifiers. Our model MedLink solves two challenges faced with the patient linkage task: (1) the challenge of identifying the same patients based on data collected in different timelines as disease progression makes the record matching difficult, and (2) the challenge of identifying distinct health patterns as common medical codes dominate health records and overshadow the more informative low-prevalence codes. To address these challenges, MedLink utilizes bi-directional health prediction to predict future codes forwardly and past codes backwardly, thus accounting for the health progression. MedLink also has a prevalence-aware retrieval design to focus more on the low-prevalence but informative codes during learning. MedLink can be trained end-to-end and is lightweight for efficient inference on large patient databases. We evaluate MedLink against leading baselines on real-world patient datasets, including the critical care dataset MIMIC-III and a large health claims dataset. Results show that MedLink outperforms the best baseline by 4% in top-1 accuracy with only 8% memory cost. Additionally, when combined with existing identifier-based linkage approaches, MedLink can improve their performance by up to 15%.
KW - electronic health record
KW - entity resolution
KW - patient deduplication
KW - patient identification
KW - patient linkage
KW - record linkage
UR - http://www.scopus.com/inward/record.url?scp=85171370177&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85171370177&partnerID=8YFLogxK
U2 - 10.1145/3580305.3599427
DO - 10.1145/3580305.3599427
M3 - Conference contribution
AN - SCOPUS:85171370177
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 2672
EP - 2682
BT - KDD 2023 - Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 6 August 2023 through 10 August 2023
ER -