'Caveat emptor': the cautionary tale of endocarditis and the potential pitfalls of clinical coding data-an electronic health records study.
Fawcett N., Young B., Peto L., Quan TP., Gillott R., Wu J., Middlemass C., Weston S., Crook DW., Peto TEA., Muller-Pebody B., Johnson AP., Walker AS., Sandoe JAT.
BACKGROUND: Diagnostic codes from electronic health records are widely used to assess patterns of disease. Infective endocarditis is an uncommon but serious infection, with objective diagnostic criteria. Electronic health records have been used to explore the impact of changing guidance on antibiotic prophylaxis for dental procedures on incidence, but limited data on the accuracy of the diagnostic codes exists. Endocarditis was used as a clinically relevant case study to investigate the relationship between clinical cases and diagnostic codes, to understand discrepancies and to improve design of future studies. METHODS: Electronic health record data from two UK tertiary care centres were linked with data from a prospectively collected clinical endocarditis service database (Leeds Teaching Hospital) or retrospective clinical audit and microbiology laboratory blood culture results (Oxford University Hospitals Trust). The relationship between diagnostic codes for endocarditis and confirmed clinical cases according to the objective Duke criteria was assessed, and impact on estimations of disease incidence and trends. RESULTS: In Leeds 2006-2016, 738/1681(44%) admissions containing any endocarditis code represented a definite/possible case, whilst 263/1001(24%) definite/possible endocarditis cases had no endocarditis code assigned. In Oxford 2010-2016, 307/552(56%) reviewed endocarditis-coded admissions represented a clinical case. Diagnostic codes used by most endocarditis studies had good positive predictive value (PPV) but low sensitivity (e.g. I33-primary 82% and 43% respectively); one (I38-secondary) had PPV under 6%. Estimating endocarditis incidence using raw admission data overestimated incidence trends twofold. Removing records with non-specific codes, very short stays and readmissions improved predictive ability. Estimating incidence of streptococcal endocarditis using secondary codes also overestimated increases in incidence over time. Reasons for discrepancies included changes in coding behaviour over time, and coding guidance allowing assignment of a code mentioning 'endocarditis' where endocarditis was never mentioned in the clinical notes. CONCLUSIONS: Commonly used diagnostic codes in studies of endocarditis had good predictive ability. Other apparently plausible codes were poorly predictive. Use of diagnostic codes without examining sensitivity and predictive ability can give inaccurate estimations of incidence and trends. Similar considerations may apply to other diseases. Health record studies require validation of diagnostic codes and careful data curation to minimise risk of serious errors.