Semi-automatic de-identification of hospital discharge summaries with natural language processing. A case-study of performance and real-world us...
Ioan Calapodescu, Svetlana Artemova, Jean-Luc Bosson
DARLI: 1st International workshop on Data Analytics solutions for Real-LIfe APplications, Exeter, United Kingdom, 23 June 2017
Patient medical records represent a very rich and important source of information for clinical research. Still, this data cannot be used directly for research purposes, as these documents contain highly-sensitive personal information protected by the law. In this paper, we evaluate the qualitative and quantitative impact of a semi-automated system (combining NLP processing, ML models and a dedicated UI) when used by human annotators for de-identifying French Hospital Discharge Summaries.