Winning patient trust is essential for advancing machine learning applications in medicine, which rely on access to large and diverse health datasets.
Data sharing challenges: While openly available medical datasets have been highly beneficial for research, in many cases health data cannot be shared due to privacy concerns or participant preferences:
- Federated learning schemes allow models to be trained on local datasets without sharing the data directly, but additional privacy-preserving tools are still needed to prevent data reconstruction from model updates.
- Synthetic data generated by AI algorithms can help protect patient privacy by replacing or augmenting real datasets, but potential bias, overfitting, and generalization issues need to be carefully examined.
Balancing privacy and accuracy: Empirical evidence from Ziller et al. demonstrates that differential privacy techniques can effectively protect patient data in medical imaging applications while maintaining model prediction accuracy:
- Differential privacy involves adding controlled noise to data to reduce the impact of individual patient data on model outputs.
- The study advocates for consistent use of differential privacy, offering a path to standardize and encourage sharing of trained models.
Broader implications: Rigorous real-world testing of privacy-preserving tools in medical machine learning, as shown by Ziller et al., should be strongly encouraged:
- Unambiguous evidence that data sharing methods can withstand privacy attacks and produce secure, effective models will motivate patients to consent to their health data being used in research.
- Maintaining patient willingness to share data is critical for the success of machine learning in medicine, which hinges on the availability of large, diverse, real-world training datasets.
A question of trust for AI research in medicine