Finding hidden features responsible for machine learning failures
There have been several situations where machine learning classifiers, trained to diagnose a particular disease (for example, lung cancer from chest x-rays), overfit on hidden features within the data. Examples include gridlines, surgical markers or evidence of treatment or text present in the images (see references for examples). This causes the classifier to fail on other type of images.
Although these “hidden pictures” are often visible, the presence of such features is not documented in the dataset. Several projects on this topic are possible:
- How can we detect hidden features with (semi-)automatic methods?
- How does removing these images (from training or test data) change the evaluation of existing methods?
- Can we achieve similar results with smaller, but better curated datasets?
Oakden-Rayner, L., Dunnmon, J., Carneiro, G., & Ré, C. (2020). Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In Proceedings of the ACM conference on health, inference, and learning (pp. 151-159).
Varoquaux, Gaël, and Veronika Cheplygina. “How I failed machine learning in medical imaging–shortcomings and recommendations.” arXiv preprint arXiv:2103.10292 (2021).
Winkler, J. K., Fink, C., Toberer, F., Enk, A., Deinlein, T., Hofmann-Wellenhof, R., … & Haenssle, H. A. (2019). Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA dermatology, 155(10), 1135-1141.