Finding hidden features responsible for machine learning failures

Supervisors: Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, data science, medical imaging

There have been several situations where machine learning classifiers, trained to diagnose a particular disease (for example, lung cancer from chest x-rays), overfit on hidden features within the data. Examples include gridlines, surgical markers or evidence of treatment or text present in the images (see references for examples). This causes the classifier to fail on other type of images. Although these “hidden features” are often visible, the presence of such features is not documented in the dataset.

Until now, we have had several successful projects on detecting such hidden features in chest x-rays. For the fall 2023 semester, we are looking to extend this set of projects with:

  • Other types of imaging data
  • Unsupervised methods to detect subgroups in the images
  • Explainability techniques to understand the shortcuts better
  • Integrating tabular data (for example patient demographics)


Jiménez-Sánchez, A., Juodelyte, D., Chamberlain, B., & Cheplygina, V. (2022). Detecting Shortcuts in Medical Images-A Case Study in Chest X-rays. arXiv preprint arXiv:2211.04279.

Varoquaux, G., & Cheplygina, V. (2022). Machine learning for medical imaging: methodological failures and recommendations for the future. NPJ digital medicine, 5(1), 48.

Oakden-Rayner, L., Dunnmon, J., Carneiro, G., & Ré, C. (2020). Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In Proceedings of the ACM conference on health, inference, and learning (pp. 151-159).

Winkler, J. K., Fink, C., Toberer, F., Enk, A., Deinlein, T., Hofmann-Wellenhof, R., … & Haenssle, H. A. (2019). Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA dermatology, 155(10), 1135-1141.