Finding hidden features responsible for machine learning failures

Supervisors: Veronika Cheplygina
Semester: Spring 2023
Tags: machine learning, data science, medical imaging

There have been several situations where machine learning classifiers, trained to diagnose a particular disease (for example, lung cancer from chest x-rays), overfit on hidden features within the data. Examples include gridlines, surgical markers or evidence of treatment or text present in the images (see references for examples). This causes the classifier to fail on other type of images. Although these “hidden features” are often visible, the presence of such features is not documented in the dataset.

For the coming semester, we have projects available on using unsupervised methods for detecting clusters of images with related features (for example surgical markers). More specifically, the steps are likely to involve:

  • Image representation (with pretrained networks, or classical methods)
  • Unsupervised clustering / subgroup detection
  • Evaluation of the results (by labelling, visual inspection etc)


Oakden-Rayner, L., Dunnmon, J., Carneiro, G., & Ré, C. (2020). Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In Proceedings of the ACM conference on health, inference, and learning (pp. 151-159).

Varoquaux, Gaël, and Veronika Cheplygina. “How I failed machine learning in medical imaging–shortcomings and recommendations.” arXiv preprint arXiv:2103.10292 (2021).

Winkler, J. K., Fink, C., Toberer, F., Enk, A., Deinlein, T., Hofmann-Wellenhof, R., … & Haenssle, H. A. (2019). Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA dermatology, 155(10), 1135-1141.