OCT5k: A dataset of multi-disease and multi-graded annotations for retinal layers
Version 2This dataset was designed to support research in AI-based medical image analysis, particularly focusing on retinal and pulmonary conditions. It includes thousands of expertly labeled OCT and Chest X-Ray images sourced from independent patients and categorized into four classes: CNV, DME, DRUSEN, and NORMAL. The dataset mirrors the imaging data described in the publication 'Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning', and is structured to facilitate reproducibility and benchmarking in deep learning workflows.
Creative Commons Attribution No Derivatives 4.0 International
The thickness and appearance of retinal layers are essential markers for diagnosing and studying eye diseases. Despite the increasing availability of imaging devices to scan and store large amounts of data, analyzing retinal images and generating trial endpoints has remained a manual, error-prone, and time-consuming task. In particular, the lack of large amounts of high-quality labels for different diseases hinders the development of automated algorithms. Therefore, we have compiled 5016 pixel-wise manual labels for 1672 optical coherence tomography (OCT) scans featuring two different diseases as well as healthy subjects to help democratize the process of developing novel automatic techniques. We also collected 4698 bounding box annotations for a subset of 566 scans across 9 classes of disease biomarker. Due to variations in retinal morphology, intensity range, and changes in contrast and brightness, designing segmentation and detection methods that can generalize to different disease types is challenging. While machine learning-based methods can overcome these challenges, high-quality expert annotations are necessary for training. Publicly available annotated image datasets typically contain few images and/or only cover a single type of disease, and most are only annotated by a single grader. To address this gap, we present a comprehensive multi-grader and multi-disease dataset fortraining machine learning-based algorithms. The proposed dataset covers three subsets of scans (Age-related Macular Degeneration, Diabetic Macular Edema, and healthy) and annotations for two types of tasks (semantic segmentation and object detection).