Labeled Retinal Optical Coherence Tomography Dataset for Classification of Normal, Drusen, and CNV Cases
Version 1This dataset was designed to support research in AI-based medical image analysis, particularly focusing on retinal and pulmonary conditions. It includes thousands of expertly labeled OCT and Chest X-Ray images sourced from independent patients and categorized into four classes: CNV, DME, DRUSEN, and NORMAL. The dataset mirrors the imaging data described in the publication 'Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning', and is structured to facilitate reproducibility and benchmarking in deep learning workflows.
Creative Commons Attribution 4.0 International
This dataset consists of more than 16,000 retinal OCT B-scans from 441 cases (Normal: 120, Drusen: 160, CNV: 161) and is acquired at Noor Eye Hospital, Tehran, Iran. Images are labeled by a retinal specialist.
The structure of the folders are as below:
- CNV, DRUSEN, NORMAL folders
- Within each class, folders are separated patient-wise with numbers from 1 to .
- Within each patient folder, images (B-scans) are labeled with <0XX_LABEL> format where is the B-scan number, and is the specialist's selected label for that specific B-scan.
The excel spreadsheet (data_information.csv) includes information such as "Patient ID", "Class", "Eye", "B-scan", "Label", and "Directory" for all images (16823 rows, 6 columns).
The python code (read_data.py) includes code for loading images and labels as NumPy arrays. The written function outputs the input data as an array with shape (number_of_images, imageSize, imageSize, 3) and output data as a list of labels (Normal: 0, Drusen: 1, CNV: 2). There are two different options for reading the files:
- Option 1: Reading all images. This would result in 16822 images.
- Option 2: Reading the worst-case condition images for each volume (i.e., if a patient was detected as a CNV case, only CNV-appearing B-scans were included for training procedure and normal and drusen B-scans of that patient are excluded from the dataset). This would result in 12649 images.