Labeled Retinal Optical Coherence Tomography Dataset for Classification of Normal, Drusen, and CNV Cases

Version 1
Description

This dataset was designed to support research in AI-based medical image analysis, particularly focusing on retinal and pulmonary conditions. It includes thousands of expertly labeled OCT and Chest X-Ray images sourced from independent patients and categorized into four classes: CNV, DME, DRUSEN, and NORMAL. The dataset mirrors the imaging data described in the publication 'Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning', and is structured to facilitate reproducibility and benchmarking in deep learning workflows.

Keywords
OCT ImagingChest X-RayAI in Medical ImagingRetinal Disease ClassificationPneumonia Detection
Conditions
Choroidal NeovascularizationDiabetic Macular EdemaDrusenPneumonia
License

Creative Commons Attribution 4.0 International

64 citations
1.1k views

This dataset consists of more than 16,000 retinal OCT B-scans from 441 cases (Normal: 120, Drusen: 160, CNV: 161) and is acquired at Noor Eye Hospital, Tehran, Iran. Images are labeled by a retinal specialist.

The structure of the folders are as below:

  • CNV, DRUSEN, NORMAL folders
  • Within each class, folders are separated patient-wise with numbers from 1 to .
  • Within each patient folder, images (B-scans) are labeled with <0XX_LABEL> format where is the B-scan number, and is the specialist's selected label for that specific B-scan.

The excel spreadsheet (data_information.csv) includes information such as "Patient ID", "Class", "Eye", "B-scan", "Label", and "Directory" for all images (16823 rows, 6 columns).

The python code (read_data.py) includes code for loading images and labels as NumPy arrays. The written function outputs the input data as an array with shape (number_of_images, imageSize, imageSize, 3) and output data as a list of labels (Normal: 0, Drusen: 1, CNV: 2). There are two different options for reading the files:

  • Option 1: Reading all images. This would result in 16822 images.
  • Option 2: Reading the worst-case condition images for each volume (i.e., if a patient was detected as a CNV case, only CNV-appearing B-scans were included for training procedure and normal and drusen B-scans of that patient are excluded from the dataset). This would result in 12649 images.