Converting Deep Learning Datasets Easily from one format to other PascalVOC, yolo, MS COCO
In deep learning converting datasets from one format to another is easy now I have been using the Voxel51 library to convert different object detection datasets to various formats. YOLOV3 to PascalVOC, PascalVOIC to different YOLO formats like YOLOV5, KITTI to YOLO, COCO to Yolo, and vice versa.

We often need to convert datasets to different formats due to the code bases we might use. For example, the yolov7 code base uses the yolo label format. However, you might want to train or test it on your own dataset. If you do not have labels that are in yolo format, finding the right code to convert them to yolo format might be challenging. While you can write code on your own, it could take a long time. I recently faced a similar problem where I needed to convert various formats into Yolo format. After researching the internet and trying various scripts, I found the Voxel51 library to be an excellent solution.
They have a to-date code base for converting different object detection and other kinds of datasets to various formats with just a few lines of code. Here is an example of how I used the Voxel51 library to convert RTTS (hazy object detection dataset) which is in PascalVOC format into yolo format
Installation
pip install fiftyone
Example
import fiftyone as fo
name = "RTTSYolo"
data_path = "/home/mahmood/Downloads/RTTS/JPEGImages"
labels_path = "/home/mahmood/Downloads/RTTS/Annotations"
# Import dataset by explicitly providing paths to the source media and labels
dataset = fo.Dataset.from_dir(
dataset_type=fo.types.VOCDetectionDataset,
data_path=data_path,
labels_path=labels_path,
name=name,
)
dataset.export(export_dir="/home/mahmood/datasets/RTTSYolo", dataset_type=fo.types.YOLOv5Dataset)
import fiftyone as fo
name = "RTTSYolo"
data_path = "/home/mahmood/Downloads/RTTS/JPEGImages"
labels_path = "/home/mahmood/Downloads/RTTS/Annotations"
# Import dataset by explicitly providing paths to the source media and labels
dataset = fo.Dataset.from_dir(
dataset_type=fo.types.VOCDetectionDataset,
data_path=data_path,
labels_path=labels_path,
name=name,
)
session = fo.launch_app(dataset, port=5151)
session.wait()
YOLOV5 Format to MS COCO Detection
name = "nRainSAVCOCO"
dataset_dir = "/home/mahmood/datasets/nRainSAVYOLO"
# Create the dataset
dataset = fo.Dataset.from_dir(
dataset_dir=dataset_dir,
dataset_type=fo.types.YOLOv5Dataset,
name=name,
split="train",
)
# export to coco
dataset.export(export_dir="/home/mahmood/datasets/nRainSAVCOCO", dataset_type=fo.types.COCODetectionDataset)
What's Your Reaction?






