*Memos:
- My post explains MNIST, EMNIST, QMNIST, ETLCDB, Kuzushiji and Moving MNIST.
- My post explains Fashion-MNIST, Caltech 101, Caltech 256, CelebA, CIFAR-10 and CIFAR-100.
- My post explains Oxford-IIIT Pet, Oxford 102 Flower, Stanford Cars, Places365, Flickr8k and Flickr30k.
- My post explains ImageNet, LSUN and MS COCO.
- My post explains Image Classification(Recognition), Object Localization, Object Detection and Image Segmentation.
- My post explains Keypoint Detection(Landmark Detection), Image Matching, Object Tracking, Stereo Matching, Video Prediction, Optical Flow, Image Captioning.
(1) PASCAL VOC(Pattern Analysis, Statistical Modelling, and Computational Learning Visual Object Classes)(2005):
- has object images and annotations with 4, 10 or 20 classes and there are the 8 datasets VOC2005, VOC2006, VOC2007, VOC2008, VOC2009, VOC2010, VOC2011 and VOC2012:
*Memos:
- VOC2005 has 2,232 images and annotations(some for train, some for validation and some for test) with 4 classes.
- VOC2006 has 5,304 images and annotations(1,277 for train, 1,341 for validation and 2,686 for test) with 10 classes.
- VOC2007 has 9,963 images and annotations(2,501 for train, 2,510 for validation and 4,952 for test) with 20 classes.
- VOC2008 has 5,096 images and annotations(2,111 for train, 2,221 for validation and 764 as extra) with 20 classes. *There are 4,133 images for test in it but just ignore them.
- VOC2009 has 7,818 images and annotations(3,473 for train, 3,581 for validation and 764 as extra) with 20 classes.
- VOC2010 has 11,321 images and annotations(4,998 for train, 5,105 for validation and 1,218 as extra) with 20 classes.
- VOC2011 has 14,961 images and annotations(5,717 for train, 5,823 for validation and 3,421 as extra) with 20 classes.
- VOC2012 has 17,125 images and annotations(5,717 for train, 5,823 for validation and 5,585 as extra) with 20 classes.
- is VOCSegmentation() and VOCDetection() in PyTorch.
(2) SUN Database(Scene UNderstanding database)(2010):
- has 108,754 scene images with 397 classes.
- is also called SUN397.
- is SUN397() in PyTorch.
(3) Kinetics Dataset(2017):
- has human action short video clips and there are the 3 datasets Kinetics-400, Kinetics-600 and Kinetics-700:
*Memos:
- Each video clip lasts around 10 seconds.
- Kinetics-400(2017) has 306,245 video clips each connected to the label from 400 categories(classes).
- Kinetics-600(2018) has 495,547 video clips each connected to the label from 600 categories.
- Kinetics-700(2019) has 545,317 video clips each connected to the label from 700 categories.
- is used for Video Classification.
- is Kinetics() in PyTorch.
(4) Cityscapes(2016):
- has the 25,000 annotated urban street scene images of semantic understanding with the 30 classes grouped into 8 categories. *5,000 images are fine-annotated and 20,000 images are coarse-annotated.
- is used for Image Segmentation.
- is Cityscapes() in PyTorch. *How to set the dataset isn't explained.
Fine-annotated images:
Coarse-annotated images:
Top comments (0)