*Memos:
- My post explains Fashion-MNIST, Caltech 101, Caltech 256, CelebA, CIFAR-10 and CIFAR-100.
- My post explains Oxford-IIIT Pet, Oxford 102 Flower, Stanford Cars, Places365, Flickr8k and Flickr30k.
- My post explains ImageNet, LSUN and MS COCO.
- My post explains Image Classification(Recognition), Object Localization, Object Detection and Image Segmentation.
- My post explains PASCAL VOC, SUN Database, Kinetics Dataset and Cityscapes.
- My post explains Image Classification(Recognition), Object Localization, Object Detection and Image Segmentation.
- My post explains Keypoint Detection(Landmark Detection), Image Matching, Object Tracking, Stereo Matching, Video Prediction, Optical Flow, Image Captioning.
(1) MNIST(Modified National Institute of Standards and Technology)(1998):
- has the 70,000 handwritten digit images[0~9] each connected to the label from 10 classes:
*Memos:
- 60,000 for train and 10,000 for test.
- Each image has 28x28 pixels.
- is used for Image Classification.
- is MNIST() in PyTorch. *My post explains
MNIST()
.
(2) EMNIST(Extended MNIST)(2017):
- has the handwritten character images(digits[0~9] and alphabet letters[A~Z][a~z]) splitted into 6 datasets(ByClass, ByMerge, Balanced, Letters, Digits and MNIST):
*Memos:
- Each image has 28x28 pixels.
- ByClass has the 814,255 character images(digits[0~9] and alphabet letters[A~Z][a~z]) each connected to the label from 62 classes. *697,932 for train and 116,323 for test.
- ByMerge has the 814,255 character images(digits[0~9] and alphabet letters[A~Z][a, b, d~h, n, q, r, t]) each connected to the label from 47 classes. *697,932 for train and 116,323 for test.
- Balanced has the 131,600 character images(digits[0~9] and alphabet letters[A~Z][a, b, d~h, n, q, r, t]) each connected to the label from 47 classes. *112,800 for train and 18,800 for test.
- Letters has the 145,600 alphabet letter images[a~z] each connected to the label from 27 classes. *124,800 for train and 20,800 for test.
- Digits has the 280,000 digit images[0~9] each connected to the label from 10 classes. *240,000 for train and 40,000 for test.
- MNIST has the 70,000 digit images[0~9] each connected to the label from 10 classes. *60,000 for train and 10,000 for test.
- is used for Image Classification.
- is EMNIST() in PyTorch. *My post explains
EMNIST()
.
(3) QMNIST(2019):
- has the 522,953 handwritten digit images[0~9] each connected to the label from 10 classes:
*Memos:
- 60,000 for train, 60,000 for test and 402,953 of all the digits from the NIST Special Database 19.
- Each image has 28x28 pixels.
- is an extended MNIST. *I don't know what Q of QMNIST means.
- is used for Image Classification.
- is QMNIST() in PyTorch. *My post explains
QMNIST()
.
(4) ETLCDB(Extract-Transform-Load Character Database)(2011):
- has the handwritten or machine-printed character images(digits, symbols, alphabet letters and Japanese characters) splitted into 9 datasets(ETL-1, ETL-2, ETL-3, ETL-4, ETL-5, ETL-6, ETL-7, ETL-8 and ETL-9):
*Memos:
- ETL1 has the 141,319 character images(digits[0~9], alphabet letters[A~Z], symbols[+-*/=()・,?’] and Katakana[ア~ン]) each connected to the label from 99 classes. *Each image has 64x63 pixels.
- ETL2 has 52,796 character images(digits[0~9], alphabet letters[A~Z], symbols, Katakana letters[ア~ン], Hiragana letters[あ~ん] and Kanji letters) each connected to the label from 2,184 classes. *Each image has 60x60 pixels.
- ETL3 has 9,600 character images(digits[0~9], alphabet letters[A~Z] and symbols[¥+-*/=()・,_▾]) each connected to the label from 48 classes. *Each image has 72×76 pixels.
- ETL4 has 6,120 Hiragana letter images[あ~ん] each connected to the label from 51 classes. *Each image has 72×76 pixels.
- ETL5 has 10,608 Katakana letter images[ア~ン] each connected to the label from 51 classes. *Each image has 72×76 pixels.
- ETL6 has 52,796 character images(digits[0~9], alphabet letters[A~Z][a~z], symbols and Katakana letters[ア~ン]) each connected to the label from 114 classes. *Each image has 64x63 pixels.
- ETL7(ETL7L and ETL7S) has 16,800 character images(Hiragana letters[あ~ん], Dakuten[゛] and Handakuten[゜]) each connected to the label from 48 classes. *Each image has 64x63 pixels.
- ETL8(ETL8G and ETL8B2) has 152,960 character images(Hiragana letters[あ~ん] and Kanji letters) each connected to the label from 956 classes. *Each image has 128x127 pixels.
- ETL9(ETL9G and ETL9B) has 607,200 character images(Hiragana letters[あ~ん] and JIS first level Kanji letters) each connected to the label from 3,036 classes. *Each image has 128x127 pixels.
- is used for Image Classification.
- isn't in PyTorch so we need to download it from etlcdb.
(5) Kuzushiji(2018):
- has the cursive style Japanese character images splitted into 3 datasets(Kuzushiji-MNIST, Kuzushiji-49 and Kuzushiji-Kanji):
*Memos:
- Kuzushiji-MNIST has the 70,000 Hiragana letter images each connected to the label from 10 classes. *Each image has 28x28 pixels.
- Kuzushiji-49 has the imbalanced 270,912 character images(Hiragana letters and Hiragana iteration marks) each connected to the label from 49 classes. *Each image has 28x28 pixels.
- Kuzushiji-Kanji has the imbalanced 140,424 Kanji letter images of 3832 classes. *Each image has 64x64 pixels.
- is used for Image Classification.
- is KMNIST() in PyTorch but it only has Kuzushiji-MNIST so we need to download Kuzushiji-49 and Kuzushiji-Kanji from GitHub. *My post explains
KMNIST()
.
(6) Moving MNIST(2015):
- has 10,000 videos:
*Memos:
- Each video has 20 frames(images) with 2 moving digits.
- Each frame(image) has 64x64 pixels.
- is used for Video Prediction.
- is MovingMNIST() in PyTorch. *My post explains
MovingMNIST()
.
Top comments (0)