The goal of this task is to develop an algorithm to learn to classify images containing objects of the same category (e.g. birds, dogs) into specific sub-categories, i.e. specific species. The dataset is available:
- Caltech-UCSD Birds-200-2011: 200 categories of birds http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
add this to import the data info for bounding_boxes, classes, image_class_labels, images, train_test_split, attributes, certainties, class_attribute_labels_continuous, image_attribute_labels, part_click_locs, part_locs & parts
git_fldr = '/content/MLBirds/'
if os.path.exists(git_fldr):
%cd '/content/MLBirds/'
!git pull
else:
!git clone https://github.com/TeaWithLucas/MLBirds.git
%cd '/content/MLBirds/
import data_load as data
For more information about the dataset, visit the project website:
http://www.vision.caltech.edu/visipedia
If you use the dataset in a publication, please cite the dataset in the style described on the dataset website (see url above).
- images/ The images organized in subdirectories based on species. See IMAGES AND CLASS LABELS section below for more info.
- parts/ 15 part locations per image. See PART LOCATIONS section below for more info.
- attributes/ 322 binary attribute labels from MTurk workers. See ATTRIBUTE LABELS section below for more info.
Images are contained in the directory images/, with 200 subdirectories (one for each bird species)
The list of image file names is contained in the file images.txt, with each line corresponding to one image:
<image_id> <image_name>
The suggested train/test split is contained in the file train_test_split.txt, with each line corresponding to one image:
<image_id> <is_training_image>
where <image_id> corresponds to the ID in images.txt, and a value of 1 or 0 for <is_training_image> denotes that the file is in the training or test set, respectively.
The list of class names (bird species) is contained in the file classes.txt, with each line corresponding to one class:
<class_id> <class_name>
The ground truth class labels (bird species labels) for each image are contained in the file image_class_labels.txt, with each line corresponding to one image:
<image_id> <class_id>
where <image_id> and <class_id> correspond to the IDs in images.txt and classes.txt, respectively.
Each image contains a single bounding box label. Bounding box labels are contained in the file bounding_boxes.txt, with each line corresponding to one image:
<image_id>
where <image_id> corresponds to the ID in images.txt, and , , , and are all measured in pixels
The list of all part names is contained in the file parts/parts.txt, with each line corresponding to one part:
<part_id> <part_name>
The set of all ground truth part locations is contained in the file parts/part_locs.txt, with each line corresponding to the annotation of a particular part in a particular image:
<image_id> <part_id>
where <image_id> and <part_id> correspond to the IDs in images.txt and parts/parts.txt, respectively. and denote the pixel location of the center of the part. is 0 if the part is not visible in the image and 1 otherwise.
A set of multiple part locations for each image and part, as perceived by multiple MTurk users is contained in parts/part_click_locs.txt, with each line corresponding to the annotation of a particular part in a particular image by a different MTurk worker:
<image_id> <part_id>
where <image_id>, <part_id>, , are in the same format as defined in parts/part_locs.txt, and is the time in seconds spent by the MTurk worker.
The list of all attribute names is contained in the file attributes/attributes.txt, with each line corresponding to one attribute:
<attribute_id> <attribute_name>
The list of all certainty names (used by workers to specify their certainty of an attribute response of is contained in the file attributes/certainties.txt, with each line corresponding to one certainty:
<certainty_id> <certainty_name>
The set of attribute labels as perceived by MTurkers for each image is contained in the file attributes/image_attribute_labels.txt, with each line corresponding to one image/attribute/worker triplet:
<image_id> <attribute_id> <is_present> <certainty_id>
where <image_id>, <attribute_id>, <certainty_id> correspond to the IDs in images.txt, attributes/attributes.txt, and attributes/certainties.txt respectively. <is_present> is 0 or 1 (1 denotes that the attribute is present). denotes the time spent by the MTurker in seconds.
Attributes on a per-class level--in a similar format to the Animals With Attributes dataset--are contained in attributes/class_attribute_labels_continuous.txt. The file contains 200 lines and 312 space-separated columns. Each line corresponds to one class (in the same order as classes.txt) and each column contains one real-valued number corresponding to one attribute (in the same order as attributes.txt). The number is the percentage of the time (between 0 and 100) that a human thinks that the attribute is present for a given class