Appearance
Project analysis – Rakuten France Multimodal Product Data Classification
Contents
Datasets
Files
- X_train_update.csv: Training input
- Y_train_CVw08PX.csv: Training output
- X_test_update.csv: Test input
Directories
- image_train: Training input images
- image_test: Test input images
WARNING
All training sets must be split into training and validation sets as there are no labels provided for the test sets. They are for final validation exclusively!
Dataset analysis
X_train_update.csv
Shape: 84916, 5
Variable Name | Description | Dtype | Total Values | Missing Values | Unique Values |
---|---|---|---|---|---|
Unnamed: 0 | Consecutive indices | int64 | 84916 | 0 | 84916 |
designation | Short product designation | object | 84916 | 0 | 82265 |
description | longform product description with html formatting | object | 55116 | 29800 | 47506 |
productid | unique product ID | int64 | 84916 | 0 | 8491 |
imageid | unique image ID | int64 | 84916 | 0 | 84916 |
X_test_update.csv
Shape: 13812, 5
Variable Name | Description | Dtype | Total Values | Missing Values | Unique Values |
---|---|---|---|---|---|
Unnamed: 0 | Consecutive indices | int64 | 13812 | 0 | 13812 |
designation | Short product designation | object | 13812 | 0 | 13812 |
description | longform product description with html formatting | object | 8926 | 4886 | 8346 |
productid | unique product ID | int64 | 13812 | 0 | 13812 |
imageid | unique image ID | int64 | 13812 | 0 | 13812 |
Y_train_CVw08PX.csv
Shape: (84916, 2)
Variable Name | Description | Dtype | Total Values | Missing Values | Unique Values |
---|---|---|---|---|---|
Unnamed: 0 | Consecutive indices | int64 | 84916 | 0 | 84916 |
prdtypecode | Product type code | int64 | 84916 | 0 | 27 |
There are 27 product classes (target variable 'prdtypecode').
Images
Naming convention: image_<imageid>_product_<productid>.jpg
Example: image_67284_product_365202.jpg
Size: 500px x 500px
Format: .jpg
image_train contains 84916 files, none missing
image_test contains 13812 files, none missing