Skip to content

Project analysis – Rakuten France Multimodal Product Data Classification


Contents


Datasets

Files

  • X_train_update.csv: Training input
  • Y_train_CVw08PX.csv: Training output
  • X_test_update.csv: Test input

Directories

  • image_train: Training input images
  • image_test: Test input images

WARNING

All training sets must be split into training and validation sets as there are no labels provided for the test sets. They are for final validation exclusively!

Dataset analysis

X_train_update.csv

Shape: 84916, 5

Variable NameDescriptionDtypeTotal ValuesMissing ValuesUnique Values
Unnamed: 0Consecutive indicesint6484916084916
designationShort product designationobject84916082265
descriptionlongform product description with html formattingobject551162980047506
productidunique product IDint648491608491
imageidunique image IDint6484916084916

X_test_update.csv

Shape: 13812, 5

Variable NameDescriptionDtypeTotal ValuesMissing ValuesUnique Values
Unnamed: 0Consecutive indicesint6413812013812
designationShort product designationobject13812013812
descriptionlongform product description with html formattingobject892648868346
productidunique product IDint6413812013812
imageidunique image IDint6413812013812

Y_train_CVw08PX.csv

Shape: (84916, 2)

Variable NameDescriptionDtypeTotal ValuesMissing ValuesUnique Values
Unnamed: 0Consecutive indicesint6484916084916
prdtypecodeProduct type codeint6484916027

There are 27 product classes (target variable 'prdtypecode').

Images

Naming convention: image_<imageid>_product_<productid>.jpg

Example: image_67284_product_365202.jpg

Size: 500px x 500px

Format: .jpg

image_train contains 84916 files, none missing

image_test contains 13812 files, none missing