Appearance
Contents
Base Models
Initial modeling is performed using Random Forest and XGBoost classifiers. These models are chosen for their robustness and ease of use. Convolutional Neural Networks (CNNs) will be explored at a later stage.
All evaluations below use the preprocessed combined datasets X_train_combined.npy
and X_val_combined.npy
with approximately 3,600 features, combining reduced TF-IDF vectors and image features after PCA.
Random Forest
Random Forest is used as a strong baseline due to its ability to handle mixed feature types and its resilience to overfitting. The first evaluation uses 100 estimators and class balancing.
Classification Report (first try, 100 estimators)
Class | Precision | Recall | F1-score | Support |
---|---|---|---|---|
10 | 0.3832 | 0.5948 | 0.4661 | 612 |
40 | 0.7347 | 0.5317 | 0.6169 | 521 |
50 | 0.7368 | 0.6667 | 0.7000 | 357 |
60 | 0.9214 | 0.8012 | 0.8571 | 161 |
1140 | 0.7219 | 0.6549 | 0.6868 | 539 |
1160 | 0.8600 | 0.8830 | 0.8713 | 786 |
1180 | 0.7683 | 0.4315 | 0.5526 | 146 |
1280 | 0.5662 | 0.5338 | 0.5495 | 961 |
1281 | 0.5973 | 0.3113 | 0.4093 | 424 |
1300 | 0.8432 | 0.8778 | 0.8602 | 974 |
1301 | 0.9485 | 0.7633 | 0.8459 | 169 |
1302 | 0.8815 | 0.6016 | 0.7151 | 507 |
1320 | 0.7294 | 0.5015 | 0.5944 | 672 |
1560 | 0.6618 | 0.8095 | 0.7282 | 1013 |
1920 | 0.9039 | 0.8728 | 0.8881 | 841 |
1940 | 0.6612 | 0.5839 | 0.6202 | 137 |
2060 | 0.7104 | 0.7318 | 0.7209 | 1029 |
2220 | 0.8361 | 0.6000 | 0.6986 | 170 |
2280 | 0.6387 | 0.7771 | 0.7011 | 942 |
2403 | 0.7149 | 0.7120 | 0.7134 | 986 |
2462 | 0.7492 | 0.7222 | 0.7354 | 306 |
2522 | 0.7330 | 0.8254 | 0.7765 | 991 |
2582 | 0.7337 | 0.5368 | 0.6200 | 462 |
2583 | 0.8013 | 0.9555 | 0.8717 | 2047 |
2585 | 0.8451 | 0.4571 | 0.5933 | 525 |
2705 | 0.6337 | 0.7195 | 0.6739 | 517 |
2905 | 0.9894 | 0.9841 | 0.9867 | 189 |
Metric | Value |
---|---|
Accuracy | 0.7273 |
Macro avg F1 | 0.7057 |
Macro avg Precision | 0.7520 |
Macro avg Recall | 0.6830 |
Weighted avg F1 | 0.7231 |
Weighted avg Precision | 0.7369 |
Weighted avg Recall | 0.7273 |
Macro F1-score | 0.7057 |
Random Oversampling
Random oversampling is applied to balance the classes in the training set. The improvement is negligible, as shown below, so the method is then abandoned:
Metric | Value |
---|---|
Accuracy | 0.7388 |
Macro avg Precision | 0.7498 |
Macro avg Recall | 0.7017 |
Macro avg F1-score | 0.7180 |
Weighted avg Precision | 0.7456 |
Weighted avg Recall | 0.7388 |
Weighted avg F1-score | 0.7369 |
Macro F1-score | 0.7180 |
XGBoost
XGBoost is evaluated as a gradient boosting baseline.
Label encoding is applied to the target variable (prdtypecode
) to ensure compatibility with XGBoost, which requires integer class labels. After prediction, results are decoded back to the original label space for reporting using sklearn's LabelEncoder.
Classification Report
Class | Precision | Recall | F1-score | Support |
---|---|---|---|---|
10 | 0.6605 | 0.6993 | 0.6794 | 612 |
40 | 0.7982 | 0.6910 | 0.7407 | 521 |
50 | 0.7699 | 0.7311 | 0.7500 | 357 |
60 | 0.9530 | 0.8820 | 0.9161 | 161 |
1140 | 0.7293 | 0.7347 | 0.7320 | 539 |
1160 | 0.9031 | 0.9135 | 0.9083 | 786 |
1180 | 0.8095 | 0.4658 | 0.5913 | 146 |
1280 | 0.6589 | 0.7315 | 0.6933 | 961 |
1281 | 0.7516 | 0.5637 | 0.6442 | 424 |
1300 | 0.9362 | 0.9189 | 0.9275 | 974 |
1301 | 0.9521 | 0.8225 | 0.8825 | 169 |
1302 | 0.9045 | 0.7101 | 0.7956 | 507 |
1320 | 0.7020 | 0.6414 | 0.6703 | 672 |
1560 | 0.7785 | 0.8500 | 0.8126 | 1013 |
1920 | 0.9242 | 0.9132 | 0.9187 | 841 |
1940 | 0.8411 | 0.6569 | 0.7377 | 137 |
2060 | 0.7731 | 0.8212 | 0.7964 | 1029 |
2220 | 0.8551 | 0.6941 | 0.7662 | 170 |
2280 | 0.7768 | 0.8163 | 0.7961 | 942 |
2403 | 0.7403 | 0.8327 | 0.7838 | 986 |
2462 | 0.8118 | 0.7614 | 0.7858 | 306 |
2522 | 0.8627 | 0.8940 | 0.8781 | 991 |
2582 | 0.8317 | 0.7273 | 0.7760 | 462 |
2583 | 0.9073 | 0.9663 | 0.9359 | 2047 |
2585 | 0.7724 | 0.6724 | 0.7189 | 525 |
2705 | 0.8464 | 0.9168 | 0.8802 | 517 |
2905 | 0.9947 | 0.9841 | 0.9894 | 189 |
Metric | Value |
---|---|
Accuracy | 0.8159 |
Macro avg Precision | 0.8239 |
Macro avg Recall | 0.7782 |
Macro avg F1-score | 0.7966 |
Weighted avg Precision | 0.8175 |
Weighted avg Recall | 0.8159 |
Weighted avg F1-score | 0.8144 |
Macro F1-score | 0.7966 |
Ensemble Voting Classifier Results
A VotingClassifier ensemble was tested using both hard voting (majority vote) and soft voting (probability averaging) to combine Random Forest and XGBoost predictions.
Hard Voting
Resulted in lower performance than XGBoost alone, as the weaker Random Forest model diluted the stronger XGBoost predictions.
Metric | Value |
---|---|
Macro avg Precision | 0.7968 |
Macro avg Recall | 0.7269 |
Macro avg F1-score | 0.7482 |
Weighted avg Precision | 0.7897 |
Weighted avg Recall | 0.7693 |
Weighted avg F1-score | 0.7693 |
Macro F1-score | 0.7482 |
Soft Voting
Produced results nearly identical to XGBoost, indicating that XGBoost dominated the ensemble's predictions.
Metric | Value |
---|---|
Macro avg Precision | 0.8240 |
Macro avg Recall | 0.7789 |
Macro avg F1-score | 0.7967 |
Weighted avg Precision | 0.8180 |
Weighted avg Recall | 0.8161 |
Weighted avg F1-score | 0.8143 |
Summary
Random Forest and XGBoost provide strong baselines for the current feature set, with macro F1 scores of 0.7057 and 0.7966 respectively. XGBoost significantly outperforms Random Forest by about 9 percentage points in macro F1-score and nearly 9 percentage points in accuracy (82% vs 73%). This substantial performance gap indicates that gradient boosting approaches are better suited to this classification task with the current feature representation.
Further improvements may be possible with more advanced models, feature engineering, or deep learning approaches. Simple ensemble methods did not improve over XGBoost's strong performance; future work should focus on model tuning or more sophisticated techniques like convolutional neural networks that can better leverage the image features in the dataset.