Skip to content

Contents


Base Models

Initial modeling is performed using Random Forest and XGBoost classifiers. These models are chosen for their robustness and ease of use. Convolutional Neural Networks (CNNs) will be explored at a later stage.

All evaluations below use the preprocessed combined datasets X_train_combined.npyand X_val_combined.npy with approximately 3,600 features, combining reduced TF-IDF vectors and image features after PCA.


Random Forest

Random Forest is used as a strong baseline due to its ability to handle mixed feature types and its resilience to overfitting. The first evaluation uses 100 estimators and class balancing.

Classification Report (first try, 100 estimators)

ClassPrecisionRecallF1-scoreSupport
100.38320.59480.4661612
400.73470.53170.6169521
500.73680.66670.7000357
600.92140.80120.8571161
11400.72190.65490.6868539
11600.86000.88300.8713786
11800.76830.43150.5526146
12800.56620.53380.5495961
12810.59730.31130.4093424
13000.84320.87780.8602974
13010.94850.76330.8459169
13020.88150.60160.7151507
13200.72940.50150.5944672
15600.66180.80950.72821013
19200.90390.87280.8881841
19400.66120.58390.6202137
20600.71040.73180.72091029
22200.83610.60000.6986170
22800.63870.77710.7011942
24030.71490.71200.7134986
24620.74920.72220.7354306
25220.73300.82540.7765991
25820.73370.53680.6200462
25830.80130.95550.87172047
25850.84510.45710.5933525
27050.63370.71950.6739517
29050.98940.98410.9867189
MetricValue
Accuracy0.7273
Macro avg F10.7057
Macro avg Precision0.7520
Macro avg Recall0.6830
Weighted avg F10.7231
Weighted avg Precision0.7369
Weighted avg Recall0.7273
Macro F1-score0.7057

Random Oversampling

Random oversampling is applied to balance the classes in the training set. The improvement is negligible, as shown below, so the method is then abandoned:

MetricValue
Accuracy0.7388
Macro avg Precision0.7498
Macro avg Recall0.7017
Macro avg F1-score0.7180
Weighted avg Precision0.7456
Weighted avg Recall0.7388
Weighted avg F1-score0.7369
Macro F1-score0.7180

XGBoost

XGBoost is evaluated as a gradient boosting baseline.

Label encoding is applied to the target variable (prdtypecode) to ensure compatibility with XGBoost, which requires integer class labels. After prediction, results are decoded back to the original label space for reporting using sklearn's LabelEncoder.

Classification Report

ClassPrecisionRecallF1-scoreSupport
100.66050.69930.6794612
400.79820.69100.7407521
500.76990.73110.7500357
600.95300.88200.9161161
11400.72930.73470.7320539
11600.90310.91350.9083786
11800.80950.46580.5913146
12800.65890.73150.6933961
12810.75160.56370.6442424
13000.93620.91890.9275974
13010.95210.82250.8825169
13020.90450.71010.7956507
13200.70200.64140.6703672
15600.77850.85000.81261013
19200.92420.91320.9187841
19400.84110.65690.7377137
20600.77310.82120.79641029
22200.85510.69410.7662170
22800.77680.81630.7961942
24030.74030.83270.7838986
24620.81180.76140.7858306
25220.86270.89400.8781991
25820.83170.72730.7760462
25830.90730.96630.93592047
25850.77240.67240.7189525
27050.84640.91680.8802517
29050.99470.98410.9894189
MetricValue
Accuracy0.8159
Macro avg Precision0.8239
Macro avg Recall0.7782
Macro avg F1-score0.7966
Weighted avg Precision0.8175
Weighted avg Recall0.8159
Weighted avg F1-score0.8144
Macro F1-score0.7966

Ensemble Voting Classifier Results

A VotingClassifier ensemble was tested using both hard voting (majority vote) and soft voting (probability averaging) to combine Random Forest and XGBoost predictions.

Hard Voting

Resulted in lower performance than XGBoost alone, as the weaker Random Forest model diluted the stronger XGBoost predictions.

MetricValue
Macro avg Precision0.7968
Macro avg Recall0.7269
Macro avg F1-score0.7482
Weighted avg Precision0.7897
Weighted avg Recall0.7693
Weighted avg F1-score0.7693
Macro F1-score0.7482

Soft Voting

Produced results nearly identical to XGBoost, indicating that XGBoost dominated the ensemble's predictions.

MetricValue
Macro avg Precision0.8240
Macro avg Recall0.7789
Macro avg F1-score0.7967
Weighted avg Precision0.8180
Weighted avg Recall0.8161
Weighted avg F1-score0.8143

Summary

Random Forest and XGBoost provide strong baselines for the current feature set, with macro F1 scores of 0.7057 and 0.7966 respectively. XGBoost significantly outperforms Random Forest by about 9 percentage points in macro F1-score and nearly 9 percentage points in accuracy (82% vs 73%). This substantial performance gap indicates that gradient boosting approaches are better suited to this classification task with the current feature representation.

Further improvements may be possible with more advanced models, feature engineering, or deep learning approaches. Simple ensemble methods did not improve over XGBoost's strong performance; future work should focus on model tuning or more sophisticated techniques like convolutional neural networks that can better leverage the image features in the dataset.