Base Models

Initial modeling is performed using Random Forest and XGBoost classifiers. These models are chosen for their robustness and ease of use. Convolutional Neural Networks (CNNs) will be explored at a later stage.

All evaluations below use the preprocessed combined datasets X_train_combined.npyand X_val_combined.npy with approximately 3,600 features, combining reduced TF-IDF vectors and image features after PCA.

Random Forest

Random Forest is used as a strong baseline due to its ability to handle mixed feature types and its resilience to overfitting. The first evaluation uses 100 estimators and class balancing.

Classification Report (first try, 100 estimators)

Class	Precision	Recall	F1-score	Support
10	0.3832	0.5948	0.4661	612
40	0.7347	0.5317	0.6169	521
50	0.7368	0.6667	0.7000	357
60	0.9214	0.8012	0.8571	161
1140	0.7219	0.6549	0.6868	539
1160	0.8600	0.8830	0.8713	786
1180	0.7683	0.4315	0.5526	146
1280	0.5662	0.5338	0.5495	961
1281	0.5973	0.3113	0.4093	424
1300	0.8432	0.8778	0.8602	974
1301	0.9485	0.7633	0.8459	169
1302	0.8815	0.6016	0.7151	507
1320	0.7294	0.5015	0.5944	672
1560	0.6618	0.8095	0.7282	1013
1920	0.9039	0.8728	0.8881	841
1940	0.6612	0.5839	0.6202	137
2060	0.7104	0.7318	0.7209	1029
2220	0.8361	0.6000	0.6986	170
2280	0.6387	0.7771	0.7011	942
2403	0.7149	0.7120	0.7134	986
2462	0.7492	0.7222	0.7354	306
2522	0.7330	0.8254	0.7765	991
2582	0.7337	0.5368	0.6200	462
2583	0.8013	0.9555	0.8717	2047
2585	0.8451	0.4571	0.5933	525
2705	0.6337	0.7195	0.6739	517
2905	0.9894	0.9841	0.9867	189

Metric	Value
Accuracy	0.7273
Macro avg F1	0.7057
Macro avg Precision	0.7520
Macro avg Recall	0.6830
Weighted avg F1	0.7231
Weighted avg Precision	0.7369
Weighted avg Recall	0.7273
Macro F1-score	0.7057

Random Oversampling

Random oversampling is applied to balance the classes in the training set. The improvement is negligible, as shown below, so the method is then abandoned:

Metric	Value
Accuracy	0.7388
Macro avg Precision	0.7498
Macro avg Recall	0.7017
Macro avg F1-score	0.7180
Weighted avg Precision	0.7456
Weighted avg Recall	0.7388
Weighted avg F1-score	0.7369
Macro F1-score	0.7180

XGBoost

XGBoost is evaluated as a gradient boosting baseline.

Label encoding is applied to the target variable (prdtypecode) to ensure compatibility with XGBoost, which requires integer class labels. After prediction, results are decoded back to the original label space for reporting using sklearn's LabelEncoder.

Classification Report

Class	Precision	Recall	F1-score	Support
10	0.6605	0.6993	0.6794	612
40	0.7982	0.6910	0.7407	521
50	0.7699	0.7311	0.7500	357
60	0.9530	0.8820	0.9161	161
1140	0.7293	0.7347	0.7320	539
1160	0.9031	0.9135	0.9083	786
1180	0.8095	0.4658	0.5913	146
1280	0.6589	0.7315	0.6933	961
1281	0.7516	0.5637	0.6442	424
1300	0.9362	0.9189	0.9275	974
1301	0.9521	0.8225	0.8825	169
1302	0.9045	0.7101	0.7956	507
1320	0.7020	0.6414	0.6703	672
1560	0.7785	0.8500	0.8126	1013
1920	0.9242	0.9132	0.9187	841
1940	0.8411	0.6569	0.7377	137
2060	0.7731	0.8212	0.7964	1029
2220	0.8551	0.6941	0.7662	170
2280	0.7768	0.8163	0.7961	942
2403	0.7403	0.8327	0.7838	986
2462	0.8118	0.7614	0.7858	306
2522	0.8627	0.8940	0.8781	991
2582	0.8317	0.7273	0.7760	462
2583	0.9073	0.9663	0.9359	2047
2585	0.7724	0.6724	0.7189	525
2705	0.8464	0.9168	0.8802	517
2905	0.9947	0.9841	0.9894	189

Metric	Value
Accuracy	0.8159
Macro avg Precision	0.8239
Macro avg Recall	0.7782
Macro avg F1-score	0.7966
Weighted avg Precision	0.8175
Weighted avg Recall	0.8159
Weighted avg F1-score	0.8144
Macro F1-score	0.7966

Ensemble Voting Classifier Results

A VotingClassifier ensemble was tested using both hard voting (majority vote) and soft voting (probability averaging) to combine Random Forest and XGBoost predictions.

Hard Voting

Resulted in lower performance than XGBoost alone, as the weaker Random Forest model diluted the stronger XGBoost predictions.

Metric	Value
Macro avg Precision	0.7968
Macro avg Recall	0.7269
Macro avg F1-score	0.7482
Weighted avg Precision	0.7897
Weighted avg Recall	0.7693
Weighted avg F1-score	0.7693
Macro F1-score	0.7482

Soft Voting

Produced results nearly identical to XGBoost, indicating that XGBoost dominated the ensemble's predictions.

Metric	Value
Macro avg Precision	0.8240
Macro avg Recall	0.7789
Macro avg F1-score	0.7967
Weighted avg Precision	0.8180
Weighted avg Recall	0.8161
Weighted avg F1-score	0.8143

Summary

Random Forest and XGBoost provide strong baselines for the current feature set, with macro F1 scores of 0.7057 and 0.7966 respectively. XGBoost significantly outperforms Random Forest by about 9 percentage points in macro F1-score and nearly 9 percentage points in accuracy (82% vs 73%). This substantial performance gap indicates that gradient boosting approaches are better suited to this classification task with the current feature representation.

Further improvements may be possible with more advanced models, feature engineering, or deep learning approaches. Simple ensemble methods did not improve over XGBoost's strong performance; future work should focus on model tuning or more sophisticated techniques like convolutional neural networks that can better leverage the image features in the dataset.

Visualizations

Contents

Base Models

Random Forest

Classification Report (first try, 100 estimators)

Random Oversampling

XGBoost

Classification Report

Ensemble Voting Classifier Results

Hard Voting

Soft Voting

Summary

Contents ​

Base Models ​

Random Forest ​

Classification Report (first try, 100 estimators) ​

Random Oversampling ​

XGBoost ​

Classification Report ​

Ensemble Voting Classifier Results ​

Hard Voting ​

Soft Voting ​

Summary ​

Contents

Base Models

Random Forest

Classification Report (first try, 100 estimators)

Random Oversampling

XGBoost

Classification Report

Ensemble Voting Classifier Results

Hard Voting

Soft Voting

Summary