Skip to content

Contents


Class distribution bar chart

alt text

Description and analysis

The bar chart displays the distribution of product types (prdtypecode) within the dataset. Each bar represents the frequency of a specific product type. There are 27 classes with a significant class imbalance.

Product Type CodePercentage
258312.02%
15605.97%
13005.94%
20605.88%
25225.88%
12805.74%
24035.62%
22805.61%
19205.07%
11604.66%
13203.82%
103.67%
27053.25%
11403.15%
25823.05%
402.95%
25852.94%
13022.93%
12812.44%
501.98%
24621.67%
29051.03%
600.98%
22200.97%
13010.95%
19400.95%
11800.90%

The product type code 2583 has the highest proportion at approximately 12%, indicating it is the most frequent product type in the dataset.

Several product types, such as 1560, 1300, 2060, and 2522, have similar proportions around 5-6%.

There are product types with notably lower proportions, such as 2905, 60, 2220, and others, each below 1%.

Validation

A chi-square test reveals a significant deviation from a uniform distribution in product types and thus a class imbalance.

Chi-square statistic: 36570.33

p-value: 0.0

INFO

A p-value of 0 may also be due to the very large sample size of X_train, so that the deviation is not as strong as implied by the values of the chi-square test.

Business relevance

Product types with higher frequencies may indicate strong market demand or a broader product range offered by the business. This can guide inventory management and marketing strategies.

Less frequent product types might represent niche markets or emerging trends. The business could explore expanding offerings in these areas to capture new market segments.

Class imbalance could pose risks in predictive modeling, potentially leading to inaccurate forecasts and thus errors in sourcing/purchasing.