Extra recipes steps for dealing with unbalanced data
R
themis provides preprocessing steps for the recipes package that handle imbalanced classification data. It implements multiple over-sampling and under-sampling algorithms to balance class distributions before model training.
The package includes several sampling techniques like SMOTE, ADASYN, and Tomek links that address class imbalance through synthetic data generation or selective sampling. These methods integrate directly into recipes workflows and support multi-class problems with tunable sampling ratios. themis solves the common problem where machine learning models perform poorly on minority classes due to unbalanced training data.