embed

The embed package provides supervised preprocessing steps for the recipes package that transform categorical and numeric predictors into numeric embeddings. It exists as a separate package because it depends on heavier libraries like keras3, rstanarm, and lme4.

The package offers multiple encoding methods for categorical variables (including effect encoding via GLM/Bayesian models, neural network embeddings, weight of evidence, and feature hashing) and dimensionality reduction for numeric predictors (including supervised UMAP and tree-based discretization). It solves the problem of handling high-cardinality categorical variables and extracting meaningful numeric representations that incorporate information about the relationship between predictors and outcomes. Most preprocessing methods are supervised, meaning they use outcome information to create more predictive features.

No results found