duckplyr

duckplyr is a drop-in replacement for dplyr that uses DuckDB as its execution engine to run data manipulation operations faster. It executes existing dplyr code with identical results while automatically leveraging DuckDB’s performance optimizations.

The package handles larger-than-memory datasets by working directly with files on disk or remote URLs without loading everything into memory. It automatically falls back to standard dplyr when DuckDB doesn’t support a specific operation, providing transparent acceleration without requiring code changes. The package can query Parquet, CSV, and JSON files efficiently, including remote files over HTTP, making it practical for analyzing datasets that exceed available RAM.

duckplyr

Contributors

Davis Vaughan

Hadley Wickham

Jeroen Janssens

Mine Çetinkaya-Rundel