| Title: | Translating from R to Python's Pandas Package |
|---|---|
| Description: | Provides an R interface to Python's 'pandas' library using non-standard evaluation. Users can write R code (e.g., rp_filter(), rp_select(), rp_mutate()) that is translated into pandas commands and executed via 'reticulate'. Supports chaining, grouping, and 'summarisation', and includes a 'table_name' parameter to generate 'copy-pasteable' Python code. Ideal for leveraging pandas' speed and flexibility within the R ecosystem. |
| Authors: | Akshat Maurya [aut, cre], Rihaan Satia [aut], David Shilane [aut] |
| Maintainer: | Akshat Maurya <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.4 |
| Built: | 2026-05-29 09:06:53 UTC |
| Source: | https://github.com/akshat09867/rpandas |
This package provides a set of wrapper functions that allow R users to interact with Python's pandas library using familiar R syntax.
Maintainer: Akshat Maurya [email protected]
Authors:
Rihaan Satia
David Shilane [email protected]
Provides the opposite of the standard R %in% operator.
x %notin% yx %notin% y
x |
Vector of values to be matched. |
y |
Vector of values to be matched against. |
A logical vector.
"a" %notin% c("b", "c")"a" %notin% c("b", "c")
Applies a list of summary functions to a list of columns, after optionally grouping the data.
rp_calculate( .data, ..., the.functions, .by = NULL, table_name = NULL, return.as = "result" )rp_calculate( .data, ..., the.functions, .by = NULL, table_name = NULL, return.as = "result" )
.data |
An R data.frame. |
... |
Bare column names to summarize (e.g., |
the.functions |
A character vector of R function names
(e.g., |
.by |
A bare column name or |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
A data.frame with the summarized and grouped data.
if (reticulate::py_available(initialize = TRUE) && reticulate::py_module_available("pandas")) { rp_calculate( ggplot2::diamonds, price, carat, the.functions = c("mean", "sd"), .by = cut ) }if (reticulate::py_available(initialize = TRUE) && reticulate::py_module_available("pandas")) { rp_calculate( ggplot2::diamonds, price, carat, the.functions = c("mean", "sd"), .by = cut ) }
This function checks if the user's system is correctly configured with Python and the pandas library. If dependencies are missing, it stops with a detailed diagnostic report and actionable instructions (only in interactive sessions). In non‑interactive contexts (e.g., CRAN checks), it issues a warning and returns FALSE.
rp_check_env()rp_check_env()
Invisibly returns TRUE if all checks pass, otherwise FALSE.
This function returns the number of rows in a data frame. When grouping
variables are provided via .by, it returns the row counts for each group.
rp_count(.data, .by = NULL, table_name = NULL, return.as = "result")rp_count(.data, .by = NULL, table_name = NULL, return.as = "result")
.data |
An R data frame (or tibble) to be processed. |
.by |
Optional grouping variables. Can be one or more unquoted column names
(e.g., |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
One of |
A data frame with one column "n" (total row count) if .by = NULL,
or a data frame with the grouping columns and a column "n" (per‑group counts).
Filters a data frame using an R expression translated to pandas.
rp_filter(.data, filter_expression, table_name = NULL, return.as = "result")rp_filter(.data, filter_expression, table_name = NULL, return.as = "result")
.data |
An R data.frame or tibble. |
filter_expression |
The filtering expression, written in R syntax. |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
A data.frame containing the filtered rows.
if (reticulate::py_available(initialize = TRUE) && reticulate::py_module_available("pandas")) { rp_filter(ggplot2::diamonds, carat > 1 & price < 4000) }if (reticulate::py_available(initialize = TRUE) && reticulate::py_module_available("pandas")) { rp_filter(ggplot2::diamonds, carat > 1 & price < 4000) }
This function returns the first k rows of the data frame. If grouping variables
are provided via .by, it returns the first k rows within each group.
rp_first_k_rows(.data, k, .by = NULL, table_name = NULL, return.as = "result")rp_first_k_rows(.data, k, .by = NULL, table_name = NULL, return.as = "result")
.data |
An R data frame (or tibble) to be processed. |
k |
An integer specifying the number of rows to return. If |
.by |
Optional grouping variables. Can be one or more unquoted column names. When provided, the operation is performed on each group separately. |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
One of |
Depending on return.as: a data frame, a character string, or a list.
This function returns the last k rows of the data frame. If grouping variables
are provided via .by, it returns the last k rows within each group.
rp_last_k_rows(.data, k, .by = NULL, table_name = NULL, return.as = "result")rp_last_k_rows(.data, k, .by = NULL, table_name = NULL, return.as = "result")
.data |
An R data frame (or tibble) to be processed. |
k |
An integer specifying the number of rows to return. If |
.by |
Optional grouping variables. Can be one or more unquoted column names. When provided, the operation is performed on each group separately. |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
One of |
Depending on return.as: a data frame, a character string, or a list.
Mutate (add/modify/remove) columns using pandas
rp_mutate( .data, to_remove = NULL, ..., table_name = NULL, return.as = "result" )rp_mutate( .data, to_remove = NULL, ..., table_name = NULL, return.as = "result" )
.data |
An R data frame. |
to_remove |
A character vector of column names to remove. |
... |
Named expressions for new/modified columns. |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
Either "result", "code", or "all". |
A data frame or list depending on return.as.
Selects specific columns from a data frame. It captures the bare column names and translates the operation into a pandas selection command.
rp_select(.data, ..., table_name = NULL, return.as = "result")rp_select(.data, ..., table_name = NULL, return.as = "result")
.data |
An R data.frame or tibble. |
... |
The bare column names to select (e.g., |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
A data.frame containing only the selected columns.
if (reticulate::py_available(initialize = TRUE) && reticulate::py_module_available("pandas")) { rp_select(ggplot2::diamonds, carat, cut, price) }if (reticulate::py_available(initialize = TRUE) && reticulate::py_module_available("pandas")) { rp_select(ggplot2::diamonds, carat, cut, price) }
Sorts a data frame by one or more columns. It translates the R expressions
into a pandas .sort_values() command and executes it.
rp_sort(.data, ..., table_name = NULL, return.as = "result")rp_sort(.data, ..., table_name = NULL, return.as = "result")
.data |
An R data.frame or tibble. |
... |
Bare column names to sort by. Use |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
A data.frame sorted by the specified columns.
if (reticulate::py_available(initialize = TRUE) && reticulate::py_module_available("pandas")) { # Sort by cut (ascending) and price (descending) rp_sort(ggplot2::diamonds, cut, desc(price)) }if (reticulate::py_available(initialize = TRUE) && reticulate::py_module_available("pandas")) { # Sort by cut (ascending) and price (descending) rp_sort(ggplot2::diamonds, cut, desc(price)) }
Aggregates a data frame by one or more groups, applying summary functions.
It translates R's dplyr::summarise syntax into a pandas .groupby().agg()
command.
rp_summarize(.data, ..., .by = NULL, table_name = NULL, return.as = "result")rp_summarize(.data, ..., .by = NULL, table_name = NULL, return.as = "result")
.data |
An R data.frame or tibble. |
... |
Named summary expressions (e.g., |
.by |
A bare column name or |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
A data.frame with the summarized and grouped data.
if (reticulate::py_available(initialize = TRUE) && reticulate::py_module_available("pandas")) { # Summarize by one group rp_summarize(ggplot2::diamonds, avg_price = mean(price), .by = cut) # Summarize by multiple groups and multiple functions rp_summarize(ggplot2::diamonds, avg_price = mean(price), count = n(), .by = c(cut, color)) }if (reticulate::py_available(initialize = TRUE) && reticulate::py_module_available("pandas")) { # Summarize by one group rp_summarize(ggplot2::diamonds, avg_price = mean(price), .by = cut) # Summarize by multiple groups and multiple functions rp_summarize(ggplot2::diamonds, avg_price = mean(price), count = n(), .by = c(cut, color)) }
Capture a bare R expression and translate it to a Python-compatible string
suitable for use with pandas.DataFrame.query().
translate_filter(expr)translate_filter(expr)
expr |
A bare R expression (e.g., |
A character string of the translated Python query.
translate_filter(carat > 2 & cut == "Ideal") # -> "(carat > 1) and (cut == 'Ideal')"translate_filter(carat > 2 & cut == "Ideal") # -> "(carat > 1) and (cut == 'Ideal')"