How to prepare excel wide-format data for r's pivot_longer and tidyr
- Step 1Export the wide sheet as .xlsx — Make sure the wide table is on the first sheet of the workbook — the tool reads sheet index 0 only.
.csvalso works if that is your source. - Step 2Decide your tidy column names — Pick the snake_case names you use in R —
names_to = 'year',values_to = 'value'. You will type these into the variable/value name fields so the output matches your downstream code. - Step 3List ID columns and value columns — Type the variables that stay (
country, product) into ID columns (comma-sep), and the wide headers to gather (yr_2021, yr_2022, yr_2023) into Value columns (comma-sep). Match the header text exactly. - Step 4Set the variable and value names — Set Variable col name to
year(holds the gathered header) and Value col name tovalue(holds the cell). These map directly tonames_toandvalues_toinpivot_longer. - Step 5Run and download unpivoted.xlsx — Run the tool; it reshapes in the browser and offers
unpivoted.xlsx(sheetUnpivoted). Verify the row count equals input rows × number of value columns. - Step 6Read it into R —
library(readxl); df <- read_xlsx('unpivoted.xlsx'). The tibble is already long — pipe straight intodf |> group_by(year) |> summarise(total = sum(value))orggplot(df, aes(year, value, colour = country)).
Field-to-pivot_longer mapping
How each Un-Pivot field corresponds to a tidyr::pivot_longer() argument.
| Un-Pivot field | pivot_longer() equivalent | Example |
|---|---|---|
| ID columns (comma-sep) | columns NOT in cols (kept) | country, product |
| Value columns (comma-sep) | cols = c(...) | yr_2021, yr_2022, yr_2023 |
| Variable col name | names_to | year |
| Value col name | values_to | value |
Behaviour vs. tidyr defaults
Where the tool matches pivot_longer and where you should mind the difference.
| Aspect | Un-Pivot tool | tidyr::pivot_longer default |
|---|---|---|
| Empty/NA cells | Emits a row with an empty value | Keeps the row (values_drop_na = FALSE) |
| Unlisted columns | Dropped | Kept (they remain ID columns) |
| Value column name match | Exact header text, trimmed | Tidyselect helpers (starts_with, etc.) |
| Type of gathered values | Formatted text (numbers/dates rendered) | Inferred column type |
| Multiple value groups | Single group only | Supports .value sentinel for multi-measure |
Tier limits (Excel family)
Un-Pivot needs Pro or higher. Plan against the input-row cap before reshaping large longitudinal datasets.
| Tier | Max file size | Input row limit | Files per run |
|---|---|---|---|
| Free | 5 MB | 10,000 | 1 (Un-Pivot is Pro-only) |
| Pro | 50 MB | 100,000 | 5 |
| Pro-media | 200 MB | 500,000 | 20 |
| Developer | 500 MB | Unlimited | Unlimited |
Cookbook
Reshapes that drop straight into a tidyverse pipeline. Each shows the wide input, the field values, and the R you run on the result.
Wide years to a tidy tibble for ggplot2
Three year columns gathered into a year column, then plotted. The output is exactly what pivot_longer(cols = c(yr_2021, yr_2022, yr_2023), names_to = 'year', values_to = 'value') would give.
Input (wide):
country,yr_2021,yr_2022,yr_2023
USA,100,120,140
UK,80,90,95
Fields:
ID columns: country
Value columns: yr_2021, yr_2022, yr_2023
Variable col: year
Value col: value
Then in R:
library(readxl); library(ggplot2)
df <- read_xlsx('unpivoted.xlsx')
ggplot(df, aes(year, value, colour = country, group = country)) +
geom_line()Group-by summarise on the long data
Once long, dplyr aggregation is trivial. The value column is numeric-looking text from the melt; read_xlsx infers numeric, or coerce with as.numeric if a stray format slipped through.
Fields:
ID columns: region, product
Value columns: q1, q2, q3, q4
Variable col: quarter
Value col: units
In R:
df <- read_xlsx('unpivoted.xlsx')
df |>
group_by(quarter) |>
summarise(total_units = sum(units, na.rm = TRUE))Drop NA rows that the melt preserves
Unlike values_drop_na = TRUE in pivot_longer, this tool keeps empty cells as rows. Replicate the drop in R after reading, so your tidy frame has no blank observations.
Input (wide, UK missing yr_2023):
country,yr_2022,yr_2023
USA,120,140
UK,90,
After unpivot you get a row: UK, yr_2023, (empty)
In R, drop it to mimic values_drop_na = TRUE:
df <- read_xlsx('unpivoted.xlsx')
df <- df |> filter(!is.na(value) & value != '')Keep an extra ID variable R needs
Columns you do not list are dropped. If R needs a grouping variable (e.g. cohort), add it to the ID columns so it survives the reshape and is available for facet_wrap.
Input (wide): country,cohort,yr_2022,yr_2023 USA,A,120,140 UK,B,90,95 Fields: ID columns: country, cohort Value columns: yr_2022, yr_2023 Variable col: year Value col: value In R: ggplot(df, aes(year, value)) + geom_col() + facet_wrap(~ cohort)
Why pivot before R, not inside it
For a wide file that is slow to read or that you only want long, reshaping first means read_xlsx loads a smaller-schema, already-tidy file — no pivot_longer call to maintain and no chance of a tidyselect typo silently gathering the wrong columns.
Instead of (inside R, risk of wrong cols):
df <- read_xlsx('wide.xlsx') |>
pivot_longer(cols = starts_with('yr_'),
names_to = 'year', values_to = 'value')
Do (reshape first, then a plain read):
# Unpivot tool with explicit value columns yr_2021..yr_2023
df <- read_xlsx('unpivoted.xlsx') # already tidyEdge cases and what actually happens
Tool keeps NA rows; pivot_longer can drop them
Behaviour differencetidyr::pivot_longer(values_drop_na = TRUE) removes rows where the gathered value is NA. This tool always emits a row per value column, including empty cells. To match the tidy default, filter(!is.na(value) & value != '') after read_xlsx.
Unlisted column dropped, breaking a later join
Dropped by designOnly ID and value columns survive. If R later joins on an identifier you forgot to list (e.g. id, cohort), that column is gone. Add every variable R needs to the ID columns list before reshaping.
Numeric values arrive as character in R
Type coercionValues are read with formatting applied, so the value column may come into R as character if cells carried text formatting. read_xlsx usually infers numeric, but a stray format can force character. Coerce with mutate(value = as.numeric(value)) if sum() complains.
Value-column name does not match a header
Empty value blockMatching is exact (after trimming). A typo like yr2022 vs yr_2022 yields empty values for that gathered name across all rows — an all-NA block in R. Copy header text exactly; tidyselect helpers like starts_with() are not available here.
Multi-measure wide data (revenue + cost per year)
Single group onlypivot_longer's .value sentinel can split rev_2021/cost_2021 into measure + year. This tool melts one group into one (variable, value) pair. Melt once, then split the variable column on the delimiter in R (or with the regex extractor before reading), or run pivot_longer for the multi-measure case.
Data on sheet 2 of the workbook
Not readOnly the first sheet is processed. Put the wide table on the first tab before uploading, or read_xlsx on the result will see nothing useful.
Date headers gather as text
By designIf your wide headers are actual dates, the gathered variable column holds the displayed date text. Parse it in R with lubridate::ymd() (or similar) to recover a Date type for time-series work.
Large longitudinal file over the row cap
RejectedThe input-row cap applies before reshaping (Pro 100,000). A many-year panel can exceed it. Split the panel, upgrade (Developer is unlimited), or pivot_longer inside R for the very largest files.
Free tier blocked
Requires ProUn-Pivot is Pro-gated and throws 'Un-Pivot requires Pro tier' for free accounts. If you are already in R, pivot_longer is free; the browser tool's value is skipping the in-R reshape and the install.
Frequently asked questions
Is this equivalent to tidyr::pivot_longer()?
For a single value group, yes — the output is the same long shape pivot_longer(cols = ..., names_to = ..., values_to = ...) produces. The differences: this tool keeps NA rows (no values_drop_na), drops unlisted columns, and matches value columns by exact name rather than tidyselect helpers.
Why reshape before R instead of using pivot_longer?
If the Excel file is the source, reshaping in the browser means read_xlsx loads an already-tidy, smaller-schema file — no pivot_longer call to maintain and no risk of a starts_with() typo gathering the wrong columns. It is also handy for very large wide files that are slow to read whole.
How do I match my R column names?
Type your snake_case names into the fields: Variable col name -> names_to value (e.g. year), Value col name -> values_to value (e.g. value). The output columns then line up with your R code without a rename().
Does it work for pandas.melt() too?
Yes — the long output is identical to what pandas.melt() produces for the same id/value split. The file serves both R and Python pipelines; only the read function differs (read_xlsx vs pd.read_excel). To reverse the shape, the pivot generator takes long data back to wide.
Why are my values character instead of numeric in R?
The tool reads cells with formatting applied, so a formatted number can melt as text. read_xlsx usually re-infers numeric, but if sum() errors, coerce with mutate(value = as.numeric(value)). Setting source cells to General number before exporting avoids it.
How do I drop the empty rows the melt keeps?
After read_xlsx, run filter(!is.na(value) & value != '') to mimic pivot_longer(values_drop_na = TRUE). The tool itself never skips empty cells — every value column emits a row.
Can it handle multiple measures per period (rev + cost)?
Not in one pass. It melts a single value group into one (variable, value) pair. Melt once, then separate() the variable column on the delimiter in R, or use pivot_longer with the .value sentinel for true multi-measure reshaping.
What if a value column name does not match my header?
It still emits one row per input row for that name, with an empty value — appearing as an all-NA block in R. That block is the signal you mistyped a header. Matching is exact text after trimming, so copy headers precisely.
Which input formats does it accept?
.xlsx, .xls, .ods, and .csv. It reads the first sheet only and always outputs .xlsx (sheet Unpivoted, file unpivoted.xlsx), which readxl::read_xlsx() reads natively.
Is my research data uploaded anywhere?
No. Reshaping runs in your browser via SheetJS. Data with subject identifiers never leaves the tab — only an anonymous processed-file counter is recorded for dashboard stats, which you can opt out of.
How big a file can it reshape?
Bounded by your tier: Pro caps at 50 MB and 100,000 input rows; Developer is unlimited. The cap is on input rows, so a wide panel near 100,000 rows times several years could exceed the output but the input must still fit under the cap.
Can I script this into a reproducible pipeline?
Yes. GET /api/v1/tools/excel-unpivot returns the schema so a script or MCP client can build the payload; execution runs through a paired @jadapps/runner on your machine, so file content never reaches JAD. Pair once, then reshape locally as a pre-read_xlsx step.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.