Vendor Invoices

Vendor Invoices

Aggregate monthly vendor invoice PDFs into one master sheet for AP review.

Finance and ops teams drown in PDFs that are really tables in disguise — invoices, bank statements, vendor reports — and retyping or copy-pasting them into a spreadsheet is slow and error-prone. This workflow is a three-node JAD orchestrator chain that automates the round trip: an `http-request` connector GETs a PDF by URL, [pdf-to-text](/pdf-tools/pdf-to-text) extracts the document's text on the runner, and a `google-sheets` connector appends the result to a tab via `appendRows`. The PDF parsing happens locally on your paired runner; the only thing that leaves the machine is the cleaned output going to Sheets over OAuth. Connectors reference credentials by name (`credentialRef`) that live on the runner — Google tokens are never stored on the JAD server. Fork the blueprint to copy the graph into a private draft, point `http-request` at your source URL and `google-sheets` at your `spreadsheetId`, then run manually, trigger by webhook, or attach a cron. Jump to the [Cookbook](#cookbook) for copy-paste configs or [Errors & edge cases](#errors-and-edge-cases) for the failure modes.

How to set up: vendor invoices

Step 1
Pair a runner and add the Google credential — Both `http-request` and `google-sheets` are connectors marked `runnerOnly` — they execute only on a paired `@jadapps/runner`, not in the browser. In the runner, store an OAuth2 credential with the Sheets scope under a name like `google-prod`; the workflow references it by that name via `credentialRef`.
Step 2
Fork the blueprint — From the workflow page, fork the blueprint. The orchestrator copies the three-node chain (`http-request` → `pdf-to-text` → `google-sheets`) into a new private draft owned by you, wires consecutive ports, and snapshots it as version 1. A `forked` audit event records the source blueprint slug.
Step 3
Configure the source URL — Open the `http-request` node and set `url` to the PDF you want (method `GET`, default `timeoutMs` 60000). The response streams to disk on the runner and passes to the next node as a file.
Step 4
Set the Sheets target — Open the `google-sheets` node: action `appendRows`, set `spreadsheetId`, set `range` (e.g. `Sheet1!A:Z`), keep `valueInputOption` `USER_ENTERED` so dates and numbers parse, and set `credentialRef` to your stored credential name.
Step 5
Run and watch the trace — Trigger the run. The orchestrator walks the topologically sorted nodes once, firing a status callback per node. Each node shows pending → running → done/error with its duration and a one-line summary; the per-node trace is persisted on the run record.
Step 6
(Optional) Schedule it — The blueprint ships with `scheduleCron: null` (manual/webhook). To run it on a schedule, set a `schedule_cron` on your forked workflow; the Cloudflare cron tick scans scheduled workflows, decides which are due since `last_fired_at`, and enqueues a run.

Frequently asked questions

What exactly does this workflow chain?+

Three nodes in order: `http-request` (GET a PDF by URL), [pdf-to-text](/pdf-tools/pdf-to-text) (extract the text on the runner), then `google-sheets` with `action: appendRows` (append the result to a tab). That is the literal blueprint chain.

Does it ingest a whole folder of PDFs at once?+

Not as shipped. The blueprint chain fetches a single PDF per run via `http-request`. To process many PDFs you trigger the workflow per file (e.g. by webhook), or add a for-each loop node around the chain yourself after forking.

Are my PDFs uploaded to JAD's servers?+

No. PDF text extraction runs on your paired runner. The connectors are runnerOnly. The only data sent off-machine is the extracted output going to Google Sheets over OAuth.

Where are my Google credentials stored?+

On the runner. The `google-sheets` node references a credential by name (`credentialRef`, e.g. `google-prod`); the runner resolves the OAuth2 token at run time. The token is not stored on the JAD/Cloudflare server.

Do I need a paid plan to run this?+

Both `http-request` and `google-sheets` are Pro connectors (`isPro: true`) and require a paired runner. Note this is separate from the media tier-precheck: `precheckWorkflowTier` only blocks nodes whose `minTier` exceeds your tier, and these connectors carry no `minTier`, so they do not trip the pro_media gate.

Is this workflow cron-scheduled out of the box?+

No — the blueprint sets `scheduleCron: null`, so a fork runs manually or by webhook. Scheduling is opt-in: set a `schedule_cron` on your forked workflow and the Cloudflare cron tick will enqueue runs when they are due.

How does forking work?+

Forking calls the from-blueprint route, which copies the three-node chain into a new private draft owned by you, wires consecutive ports, snapshots it as version 1, and writes a `forked` audit event tagged with the blueprint slug.

Can I see a run history / trace?+

Yes. Each run records a `WorkflowRunTrace[]` — per-node status, duration, output size, and a one-line summary — plus an output_summary with step/success counts and total duration. The run panel renders each node pending → running → done/error as the orchestrator fires its onStep callbacks.

What happens if a PDF has no tables or no text?+

pdf-to-text returns an empty or near-empty string (it does not OCR scanned images). The chain still completes; appendRows may add a blank row. For scanned PDFs, run an OCR step before this chain.

Why did numbers like account IDs change in my sheet?+

`valueInputOption` defaults to USER_ENTERED, which parses values like a human typing — leading zeros drop and long numbers go scientific. Switch the node to RAW to keep cells as literal strings.

What if the source URL is slow or flaky?+

Raise `timeoutMs` on the `http-request` node (default 60000, max 600000) and set its error policy to retry. The runner retries a failing step up to 3 times with linear backoff (200ms, 400ms) before surfacing the error.

What other workflows are like this one?+

See [csv-to-slack-summary](/workflows/csv-to-slack-summary) for a data-clean-then-connector pattern, [rss-to-notion-digest](/workflows/rss-to-notion-digest) for a fetch-then-create-page chain, and [video-transcode-to-r2](/workflows/video-transcode-to-r2) for a local-process-then-upload chain.

Vendor Invoices

Aggregate monthly vendor invoice PDFs into one master sheet for AP review.

How to set up: vendor invoices

Step 1

Pair a runner and add the Google credential — Both `http-request` and `google-sheets` are connectors marked `runnerOnly` — they execute only on a paired `@jadapps/runner`, not in the browser. In the runner, store an OAuth2 credential with the Sheets scope under a name like `google-prod`; the workflow references it by that name via `credentialRef`.

Step 2

Fork the blueprint — From the workflow page, fork the blueprint. The orchestrator copies the three-node chain (`http-request` → `pdf-to-text` → `google-sheets`) into a new private draft owned by you, wires consecutive ports, and snapshots it as version 1. A `forked` audit event records the source blueprint slug.

Step 3

Configure the source URL — Open the `http-request` node and set `url` to the PDF you want (method `GET`, default `timeoutMs` 60000). The response streams to disk on the runner and passes to the next node as a file.

Step 4

Set the Sheets target — Open the `google-sheets` node: action `appendRows`, set `spreadsheetId`, set `range` (e.g. `Sheet1!A:Z`), keep `valueInputOption` `USER_ENTERED` so dates and numbers parse, and set `credentialRef` to your stored credential name.

Step 5

Run and watch the trace — Trigger the run. The orchestrator walks the topologically sorted nodes once, firing a status callback per node. Each node shows pending → running → done/error with its duration and a one-line summary; the per-node trace is persisted on the run record.

Step 6

(Optional) Schedule it — The blueprint ships with `scheduleCron: null` (manual/webhook). To run it on a schedule, set a `schedule_cron` on your forked workflow; the Cloudflare cron tick scans scheduled workflows, decides which are due since `last_fired_at`, and enqueues a run.

Frequently asked questions

What exactly does this workflow chain?+

Does it ingest a whole folder of PDFs at once?+

Are my PDFs uploaded to JAD's servers?+

No. PDF text extraction runs on your paired runner. The connectors are runnerOnly. The only data sent off-machine is the extracted output going to Google Sheets over OAuth.

Where are my Google credentials stored?+

Do I need a paid plan to run this?+

Is this workflow cron-scheduled out of the box?+

How does forking work?+

Can I see a run history / trace?+

What happens if a PDF has no tables or no text?+

pdf-to-text returns an empty or near-empty string (it does not OCR scanned images). The chain still completes; appendRows may add a blank row. For scanned PDFs, run an OCR step before this chain.

Why did numbers like account IDs change in my sheet?+

`valueInputOption` defaults to USER_ENTERED, which parses values like a human typing — leading zeros drop and long numbers go scientific. Switch the node to RAW to keep cells as literal strings.

What if the source URL is slow or flaky?+

What other workflows are like this one?+

How to set up: vendor invoices

Frequently asked questions

Other use cases for this workflow

Vendor Invoices

How to set up: vendor invoices

Frequently asked questions

Other use cases for this workflow