How it works
- Step 1Pair a runner and add the Google credential — Both
http-requestandgoogle-sheetsare connectors markedrunnerOnly— they execute only on a paired@jadapps/runner, not in the browser. In the runner, store an OAuth2 credential with the Sheets scope under a name likegoogle-prod; the workflow references it by that name viacredentialRef. - Step 2Fork the blueprint — From the workflow page, fork the blueprint. The orchestrator copies the three-node chain (
http-request→pdf-to-text→google-sheets) into a new private draft owned by you, wires consecutive ports, and snapshots it as version 1. Aforkedaudit event records the source blueprint slug. - Step 3Configure the source URL — Open the
http-requestnode and seturlto the PDF you want (methodGET, defaulttimeoutMs60000). The response streams to disk on the runner and passes to the next node as a file. - Step 4Set the Sheets target — Open the
google-sheetsnode: actionappendRows, setspreadsheetId, setrange(e.g.Sheet1!A:Z), keepvalueInputOptionUSER_ENTEREDso dates and numbers parse, and setcredentialRefto your stored credential name. - Step 5Run and watch the trace — Trigger the run. The orchestrator walks the topologically sorted nodes once, firing a status callback per node. Each node shows pending → running → done/error with its duration and a one-line summary; the per-node trace is persisted on the run record.
- Step 6(Optional) Schedule it — The blueprint ships with
scheduleCron: null(manual/webhook). To run it on a schedule, set aschedule_cronon your forked workflow; the Cloudflare cron tick scans scheduled workflows, decides which are due sincelast_fired_at, and enqueues a run.
The real node chain
Exactly what the pdf-batch-extract-to-sheets blueprint runs, in order. Source: lib/orchestrator/seo-workflow-blueprints.ts.
| # | Tool slug | Category | Role in chain | Key config |
|---|---|---|---|---|
| 1 | http-request | connector (runnerOnly) | GET the source PDF by URL | method=GET, url=…/document.pdf, timeoutMs=60000 |
| 2 | pdf-to-text | Extract plain text from the PDF on the runner | {} (defaults) | |
| 3 | google-sheets | connector (runnerOnly) | Append the extracted rows to a tab | action=appendRows, range=Sheet1!A:Z, valueInputOption=USER_ENTERED, credentialRef |
Trigger, credential & tier matrix
What each requirement actually maps to in the orchestrator runtime. Verified against tool-registry.ts, tier-precheck.ts, cron/tick, and the from-blueprint route.
| Concern | How it works here | Where it is enforced |
|---|---|---|
| Default trigger | Manual or webhook — blueprint scheduleCron is null | from-blueprint copies schedule_cron=null into the new row |
| Scheduling | Opt-in: set schedule_cron after forking; cron tick enqueues due runs | Cloudflare Cron Trigger → /api/orchestrator/cron/tick |
| Credentials | Referenced by name (credentialRef), resolved on the paired runner; never stored on the JAD server | tool-registry connector config; runner resolves the secret |
| Runner requirement | Both connectors are runnerOnly; they throw 'requires a paired @jadapps/runner' if run in-browser | categoryHandlers.connector in tool-executor.ts |
| Pro gate | http-request and google-sheets are isPro:true connectors | UI/run affordance + runner pairing |
| Tier precheck | precheckWorkflowTier blocks only nodes whose minTier exceeds the user tier; these connectors carry no minTier, so they do NOT trip the pro_media precheck | tier-precheck.ts (tool.minTier ?? 'free') |
| Fork | Copies the graph to a private draft owned by the caller, snapshots v1, writes a 'forked' audit event | POST /api/orchestrator/workflows/from-blueprint |
| Run trace | Per-node status/duration/summary persisted on the run record | WorkflowRunTrace[] in types.ts; onStep callbacks in workflow-runner.ts |
google-sheets node — appendRows config
The outbound config fields used by node 3. Source: tool-registry.ts google-sheets entry.
| Field | Value for this workflow | Notes |
|---|---|---|
| action | appendRows | Outbound; the blueprint default in the registry is readRange, so set this explicitly |
| spreadsheetId | your sheet ID (e.g. 1AbCdE…) | Empty in the blueprint — you fill it after forking |
| range | Sheet1!A:Z | A1 range; appendRows appends after the last populated row in this range |
| valueInputOption | USER_ENTERED | Parses dates/numbers/formulas; use RAW to keep literal strings |
| credentialRef | google-prod (your runner credential) | OAuth2 with Sheets scope, resolved on the runner |
Cookbook
These are real, copy-paste-ready configurations for the three nodes. Field names match the tool registry exactly, so what you paste into a node's config is what the runner executes. Because the blueprint ships with empty spreadsheetId/url and an inbound readRange default on the Sheets node, the recipes below show the deltas you must set after forking.
Minimal: one PDF URL → append to a tab
ExampleThe smallest working setup. http-request fetches the PDF, pdf-to-text extracts it, google-sheets appends. Set only what the blueprint leaves blank.
http-request:
method: GET
url: https://files.example.com/2026-04-invoice.pdf
timeoutMs: 60000
pdf-to-text:
{} # defaults
google-sheets:
action: appendRows
spreadsheetId: 1AbCdEfGhIjKlMnOpQrStUvWxYz
range: Sheet1!A:Z
valueInputOption: USER_ENTERED
credentialRef: google-prodAuthenticated source PDF (bearer token)
ExampleWhen the PDF sits behind an API, store the token as a runner credential and let http-request inject auth from it. The credential name is resolved on the runner, never sent to the JAD server.
http-request:
method: GET
url: https://api.vendor.com/v1/invoices/8842/pdf
credentialRef: vendor-api # bearer/token resolved on the runner
headers: {"Accept":"application/pdf"}
timeoutMs: 120000Write into a specific tab instead of Sheet1
ExamplePoint the range at a named tab. appendRows appends after the last populated row within the given A1 range.
google-sheets: action: appendRows spreadsheetId: 1AbCdEfGhIjKlMnOpQrStUvWxYz range: 'April Invoices'!A:Z valueInputOption: USER_ENTERED credentialRef: google-prod
Keep raw strings (no date/number coercion)
ExampleStatement reference numbers like 0044012 lose their leading zeros under USER_ENTERED. Switch to RAW so cells stay literal.
google-sheets: action: appendRows spreadsheetId: 1AbCdEfGhIjKlMnOpQrStUvWxYz range: Statements!A:Z valueInputOption: RAW # literal strings, no coercion credentialRef: google-prod
Schedule it after forking (opt-in cron)
ExampleThe blueprint is manual by default. To run nightly, set schedule_cron on your forked workflow; the Cloudflare cron tick enqueues a run when it is due since last_fired_at.
# On your forked workflow row: schedule_cron: "0 6 * * *" # every day at 06:00 # cron/tick then: # - reads workflows where schedule_cron is not null # - skips rows not due since last_fired_at # - inserts a workflow_run_queue row (source: cron) # - stamps last_fired_at and wakes the runner
Edge cases and verbatim errors
No runner paired (run from the browser)
errorBoth connectors are runnerOnly. The browser fallback for the connector category throws <slug> is a connector and requires a paired @jadapps/runner. Configure one in the orchestrator status pill. Pair a runner before running this workflow.
credentialRef missing or unknown on the runner
failIf google-sheets references a credential name the runner doesn't have, the appendRows call has no OAuth token and the node errors. The node turns red in the trace with the failure message; the run aborts under the default abort error policy.
Source URL returns 404 / non-PDF
404http-request streams whatever the URL returns. A 404 page or HTML body still flows to pdf-to-text, which then fails to parse a PDF. The error surfaces on the pdf-to-text node. Validate the URL, or set the node's error policy to retry for transient 5xx/timeouts.
Empty or image-only PDF (no extractable text)
warningpdf-to-text returns an empty (or near-empty) string for scanned/image-only PDFs — it does no OCR. The chain still completes and appendRows may add a blank row. For scanned documents, swap in the OCR-capable PDF path before this chain.
Wrong Sheets action (blueprint default is readRange)
invalidThe google-sheets registry default for action is readRange (inbound). If you forget to switch it to appendRows, the node reads instead of writes and nothing is appended. Always set action explicitly after forking.
spreadsheetId left blank
failThe blueprint ships spreadsheetId: "". Running without filling it in means the Sheets API call has no target spreadsheet and the node errors. Fill spreadsheetId (and confirm the credential has access to it) before the first run.
USER_ENTERED mangles IDs and codes
warningUSER_ENTERED parses values like a human typing into a cell: 0044 becomes 44, long account numbers go to scientific notation, and date-like strings shift. Switch valueInputOption to RAW when you need exact strings.
http-request timeout on a large PDF
errorDefault timeoutMs is 60000. A slow host or a large PDF can exceed it and the request errors. Raise timeoutMs (max 600000) on the http-request node, and consider setting its error policy to retry — the runner retries up to 3 times with linear backoff.
Manual workflow never fires on its own
okscheduleCron is null in the blueprint, so a forked copy will only run when you trigger it (manually or by webhook). This is expected — it is not a misconfiguration. Add a schedule_cron if you want unattended runs.
Run completes; trace shows each node done
okOn success the trace records each node as done with its durationMs and a summary (e.g. page count for pdf-to-text, append result for google-sheets). The output_summary captures step count, success count, and total duration on the run record.
Frequently asked questions
What exactly does this workflow chain?
Three nodes in order: http-request (GET a PDF by URL), pdf-to-text (extract the text on the runner), then google-sheets with action: appendRows (append the result to a tab). That is the literal blueprint chain.
Does it ingest a whole folder of PDFs at once?
Not as shipped. The blueprint chain fetches a single PDF per run via http-request. To process many PDFs you trigger the workflow per file (e.g. by webhook), or add a for-each loop node around the chain yourself after forking.
Are my PDFs uploaded to JAD's servers?
No. PDF text extraction runs on your paired runner. The connectors are runnerOnly. The only data sent off-machine is the extracted output going to Google Sheets over OAuth.
Where are my Google credentials stored?
On the runner. The google-sheets node references a credential by name (credentialRef, e.g. google-prod); the runner resolves the OAuth2 token at run time. The token is not stored on the JAD/Cloudflare server.
Do I need a paid plan to run this?
Both http-request and google-sheets are Pro connectors (isPro: true) and require a paired runner. Note this is separate from the media tier-precheck: precheckWorkflowTier only blocks nodes whose minTier exceeds your tier, and these connectors carry no minTier, so they do not trip the pro_media gate.
Is this workflow cron-scheduled out of the box?
No — the blueprint sets scheduleCron: null, so a fork runs manually or by webhook. Scheduling is opt-in: set a schedule_cron on your forked workflow and the Cloudflare cron tick will enqueue runs when they are due.
How does forking work?
Forking calls the from-blueprint route, which copies the three-node chain into a new private draft owned by you, wires consecutive ports, snapshots it as version 1, and writes a forked audit event tagged with the blueprint slug.
Can I see a run history / trace?
Yes. Each run records a WorkflowRunTrace[] — per-node status, duration, output size, and a one-line summary — plus an output_summary with step/success counts and total duration. The run panel renders each node pending → running → done/error as the orchestrator fires its onStep callbacks.
What happens if a PDF has no tables or no text?
pdf-to-text returns an empty or near-empty string (it does not OCR scanned images). The chain still completes; appendRows may add a blank row. For scanned PDFs, run an OCR step before this chain.
Why did numbers like account IDs change in my sheet?
valueInputOption defaults to USER_ENTERED, which parses values like a human typing — leading zeros drop and long numbers go scientific. Switch the node to RAW to keep cells as literal strings.
What if the source URL is slow or flaky?
Raise timeoutMs on the http-request node (default 60000, max 600000) and set its error policy to retry. The runner retries a failing step up to 3 times with linear backoff (200ms, 400ms) before surfacing the error.
What other workflows are like this one?
See csv-to-slack-summary for a data-clean-then-connector pattern, rss-to-notion-digest for a fetch-then-create-page chain, and video-transcode-to-r2 for a local-process-then-upload chain.
Local-first by design
This workflow executes entirely on your jadapps-runner. API keys, database credentials, and OAuth tokens are stored in an AES-GCM-encrypted vault on your device — they are never uploaded to JAD Apps' servers. The server only stores the workflow graph (the recipe), not the secrets.