How to cover card numbers and long account runs on pdf invoices
- Step 1Open the canonical PDF PII redactor — This Security entry routes to the real engine at /pdf-tools/pdf-pii-redactor. It is Pro-tier (
minTier: pro); Free accounts can't run this redactor. - Step 2Upload one invoice / receipt PDF with a text layer — Drop a single file (
acceptsMultiple: false). Invoices from accounting software, e-commerce order PDFs, and bank statements are usually born-digital with a real text layer. A photographed receipt is image-only and won't be scanned. - Step 3Let the scanner walk every page — pdfjs reads each page's
getTextContent()items; pdf-lib loads the same document. For each item the four patterns run in order — email, phone, SSN, card — and the first match flags the item. - Step 4Card runs get boxed — When a 13–16 digit run matches, a black rectangle is drawn at the item's
x/yspanning its full width and height (plus 2 pt). Spaced and dashed card formats all match because the pattern allows the separators. - Step 5Download the redacted invoice — The result is saved as a new PDF blob. Page count and the rest of the layout are preserved — only black boxes are added over matched runs. Amounts, dates, and line items stay visible.
- Step 6Flatten / rasterise before archiving — Critical for PCI: open the file and
Ctrl+A → copy. If the digits still paste out, they're still in the stream. Flatten (/pdf-tools/pdf-flatten) or rasterise (/pdf-tools/pdf-to-image-strip) so the boxes become pixels, then archive or send.
The card pattern in detail (and what it over-catches)
The exact card-number rule from PII_PATTERNS: a run of 13–16 digits with optional single spaces or dashes between them. There is no Luhn check and no card-brand logic, so anything in that length range matches.
| Number on the document | Digit count | Boxed? | Why |
|---|---|---|---|
Visa-style 4002 8812 3456 7890 | 16 | Yes | 16 digits with spaces — matches the card pattern |
Amex-style 3712 345678 90123 | 15 | Yes | 15 digits in range — matches |
Purchase order 4002881234567890 | 16 | Yes (false positive) | No Luhn check — any 16-digit run is boxed |
Masked **** **** **** 0042 | 4 (only digits) | No | Only 4 real digits — below the 13-digit floor |
| 19-digit account number | 19 | No | Above the 13–16 range — not matched |
Redaction behaviour — what it does vs. does NOT do
"Visual" means a filled rectangle is drawn over the digits; the characters underneath are not removed. This is the table that matters for PCI sign-off.
| Behaviour | Reality in this tool | Why it matters for invoices |
|---|---|---|
| Redaction method | Filled black drawRectangle over the matched item (pdf-lib) | Real ink, visible everywhere — but an overlay, not a deletion |
| Text removal | Not removed — digits stay in the content stream | Ctrl+A → copy recovers the PAN until you flatten — do NOT archive before that |
| Luhn / brand check | None — any 13–16 digit run matches | POs, account IDs, and tracking numbers get boxed too (safe over-redaction) |
| Masked numbers | **** 0042 has only 4 real digits — not matched | Already-masked stubs are left alone, which is fine |
| Granularity | Whole text item, not the exact run | If the card is inside a longer line, the line is boxed |
| Options | None (needsOptions: false) | You can't redact "cards only" or change the colour |
Tier and file limits (PDF family)
Gated at Pro (minTier: pro); runs through the PDF tool family. One file at a time.
| Tier | Max file size | Max pages | Files per run |
|---|---|---|---|
| Free | Tool gated — Pro required | — | — |
| Pro | 50 MB | 500 pages | 5 (this tool: 1 at a time) |
| Pro-media | 500 MB | 2,000 pages | 50 (this tool: 1 at a time) |
| Developer | 2 GB | 10,000 pages | Unlimited (this tool: 1 at a time) |
Cookbook
Before/after snippets from invoice, order, and statement layouts. Card and account numbers are fabricated. "Before" is the page text; "After" shows the boxed result — and what copy/paste still recovers until you flatten.
A spaced card number on an order PDF
Born-digital order confirmation with a full text layer. The 16-digit card with spaces matches the card pattern and is boxed. The masked stub below it is left alone.
Before (page text): Paid with card: 4002 8812 3456 7890 Card on file: **** **** **** 0042 Total: $1,240.00 After (what the viewer shows): Paid with card: █████████████████████ Card on file: **** **** **** 0042 Total: $1,240.00 The full PAN was boxed; the masked stub (4 digits) was not. Verify with Ctrl+A -> copy, then flatten.
A purchase-order number gets boxed too
No Luhn check means a 16-digit PO is indistinguishable from a card to the pattern. It's boxed — over-redaction in the safe direction, but glance at the output so you don't hide a reference you needed.
Before: PO Number: 4002881234567890 Card: 5555 4444 3333 1111 Amount: $980.00 After: PO Number: ████████████████ Card: ███████████████████ Amount: $980.00 Both 16-digit runs matched the card pattern. The PO is a false positive (no Luhn), but safe.
Card inside a longer line is fully covered
Redaction is per text item. If pdfjs returns the card inside a longer run, the whole run is boxed — including the surrounding words in that item.
Before (single pdfjs item): 'Charged 4111 1111 1111 1111 on 2026-05-02 by Acme' After: '████████████████████████████████████████████████' The whole item is boxed because it contained a card match, so the date and merchant in that item got covered too.
A 19-digit account number is missed
The card pattern caps at 16 digits. A 19-digit account or reference number is above the range and isn't matched — it stays in the clear. Split or reformat it, or box it manually.
Before: Account: 4002881234567890123 <- 19 digits, NOT boxed Card: 4002 8812 3456 7890 <- 16 digits, boxed After: Account: 4002881234567890123 <- still visible Card: █████████████████████ Mitigation: box the long account line manually with /security-tools/signature-burner.
Photographed receipt with no text layer
A snapped receipt saved as PDF is just an image — no text items. The redactor finds nothing and returns the file unchanged. OCR first, or box the card region by hand.
Input: receipt_photo.pdf (image-only) Scan result: 0 text items -> 0 matches -> 0 boxes Output: identical pages, no redactions. Fix path: 1. Run OCR via /pdf-tools/pdf-ocr to add a text layer 2. Re-run this redactor, OR 3. Burn manual rectangles with /security-tools/signature-burner
Edge cases and what actually happens
Boxed card number is still copy-pasteable
By design (visual only)The tool draws a rectangle over the digits but does not delete them — the code comment notes "the glyphs underneath are still in the file's content stream." Ctrl+A → copy recovers the PAN. For PCI you must flatten (/pdf-tools/pdf-flatten) or rasterise (/pdf-tools/pdf-to-image-strip) so the boxes become pixels, then re-verify with copy-paste before archiving.
Purchase-order / account number flagged as a card
Over-redactionThe card pattern matches any 13–16 digit run and runs no Luhn check, so POs, account IDs, and tracking numbers in that range get boxed. It hides rather than leaks — a safe false positive — but review the output if you needed those references readable.
19-digit (or longer) number is not boxed
Missed matchThe card pattern caps at 16 digits. Long account or reference numbers above that range fall through and stay in the clear. Reformat them or box them manually with signature-burner.
Already-masked number (**** 0042) is left alone
ExpectedA masked stub like **** **** **** 0042 has only 4 real digits, below the 13-digit floor, so it isn't matched. That's correct — there's nothing sensitive to hide. No action needed.
Whole line is boxed, not just the digits
ExpectedRedaction is one box per matched text item. If the card sits inside a longer run (Charged 4111... on 2026-05-02), the whole run is covered ("one redaction box per item is enough"). Usually fine, but it can hide a date or merchant in that item.
Card split across two text items
Missed matchRegexes run per item. If the PDF engine split the card across two runs (4002 8812 then 3456 7890), each fragment is fewer than 13 digits and neither matches. Spot-check; flatten + re-OCR can re-flow the number into one item.
Scanned / photographed receipt produces no redactions
No matchesDetection reads the text layer via pdfjs. An image-only receipt has zero text items and nothing is boxed. Add a text layer with PDF OCR first, then re-run, or box the card region with signature-burner.
Encrypted / password-protected statement
Loaded with ignoreEncryptionThe redactor loads with ignoreEncryption: true, so many lightly-protected statements open and process. Strongly encrypted files that pdfjs/pdf-lib can't parse error out before scanning. Remove the password first with pdf-remove-password, then redact.
Free tier can't run this tool
Pro requiredGated at minTier: pro. On Free the run is blocked before processing. Pro allows up to 50 MB / 500 pages; Developer raises that to 2 GB / 10,000 pages. One file at a time.
Frequently asked questions
Does this make the invoice PCI-safe by itself?
Not on its own. The tool draws a black box over each card run, but the digits stay in the PDF content stream — the code comment says so. Ctrl+A → copy recovers the PAN. To satisfy 'do not store the full number in the clear', flatten (/pdf-tools/pdf-flatten) or rasterise (/pdf-tools/pdf-to-image-strip) the output so the boxes become pixels, then verify with copy-paste.
How does it decide what's a card number?
The card pattern matches any run of 13–16 digits with optional single spaces or dashes between them. There's no Luhn check and no brand logic — so a real Visa, Amex, or Mastercard matches, and so does any 13–16 digit purchase-order or account number. It's a length-based heuristic, not card validation.
Why was my purchase-order number blacked out?
Because it's 13–16 digits and the pattern has no Luhn check, so it can't tell a PO from a card. It gets boxed — a false positive in the safe direction (it hides, doesn't leak). Glance at the result if you needed the PO readable, and box that line back open is not possible; re-run on a copy without the box if needed.
Will it catch a card with no spaces or dashes?
Yes. The card pattern allows the separators to be absent, so a solid 13–16 digit run (4002881234567890) matches just as well as a spaced one. The constraint is the digit count, not the formatting.
What about a longer account number, like 19 digits?
It's missed. The pattern caps at 16 digits, so a 19-digit account or reference number is above the range and stays in the clear. Reformat it or box it manually with signature-burner.
Does an already-masked number get boxed?
No. A stub like **** **** **** 0042 has only 4 real digits — below the 13-digit floor — so it isn't matched. That's correct; there's nothing sensitive left to hide on a masked number.
Does the invoice get uploaded?
No. pdfjs reads the pages, pdf-lib draws the boxes, and the result is saved locally — entirely in your browser. Financial PDFs and their card data never leave your device.
Can I redact only card numbers and leave emails alone?
No. The tool has no options (needsOptions: false) — all four patterns always run (email, phone, SSN, card). On an invoice that usually helps (it covers a billing contact block too), but you can't restrict it to cards only.
Does it work on a photographed or scanned receipt?
No. Detection reads the PDF text layer via pdfjs; an image-only receipt has zero text items and nothing is boxed. Run PDF OCR to add a text layer first, then re-run, or box the card region with signature-burner.
My statement is password-protected — will it still work?
Often yes. The redactor loads with ignoreEncryption: true, so many lightly-protected statements open and process. Strongly encrypted files that the libraries can't parse will error before scanning — remove the password first with pdf-remove-password, then redact.
What file size and page limits apply?
Gated at Pro. Pro allows up to 50 MB and 500 pages; Pro-media 500 MB / 2,000 pages; Developer 2 GB / 10,000 pages. Free accounts can't run this redactor. One file at a time.
My card data is in a CSV export, not a PDF — what then?
Use csv-json-data-scrambler for structured rows or email-phone-scrubber for pasted text — both genuinely replace values with [REDACTED_*] labels rather than covering them. For a sealed archive of the originals, aes-256-encryptor encrypts the file offline.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.