How to verify external url reachability in markdown
- Step 1Open the Link Validator — Open md-link-validator. It has no options — external-URL reachability is part of the single validation pass.
- Step 2Drop the doc whose external links you want verified — Paste or drop one
.md/.mdx/.markdownfile. Docs that cite many third-party sources are the highest-value targets for a reachability sweep. - Step 3Run and read the External URLs section — The count line shows how many unique external URLs were found; the
## External URLssection lists each with✓or✗ (unreachable or CORS-blocked). Only the first 50 are probed. - Step 4Sort ✗ into rot vs. CORS — Open each
✗in a tab. A DNS error, parked domain, or connection failure is real rot; a page that loads normally was a CORS false positive. Note which is which. - Step 5Refresh the genuinely dead URLs — Replace confirmed-dead links with current equivalents or an archived snapshot, and update your source document.
- Step 6Re-verify periodically — External links die continuously. Re-run the sweep on long-lived docs on a cadence so dead vendor links are caught before readers report them.
External-URL probe behaviour
Exactly how each external URL is tested, verified against validateLinks() in markdown-processor.ts.
| Aspect | Behaviour |
|---|---|
| What is extracted | Every bare http(s)://... substring via (?:https?:\/\/)[^\s)>"']+, de-duplicated with a Set |
| How it is probed (browser) | fetch(url, { mode: "no-cors", signal }) — the response is opaque |
| Per-URL timeout | 5 seconds via an AbortController; an abort counts as ✗ |
| How many are probed | The first 50 unique URLs; the rest show ... and N more |
| Pass marker | ✓ <url> — the fetch resolved without throwing |
| Fail marker | ✗ <url> (unreachable or CORS-blocked) — the fetch threw or timed out |
| What it cannot read | HTTP status code, redirect destination, or page content (all opaque under no-cors) |
Interpreting each external-URL result
A result is a reachability signal, not a status verdict. Confirm the ambiguous ones.
| Result | Means | Caveat |
|---|---|---|
✓ | Host reachable; request did not throw | Could still be a 404/410/500 page — status is unreadable |
✗ | Request threw or timed out | Could be a live site that blocks CORS — verify in a tab |
... and N more | More than 50 unique URLs found | Those beyond 50 were not probed this run |
Cookbook
External-link reachability scenarios with the exact report lines, plus how to separate real rot from CORS noise.
Verifying a list of vendor links
A doc that cites several SaaS vendors. Most resolve; one vendor's old domain is gone.
Input: - Stripe docs: https://stripe.com/docs - Old vendor: https://acme-saas-2018.test/api Report: # Link Validation Report Checked 2 external URLs and 0 relative links. > Note: Due to browser CORS restrictions, only DNS/network reachability can be tested, not HTTP status codes. ## External URLs ✓ https://stripe.com/docs ✗ https://acme-saas-2018.test/api (unreachable or CORS-blocked)
A repeated URL is verified once
The same documentation URL is cited five times. The Set collapses it so the count line says 1 and there is a single probe.
Input: the link https://example.com/api appears 5 times. Report: Checked 1 external URLs and 0 relative links. ## External URLs ✓ https://example.com/api
Telling CORS noise from real rot
Two ✗ entries: one is a dead domain, one is a healthy site that blocks cross-origin requests. Only the dead one needs fixing.
Report: ## External URLs ✗ https://truly-gone.test/page (unreachable or CORS-blocked) ✗ https://healthy-but-strict.com/docs (unreachable or CORS-blocked) Manual check: truly-gone.test -> DNS error in browser tab -> FIX healthy-but-strict -> page loads fine in tab -> CORS false positive, keep
A 404 page that still passes reachability
The host is up but the path returns 404. Because the status is unreadable, the probe reports ✓ — the validator's known limitation for external verification.
Input: Moved page: https://example.com/removed-article Report: ## External URLs ✓ https://example.com/removed-article Reminder: ✓ = host responded, NOT = page exists. Spot-check key URLs in a tab.
Doc with more than 50 external URLs
A heavily-referenced literature review has 64 unique external URLs. Only 50 are verified per run.
Report (tail): ✓ https://example.org/49 ✗ https://example.org/50 (unreachable or CORS-blocked) ... and 14 more To verify all 64, split the doc with md-splitter and run each part.
Edge cases and what actually happens
Live site that blocks CORS shows ✗
False positiveA healthy site that rejects cross-origin browser requests throws on probe and is reported ✗ (unreachable or CORS-blocked). There is no way to distinguish it from a dead host client-side. Open every ✗ in a tab before treating it as rot.
Reachable host returning 404 shows ✓
Blind spotThe probe only learns whether the request threw, not the HTTP status. A live server returning 404 on the path still shows ✓. For status-level external verification, use a server-side crawler — this tool verifies reachability.
Probe times out at 5 seconds
TimeoutEach URL is aborted at 5000 ms. A slow or throttled but healthy endpoint can report ✗ on one run. Re-run, and confirm in a tab if an important URL keeps timing out. The timeout is fixed.
Only 50 external URLs probed
CappedThe probe loop slices to the first 50 unique URLs; the rest show ... and N more and are not tested. Split heavily-referenced docs with md-splitter to verify every external link across runs.
Redirected URL
ReachableThe browser fetch may follow redirects internally, but the opaque response means the validator cannot tell you where it landed or whether the final page is the one you intended. A ✓ on a redirecting URL does not confirm the destination.
Rate-limited or bot-protected host
UnreachableSites with aggressive rate limiting or bot protection may reject or stall the probe, producing ✗. This is a probing artefact, not necessarily a dead link. Verify important hosts manually.
Non-http(s) schemes (mailto:, tel:, ftp:)
Not probedOnly http:// and https:// URLs match the external-URL regex. A mailto: or tel: target written as [x](mailto:...) falls into the relative pass and gets a ? format mark; it is never network-tested.
Bare URL with trailing punctuation
Reachable / suspiciousThe regex stops at whitespace, ), >, or quotes, but a trailing . or , in prose (e.g. https://example.com.) is captured as part of the URL and may cause a spurious ✗. Wrap URLs in <...> or [text](url) to bound them cleanly.
Transient connection drop mid-run
UnreachableIf your own connection blips during the sweep, healthy URLs can momentarily show ✗. Re-run on a stable connection before refreshing any links.
Input exceeds the free tier limit
413 rejectedFree runs cap at 1 MB / 500,000 characters per file (the character cap is independent of byte size). Oversized input is rejected. Pro raises this to 10 MB / 5,000,000 characters.
Frequently asked questions
What exactly does 'verify' mean here?
Reachability: a browser fetch to each external URL with a 5-second timeout. ✓ means the request resolved without throwing; ✗ means it threw or timed out. Because of CORS, the HTTP status code, redirect target, and page content are all unreadable.
Will a rate-limited site report as broken?
It can. Aggressive rate limiting or bot protection may reject or stall the probe, producing ✗ (unreachable or CORS-blocked). Cross-check important links by opening them in a browser tab.
Can I exclude certain domains from the check?
No. The validator has no options — there is no domain allowlist or denylist. Every extracted external URL up to the first 50 is probed. Remove or comment out links you do not want tested before running.
Does it check both HTTP and HTTPS?
Yes — the regex matches both http:// and https://. Both are probed the same way. As a cleanup, prefer the https:// variant of any link where one exists.
How many external URLs are verified per run?
Up to 50 unique URLs. Beyond that you get a ... and N more line and those are not probed. Split a heavily-referenced doc with md-splitter to verify all of them across runs.
Does a ✓ mean the page is definitely fine?
No. ✓ only means the host responded without the request throwing. A reachable server returning 404 on that path still shows ✓. Spot-check key URLs in a tab, and use a server-side crawler for status-level verification.
Does it follow redirects?
The browser fetch may follow redirects internally, but the opaque no-cors response means the tool cannot report the final destination or status. A ✓ on a redirecting link does not confirm where it ended up.
Why did a bare URL in prose report ✗?
The regex captures up to whitespace, ), >, or a quote, so trailing punctuation like a sentence-ending . can be pulled into the URL and break the probe. Bound URLs with <...> or [text](url) to avoid this.
Are relative links verified too?
Not by reachability. Relative [text](path) links get a format check only (✓/?) and are never network-tested. For a relative-path-focused sweep, see check-relative-paths-markdown.
Is my document uploaded?
No. The Markdown is parsed in your browser. The external-URL probes are outbound requests from your browser to those URLs (that is how reachability is tested), but the document text itself is never uploaded to a JAD server.
Are duplicate URLs probed repeatedly?
No. The URL list is de-duplicated with a Set, so a link cited many times is verified once and listed once. The count line reflects unique URLs.
What is the output format?
A plain-text # Link Validation Report downloaded as <name>-link-report.txt, with a ## External URLs section (each ✓/✗) and, when present, a ## Relative Links (format check only) section.
Privacy first
All Markdown processing runs locally in your browser using JavaScript. No file is ever uploaded to JAD Apps servers — only metadata counters are saved for signed-in dashboard stats.