How to extract a pdf schedule or timetable table to json
- Step 1Open the tool and drop the schedule PDF — Load the programme or timetable into the PDF Table to JSON tool. It extracts immediately in the browser — no options to set.
- Step 2Confirm the event columns in the preview — Check the first 20 objects: keys should be the schedule's real columns (time, session, room, speaker). If a day banner or title became the keys, see the edge cases.
- Step 3Download the JSON array — Save
<name>.json— a flat array of event objects across every page of the schedule. - Step 4Merge wrapped session rows and drop day banners — A long session title that wrapped becomes an extra row with an empty time cell; merge it onto the row above. Day headers ('Tuesday') come through as near-empty rows — capture them as a
datefield, then filter them out. - Step 5Normalise dates and times to ISO 8601 — Combine the captured date with each event's printed time and convert to ISO 8601 (
2026-06-02T14:30:00). A library like day.js or Luxon makes the timezone and AM/PM handling explicit and testable. - Step 6Build .ics or POST to a calendar API — Calendar apps want iCal (.ics), not raw JSON — generate it from the normalised events (e.g. with ical-generator), or POST the events to a booking/calendar API.
Schedule fields: how each extracts and what to finish
The generic row/column extraction applied to a typical programme, with the schedule-specific cleanup your code does.
| Schedule element | What you get | Your normalise step |
|---|---|---|
| Time / time range | String as printed ("9:00–10:30", "14:30") | Parse to start/end ISO 8601 datetimes |
| Date / day banner | Often a near-empty row ("Tuesday" then blanks) | Capture as the current date, then filter the row |
| Session / title | String; long titles may wrap to a second row | Merge continuation rows (empty time = belongs above) |
| Room / track / speaker | Own keys when columnar | Map to your event fields |
| Grid timetable cell | Flattened by visual row, not one-per-cell | Reshape rows×columns into individual events |
| Reprinted page header | Becomes a data row from page 2 onward | Filter r.Time !== 'Time' |
Tier limits for schedule PDFs
Most programmes and timetables are small; multi-day conference books can run long.
| Tier | Max file size | Max pages |
|---|---|---|
| Free | 2 MB | 50 |
| Pro | 50 MB | 500 |
| Pro + Media | 500 MB | 2,000 |
| Developer | 2 GB | 10,000 |
Cookbook
A real conference programme extraction and the steps that turn printed times into calendar-ready events.
A day's programme with a day banner
Note 'Tuesday 2 June' becomes a near-empty row, and the long keynote title wraps to a second row with an empty time.
PDF:
Time Session Room
Tuesday 2 June
09:00–10:00 Opening keynote: the road
ahead for the platform Main Hall
10:15–11:00 Workshop A Room 2
Downloaded JSON:
[
{ "Time": "Tuesday 2 June", "Session": "", "Room": "" },
{ "Time": "09:00–10:00", "Session": "Opening keynote: the road", "Room": "" },
{ "Time": "", "Session": "ahead for the platform", "Room": "Main Hall" },
{ "Time": "10:15–11:00", "Session": "Workshop A", "Room": "Room 2" }
]Carry the date down and merge wrapped titles
Walk the rows once: remember the current day banner, attach it to following events, and fold continuation rows (empty time) into the event above.
const rows = JSON.parse(json);
const events = [];
let date = null;
for (const r of rows) {
if (/\d/.test(r.Time) && !/^\d{1,2}[:.]/.test(r.Time) === false) {} // (illustrative)
if (/^[A-Za-z]+ \d/.test(r.Time) && !r.Session) { date = r.Time; continue; } // day banner
if (!r.Time && events.length) { // wrapped continuation
events[events.length - 1].Session += " " + r.Session;
if (r.Room) events[events.length - 1].Room = r.Room;
continue;
}
events.push({ date, ...r });
}Normalise a printed time range to ISO 8601
Combine the carried date with the printed time range and produce real start/end datetimes — explicit timezone, no guessing.
import dayjs from "dayjs";
function toISO(date, time) { // date "Tuesday 2 June", time "09:00–10:00"
const [start, end] = time.split(/[–-]/).map(s => s.trim());
const d = dayjs(date.replace(/^[A-Za-z]+ /, "") + " 2026", "D MMMM YYYY");
const mk = t => d.hour(+t.split(":")[0]).minute(+t.split(":")[1]).second(0);
return { start: mk(start).toISOString(), end: mk(end).toISOString() };
}
toISO("Tuesday 2 June", "09:00–10:00");
// { start: "2026-06-02T09:00:00.000Z", end: "2026-06-02T10:00:00.000Z" }Generate an .ics calendar file
Calendars import iCal, not JSON. Build .ics from the normalised events so attendees can subscribe.
import ical from "ical-generator";
const cal = ical({ name: "Conference 2026" });
for (const e of normalisedEvents) {
cal.createEvent({
start: new Date(e.start),
end: new Date(e.end),
summary: e.Session,
location: e.Room,
});
}
fs.writeFileSync("programme.ics", cal.toString());Reshape a grid timetable into per-cell events
A class timetable with days across the top flattens by row. Pivot it into one event per (time-slot, day) cell.
// rows keyed by { Time, Mon, Tue, Wed, Thu, Fri }
const days = ["Mon","Tue","Wed","Thu","Fri"];
const slots = JSON.parse(json).filter(r => r.Time && r.Time !== "Time");
const events = slots.flatMap(r =>
days.filter(d => r[d]).map(d => ({ day: d, time: r.Time, subject: r[d] }))
);Edge cases and what actually happens
Times and dates come out as text, not Date objects
ExpectedEvery value is a string, including "9:00 AM" and "2 June". The tool never parses temporal values. Convert to ISO 8601 in your code with a date library so timezone and AM/PM handling is explicit and testable.
Day banner becomes a near-empty row
Capture then filterA 'Tuesday 2 June' header that spans the row width arrives as a row with mostly empty cells. Capture it as the current date and attach it to following events, then drop the banner row (see the cookbook walk-through).
Session title wrapped onto two lines
Split rowLong titles wrap to a second visual line, which becomes a separate row with an empty time cell. Merge continuation rows into the event above — an empty time column is the tell-tale that a row is a continuation.
Grid timetable flattens by row
Reshape neededA timetable with days across the top and time slots down the side extracts one object per time-slot row (with a key per day), not one event per cell. Pivot it in your code to get individual events (the cookbook shows the flatMap).
Repeated header on a multi-page programme
By designFrom page 2 onward the reprinted column header is emitted as a data row. Filter r.Time !== 'Time' (or your first column) before building events.
Scanned printed programme
Empty arrayA scanned booklet has no text layer, so extraction returns nothing. Run PDF OCR first to add a text layer, then extract — and double-check times, since OCR can confuse 0/O and 1/l.
Time range merges with the session text
MisalignedIf the time and the session sit very close horizontally, position-based grouping can merge them into one cell. Inspect the preview; split a "09:00 Opening keynote" cell with a leading-time regex in post-processing.
Programme exceeds the free page limit
BlockedA thick multi-day conference book can exceed 50 pages. Upgrade to Pro (500 pages) or extract just the schedule pages with PDF Extract Pages before running the tool.
Calendar app rejects the raw JSON
Wrong formatCalendars import iCal (.ics), not JSON. The tool's job is to get the events into structured data; convert that to .ics (e.g. with ical-generator) before importing into Google Calendar, Outlook, or Apple Calendar.
Frequently asked questions
Do dates and times come out as text or as typed values?
As text — every value is a string, including "9:00 AM" and "2 June". The tool never parses temporal values, so the timezone and format decisions stay in your code. Convert to ISO 8601 with a date library (day.js, Luxon) after extracting; the cookbook shows a worked example.
What happens with a schedule that spans multiple pages?
All pages' rows are combined into one flat array, so a multi-page programme comes through as one continuous list of events — convenient. The only thing to handle is the reprinted column header on page 2+, which appears as a data row; filter it out (r.Time !== 'Time').
Can I import the JSON straight into Google Calendar?
Not directly — Google Calendar (and Outlook, Apple Calendar) import iCal (.ics), not JSON. Normalise the events to ISO 8601, then generate .ics with a library like ical-generator. The cookbook includes both steps.
How do I handle the day headers like 'Tuesday'?
They come through as near-empty rows. Walk the array once, remember the most recent day banner, and attach it as a date field to the events that follow, then drop the banner rows. This gives each event a full date to combine with its printed time.
My timetable is a grid (days across the top) — what do I get?
One object per time-slot row, with a key for each day column. To get individual events you pivot that into one event per (slot, day) cell — a short flatMap, shown in the cookbook. The extraction itself can't know a grid is a grid; it reads visual rows.
Why are long session titles split across two rows?
Rows are defined by vertical position, so a title that wraps to a second line becomes its own row with an empty time cell. Merge continuation rows (empty time means it belongs to the event above) when building your events.
Is the schedule uploaded anywhere?
No. Extraction runs entirely in your browser via PDF.js. Unpublished programmes and internal rosters never leave your device; only anonymous usage counters are recorded when you're signed in.
My programme is a scanned booklet — will it work?
Not as-is: a scan has no text layer, so you'll get an empty array. Run PDF OCR first, then extract, and verify the times because OCR can misread digits like 0/O and 1/l.
Can I keep room and speaker columns separate?
Yes, when they're distinct columns in the PDF, each becomes its own key (Room, Speaker). If they're crammed into one cell, you'll split them in post-processing. Check the preview to see how the columns landed before writing your import.
What are the size and page limits?
Free: 2 MB and 50 pages. Pro: 50 MB / 500 pages. Pro + Media: 500 MB / 2,000 pages. Developer: 2 GB / 10,000 pages. For a long conference book, upgrade or extract just the schedule pages first with PDF Extract Pages.
Can I extract a fillable booking form instead of a printed table?
If the data is in AcroForm fields rather than a printed grid, use the PDF Form Field Extractor, which reads field names and values directly. Use this table tool for printed timetables and programmes.
Why does the preview only show 20 events?
The on-page preview caps at the first 20 objects to stay responsive and reports the total count. The downloaded .json contains every event — the cap is display-only.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.