Break a PDF into overlapping, sentence-aware text chunks with a configurable token target — ready for RAG pipelines, vector databases, and LLM context windows.
Upload the PDF to chunk
Set the target chunk size in tokens (default ~500) and overlap
Download chunks as JSON with page ranges and token estimates
Free is enough for most one-off jobs. Pro raises the file and batch caps; Pro + Media unlocks GB-scale streaming and unlimited duration.
Larger files supported on Developer (5 GB CSV) and Enterprise (unlimited). All processing happens in your browser — files never reach a server.
0 bytes uploaded. PDF to Text Chunks runs entirely in your browser using pdf-lib and pdfjs-dist. Your file stays on your device at all times. No data is sent to any server.
Retrieval-Augmented Generation — a technique where an LLM retrieves relevant document chunks and uses them as context before generating an answer. These chunks are what you'd load into a vector database for that retrieval step.
Chunks target a configurable token count (default ~500) and split on sentence boundaries rather than mid-sentence, with a configurable overlap between consecutive chunks so context isn't lost at the seams. Each chunk records its page range and an estimated token count. Note: this is sentence-aware, not embedding-based topic segmentation.
It's sentence-aware with overlap — it respects sentence boundaries and keeps a sliding context window, which is what most RAG pipelines actually need. It does not compute embeddings to detect topic shifts, so it's not 'semantic' in the embedding sense. For most retrieval use cases the difference doesn't matter; if you need topic-boundary detection, post-process the chunks with your own embedding model.
Extract text from a PDF and format it as clean Markdown with page headers. Perfect for documentation workflows.
Open toolExtract all text content from a PDF file. Clean, page-separated plain text output ready for processing.
Open toolDetect and extract tables from PDF documents into structured JSON. First row becomes keys, subsequent rows become objects.
Open tool