Frequently asked questions
Is my text uploaded anywhere?
No. The text is split into chunks entirely in your browser. Nothing you paste is sent anywhere or stored.
What does chunking do and why overlap?
It breaks a long document into smaller pieces for retrieval augmented generation, so each piece can be embedded and searched. Overlap repeats a little of the previous chunk at the start of the next, so a sentence split across the boundary is not lost.
What can I split by?
Tokens, characters, words or sentences. Tokens are usually best for fitting an embedding model's limit, while sentences keep chunks readable and avoid cutting mid-sentence.
How accurate is the token splitting?
It uses the GPT-4o tokenizer, which is a close approximation for most models since there is no public tokenizer for Claude or Gemini. It is well within range for planning chunk sizes.
How do I get the chunks out?
Copy all as JSON gives an array of strings ready to drop into code, and copy all as text gives the chunks separated by a divider. Both include every chunk, not just the ones shown.