Question 1

Is my text uploaded anywhere?

Accepted Answer

No. The text is split into chunks entirely in your browser. Nothing you paste is sent anywhere or stored.

Question 2

What does chunking do and why overlap?

Accepted Answer

It breaks a long document into smaller pieces for retrieval augmented generation, so each piece can be embedded and searched. Overlap repeats a little of the previous chunk at the start of the next, so a sentence split across the boundary is not lost.

Question 3

What can I split by?

Accepted Answer

Tokens, characters, words or sentences. Tokens are usually best for fitting an embedding model's limit, while sentences keep chunks readable and avoid cutting mid-sentence.

Question 4

How accurate is the token splitting?

Accepted Answer

It uses the GPT-4o tokenizer, which is a close approximation for most models since there is no public tokenizer for Claude or Gemini. It is well within range for planning chunk sizes.

Question 5

How do I get the chunks out?

Accepted Answer

Copy all as JSON gives an array of strings ready to drop into code, and copy all as text gives the chunks separated by a divider. Both include every chunk, not just the ones shown.

RAG text chunker

Frequently asked questions