Question 1

Are my PDFs uploaded anywhere?

Accepted Answer

No. The PDF is read and its text is pulled out inside your browser on your device, and nothing is ever sent to a server.

Question 2

Does it work on scanned PDFs?

Accepted Answer

Normal digital PDFs (made from Word, Google Docs, print to PDF and so on) carry a real text layer, so their text comes out instantly. A scanned PDF is just an image of text with no text layer; turn on "Use OCR for scanned pages" and the text is read on your device. The OCR engine downloads once (a few MB) and is then cached, it is slower, and it is optimized for major languages, so accuracy is best on common scripts and clear scans.

Question 3

Will the layout and formatting be kept?

Accepted Answer

It pulls out the words and line breaks, but extraction is essentially linear, so complex multi-column layouts, tables and heavy formatting can come out in an awkward reading order. You can clean up the result in the editable box before saving.

Question 4

Which languages are supported?

Accepted Answer

Text-layer extraction works for any language already stored in the PDF. The optional OCR is optimized for major languages and uses English by default, so it is most accurate on common scripts and may struggle with unusual fonts or other writing systems.

Question 5

Can it handle big PDFs?

Accepted Answer

Yes for the fast text-layer method. OCR is much heavier because every scanned page is analysed on your device, so on older phones or very long scanned documents it can be slow; work in smaller batches if needed.

Extract text from a PDF

Frequently asked questions