Search & OCR

How text extraction and OCR work

What Stoatify does to every upload so its words become searchable.

3 min read · Updated June 28, 2026


The reason you can search inside a scan is that Stoatify reads the text out of every document the moment it arrives. It picks the right method for each file, so a crisp PDF and a phone photo both end up searchable.

The Processing tab in Stoatify's admin: a Text extraction meter showing documents ready, an Automation meter, and a Failed section reading none
The Processing view: text extraction status across your whole vault.

The right method per file

  • Born-digital PDFs already carry a text layer, Stoatify reads it directly, no OCR needed.
  • Scans, photos, and image-only PDFs are run through OCR (optical character recognition) to turn pixels into text.
  • Word, Excel, PowerPoint, and OpenDocument files are rendered to a clean PDF, and their text is read from that.
  • Plain text and CSV are read as they are.

Stoatify also reads barcodes and QR codes it finds on a page, which is how a document can pick up an archive serial number automatically.

The processing states

Each document moves from pending (queued) to ready (text extracted and searchable). A file that genuinely can't be read is marked failed, and a format with no text to find is simply skipped. You can see the live breakdown, and reprocess a failure, under your organization's Admin → Processing.

Good to know

OCR runs inside Stoatify: no document bytes are sent to a third-party service to be read. A large scanned PDF may take a few moments to become searchable.

Tip

Reading another language, or an office file? See OCR languages and office formats. Something not turning up? See Why can't I find a document?.

Was this article helpful?

Ready to try it?

Open your vault and put this into practice.

Open app