logo

GET A DEMO
  • About
    • About Us
    • Industry Guides
    • Watch Our Story
    • Customer Success Stories
    • Contact Us
  • Products
      • veroDocs
      • styleDocs
      • cleanDocs
      • cleanDocs Server
      • compareDocs
      • compareDocs Cloud
      • pdfDocs
      • pdfDocs Binder
      • printDocs
      • contentCrawler
      • contentCrawler Cloud
  • Solutions
    • Redaction
    • Finding Documents
    • Email Security
    • Manage Metadata
    • Document Comparison
    • Document Bundling
    • OCR for Dropbox
    • Legal Software
    • Accounting Software
    • Mimecast
  • Developers
    • compareDocs SDK
    • compareDocs Cloud API
  • Integrations
    • iManage
    • NetDocuments
    • OpenText
    • SharePoint
    • Worldox
    • Other Integrations
  • News
    • Press Releases
    • Events and Webinars
    • Our Blog
    • Infographics
    • Customer Success Stories
    • Industry Guides
    Discover how to produce high-quality work at every stage of the document journey Discover how to produce high-quality work at every stage of the document journey How Travers Smith, one of the world's most innovative law firms, uses compareDocs SDK How Travers Smith, one of the world's most innovative law firms, uses compareDocs SDK
  • Support
    • Customer Support
    • Client Portal
    • myDocsCorp
    • Credit Card Payments
    • eLearning
    • Training Partners
    • Quick Training Guides
    • On-Demand Training Webinars
    • Product FAQs
  • Partners
    • Become a Partner
    • Find a Partner
    • Training Partners
    • Partner Portal
  • Buy
    • cleanDocs
    • compareDocs
    • pdfDocs

How to make PDFs searchable

16 Apr 2021

 

By Caitlin Burns, DocsCorp Content Manager. 

Many PDFs are created via a process that stores just an image of the document (like a photograph of the page).


For example, if a document is received from a scanner, it may only be an image of the document and contain no searchable text.


There is no text information in a scanned document that a user can search for, just millions of dots on a page of various colors and shades representing an image of the document.


There is no immediately simple way of determining if a PDF document is text-searchable. It can only be done by trial and error.


If you were to open a document that is not text-searchable, any text you entered in the Find field would not be found in the document. If you try to select text in the document, the entire page is selected.


How does OCR software make searchable PDFs?


PDFs that contain only images of a page of text are made searchable by a process called Optical Character Recognition (OCR). This involves a software application looking at all the dots on a page and determining what text characters are represented by those dots, including the font type, style, and size.


The better the image quality, the more accurate this process. 99% accuracy is possible for typical typewritten pages that are scanned. However, handwritten text cannot be understood unless very clearly written. The OCR process ignores graphics it can’t determine as text.


The process of OCRing a document in no way affects the images. When you view or print a document after OCRing, it looks the same, with the image retaining its graphics, pen marks, signatures, etc.


If you annotated the document with comments, highlighting, etc., these components remain on the page as before.


In some cases, the OCR software must approximate the font size, type, and style and may not find the exact font that the document was created with. Then, the text you select or Find may NOT line up precisely with the image of the text, but the OCR software can match it very closely.


Automated OCR software creates searchable PDFs using the following process:


1. It analyzes PDFs to determine if they contain text (or if the quantity of text characters found is less than a specific number of characters per page)
2. Using OCR technology, it creates and applies a text layer to non-searchable PDFs
3. It also converts image documents (BMP, JPEG, PNG, and TIFF) to text-searchable PDFs while retaining all their original image content


Learn more about contentCrawler for a set-and-forget solution to make searchable PDFs.

 

Related

image

Case study

contentCrawler as a solution

How Stibbe used contentCrawler to index 28 million documents and emails for its enterprise search engine

image

Blog

Read here

Answers to common contentCrawler questions

image

Infographic

See the visual guide

How to experience better search and reduced storage costs

  • Industry Insights: How Technology Can Help Firms Create High-Quality Work More Effectively
  • Improve the document journey: Ensure your content repository is 100% searchable
  • The software integration improving document review workflows
  • Improve the document journey: Publish or file sooner with time-saving PDF software
  • TechnoLawyer Hot Product Report | Prevent Misdirected Email and Data Breaches When Using Outlook
  • Improve the document journey: Streamline the editing and review process
  • Improve the document journey: Create professional documents in less time
  • How a UK Top 100 firm trialed cleanDocs to prevent data breaches
  • How to choose the right PDF binder type in pdfDocs Binder
  • 3 articles you need to read about data privacy in the first 3 years of the GDPR
  • 10 reasons to use styleDocs to manage document formatting, numbering, and repair
  • Need to combine PDFs? pdfDocs makes it easy with Binder Project Mode.
  • How compression software helps you save on cloud storage costs
  • What are the standard parts of a legal document?
  • 7 time-saving pdfDocs features you may not know about
  • Should your business choose pdfDocs or pdfDocs Binder to combine multiple PDFs?
  • How to make PDFs searchable
  • How law firms use compareDocs for document comparison inside iManage
  • Understanding the CPRA: An important change to the CCPA
  • Podcast: Ten Minutes with DocsCorp CEO Dean Sappey
  • Guidelines on creating electronic court bundles for the Canadian Supreme Court
  • IDM Industry Profile: DocsCorp CEO and co-founder, Dean Sappey
  • How you can use printDocs to streamline print management
  • A complete solution for document styles, formatting, and repair
  • What is PDF/A? Unpacking the format designed for PDF archiving
  • How to create electronic bundles that comply with UK Supreme Court requirements using pdfDocs Binder
  • 10 reasons to choose veroDocs for template management and document assembly
  • How to compress or split PDFs to reduce file size using pdfDocs
  • Guidelines on creating PDF binders for the UK Supreme Court
  • Answers to common veroDocs questions | FAQs
  • 3 steps to making a Closing Book with pdfDocs binder
  • 3 tools to help you send secure emails while working from home
  • The Start of An Exciting New Phase with Litera | DocsCorp CEO Message
  • Understanding the CCPA, California’s GDPR equivalent
  • The best way to combine PDF files into a Court Book
  • Automatically prevent data breaches when you use cleanDocs AI for email security
  • How veroDocs simplifies the creation of documents and document templates
  • Rethink What Document Templates Can Do for Your Law Firm
  • 3 ways you can write on a PDF and boost productivity
  • How to create a PDF binder with pdfDocs
  • Seven smart financial solutions in one software suite
  • What I learned from attending four virtual conferences in four months
  • Document productivity - driven from the Ribbon
  • Automatically reduce PDF file sizes within your document management system
  • What common document problems can template management software resolve?
  • Expand your text search capabilities with the new pattern and custom regex searches in pdfDocs
  • New page numbering options in pdfDocs make it easier to comply with Court requirements for PDF binders
  • How to use your PDF file editor to be more productive while working from home
  • compareDocs 3-Pane View report and the simple way it compares document versions and highlights the differences
  • Streamline electronic signatures with pdfDocs and DocuSign
  • 7 legal software solutions that improve productivity
  • Enterprise Software Selection Series: The final steps
  • New feature: Better management of unsupported documents in contentCrawler
  • Enterprise Software Selection Series: Should you include a software pilot program?
  • How to produce USPTO-ready PDF documents
  • How to use templates to create consistent looking PDF binders
  • cleanDocs and RMail integration explained
  • Enterprise Software Selection Series: Start your software evaluation process by knowing exactly what you want
  • How to prepare your data in Worldox for the CCPA
  • Top 10 reasons enterprises depend on pdfDocs for PDF editing and bundling
  • Enterprise Software Selection Series: Keep on track by working methodically
  • Digital workflows that help accountants go paperless
  • A report into enterprise software investment, amid the COVID-19 pandemic
  • Add an electronic signature to a PDF with pdfDocs
  • Enterprise Software Selection Series: Create a clear path to success.
  • Part 1: A straightforward approach to navigating the software selection maze
  • Top 10 reasons why businesses choose contentCrawler to find every document
  • How to add custom metadata to electronic binder projects with pdfDocs Binder
  • New, smarter PowerPoint comparison for compareDocs users
  • The new work habits our Marketing team want to take back to the office, post COVID-19
  • Stay productive and secure during the Covid-19 crisis
  • 10 reasons to choose compareDocs to see the difference
  • Answers to common cleanDocs Server questions | FAQs
  • How to output PDF binders with Bates Numbering as file names using pdfDocs Binder
  • Are Australian businesses better at preventing data breaches caused by human error?
  • What I've learned about myself working from home because of COVID-19
  • Feature Spotlight: Customizing a TOC in pdfDocs
  • Working from home: Finding the silver lining in my new normal
  • How to work with Auto Page Numbering in pdfDocs
  • How to compare selected text with compareDocs
  • 3 ways accounting software can make tax time less taxing
  • How to convert from PDF to Word in pdfDocs
  • How to compare Excel files with compareDocs
  • A Q&A with the DocsCorp Co-Founders on gender equality in the workplace
  • Meet the female lead software developer inspiring change
  • Answers to frequently asked pdfDocs questions
  • Meet the female Sales VP leading an all-female sales team
  • How to compare PDFs in compareDocs
  • A better stylus editing experience with the DocsCorp PDF file editor
  • Travers Smith and DocsCorp Industry Case Study | Briefing February 2020
  • How 76,000 missing files were recovered in SharePoint
  • Top 10 reasons why businesses trust cleanDocs for data protection
  • DocsCorp wants to improve how you write on a PDF with a stylus
  • How to compare two Word documents in compareDocs
  • Ask an expert: How to compress PDFs using automation
  • 3 benefits of combining cleanDocs Desktop and Server to protect your sensitive metadata
  • Answers to common contentCrawler questions
  • How to justify email recipient checking software for your business
  • Document comparison workflows made simple
  • How to remove metadata from PDFs using cleanDocs
  • What you need to know about file compression
  • Answers to common cleanDocs questions
  • How email recipient checking in cleanDocs protects you from emailing the wrong person
Home
  • About DocsCorp
  • Disclaimer
  • Privacy Policy
  • GDPR Policy
  • Data Security
  • Accreditations
  • Service Level Agreement
  • Human Rights Policy
  • Anti-Slavery and Human Trafficking Policy
  • Anti-Bribery and Corruption Policy
  • COVID-19 Statement
Products
  • veroDocs
  • styleDocs
  • cleanDocs
  • cleanDocs Server
  • compareDocs
  • compareDocs Cloud
  • compareDocs Cloud API
  • compareDocs SDK
  • pdfDocs
  • pdfDocs Binder
  • printDocs
  • contentCrawler
  • contentCrawler Cloud
News
  • Press Releases
  • Events/Webinars
  • Industry Guides
  • Case Studies
  • Blog Posts
  • Infographics
  • Watch Our Story
myDocsCorp
  • Support Login
  • Pay2Go
  • myDocsCorp
  • Training Directory
  • Customer Support
  • Product FAQs
  • Find a Partner
  • Contact Us
  • blog
  • linkedin
  • twitter
  • facebook
logo

© Copyright DocsCorp 2021 - All rights reserved.