By Angela O'Donnell, Product Manager
Everyone would like to think they are making informed business decisions based on all the available information – especially if substantial sums have been invested in document management systems to make it possible. However, over many years of looking deep into document management systems all over the world, we've found that this is often not the reality.
In every document management system, up to 30% of files can be non-searchable. Non-searchable data –dark data – are image-based and lack the text layer on which search technology relies. The likely presence of dark data means business decisions are based on only 70% of the available information.
Dark data is a blatant waste of resources – it undervalues the investment made in document management software and costs staff hours in searching for something that can't be found. Knowing you need to solve your dark data problem is only the first step. Next, you need to ask yourself which of the millions of documents stored in my document management system are non-searchable?
The quickest, easiest, and cheapest way to find out is to audit your document management system and pinpoint precisely how many image files require conversion to text-searchable PDF files. The audit results can tell you how many files have gone dark and provide an estimate into how long it would take to make them searchable through conversion to text-searchable PDF.
Assessing image documents for conversion to searchable PDFs
A dark data audit of your document management system can tell you exactly how many documents require Optical Character Recognition (OCR) scanning for conversion to a text-searchable PDF. The audit tool calculates this as a percentage of total documents and can go so far as estimating processing speeds. For example, the average processing speed range is 1-2 seconds per page. Compare this to how long it would take staff to run documents page by page through scanning software.
Batch conversion of image files to text-searchable PDFs is automated and happen silently in the background. System administrators can set up backlog processing for legacy files already in the document management system alongside active monitoring that can process new files as they are added.
Users of document management systems that are 100% searchable won't only make better-informed business decisions and have a higher return on their software investment; they will be better able to comply with data return and erasure requirements in legislation like the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA).
For an assessment of non-searchable files in your document management system fill in the form to arrange your free dark data audit today.