20% of documents in Content Repositories are in a black hole

Published on October 31, 2013 by kerryc
The dictionary defines the term Achilles heel as "a seemingly small but actually crucial weakness." And Achilles heel is exactly what we have when we think of image-based documents in Content Repositories. How do you search for something, which by definition is non-searchable? Businesses have invested heavily in document management systems as well as in search technology to ensure complete access to all documents, at all times. Despite this investment, up to 20% of documents in a Content Repository may be non-searchable and therefore “invisible” to search technology.

The risks are great

Failure to produce documents on demand impacts the bottom line, workplace efficiency, regulatory compliance, productivity and exposes a firm to unnecessary risks, which can lead to sanctions, dismissal of claims, ultimate loss of case as well as undermining a firm's reputation.

The sources are many

Image-based files such as faxes, image PDFs and scanned documents often get profiled into Content Repositories through a variety of workflow loopholes; email attachments, legacy documents, mobile technology, documents ingested from acquisitions and imported litigation files. These image-based documents are “invisible” to search as there is no text to search.

The solution is simple

contentCrawler is an integrated analysis, processing and reporting framework designed to look deep into a document management or MS Windows file system for image-based documents, even ones within email attachments. After its analysis, documents that meet the criteria are then converted to text-searchable PDFs. Converting image-based documents to text-searchable PDFs can be an automated end-to-end process or a manual one with built-in “Hold for Review” stages before Convert to PDF and/or Save Back into the DMS. Equally, processing can run in one of two (or both) modes: Convert Backlog or Active Monitoring. Convert Backlog converts all legacy documents to text-searchable PDFs, while Active Monitoring converts documents as soon as they are profiled into the DMS. contentCrawler will not OCR documents that already have a text layer. When the documents are converted to text-searchable PDFs, they are automatically saved as New Versions, Attachments or Related Documents in the DMS. These documents are now text-searchable and ready to be found by your DMS search technology.

