By Caitlin Burns, DocsCorp Content Manager.
Non-searchable files can end up in your systems through a whole host of ways. It's the signed contracts that were scanned and saved as an image file. It's an old archive that was ingested and digitized. And it's any other image file or PDF that doesn't have a text layer. A text layer is what file search technology relies on to find and return the right documents. Unless you remember the file name itself, or exactly where you saved it, you may not be able to locate it easily. For other files that do have a text layer, you can search for on-page content, like account names or locations, and find every related document in an instant.
So, how does a business go about pinpointing how many of these non-searchable files exist and converting them? Rather than manually processing each file with Optical Character Recognition (OCR) technology to recognize text, contentCrawler can automate the process from beginning to end. It finds, assesses, and converts 100% of non-searchable files - no matter how they ended up in your systems. Keep reading to discover why it's the smart choice for ensuring every one of your files is searchable.
1. Smart monitoring
contentCrawler's framework finds image-based documents, assesses, and automatically converts them to searchable PDFs – no matter how they entered your systems. It analyzes documents in a variety of systems based on search criteria, as well as text and compression thresholds set up by an Administrator. The documents are then processed and saved back into the system automatically.
Finding and converting non-searchable files is a 24/7 service that operates unseen to users, completely in the backend of their systems. Administrators can just set and forget while staff continue to add and profile documents as usual.
3. New and legacy files
Use contentCrawler to process your legacy documents that came in through scanning, mergers and acquisitions as well as any new files that are created in real-time. It can work in both modes simultaneously, prioritizing new files and processing them on a regular basis.
4. Better search
Better business decisions are made when staff have access to all relevant information. contentCrawler ensures everyone in your organization can find the file they need, every time.
Using contentCrawler to ensure 100% search across your systems ensures all documents are available on-demand, so you can comply with full disclosure in eDiscovery and Data Subject Access Requests under the GDPR.
contentCrawler combines OCR and Compression modules into a single service. The Compression module reduces file size, saving on storage costs without affecting the quality of the document.
7. Foundation for AI
Use contentCrawler's OCR service to build a foundation of searchable data to prepare your business for AI and enterprise search technology.
The centralized Administration Console’s dashboard provides up to the minute progress, showing the number and percentage of documents OCRd and Compressed. Email notifications provide periodic processing statistics and error reporting.
Global businesses will often have documents written in multiple languages. contentCrawler includes multi-language recognition of over 180 languages. Administrators can select up to 16 languages for OCR recognition with no effect on processing speed.
10. On-premises or cloud
OCR and image compression can be delivered on-premises or installed on a hosted VM such as Microsoft Azure VM.