contentCrawler for Bulk Image Conversion
contentCrawler is an integrated analysis, processing and reporting framework that intelligently assesses documents in a document management system for bulk processing.
Users can bulk process documents in the content repository using either the OCR or Compression modules. Or, they can do both. For example, contentCrawler will convert all image-based documents in the DMS to text-searchable PDFs. The Compression module will then apply compression and downsampling to all PDFs, reducing them in file size.
The automated end-to-end process can run 24/7 without any staff intervention, emailing periodic notifications of processing statistics and error reporting to the IT Administrator. Staff no longer have to worry about OCR or Compression as a process or workflow.
contentCrawler is available as an on-premises and cloud solution.
- Assesses and analyzes documents in a content repository for OCR and/or compression processing
- Processes image-based documents such as TIF, JPG, PNG and image PDFs
- Converts image-based documents to text-searchable PDFs adding a text layer for enhanced searching
- Reduces image-based document file size using a variety of JPEG compression standards
- Processes image-based attachments in emails
- Set compression and text thresholds to optimize processing, ignoring documents that do not meet the requirements
The contentCrawler OCR module converts image-based document to text-searchable PDFs, saving them back into the Content Repository as new or replacement documents--ready to be indexed and found.
The contentCrawler Compression module compresses image and PDF documents. Converting image documents to PDF and applying compression and downsampling to the files will reduce overall file size.
IT Administrators are able to combine the OCR and Compression modules into a single service.
contentCrawler is available as a cloud and as an on-premises solution.
contentCrawler in the cloud
contentCrawler cloud powered by Microsoft Azure currently integrates with cloud-based document management systems iManage Cloud, NetDocuments and SharePoint Online.
contentCrawler integrates with a number of leading document management systems as well as a Windows file system:
- File System
- HP TRIM/Records Manager
- iManage Work
- MS SharePoint
- MS SharePoint Online (O365)
- OpenText Content Server
- OpenText eDOCS DM
- OpenText LiveLink
- Microsoft® Windows Server® 2016, 2012 R2 or 2012*, 2008 R2 SP1* or 2008 SP2*
- MS .NET Framework 4.5/4.5.1
* Not supported on Server Core Role
- 8 GB RAM
- 100 GB free disk space
- 1-2 GB per CPU core over 4*
* Recommended: 4 dedicated CPUs
contentCrawler supports multi-core CPUs - 4, 8, 16 and 32 cores.
Save up to 240 hours a year per person in lost productivity looking for missing or invisible documents
contentCrawler can run on 4, 8, 16 or 32 CPU cores for faster processing. OCRs 2 pages per second on an 8 CPU core
contentCrawler finds 30% more documents than your document management search technology
Save up to 120 hours per year per person OCR’ing documents
Can OCR up to 17,000 pages per day
Run fully-automated OCR processing 24/7, with no staff intervention needed
Over three million documents OCR’ed
Josh Schreiner, Workman Nydeggar IT Director, explains how contentCrawler automated the process of OCR’ing over three million legacy documents in iManage Work to make them 100% searchable.
In a profession that is overwhelmed by paperwork, it’s not unusual for some of it to end up lost in a document management system. contentCrawler is our data discovery solution that helps users find files that normal search technology cannot. This ensures our users comply with requests from clients and tax authorities to hand over specific documents or face harsh penalties.
Large volumes of documents are created and managed in the Financial Services industry. contentCrawler ensures all documents can be found and produced when regulations require it. This includes everything from financial statements; contracts and agreements; credit profiles; loan agreements, and contracts.
Additionally, when users have full access to files, they can analyze all available data to identify customer characteristics and determine the best offers to present to prospects.
Government employees manage applications, licenses, certificates, reports, contracts, tax documents, and more every single day. contentCrawler ensures government departments are compliant with regulations by making all documents discoverable though search.
Governments have to comply with legislative agreements around how information is processed and delivered; minimum response times to information requests; and non-disclosure of private or confidential information.
Law firms require fast and reliable access to documents to be both productive and diligent in the advice they give. Failure to find documents can have serious implications; reputational and financial damage as well as conflicts of interest. contentCrawler ensures legal professionals find the documents they need.
Since much of the work is regulated by government and industry bodies, Life Science companies need to be able to produce documents on demand. Failure to do so can lead to serious fines and penalties. contentCrawler ensures all documents are 100% searchable and retrievable, reducing the risk of non-compliance or lost productivity looking for lost or misfiled documents.
Resources and Energy
Large engineering and resources projects can involve hundreds of thousands of files; including drawings, operations documentation, equipment specifications, user manuals. contentCrawler ensures all these documents are retrievable.
Since many of the documents will be image-based documents, they are “invisible” to search engines. Failure to find critical documents can have serious impacts on these projects.
cleanDocs, compareDocs, contentCrawler, pdfDocs and compareDocs cloud are 100% compatible with Microsoft Office 2016 and Windows 10.
What Our Customers Say
"Every time we ran a search more than 1/3 of the documents would not be returned. This was an issue for us going forward with the scanning project."
IT Director, Hugh James
"There were no new processes or staff training required. Everything just worked in the background. Staff members were completely unaware of any changes other than the fact that more documents started to show up in the search results."
System Engineer, AJ Park
"We can control the OCR (optical character recognition) workflow on documents generated internally, but there was no tool or workflow to automatically capture and convert image-based documents from outside sources and profile them into iManage."
Manager of Application Services, Marshall Dennehey Warner Coleman & Goggin
Local Support. Global Reach.
We have support teams based all over the world to assist you with any questions or difficulties you may be experiencing. Support is available to our users 24 hours a day, 5 days a week to ensure we can get you back up and running as soon as possible.
You can submit a support ticket online through the Resource Portal, contact us directly via email or phone, or chat with us on social media.Request Support
DocsCorp is a leading provider of productivity software for document management professionals. Our offices and products span the globe with over 500,000 users in 67 countries. Our clients are well known and respected global brands that rely on our software every day.