Upcoming contentCrawler 2 release to support multi-OCR processing

Published on September 2, 2013 by kerryc
DocsCorp will be launching a new version of its integrated analysis, processing and reporting framework - contentCrawler 2. This release will include a variety of usability and performance enhancements. contentCrawler provides document management professionals with the peace of mind of knowing that their content is 100% searchable. The automated end-to-end process intelligently assesses image-based documents in a content repository for conversion and re-profiling as searchable PDFs.

Faster processing

  • Multi-OCR processing - Users will be able to take advantage of faster processing with support for 4, 8, 16, and 32 CPU core processing. For example, with 4 CPU core processing, contentCrawler will be able to OCR up to 1 page per second, or 85,000 pages per day. This represents a significant improvement over earlier versions of contentCrawler, which took 4 seconds to OCR a page. 8 CPU core processing will be capable of OCR'ing up to 2 pages per second, or up to 170,000 pages per day.
  • Apply Advanced Search filters - New Advanced Search filters provide users with greater control over document types to be processed. Users will be able to exclude certain document types from the search to decrease processing time, including those saved as email message attachments.

Easy administration and reporting

  • Set up Service email notifications - Users can set up various email notifications to report on the progress of the crawl as well as requesting the Service Statistics and Error reporting be emailed to them.
  • Monitor progress - Users can see instantly the progress status of individual documents being processed at the OCR stage. This information is displayed to the user as a percentage.
  • Document information display - Provides document information such as total page number and size of documents being processed, including an overall total size of documents requiring OCR.
  • Configurable Multilingual OCR - Users can easily configure multilingual OCR’ing across all services. contentCrawler supports over 180 languages.
  • Export Report - Users can export processing reports as CSV files for analysis and review.
  • Configurable minimum disk space limit - Users can specify minimum free space threshold for document cache directory

Greater visibility, better search

contentCrawler was developed to address the very real and serious issue of non-searchable content in enterprise content management systems. More than 20% of documents in a content repository are "invisible" to search technology. These documents get profiled as a result of ingestion of legacy or litigation documents, saving emails with attachments, mobile technology and employee workarounds that bypass the OCR'ing process. Failure to produce documents on demand impacts the bottom line, workplace efficiency, regulatory compliance, productivity and exposes an organization to unnecessary risks.

contentCrawler integration

contentCrawler integrates with Autonomy iManage, Autonomy TRIM, OpenText eDOCS DM, OpenText Content Server, ProLaw, Worldox, MS SharePoint as well as MS Windows file systems. Request a contentCrawler trial to determine how much non-searchable content is in your content repositories.