contentCrawler automated OCR workflows to increase organizational productivity and reduce costs

Published on November 18, 2014 by kerryc
Image-based documents pose two productivity problems for businesses; if misfiled, they can be time-consuming to find; equally, so is manually processing them using OCR software. Learn how Brisbane-based accounting firm, Bentleys, solved both problems with a single solution using  contentCrawler from DocsCorp. Bentleys are business advisors in a national network of professionals that provide accounting, audit and assurance, business advisory and corporate recovery services, as well as superannuation advice, trusts and estates planning and taxation consulting. They provide services to family business, small to medium business, listed entities, professionals, high net-worth individuals and government. Bentleys nationally employs over 350 professionals in 9 offices around Australia.

The business challenge

Bentleys Brisbane receives documents on a daily basis from clients, the Australian Tax Office (ATO) and other government departments. Many of these documents are image-based, which means they can be opened, viewed and printed, but not searched. Not being able to search the contents of a document in a Windows file system or in a document management system is a serious problem as documents do get lost or misfiled from time to time. “Bentleys Support team used to receive frequent requests from staff reporting lost documents,” recalls Paul Barber, Operations Manager at Bentleys. When the document is a text-based document, MS Word or text-based PDF for example, you can do a search on the content of the document to locate it. This is not possible with an image-based document as there is no actual text to search. In these cases, Bentleys staff as well as the Support team would search the clients and relating client’s folders the old fashion way—a time-consuming process. Another time-consuming process managing documents coming into the business every day was the multi-function devices (MFDs), where paper and image-based documents were scanned, OCR’ed and saved into the firm’s HP WorkSite document management system. However, processing was taking several minutes per document, causing a serious bottleneck. Bentleys is a long-time DocsCorp client and was familiar with the company’s product offerings. They arranged a meeting to discuss OCR Server as a possible solution for managing image-based documents. However, it was at the meeting that they learned about a new product—contentCrawler, which seemed to be a better solution.

Our solution

contentCrawler is an integrated analysis, processing and reporting framework. It intelligently assesses image-based documents in a content repository for batch conversion to text-searchable PDF documents, which can be saved back into the content repository as a new version or as a replacement for the original. This ensures every document is 100% searchable—even image-based email attachments. The next step according to Paul “was for us to run the complimentary contentCrawler audit tool on the HP WorkSite database to determine the scale of the problem, which would also provide us with the numbers to build the business case to present to the board.” The contentCrawler audit tool was run on the entire HP WorkSite database—1.3 million documents in total. The audit report found that 50% of image-based documents in the database were non-searchable. The board approved the purchase of contentCrawler based on productivity losses looking for documents, and on the audit findings. contentCrawler was deployed on the live system, operating initially in Backlog mode to handle legacy documents in the first stage of the project. contentCrawler can work in one of two modes (or both) to process documents—Backlog mode and Active Monitoring mode. Once the legacy documents had been processed and converted, Active Monitoring mode was switched on to process newly profiled documents. The fact that the solution was completely automated meant it could run 24/7 without staff intervention. It also meant there was no need for any other OCR’ing hardware or software. By performing the conversion process at the backend, there were no impacts on staff workflows or processes. They could continue to profile documents into the document management system without worrying about OCR as a process or workflow. “contentCrawler wasn’t directly compliant with Bentleys needs out of the box due to their internal workflows, which presented a challenge to DocsCorp before the software could be purchased.” recalls Paul, “When we send final documents to clients from HP WorkSite, they are flagged as a record and are made read only. This particular process prevented contentCrawler from processing and OCR’ing these documents. Members of the DocsCorp R&D team were brought on board to customize the system to resolve this issue, which they did.” This solution is now available to all clients.

Other benefits

“Since implementing contentCrawler almost a year ago, the number of support calls requesting help to find missing documents has dropped drastically. No news on this front was definitely good news,” claimed Paul. Some other good news was the fact that Bentleys was able to turn off the OCR’ing function in the MFDs, saving them both time and software license fees.

In summary

Bentleys, a Brisbane-based accounting practice in Australia, approached its long-term technology partner, DocsCorp, to help it manage its image-based document and scanning problem, which was having an impact on workflow and productivity. DocsCorp’s contentCrawler solution was designed specifically to automate the process of converting and profiling image-based documents back into a content repository as text-searchable PDFs, freeing up staff to focus on more important business.