Top 100 law firm Hugh James embarked on a very ambitious project in 2012 to provide matter management to its lawyers, this required all documents to be 100% electronic and searchable. Read how the firm automated the process with contentCrawler to find, OCR and profile all the image-based documents in its iManage DMS as text-searchable PDFs.
About Hugh James
Hugh James is a Top 100 law firm that offers a comprehensive range of legal services across the UK from its offices in Cardiff and London. It was also voted Welsh Law Firm of the Year in 2013. Its team includes specialists in almost every area of the law, as well as experts in many diverse sectors, from banking and finance to energy, from agriculture to sport, from the armed forces to healthcare, from construction to the creative industries and from retail to social housing.
The business challenge
In 2012 Hugh James was researching multifunction devices as part of an upcoming scanning project that would allow its lawyers to create a “true electronic file of each and every matter.” Every document relating to the matter would have to be searchable. This would require paper-based and image-based documents to be scanned and made searchable.
Hugh James has been a DocsCorp client for a number of years, using compareDocs and pdfDocs. Jon Howells, IT Director at Hugh James, recalls how they became aware of contentCrawler, which looked like it would do most of the heavy lifting finding and converting image-based documents to searchable PDFs. Like most firms, Hugh James was well aware that there were many non-searchable documents in its iManage DMS. They had estimated that there would be about 15-20% nonsearchable content in the DMS.
DocsCorp provided the firm with its Audit tool to help them determine how much non-searchable content they actually had. This would provide a more realistic understanding of the database as well as providing the numbers to build a business case for solving this problem. Hugh James ran the Audit tool for two days on a section of their DMS. They found that 100% of image documents and 36% of PDF documents were non-searchable.
Jon recalls that the numbers from the audit were much higher than they had thought. “Every time we ran a search more than 1/3 of the documents would not be returned. This was an issue for us going forward with the scanning project. If this was a problem now, imagine what it would be like in 12 months! We would be looking at close to 70%.”
With the results from the audit in, the firm decided to purchase contentCrawler. But before deploying, they decided to crawl the database in a test environment to ensure contentCrawler was capturing all the documents, OCR’ing and saving them correctly. The firm was happy with the process and deployed contentCrawler.
contentCrawler can be run as an automated, 24/7, end-to-end process, or as a manual process with built in “Hold for Review” stages in the process. Hugh James decided to run contentCrawler as an automated process, relieving staff members of OCR’ing duties. This freed them up to concentrate on more important tasks. For Jon “it just did its thing. There was no IT intervention required. It just sat in the corner and did what it was meant to do. It’s simple, and it just works.”
contentCrawler can also be run in one of two modes, or both - Backlog Monitoring, which looks for and OCR’s legacy documents, or Active Monitoring, which detects newly profiled documents and OCR’s them, making them available for indexing. Hugh James decided to break the project up into two phases. First, it ran contentCrawler in Backlog Monitoring mode. It took 3 weeks to convert 75,000 documents to text-searchable content. Phase II saw the firm switch on Active Monitoring mode to handle documents in real time, ie convert documents to text-searchable content as they were profiled into iManage DMS. Note that many firms run contentCrawler in both modes simultaneously.
contentCrawler supports 4, 8, 16, and 32 CPU cores for faster processing. For example, with 4 CPU core processing, contentCrawler is able to OCR 1 page per second or 85,000 pages per day. 8 CPU core processing is capable of OCR’ing 2 pages per second, or up to 170,000 pages per day. It provides unlimited pages, non-stop 24/7 processing.
According to Jon one of the benefits of contentCrawler was how much time it saved. “Previously, lawyers had to remember where they saved the document. If they couldn’t, they had to spend time looking for it. They don’t have to do this now. When they run a search, all the documents relating to the matter are returned, which is a lot quicker. This is extremely important when it comes to FOIA and eDisclosure, as we are confident we know where all the documents are, minimizing any embarrassment or risk to the firm.”
When configuring contentCrawler, Administrators can choose to replace the original or save as a new version—options vary depending on the DMS. In the case of Hugh James, when documents were profiled back into the DMS, they were profiled as a new version. The original documents were not changed or modified in any way. This is important in the case of an audit trial and you have to go back to the original.
The success of contentCrawler and the scanning project has allowed the firm to move forward with some other scanning projects that might not have happened otherwise.
contentCrawler was pivotal to the realization of an important project at Welsh law firm Hugh James, enabling its lawyers to make every matter electronic and searchable. The success of the project enabled the firm to do away with multiple OCR’ing processes, replacing them with a single, centralized approach that increased productivity while minimizing risks to the firm.