logo

Free Trial
  • About
    • About Us
    • Industry Guides
    • Watch Our Story
    • Customer Success Stories
    • Contact Us
  • Solutions
    • Redaction
    • Finding Documents
    • Recipient Checking
    • Manage Metadata
    • Document Comparison
    • Document Bundling
    • OCR for Dropbox
    • Legal Software
    • Accounting Software
    • Mimecast
  • Products
      • veroDocs
      • cleanDocs
      • cleanDocs Server
      • compareDocs
      • compareDocs Cloud
      • pdfDocs
      • pdfDocs Binder
      • contentCrawler
      • contentCrawler Cloud
  • Developers
    • compareDocs SDK
    • compareDocs Cloud API
  • Integrations
    • iManage
    • NetDocuments
    • OpenText
    • SharePoint
    • Worldox
    • Other Integrations
  • News
    • Press Releases
    • Events and Webinars
    • Our Blog
    • Infographics
    • Customer Success Stories
    • Industry Guides
    DocsCorp releases cleanDocs Enterprise with AI capability to prevent data breaches DocsCorp releases cleanDocs Enterprise with AI capability to prevent data breaches How veroDocs simplifies the creation of documents and document templates How veroDocs simplifies the creation of documents and document templates
  • Support
    • Customer Support
    • Client Portal
    • myDocsCorp
    • Credit Card Payments
    • eLearning
    • Training Partners
    • Quick Training Guides
    • Product FAQs
  • Partners
    • Become a Partner
    • Find a Partner
    • Training Partners
    • Partner Portal
  • Buy
    • cleanDocs
    • compareDocs
    • pdfDocs
DOWNLOAD THE PDF

How contentCrawler improves the user experience of searching in Microsoft SharePoint

Manawatu District Council uses automated OCR processing in contentCrawler to convert image-based PDFs, TIFF, and MSG files to text-searchable PDFs. Now, staff are finding more content, more easily, within their SharePoint environment.

The business need

  • Enable full-text search across all documents within a SharePoint environment
  • Convert image-based PDFs, TIFF, and MSG files to searchable PDFs
  • Process newly-created documents added to SharePoint in real time, as well as a backlog of migrated content
  • Ensure all records are discoverable in order to comply with the Public Records Act and the Archives New Zealand Information Management Standards

 


 About Manawatu District Council


Manawatu District Council serves a population of roughly 30,000 people in an area about two hours’ drive from Wellington. Its main town, Feilding, has been awarded New Zealand’s Most Beautiful Town 16 times thanks to its picturesque Victorian and Edwardian- style buildings. The job of the Council is to support local infrastructure, public services, and regulatory management systems.

Migrating non-searchable legacy documents to SharePoint


Manawatu District Council (MDC) had a problem. Its legacy document management system was no longer supported and this was causing major issues. Information Team Leader at MDC, Mel Rush, explained they “had times when it would just crash, and IT would struggle to get it rebooted again. We had one case where we lost a bunch of documents that had been scanned in – and that’s just what we know about.”

So, MDC switched to Microsoft SharePoint and Mel and her team began moving hundreds of documents over to that environment. But this brought with it a new set of problems.

“We were bringing nearly 200,000 documents from two older systems into the new SharePoint environment. One of our biggest concerns was people being able to find the content they were looking for,” Mel explained. “Our legacy systems didn’t play ball when we started our migration project, and some of the metadata did not align with the records.” A lot of crucial information that gave the documents meaning was lost. “We had these arbitrary records sitting there, which you had to open in order to know what they were about.”

One of the key selling points of SharePoint is its Google-style search technology that makes finding content easier with the use of filters. Mel and her team had been selling this feature as a real benefit to staff and knew non-searchable documents would have undermined its value.

SharePoint’s search technology relies on metadata – of which MDC’s legacy content had very little or none. “Therefore, we needed our documents fully text-searchable to allow staff to find them,” Mel explained. “We also wanted these files to be converted to PDF to ensure all staff would be able to access them.”

As well as impacting the value of its new SharePoint environment, non-searchable documents were a risk. “In order to be compliant with the Public Records Act and the Archives New Zealand Information Management Standards, we needed our records to be discoverable.”

Mel and her team began looking for a solution to “OCR documents both within our new SharePoint environment and those migrated from legacy systems.” Optical Character Recognition (OCR) technology analyzes image files for the presence of text and converts them to searchable documents.

“We had analyzed our SharePoint environment and knew there were a total of 76,061 files that needed to be processed, including PDF, TIFF, and MSG files,” said Mel. Image-based PDFs, TIFFs, and MSG files do not have the text layer needed to be found by search technology. “The migrated content from legacy systems – nearly 200,000 files – were mostly TIFFS”.

Researching text recognition solutions


Mel and her team began to investigate available solutions. “Cost is always really high on the list of what we need to accommodate,” said Mel. “We were wary of buying something that would do what we wanted but also a hundred other things that we didn’t need. We wanted to really focus on what our actual needs were, and so, when we began researching possible solutions through Google, contentCrawler came out on top because it was able to do exactly what we needed.”

MDC trialed contentCrawler before making its final decision. “We were able to set contentCrawler up in our environment and have it process our actual content. By the time we’d deployed it fully, we had a good idea of what we were going to get out of it and how it was going to work.”

How contentCrawler is making files searchable


contentCrawler is integrated with MDC’s SharePoint environment, processing both new documents as they’re added as well as any legacy documents. “Every time someone adds something new, contentCrawler will look at it within 24 hours. We also have a backlog running that is just chugging through that older content,” explained Mel. “contentCrawler supports PDF, TIFF, and MSG files and is able to compress our documents, which really suits our way of working.”

A lot of building consent information was migrated to SharePoint as part of MDC’s property file digitization project. “contentCrawler is looking directly at that site to OCR those files as quickly as possible.”

For other organizations looking for better search


For other organizations looking for a way to enable effective search, Mel recommends reaching out to technology providers – like she did with DocsCorp – and ask questions.

“We weren’t 100% sure if it was going to work for us, but DocsCorp had a great support system in place right from the start,” said Mel. “DocsCorp not only wanted to help us use contentCrawler properly, they genuinely wanted to hear our feedback and use it to improve the product.”

“Having our documents processed by contentCrawler has made a massive difference to our users’ search experience, allowing them to work more efficiently.”

DOWNLOAD THE PDF
  • How MacRoberts simplified software management with DocsCorp
  • Madgwicks Lawyers increases productivity using a PDF editor with iManage integration
  • How DocsCorp and SeeUnity delivered a joint solution that cleans documents of metadata as they sync between systems
  • DocsCorp and Morae’s Phoenix Business Solutions: a partnership based on teamwork and trust
  • How Delphi uses technology from DocsCorp to minimize human error
  • How award-winning Benelux law firm Stibbe uses DocsCorp solutions for core legal workflows
  • How Stibbe used contentCrawler to index 28 million documents and emails for its enterprise search engine
  • Automating electronic binder production boosted productivity and morale at this leading Australian law firm
  • How U.S. law firm Taft eliminated licensing and performance issues when it switched to compareDocs for document comparison
  • How contentCrawler improves the user experience of searching in Microsoft SharePoint
  • An IT Infrastructure Manager explains how simple it was to manage compareDocs during a merger and Office 365 update
  • How confidence was restored in core applications like metadata cleaning and document comparison at UK firm Hempsons
  • How Top U.S. firm Shook, Hardy & Bacon automated OCR, resulting in more litigation items being filed in less time
  • How UK Top 100 Firm Winckworth Sherwood strengthened data protection for GDPR compliance
  • How DBL Law converts image files to searchable PDFs using batch OCR processing
  • How this insurance broker slashed paper usage with document comparison software
  • How Simpson Grierson found a better alternative for document comparison
  • How a PDF binder solution gave Simpson Grierson a competitive edge on billable hours, and their clients an even better experience
  • Preparing for changes to data privacy regulations: How Simpson Grierson improved and streamlined its metadata cleaning
  • A regulatory body in Australia’s legal industry reduces production costs for mandatory paperless Court Briefs by 84%
  • How an ISO 27001 certified law firm manages the risk of data breaches
  • Four solutions, one firm: How ByrneWallace uses the DocsCorp productivity suite
  • How Mobile Helix used compareDocs SDK to provide accurate document comparison in the LINK App for Lawyers
  • Seddons uses cleanDocs and contentCrawler to support their GDPR compliance goals
  • cleanDocs helps DBL Law protect against accidental data leaks
  • contentCrawler helps minimize the number of hidden files in Makinson d'Apice DMS
  • contentCrawler changed Becker & Poliakoff's OCR workflow from four steps to one

NEWS IN YOUR INBOX

Home
  • About DocsCorp
  • Disclaimer
  • Privacy Policy
  • GDPR Policy
  • Data Security
  • Accreditations
  • Service Level Agreement
  • Human Rights Policy
  • Anti-Slavery and Human Trafficking Policy
  • Anti-Bribery and Corruption Policy
  • COVID-19 Statement
Products
  • pdfDocs
  • pdfDocs Binder
  • compareDocs
  • compareDocs Cloud
  • compareDocs SDK
  • cleanDocs
  • cleanDocs Server
  • contentCrawler
  • contentCrawler Cloud
News
  • Press Releases
  • Events/Webinars
  • Industry Guides
  • Case Studies
  • Blog Posts
  • Infographics
  • Watch Our Story
myDocsCorp
  • Support login
  • Pay2Go
  • myDocsCorp
  • Training directory
  • Customer Support
  • Product FAQs
  • Find a partner
  • Contact us
  • blog
  • linkedin
  • twitter
  • facebook
logo

© Copyright DocsCorp 2021 - All rights reserved.