In July of 2017, we surveyed document management system (DMS) users to see if they were aware of the presence – and danger – of hidden files within their repositories. The results were concerning.
81% of respondents in America, Canada, and Australia said they could not locate the exact file they were searching for all of the time. In Europe, that number was 69%, which is worrying given the approaching deadline of the General Data Protection Regulation (GDPR). The new legislation mandates that all information held on an EU citizen be made available upon request. If an organization cannot find a file as part of a response to a request they could be fined up to 20,000,000 Euros or 4% of global revenue – whichever is highest.
It's one thing to know you have hidden files, but it's understanding the cause and the fix that is key to minimizing risk.
How files become hidden
If an image-based file, like a scanned document, doesn’t go through Optical Character Recognition (OCR) processing it becomes hidden to search technology because it doesn’t have the text layer needed for it to be found.
The results of our survey suggest that despite 81% of respondents having OCR technology to make image-based files text searchable, files are still being missed.
DocsCorp co-founder Dean Sappey explains that this could be happening because, “a) the OCR tool they are using is not automated and requires users to OCR files manually before adding them to a DMS – a process that can be easily skipped or forgotten, or b) large volumes of documents have been ingested in the system prior to the company acquiring OCR technology.”
How hidden files are found
The first step in solving a hidden data problem is admitting you have one. The best way to do this is to run a simple audit of your DMS using a tool like contentCrawler. Within 48 hours you will know exactly what percentage of documents in your DMS are not searchable.
The second step is to implement an OCR framework or reassess your existing workflow. For every file to pass through OCR and be made searchable the tool should work in the backend – i.e. after the file has been added into the DMS. This way files won’t get skipped or forgotten.
Request your own free hidden data audit to see how many files in your DMS are non-searchable. For more information on how hidden data could be putting you at risk download our free white paper: Invisible Documents: A Serious Risk to ECM Compliance and Productivity.