By Dean Sappey, President and Co-Founder of DocsCorp
In July 2017, DocsCorp surveyed respondents from different regions around searchability in their document management systems. We were expecting to see a portion at risk of having hidden files in their repositories, but what we didn't expect to see was that a huge 81 percent of survey respondents couldn't always find the files they were looking for. And, since 70 percent of respondents performed 11-26 (or more) search queries within a single day, people are clearly spending a lot of time searching for, but not finding, what they need.
Hidden Files and the GDPR
Firms in the United States can and will be impacted by the new General Data Protection Regulation (GDPR) taking effect in Europe and the United Kingdom from May 2018 onward. The regulations will exist to give citizens greater access to and power over personal data held by an organization. Any U.S. firm in possession of a European or U.K. citizen's personal data will need to be compliant with GDPR's new, stricter rules around managing that data, irrespective of where in the world it is based. Failure to do so could lead to fines of 4 percent of global revenue or in excess of $24 million USD (whichever is larger).
Any hidden files within a firm's document management system could be putting the firm at serious risk of breaching GDPR. If a U.K. or European citizen lodges a Data Subject Access Request (DSAR) and the firm is not able to produce every file containing relevant personal data, it is failing to comply. Additionally, a DMS that is not fully searchable will make responding to any kind of access request painfully difficult or impossible.
Managing Hidden Files in Time for the GDPR Deadline
In order to bring existing systems up to date and implement any new processes, firms need to begin preparing for GDPR now. Of survey respondents in the U.S., 81 percent said they already had optical character recognition processing in place to make all files searchable. So why do so many files stay hidden?
The main culprits behind hidden files lurking in a firm's DMS are image-based documents like JPGs, TIFFs, PNGs and image PDFs. Though some of these documents are processed by OCR technology before being profiled into the system, many are missed. Files that are missed do not get text-indexed since they are image files with no text. Unindexed, they effectively become invisible to search technology.
These rogue image files have bypassed the OCR process in a number of ways; some are paper documents scanned and saved as TIFFs or image PDFs; they are email messages with TIFF or image-based PDF attachments; or they are legacy images, PDF or email documents originating from business acquisitions or litigation file ingestion.
The standard OCR practice for files is to process them as soon as the firm receives them. Yet think about how much time staff members are spending on the OCR process for documents at their desks or feeding documents into a scanning device. Or consider how documents skip the OCR process altogether because people simply forget to do it. Gina Johnson, Systems Engineer at Bowman & Brooke, LLP, saw it wasn't working at her firm to have them manually process files for OCR: "[Staff] would either have to open the files and then process them for OCR within the application, or they would be doing them on the scanner." Automated OCR would have removed the inconsistency and hassle for staff.
One solution would be to process documents for OCR only after they have been saved into the DMS. Moving the process to the back end will make the OCR process less time-consuming and leave less room for error while ensuring that 100 percent of files are made searchable once they enter the DMS, regardless of how they got there. Bowman & Brooke is using a back-end approach to stay on top of the constant stream of files entering their DMS. Information Technology Director Ed Jorczyk said, "We'll be able to stay current, and attorneys will always be able to search and find things, no matter how old they are, whether they're new documents or old documents."
Kris Stojanovski, Network Manager at Jaffe, Rait, Heuer & Weiss, also moved his firm's OCR process into the DMS and found that "it gives some value-add to the attorney when they're actually looking for a document. They can actually find PDF documents that weren't searchable before that are now searchable."
Finally, 75 percent of all U.S. respondents who weren't already using OCR to find hidden files would consider purchasing it if they knew hidden files existed in their systems. Run an audit of your DMS today and see if there is a lesson to be learned. If you have hidden files—and our recent survey indicates it's likely that you do—consider extending your technology umbrella to include tools to solve the problem.
This article was first published in ILTA's Winter 2017 issue of Peer to Peer titled "Finders Weepers: Uncovering the Hidden Files in Your DMS Before It's Too Late" and is reprinted here with permission.