Samantha Jefferies unpacks how OCR processing was used to meet GDPR compliance requirements.
100% searchability of files is needed for full and complete Data Subject Access Requests (DSARs) under the GDPR. A DSAR response should include information stored in IT systems, email, network documents, and other mediums, and should provide a copy of or access to personal data being processed. All relevant data needs to be included in a DSAR response or firms risk a fine of up to 4% of global revenue. Yet, not every file in your system will be text-searchable.
Recognizing hidden information
In every system, image-based files like scanned invoices or client IDs represent non-searchable FILES. These image files can’t be found since they lack the text layer that search technology relies on.
Swedish firm Delphi ran an audit of their iManage document management system and discovered that 30% of their files were non-searchable and undermined their ability to find data as part of a DSAR response. The firm ended up storing a lot of personal data that was invisible to iManage search since many files were scanned driver licenses and passports for client identification that didn’t undergo Optical Character Recognition (OCR) processing – technology that adds a text layer to image files to make them searchable.
London-based Seddons performed a data mapping exercise to recognize where their data was located among their systems. It revealed that a significant amount of non-searchable files – scanned PDF, TIFF, JPG, and BMP – had banked up over time. Like Delphi’s, these files had not undergone OCR processing to make them discoverable by search.
Finding a solution
Delphi had investigated an OCR solution developed by their document management system provider, but it could only process those files that were stored within. Henrik Järnberg, Head of IT at Delphi explained that "law firms don't just store files in the DMS, firms also store documents in their network file shares.”
Seddons learned from their data audit they would need an OCR tool to continuously monitor files and recognize those that needed OCR’ing to become text-searchable. They also knew these text-searchable files should then be indexed by their Proclaim practice management system since that was where most searches were performed by users.
Making files 100% searchable
The solution to making image-based files searchable is OCR technology – but Delphi and Seddons both realized the key to successful OCR’ing was when and where the process happened. They each deployed back-end, automated OCR software that runs silently in the background. It searches and assesses both legacy documents already in their systems, and new files uploaded into iManage and Proclaim.
Today, the staff at Delphi and Seddons don’t have to worry about OCR as a workflow since every file requiring a text layer is caught and processed. Now, when the firms search for information relating to a subject access request, they can be confident they are seeing every file.
Discover more about the back-end, automated OCR software both Delphi and Seddons use to crawl for non-searchable files.
About the author
Samantha Jefferies is the Vice President of EMEA and is based in the DocsCorp London office. Samantha has over 20 years' sales and management experience working for leading technology companies such as NEC Display Solutions, Despatchbox as well as computer manufacturer Packard Bell.