In an exclusive interaction with Dataquest, Matthew Hardman, Director Technical Experts, Data Intelligence, Hitachi Vantara talks about the impact of GDPR on AI, challenges that the companies are facing while deploying GDPR and what are the solutions for the same. Excerpts:
Q. According to you, what is the impact of GDPR on AI?
Most AIs are made up of large amounts of data as input, and complex algorithms to process that data and produce an insight for a human or another machine to take action on. Now GDPR will impact both areas, and while I could go into regulation by regulation, there are a few key areas that might impact AI most of all, including consent and permission to use data, understanding how the data is being processed and the right to be forgotten.
First of all, consent and permission to use data are complicated because, under the GDPR, organizations need to obtain permission from the user to capture data for specific use cases only. While this protects the individual from usage of their data in ways they don’t know about, it also potentially limits the organization from embarking on the sorts of science projects that had sprung up around AI, and the consumption of large amounts of data. Now if an organization wants to re-use data for another purpose not outlined in the original consent, they may need to go back and ask for an increased level of permission, this process can prove to be costly, and time-consuming that could impact innovation.
Understanding how data is being processed means that organizations will have to be much more aware of the algorithms and how they operate inside the organization. This is essential because they will be required to explain to the users, whose data they have collected, what the algorithms are doing, how they would work, and how the result would be used. This means that in many cases marketing communications, legal and other departments will become much more involved when new initiatives around data take place.
Finally, considering AI uses large amounts of data to build their models, if customers exercise their right to be forgotten, this may take chunks of information away from the AI, and have an overall impact on the AI insights themselves. Organizations may need to spend more time taking care to anonymize data before use to ensure that Personal Identifiable Information cannot be found.
Q. What challenges are Indian companies facing while deploying GDPR?
The biggest challenge for Indian companies, and pretty much any company in the world with GDPR, is knowing where all the user data is being stored. Typically, most people will look towards their centralized databases and applications hosted in their data centres to look for the data, but the reality is data doesn’t just exist in a controlled environment, it resides in peoples hard drives on their laptops, on file shares on a server, it resides in backups on tape… there are many locations where user data can be intentionally or unintentionally stored. We recently worked with a company in the financial industry who was looking at their user file services and what they had been storing in their shares. It turns out that 50% off all the data (and we are talking about a petabyte of files), was in the format of excel and comma-delimited file format (.csv) files. Now if you think about the .csv file for a second, these are usually exports of data from other applications, so what is happening here is that the data itself is being removed from the application, and also the security that application had implemented, and put into a general file share… the risks are enormous. Now when we talk about backup tapes, how can we find the user data in those backups, and even if we do, how do we enable an individual’ “right to be forgotten”? Do you destroy the whole tape?
There are many challenges technically when trying to get ready for GDPR, but you must start with the simple recognition that you may not know where all the data is.
Q. What is the solution for the same?
To be honest, there isn’t one silver bullet that will enable you to solve all of these challenges, what customers need to adopt is a variety of approaches that combine user practices and technology to put you back in control of data again, but here are three practical suggestions from a technology point of view.
- Take control of user file data: Understanding the data that your users are creating, consuming and collaborating on is a critical step to taking control of the uncontrolled spread of data. Customers can provide their users with a compelling experience that gives them enhanced control and discovery of data in the hands of Enterprise Users. At Hitachi, we have and use thoroughly a solution called HCP-Anywhere, that presents to all of our users a modern file sharing experience that can run on all your devices from PCs, iPads, Phones, Browsers etc. This experience allows users to place all their data in a single location on their machine, in the background, that data is being replicated to a centralized object storage which is able to catalogue, and perform workflows and analysis on the data to see what data could be potentially harmful to the organization from something like GDPR or any other regulation. The result is the user is given a modern day file storage and collaboration experience, and IT gets control in identifying the data that users are creating and consuming.
- Step up your use of metadata:Metadata is a key enabler to helping an organization take control of all the various data that is being produced from applications, data centres, users etc. Metadata is simply data that describes data. To give you a good comparison of how important metadata is, imagine being asked to buy a tin of baked beans, and when you go to the supermarket, there are no signs telling you what is in the aisles, there are no labels on the shelves indicating sections, and there are no labels on the cans. The task would be incredibly difficult and time-consuming, all of those labels and signs etc are in essence metadata, and you can do the same thing with your data you capture today. Similar to the user file services example earlier, you can start to search and catalogue all the information in your enterprise and inject metadata into it to find it again. This could be data from anything as simple as an Excel spreadsheet, to a recorded call centre call, but all this data can start to be aggregated into a single storage service and can be injected with metadata to make it more discoverable. Now before you ask, isn’t adding metadata to all that data extremely time consuming, well yes if you did it manually, however, we work with a large variety of partners and ISVs that have built systems to do the automatic extraction of metadata from the data that can then be used for discovery.
- Consider going tapeless: Customers have never had a better reason to start implementing a modernized approach to data protection than ever before, going tapeless and backing up to object storage gives you a way to discover the PII data that is stored within those backups and take the appropriate action if and when the need arises.
GDPR is not a simple regulation, but with a good technology partner, customers can take control of how they manage and apply the GDPR in their organization.