Data: access and ethics

Last week I attended two events back to back discussing all things data, but from different angles. The first, Open Data, hosted by the Economist was an event looking at how businesses want to use data and the ethical (legal) means that they can acquire it. The second was a round table discussion of practitioners that I chaired hosted by Ammonite Data, where we mainly focussed on the need for compliance and balancing protection of personal data with the access that our companies need in order to do business effectively.

We’re in a world driven by data. If you don’t have data then you can’t compete. While individuals are getting more protective over their data and understanding its value, businesses are increasingly wanting access to more and more – at what point does legitimate interest or consumer need cross the line?

The premise of the Econimist event was simple: historically, companies would send mystery shoppers into their competitors to investigate prices, these days, while the information is available on line, businesses are making it harder to access. Is it okay for competitors to get the data they need by scraping websites? Surely there’s a better way?

The panel from the Economist open source data event

Everyone agreed that data collection should be above board and not some sort of cat and mouse game of underhand acquisition. Price checking is important to help the market flow and people (businesses) are not afraid to pay for data that will give them business value. However, there is a lack of regulation on how much of this data can be accessed and used and we are yet to have proper debate to drive this regulation. Some companies are self regulating with their own ethical framework, but if your competitors are doing something that is unregulated and gaining competitive advantage it can be hard as a business to stick to this.

There was a timely reminder from Jeni Tennison from the Open Data Institute that while data on the web may be publicly available it is mostly not open licensed – something I find myself saying a lot at conferences – and it would be helpful to have some legislative clarity on when reuse is legitimate.

Start of the panel session at #EconOpenSource

The example of websites increasing prices for hotels and flights based on whether or not you were using an apple product was brought up to illustrate how consumers are changing their behaviour and how they use tools because of how companies are using their information. This led on to GDPR, which was felt as a good thing for providing more clarity on what must be done with personal data. Steve King, from Black Swan, said that they retrain every single analytical model when they have a removal request, which was painful to implement but the right thing to do. While this is certainly best practise, I don’t believe this is a requirement of GDPR and is one of the grey areas that really needs more clarity.

International differences mean that companies will have to deal with multiple sets of rules and potentially local hardware so that data can be processed in its originating country. The panel felt that this would be an evolution but I feel it’s unlikely that we would ever get to a universal standard, just because of how different countries handle their citizen’s data.

Most individuals are happy to exchange data for value – if you can show context and relevance then they may be more accepting (this was something we discussed heavily in the second event, which I’ll get to shortly). Bur what company will share the details of how they get their predictions? If you understand the rules then it becomes a trivial exercise to game the system to get the best prices or offers. At this point there was a suggest from the audience, who was revealed as a lawyer, “why not remove copyright from everything online?” This was described as an interesting philosophical idea but the panel agreed it just wouldn’t work.

The panel wrapped up with comments that data drives innovation and it is critical to keep ethics, fairness, trust and security in mind. Education and openness is key. There should always be a mechanism for people to opt out of data collection and value occurs when we can bring data together in innovative ways. In many cases, innovation is stifled by so many siloed data sets.

The next day I was straight into a breakfast round table hosted by Ammonite Data. I’ve chaired these before and it’s always a great engaging discussion. We had two topics to discuss GDPR and technology stacks. After the previous night discussing access to data from a use point of view, we now looked at how to comply with the regulation that does exist.

Everyone around the table was with a company who were taking GDPR seriously, but we all knew of companies that were skirting the lines because they didn’t see the risk. While the Economist event highlighted the important of education and openness, the morning’s discussion showed that the cookie consent form was seen by almost everyone as another annoying pop up. The Daily Mash is one of the sites that has capitalised on this lacksadasical attitude with their “whatever” cookie consent. Education and openness isn’t really working. We make decisions as to whether we really want to the information on the site and if we do, we consent, otherwise we back out (regardless of whether this correctly handles our lack of consent or not…). People tend not to click through to see the detail of what their data is used for – I’d love to see some statistics on this as while it felt right, it was just anecdotal.

The Daily Mash realises that nobody reads these consent banners anyway…

One of the other key points is that it is exceedingly difficult to be GDPR compliant with legacy systems and if you were determined, you could probably find some sort of accidental breach in most companies, no matter how hard they tried to comply. Risk management is key.

One of the interesting effects is that well intentioned applications could have unforeseen consequences with our data. One of the attendees mentioned that his phone thought he was repeatedly going to a benign business with an amusing name, which just so happened to be close to where they picked up their children. While this was funny, it’s easy to see how our devices could leak very sensitive information about our movements. By the same token, much of the information collected about us is not used effectively – companies can appear clueless when they advertise items to us that we’ve already purchased. In many ways, this makes people ambivalent to use of their data and at the same time suspicious of the larger companies.

The UK has many companies who are trying to do the right thing with data, both in terms of acquisition and use. While this should not automatically earn our trust, we need to have the power to make our own choices with our data based on both our individual benefit and understanding that there could be social benefits for pooled data.

Published by

janet

Dr Janet is a Molecular Biochemistry graduate from Oxford University with a doctorate in Computational Neuroscience from Sussex. I’m currently studying for a third degree in Mathematics with Open University. During the day, and sometimes out of hours, I work as a Chief Science Officer. You can read all about that on my LinkedIn page.