It’s not often that I feel the need to write a reactionary post as mainly the things that tend to inflame me are usually by design. However today I read something on LinkedIn that caused a polarisation in debate within a group of people who should really appreciate learning from different data: Data Scientists.
What was interesting was how the responses fell neatly into one of two camps: the first praising the poster for speaking out and saying this, supported by nearly an order of magnitude more likes than the total number of comments, and the second disagreeing and pointing out that it can work. What has been lost in this was that “can” is not synonymous with “always” – it really needs a good team and better explanation than many companies sometimes use. What irked me most about the whole thread was the accusation that people doing data science with agile obviously “didn’t understand what science was”. I hate these sweeping generalisations and I really do expect a higher standard of debate from anyone with either “data” or “science” anywhere near their profile. Continue reading Agile Data Science: your data point is probably an outlier
By now, the majority of people who keep up with the news will have heard of Cambridge Analytica, the whistle blower Christopher Wylie, and the news surrounding the harvesting of Facebook data and micro targeting, along with accusations of potentially illegal activity. In amongst all of this news I’ve also seen articles that this is the “awakening ” moment for ethics and morals AI and data science in general. The point where practitioners realise the impact of their work.
“Now I am become Death, the destroyer of worlds”, Oppenheimer
I chaired a breakfast meeting for Women in Data Science recently, and one of the topics for discussion was how to retain talent. While demand is outstripping supply and the market is going crazy, it’s enough of a minefield finding good people in the first place.
Add to this that even after you’ve made an offer to someone, recruiters will be contacting them regularly to try to tempt them away to other roles. It’s impossible to prevent this. I’m a big believer in not playing games with recruitment – I know what I can afford and won’t get into a bidding war. If I’m paying a fair salary and they go elsewhere for money, then they are more likely to jump when a recruiter calls regardless of how well you incentivise them. This isn’t a big company or small company thing, if you want to keep hold of your team after you’ve done the very hard job of hiring them then you need to understand what motivates them and either make sure that you continue to provide those needs or plan to be hiring again in the next 12-24 months. Continue reading Incentivising data scientists
I work with many people who are recently out of academia. While they know how to code and are experts in their fields, they are lacking some of rigour of computer science that experienced developers have. In addition to understanding the problems of data in the wider world and testing their solutions properly, they are also unaware of the importance of source code control and deployment. This is another missing aspect from these courses – you cannot exist as a professional developer without it. While there are many source control setups, I’m most familiar with git.
I’ve recently written a how-to guide for my team and was going to make that the focus of this post, although I’ve seen some very good guides out there that are more generic, so I’d like to explain why source code control is important and then give you the tools to learn this yourself. Continue reading Source Code Control for Data Scientists
It’s rare that I am intentionally provocative in my post titles, but I’d really like you to think about this one. I’ve known and worked with a lot of people who work with data over the years, many of who call themselves data scientists and many who do the role of a data scientist but by another name1. One thing that worries me when they talk about their work is an absence of scientific rigour and this is a huge problem, and one I’ve talked about before.
The results that data scientists produce are becoming increasingly important in our lives; from determining what adverts we see to how we are treated by financial institutions or governments. These results can have direct impact on people’s lives and we have a moral and ethical obligation to ensure that they are correct. Continue reading Why are data scientists so bad at science?
This week I was delighted to be at the Royal Statistical Society as a business representative for the launch of their Data Science Section. At over 160 years old, the RSS is one of the more established professional bodies and I like that it is questioning and making a difference as the application of their industry changes and when faced with an increasing challenge of abuse of statistical methods. I wish the general public had a greater understanding of statistics so they wouldn’t be so easily swayed by the media with a simple graph “proving” a point. Continue reading Professional body for data science? Yes Please