Source Code Control for Data Scientists

XKCD explains git source code control.. 🙂

I work with many people who are recently out of academia. While they know how to code and are experts in their fields, they are lacking some of rigour of computer science that experienced developers have. In addition to understanding the problems of data in the wider world and testing their solutions properly, they are also unaware of the importance of source code control and deployment. This is another missing aspect from these courses – you cannot exist as a professional developer without it. While there are many source control setups, I’m most familiar with git.

I’ve recently written a how-to guide for my team and was going to make that the focus of this post, although I’ve seen some very good guides out there that are more generic, so I’d like to explain why source code control is important and then give you the tools to learn this yourself. Continue reading Source Code Control for Data Scientists

Why are data scientists so bad at science?

Do you check your inputs?

It’s rare that I am intentionally provocative in my post titles, but I’d really like you to think about this one. I’ve known and worked with a lot of people who work with data over the years, many of who call themselves data scientists and many who do the role of a data scientist but by another name1. One thing that worries me when they talk about their work is an absence of scientific rigour and this is a huge problem, and one I’ve talked about before.

The results that data scientists produce are becoming increasingly important in our lives; from determining what adverts we see to how we are treated by financial institutions or governments. These results can have direct impact on people’s lives and we have a moral and ethical obligation to ensure that they are correct. Continue reading Why are data scientists so bad at science?

Professional body for data science? Yes Please

Statistically significant
My new prized badge from the RSS

This week I was delighted to be at the Royal Statistical Society as a business representative for the launch of their Data Science Section. At over 160 years old, the RSS is one of the more established professional bodies and I like that it is questioning and making a difference as the application of their industry changes and when faced with an increasing challenge of abuse of statistical methods. I wish the general public had a greater understanding of statistics so they wouldn’t be so easily swayed by the media with a simple graph “proving” a point. Continue reading Professional body for data science? Yes Please