Last week I was interviewed by Keith Robinson of Ammonite Data, with a topic of managing data science teams remotely and all the challenges this brings. We had a much more wide ranging conversation where I looked at challenges of communication and even the impact on models that the current extraordinary events will have.
One of the things that I have been complaining about with many of the data science masters courses is that they are missing a lot of the basic skills that are essential for you to be able to be effective in a business situation. It’s one of the things I was going to talk about at the Women in AI event that was postponed this week and I’m more than happy to work with universities who want to help build a course1. That said, some universities are realising this is missing and adding it as optional courses.
My LinkedIn news feed was lit up last week by a medium post from Dario Radečić originally posted in December 2019 discussing how much maths is really needed for a job in data science. He starts with berating the answers from the Quora posts by the PhD braniacs who demand you know everything… While the article is fairly light hearted and is probably more an encouragement piece to anyone currently studying or trying to get that first job in data science, I felt that, as someone who hires data scientists1, I could add some substance from the other side.
Getting any role in IT can be daunting as a first timer,
whether it’s your first ever job or you’ve changed career or you’ve had a break
and are returning as a junior in a new field or anything else. Getting one in any part of AI can be even
more of an up hill struggle. Job posting
and recruitment agencies are asking for PhDs, academic papers and post-doctoral
research as well as years of experience in industry. How can you get past that first barrier? I get a lot of people asking me this when I
present at Meet-Ups so thought I’d collate everything into one post.
I’m going to break down how you can demonstrate the skills
that businesses need and how to talk confidently abut what you can offer
without the fluff.
It’s not often that I feel the need to write a reactionary post as mainly the things that tend to inflame me are usually by design. However today I read something on LinkedIn that caused a polarisation in debate within a group of people who should really appreciate learning from different data: Data Scientists.
What was interesting was how the responses fell neatly into one of two camps: the first praising the poster for speaking out and saying this, supported by nearly an order of magnitude more likes than the total number of comments, and the second disagreeing and pointing out that it can work. What has been lost in this was that “can” is not synonymous with “always” – it really needs a good team and better explanation than many companies sometimes use. What irked me most about the whole thread was the accusation that people doing data science with agile obviously “didn’t understand what science was”. I hate these sweeping generalisations and I really do expect a higher standard of debate from anyone with either “data” or “science” anywhere near their profile. Continue reading Agile Data Science: your data point is probably an outlier
By now, the majority of people who keep up with the news will have heard of Cambridge Analytica, the whistle blower Christopher Wylie, and the news surrounding the harvesting of Facebook data and micro targeting, along with accusations of potentially illegal activity. In amongst all of this news I’ve also seen articles that this is the “awakening ” moment for ethics and morals AI and data science in general. The point where practitioners realise the impact of their work.
“Now I am become Death, the destroyer of worlds”, Oppenheimer
I chaired a breakfast meeting for Women in Data Science recently, and one of the topics for discussion was how to retain talent. While demand is outstripping supply and the market is going crazy, it’s enough of a minefield finding good people in the first place.
Add to this that even after you’ve made an offer to someone, recruiters will be contacting them regularly to try to tempt them away to other roles. It’s impossible to prevent this. I’m a big believer in not playing games with recruitment – I know what I can afford and won’t get into a bidding war. If I’m paying a fair salary and they go elsewhere for money, then they are more likely to jump when a recruiter calls regardless of how well you incentivise them. This isn’t a big company or small company thing, if you want to keep hold of your team after you’ve done the very hard job of hiring them then you need to understand what motivates them and either make sure that you continue to provide those needs or plan to be hiring again in the next 12-24 months. Continue reading Incentivising data scientists
I work with many people who are recently out of academia. While they know how to code and are experts in their fields, they are lacking some of rigour of computer science that experienced developers have. In addition to understanding the problems of data in the wider world and testing their solutions properly, they are also unaware of the importance of source code control and deployment. This is another missing aspect from these courses – you cannot exist as a professional developer without it. While there are many source control setups, I’m most familiar with git.
I’ve recently written a how-to guide for my team and was going to make that the focus of this post, although I’ve seen some very good guides out there that are more generic, so I’d like to explain why source code control is important and then give you the tools to learn this yourself. Continue reading Source Code Control for Data Scientists
It’s rare that I am intentionally provocative in my post titles, but I’d really like you to think about this one. I’ve known and worked with a lot of people who work with data over the years, many of who call themselves data scientists and many who do the role of a data scientist but by another name1. One thing that worries me when they talk about their work is an absence of scientific rigour and this is a huge problem, and one I’ve talked about before.
The results that data scientists produce are becoming increasingly important in our lives; from determining what adverts we see to how we are treated by financial institutions or governments. These results can have direct impact on people’s lives and we have a moral and ethical obligation to ensure that they are correct. Continue reading Why are data scientists so bad at science?
This week I was delighted to be at the Royal Statistical Society as a business representative for the launch of their Data Science Section. At over 160 years old, the RSS is one of the more established professional bodies and I like that it is questioning and making a difference as the application of their industry changes and when faced with an increasing challenge of abuse of statistical methods. I wish the general public had a greater understanding of statistics so they wouldn’t be so easily swayed by the media with a simple graph “proving” a point. Continue reading Professional body for data science? Yes Please