Are you a 5/5 Data Scientist?

What does this even mean and why are people putting it on their CVs? 🙂

Towards the end of 2020 I was lucky enough to be hiring for several new positions in my team1. Given the times that we are in, there are many more applicants for roles than there was even a year ago. I’ve spoken before about the skills that you need to get a role as a data scientist and there are specific things I expect to see so I can judge experience and competency when I’m looking at these pieces of paper so I can decide who I want to interview.

Sadly I’m seeing a lot of cringeworthy things on CVs that are the fastest way to put a candidate on the no pile when they reach me. These things might get you past HR and also past some recruitment agents, and I wonder if this is why candidates do them. I try and give as much feedback as I can, although sometimes the sheer volume of CVs and the time taken for constructive feedback would be more than a full time job. By sharing some of these things more publicly I hope to pass this advice on to as many as possible.

Continue reading Are you a 5/5 Data Scientist?

Let’s talk about testing

One of the things that I find I have to teach data scientists and ML researchers almost universally is understanding how to test their own code. Too often it’s all about testing the results and not enough about the code. I’ve been saying for a while that a lack of proper testing can trip you up and recently we saw a paper that rippled through academia about a “bug” in some code that everyone used…

A Code Glitch May Have Caused Errors In More Than 100 Published Studies

https://www.vice.com/en_us/article/zmjwda/a-code-glitch-may-have-caused-errors-in-more-than-100-published-studies

The short version of this is that back in 2014, a python protocol was released for calculating molecule structure through NMR shifts1 and many other labs have been using this script over the past 5 years.

Continue reading Let’s talk about testing

Literate programming – effect on performance

Example from the MNIST data set used in this experiment
Example from the MNIST data set used in this experiment

After my introductory post on Literate Programming, it occurred to me that while the concept of being able to create documentation that includes variables from the code being run is amazing, this will obviously have some impact on performance.  At best, this would be the resource required to compile the \LaTeX document as if it was static, while the “at worst” scenario is conceptually unbounded.  Somewhere along the way, pweave is adding extra code to pass the variables back and forth between the python and the \LaTeX, how and when it does this could have implications that you wouldn’t see in a simple example but could be catastrophic when running the kind of neural nets that my department are putting together. So, being a scientist, I decided to run a few experiments….1 Continue reading Literate programming – effect on performance

Using Literate Programming in Research

Literate Programming by Donald Knuth
Literate Programming by Donald Knuth

Over my career in IT there have been a lot of changes in documentation practises, from the heavy detailed design up front to lean1 and now the adoption of literate programming, particularly in research (and somewhat contained to it because of the reliance on \LaTeX as a markup language2).  While there are plenty of getting started guides out there, this post is primarily about why I’m adopting it for my new Science and Innovations department and the benefits that literate programming can give. Continue reading Using Literate Programming in Research

Python: serious language or just for beginners?

Python logo and code smapleTwo months ago I hadn’t looked at a line of Python code – it was never a requirement when I was a developer and as I moved into management I worked with teams and projects using everything from C and COBOL through LAMP to .Net, while Python sat on the periphery.  I’d always considered it to be a modern BASIC – something you did to learn how to code or for a quick prototype but not something to be taken seriously in a professional environment.

I’ve always believed that really good programmers understand the boundaries and strengths of multiple languages, able to choose the right tool for the job, and finding the correct compromise for consistency and maintainability.  People like this are really hard to find1 although I do tend to veer away from individuals who can only evangalise a single language and say all the others are rubbish2.  Due to the projects I’ve been involved with, Python ability has been irrelevant and never considered part of that toolbox. Continue reading Python: serious language or just for beginners?