python – janjanjan

Are you a 5/5 Data Scientist?

What does this even mean and why are people putting it on their CVs? 🙂

Towards the end of 2020 I was lucky enough to be hiring for several new positions in my team¹. Given the times that we are in, there are many more applicants for roles than there was even a year ago. I’ve spoken before about the skills that you need to get a role as a data scientist and there are specific things I expect to see so I can judge experience and competency when I’m looking at these pieces of paper so I can decide who I want to interview.

Sadly I’m seeing a lot of cringeworthy things on CVs that are the fastest way to put a candidate on the no pile when they reach me. These things might get you past HR and also past some recruitment agents, and I wonder if this is why candidates do them. I try and give as much feedback as I can, although sometimes the sheer volume of CVs and the time taken for constructive feedback would be more than a full time job. By sharing some of these things more publicly I hope to pass this advice on to as many as possible.

Let’s talk about testing

One of the things that I find I have to teach data scientists and ML researchers almost universally is understanding how to test their own code. Too often it’s all about testing the results and not enough about the code. I’ve been saying for a while that a lack of proper testing can trip you up and recently we saw a paper that rippled through academia about a “bug” in some code that everyone used…

A Code Glitch May Have Caused Errors In More Than 100 Published Studies
https://www.vice.com/en_us/article/zmjwda/a-code-glitch-may-have-caused-errors-in-more-than-100-published-studies

The short version of this is that back in 2014, a python protocol was released for calculating molecule structure through NMR shifts¹ and many other labs have been using this script over the past 5 years.

Literate programming – effect on performance

Example from the MNIST data set used in this experiment

After my introductory post on Literate Programming, it occurred to me that while the concept of being able to create documentation that includes variables from the code being run is amazing, this will obviously have some impact on performance. At best, this would be the resource required to compile the $\LaTeX$ document as if it was static, while the “at worst” scenario is conceptually unbounded. Somewhere along the way, pweave is adding extra code to pass the variables back and forth between the python and the $\LaTeX$ , how and when it does this could have implications that you wouldn’t see in a simple example but could be catastrophic when running the kind of neural nets that my department are putting together. So, being a scientist, I decided to run a few experiments….¹ Continue reading Literate programming – effect on performance

Using Literate Programming in Research

Over my career in IT there have been a lot of changes in documentation practises, from the heavy detailed design up front to lean¹ and now the adoption of literate programming, particularly in research (and somewhat contained to it because of the reliance on $\LaTeX$ as a markup language²). While there are plenty of getting started guides out there, this post is primarily about why I’m adopting it for my new Science and Innovations department and the benefits that literate programming can give. Continue reading Using Literate Programming in Research

Python: serious language or just for beginners?

Two months ago I hadn’t looked at a line of Python code – it was never a requirement when I was a developer and as I moved into management I worked with teams and projects using everything from C and COBOL through LAMP to .Net, while Python sat on the periphery. I’d always considered it to be a modern BASIC – something you did to learn how to code or for a quick prototype but not something to be taken seriously in a professional environment.

I’ve always believed that really good programmers understand the boundaries and strengths of multiple languages, able to choose the right tool for the job, and finding the correct compromise for consistency and maintainability. People like this are really hard to find¹ although I do tend to veer away from individuals who can only evangalise a single language and say all the others are rubbish². Due to the projects I’ve been involved with, Python ability has been irrelevant and never considered part of that toolbox. Continue reading Python: serious language or just for beginners?

Share this:

Share this:

Share this:

Share this:

Share this: