When I attended the ReWork Deep Learning conference in Boston in May 2016, one of the most interesting talks was about the Echo and the Alexa personal assistant from Amazon. As someone whose day job is AI, it seemed only right that I surround myself by as much as possible from other companies. This week, after it being on back order for a while, it finally arrived. At £50, the Echo Dot is a reasonable price, with the only negative I was aware of before ordering being that the sound quality “wasn’t great” from a reviewer. Continue reading Amazon Echo Dot (second generation): Review
As part of a few hours catching up on machine learning conference videos, I found this great talk on what can go wrong with machine recommendations and how testing can be improved. Evan gives some great examples of where the algorithms can give unintended outputs. In some cases this is an emergent property of correlations in the data, and in others, it’s down to missing examples in the test set. While the talk is entertaining and shows some very important examples, it made me realise something that has been missing. The machine learning community, having grown out of academia, does not have the same rigour of developmental processes as standard software development.
Regardless of the type of machine learning employed, testing and quantification of results is all down to the test set that is used. Accuracy against the test set, simpler architectures, fewer training examples are the focal points. If the test set is lacking, this is not uncovered as part of the process. Models that show high precision and recall can often fail when “real” data is passed through them. Sometimes in spectacular ways as outlined in Evan’s talk: adjusting pricing for Mac users, Amazon recommending inappropriate products or Facebook’s misclassification of people. These problems are either solved with manual hacks after the algorithms have run or by adding specific issues to the test set. While there are businesses that take the same approach with their software, they are thankfully few and far between and most companies now use some form of continuous integration, automated testing and then rigorous manual testing.
The only part of this process that will truly solve the problem is the addition of rigorous manual testing by professional testers. Good testers are very hard to find, in some respect harder than it is to find good developers. Testing is often seen as a second class profession to development and I feel this is really unfair. Good testers can understand the application they are testing on multiple levels, create the automated functional tests and make sure that everything you expect to work, works. But they also know how to stress an application – push it well beyond what it was designed to do, just to see whether these cases will be handled. What assumptions were made that can be challenged. A really good tester will see deficiencies in test sets and think “what happens if…”, they’ll sneak the bizarre examples in for the challenge.
One of the most difficult things about being a tester in the machine learning space is that in order to understand all the ways in which things can go wrong, you do need some appreciation of how the underlying system works, rather than a complete black box. Knowing that most vision networks look for edges would prompt a good tester to throw in random patterns, from animal prints to static noise. A good tester would look of examples not covered by the test set and make sure the negatives far outweigh the samples the developers used to create the model.
So where are all the specialist testers for machine learning? I think the industry really needs them before we have (any more) decision engines in our lives that have hidden issues waiting to emerge…
Following my post on AI for understanding ambiguity, I got into a discussion with a friend covering the limitations of AI if we only try to emulate ourselves. His premise was that we know so little about how our brains actually enable us to have our rich independent thoughts that if we constrain AI to what we observe, an ability to converse in our native language and perform tasks that we can do with higher precision, then we will completely limit their potential. I had a similar conversation in the summer of 2015 while at the start-up company I joined1– we spent a whole day2 discussing whether in 100 years’ time the only jobs that humans would do would be to code the robots. While the technological revolution is changing how we live and work, and yes it will remove some jobs and create others just like the industrial revolution did and ongoing machine automation has been doing, there will always be a rich variety of new roles that require our unique skills and imagination, our ability to adapt and look beyond what we know. Continue reading Evolving Machines
I’ve been with my current company for 9 months as Chief Information Officer and had responsibility for everything technical from production systems down to ensuring the phone systems worked and everything in between. The only technical responsibilities not under me was the actual development and QA of our products. CIO is a thankless role – when everything is going right, questions are raised over the size of the team and the need to replace servers and budget for new projects. When something breaks, for whatever reason, you are the focus of the negativity until it is resolved. The past 9 months have been a rollercoaster of business needs, including many sleepless nights. However, I can look back on this knowing that when I do finally get around to writing about my experiences as a woman in IT I will have a lot of fun stories for the CIO chapter1. While I didn’t have the opportunity to finish off as many of the improvement projects as I would have liked, I’ve built up a fantastic team and know that they’ll continue to do a fantastic job going forward.
Last year I wrote a post on whether machines could ever think1. Recently, in addition to all the general chatbot competitions, there has been a new type of test for deeper contextual understanding rather than the dumb and obvious meanings of words. English2 has a rich variety of meanings of words with the primary as the most common and then secondary and tertiary meanings further down in the dictionary. It’s probably been a while since you last sat down and read a dictionary, or even used an online one other than to find a synonym, antonym or check your spelling3 but as humans we rely mostly on our vocabulary and context that we’ve picked up from education and experience.
The day started with a great intro from Jana Eggers with a positive message about nurturing this AI baby that is being created rather than the doomsday scenario that is regularly spouted. We are a collaborative discipline of academia and industry and we can focus on how we use this for good. Continue reading ReWork DL Boston 2016 – Day 1
There’s a lot of money, time and brain power going in to various machine learning techniques to take the aggravation out of manually tagging images so that they appear in searches and can be categorised effectively. However, we are strangely fault-intolerant of machines when they get it wrong – too many “unknowns” and we’re less likely to use the services but a couple of bad predictions and we’re aghast about how bad the solution is.
With a lot of the big players coming out with image categorisers, there is the question as whether it’s really worth anyone building their own when you can pay a nominal fee to use the API of an existing system. The only way to really know is to see how well these systems do “in the wild” – sure they have high precision and recall on the test sets, but when an actual user uploads and image and frowns at the result, something isn’t right. Continue reading AI for image recognition – still a way to go
A few days ago, researchers from Facebook published a paper on a deep learning technique to create “natural images”, with the result being that human subjects were convinced 40% of the time that they were looking at a real image rather than an automatically generated one. When I saw the tweet linking this, one of the comments1 indicated that you’d “need a PhD to understand” the paper, and thus make any use of the code Facebook may release.
I’ve always been a big believer in knowledge being accessible both from being freely available (as their paper is) and also that any individual who wants to understand the concepts presented should be able to, even if they don’t have extensive training in that specialism. So, as someone who does have a PhD and who is in the deep learning space, challenge accepted and here I’ll discuss what Facebook have done in a way that doesn’t require advanced degrees, but rather just a healthy interest in the field2. Continue reading Facebook’s latest deep learning research
At the ReworkDL conference in Boston last month I listened to a fantastic presentation by Ryan Adams of Whetlab on how they’d created a business to add some science to the art of tuning deep learning engines. I signed up to participate to their closed beta and came back to the UK very excited to use their system once I’d got my architecture in place. Yesterday they announced that they had signed a deal with Twitter and the beta would be closed. I was delighted for the team – the business side of me is always happy when a start-up is successful enough to get attention of a big corporate, although I was personally gutted as it means I won’t be able to make use of their software to improve my own project.
This, for me, is a tragedy. Continue reading Whetlab and Twitter