I’m four weeks in to my new role and one of the threads of work I have is looking into machine learning and how this has advanced since my own thesis. The current approach to machine intelligence is via learning networks where the data is abstracted: rather than recognising specifics about the problem, the algorithm learns the common elements of the problem and solution to match an input to the expected output, without needing an exact match. Our brains are very good at this: from a very early age we can recognise familiar faces from unfamiliar ones and quickly this progresses to identification in bad light, different angles, when the face is obscured. Getting machines to do the same has been notoriously difficult.
Traditionally, the algorithms and parameters would be tuned for the specific data set used to define the problem, hoping that this covered enough examples of difficult matches of inputs and outputs. The team would add extra checks until all the problems had been accounted for and the matching threshold adjusted. The problem here is that any new data could break the algorithm, introducing false positives or false negatives, and trying to resolve these can cause problems elsewhere. Raising or lowering the matching score can have huge repercussions across the whole data set. This was broadly overcome by making sure the data set was as large as possible and contained as many obscure examples as possible, but the refactor of the algorithm was slow and requires large amounts of testing time.
More recently, there has been a shift to machine learning where the algorithms are set up to work like the neurons in our brains: abstracting the data and adjusting what weight is given to which elements of the data in order to get the desired result. This means that, rather than a human meticulously fine tuning an algorithm, the decision is reduced to the number and level of “neural” nodes in the network. Getting this architecture correct can take a lot of time and will vary for different problems. Once done, the network can accept the data and will self-learn (self-tune) in order to get the best result for the data given.
There is still a lack of plasticity with this approach. Firstly, just as with traditional matching algorithms, unless there is continual retraining (as occurs in the human brain) the network will only be good at matching based on the problem it has been trained to solve – introducing new examples of difficult data will give potentially incorrect results. So, the system must have a continual feedback approach where incorrect results are identified and fed back into the system. Many machine learning projects have co-opted humans into this process: double captchas, “click here if we’ve got it wrong” etc. The latest Flickr system is a great example of this:
The new tools work by using “convolutional neural networks”, or computers that act like human brains. That means that even when it is wrong, it will often be in an understandable way — mistaking a push bike for a motorcycle, for instance — and that when it gets it wrong it will learn from user feedback.
The second problem is far less obvious – even if there is plasticity in weightings and accounting for user feedback, there is still the problem that the network itself cannot adjust its own architecture.
If you set up an artificial network that is complex and give it a problem that is simple, it is possible to end up with a system that gives you far too many errors and cannot improve. Similarly, creating a network that is too simple for the problem will also give you a system that cannot improve. So how do you know where the right level is? Does this architecture remain static even with the user feedback?
Although the neural connections in the human brain become fewer as we age, the immense number of connections that remain allow us to co-opt extra neurons to any problem that we need to solve – we can learn to apply ourselves to draw the correct result from new and difficult data. For deep learning truly to come of age, the network must be able to flex its own architecture in response to the complexity of the problem, and this is where there are some exciting possibilities.