Demystifying the black box of machine learning- Part 1

by Jonny Jackson

What is machine learning?

Machine learning is a key discipline in the extraction of useful things from annoyingly large amounts of data. It is in the family of statistical analysis. While other methods involve manually-designed rules created by a human, machine learning methods involve algorithms that “discover” rules through maths.

In the real world, there are some things that we can measure and some things that we can’t. Things we can measure include: sizes and weights, colours and brightnesses, times, MRI scans of people’s heads, audio recordings of woodland animals, and today’s house prices in Blackpool. Things we can’t measure include: what someone is thinking, which animals are living in woodland, and house prices in Blackpool in the year 2072. The world does its thing, including having physical laws and complicated psychological entities (aka people) that continually change the state and composition of the world.

That all sounds rather complicated though, so scientists make simplified models in an attempt to approximately explain how the things we can measure can predict the things that we can’t. They use their experience (aka empirical evidence) and mathematical wit (aka theoretical evidence) to sculpt their models according to their best ability and then hope they behave on the whole like the real world does.

That also sounds rather complicated though, so machine learning scientists take a step outside of their personal experiences and instead let the models build themselves by exposing them to as much experience of the real world as they can in the form of data. Scientists spend most of their time gathering and preparing this data and mathematically formulating what the “right outcome” of a model is: it should obviously behave as much like the real world as possible. They then press run and hope for the best (in a rigorous, scientific manner of course).

Is machine learning better than other methods?

The machine learning process of “discovering” takes a long time, can be hard (often impossible) to interrogate, and can lead to very different results depending on the example data used. Under the right conditions, it can perform significantly better than human-designed models.

How can machine learning do better than human-designed models?

Human-designed models inherently need to be expressable by a person—for example, they might need to be written down as an equation or described in words—since that’s the way we think: symbolically (see Wittgenstein, Chomsky, and others). Machine learning avoids this constraint by minimising the human design involved (*) and hence allowing for much more sophisticated models than could be described by a person.

In the next instalment, we'll dive into how machine learning works and the phases of machine learning.