Buy Book Buy

I’m looking for beta readers! The first draft is done. Next, I will improve the book so that it's a useful resource for you and an enjoyable read. If you’d like to give feedback, please read and comment in Google Docs

Chapter 8 Machine Learning – Learn Algorithms From Data

Machine learning is about learning from data with the goal of solving a task.
Solving the task well is more important than the (internal) validity of the model.
Machine learning is an alternative meta mindset to statistical modeling.
Supervised machine learning, unsupervised machine learning, reinforcement learning are specializations of the machine learning mindset.

A machine learner walks along the beach. He sees a bottle in the sand, opens it, and finds a genie who grants him a wish. “I want to be the best machine learner in the world,” he says. The genie nods. “Your wish is granted.” The machine learner disappears in a puff of smoke. In his place is a statistician.

It’s likely that you’ve used a machine learning product today. Maybe you have asked your smart assistant to read out your schedule for today, used a navigation app to get from A to B, or checked your mostly spam-free e-mails. In all of these applications, machine learning is used to make the product work: speech recognition, traffic jam prediction, and spam classification are just a few examples of what machine learning can do.

8.1 One or Many Mindsets?

Machine learning is the branch of artificial intelligence that deals with learning models directly from data. The computer improves at a given task through “experience” which means learning from data. The machine learning mindset doesn’t tell you how the computer should learn from data. For example, a machine learner may use random variables, but they don’t have to. They can work on a prediction model where it is clearly defined when the model is correct, or they can work on clustering where the model is harder to evaluate. The models can be neural networks, decision trees, density estimators, statistical models and many more. Given this wide range of tasks, and without strict guidelines on how the computer must learn: Can we really say that machine learning is a distinct mindset? To answer the question, let’s first look at more specific mindsets within machine learning. Machine learning is usually divided into supervised, unsupervised and reinforcement learning. Each of these subsets also represents a distinct modeling mindset: They involve a particular view of the world and of the relationship between the models and the world. The supervised learning mindset frames everything as a prediction problem and evaluates models by how well they perform using unseen data. In unsupervised learning the goal is to find patterns in the data. The reinforcement learning mindset views the model as an actor in a dynamic environment, guided by rewards. Deep learning is an add-on mindset that enables learning tasks end-to-end with neural networks. What are the commonalities between all these mindsets? Is there a unified machine learning mindset?

Machine learning is a subfield of artificial intelligence. Within machine learning, there is supervised, unsupervised and reinforcement learning. Deep learning overlaps with these 3.

FIGURE 8.1: Machine learning is a subfield of artificial intelligence. Within machine learning, there is supervised, unsupervised and reinforcement learning. Deep learning overlaps with these 3.

The machine learning mindset may not be as unified and principled as statistical modeling. But all machine learning approaches have a few things in common. Let’s take a look at what makes a good machine learning model and how these models relate to the real world.

8.2 Computational, Task-Driven, Externally Motivated

Like all modeling mindsets in this book, machine learning is based on learning models from data. As the name implies, machine learning focuses on the “machine”, meaning the computer. Machine learning is about the computer “learning” to solve tasks such as classification, regression, recommendation, translation, and clustering. How is this different from the work what statisticians, who also rely on computers? The motivation for using a computer differs between an archetypal statistician and an archetypal machine learner. The statistician uses the computer out of convenience and necessity. Modern statistics wouldn’t be possible without the computer. But the computer is not the starting point. The starting point is statistical theory. And the computer is only a tool to apply statistical theory to data.⁸ The machine learner, in contrast, starts with the computer. The machine learner says, “We have this new thing, the computer. How can we get it do intelligent and useful things?”

Machine learning can be understood as a meta-algorithm: An algorithm that uses data to produce machine learning models that are also algorithms. From a programmer’s point of view, machine learning is a paradigm shift: Machine learning is just a way of “learning” an algorithm from data, rather than programming it directly. ⁹

In contrast to more insight-driven model-encodes-hypothesis statistical modeling, machine learning is typically used to solve a task. The task may be language translation, image captioning, classification, and so on. And success of the model is measured by how well the task was solved using some type of metric. In regression and classification tasks, the machine learner measures the generalization error for new data. Specifically, for a classification model, this could be the accuracy with which the model assigns classes correctly in a new data set, or the F1 score. In clustering tasks, success metrics can measure how homogeneous the data points in the clusters are. This external focus is also reflected in the way machine learning research works: Researchers invent a new machine learning algorithm and show that it works by comparing it to other algorithms in some task benchmarks. The reason why the algorithm works well is often discovered in later scientific publications, if at all.

We can distinguish between external and intrinsic modeling motivation. The motivation and evaluation of a machine learning model is external, based on task performance. It’s like judging food based on how it tastes. Statistical modeling is intrinsically motivated. The rationale for constructing the model is important. It’s like judging food not only by how it tastes, but also by the cooking process: did the chef use the right ingredients? Was the cooking time appropriate, and so on.

8.3 Machine Learning As Statistical Learning

Some people refer to machine learning as “statistical learning”.¹⁰ Statistical learning is a way of understanding machine learning: through the lens of statistics and probability. Many machine learning courses start with conditional probabilities, bias and variance, and probability distributions – the lens and language of statistical learning. Researchers look at machine learning through statistical learning to derive properties of models and algorithms, ultimately improving our understanding of machine learning. So is machine learning just a rebranding of statistical modeling? Again, it helps to think in mindsets: we can distinguish the mindset of statistical modeling and the language of statistics. Statistical learning means applying the language of statistics to machine learning. However, statistical learning doesn’t mean that the modeling mindsets are the same. Consider these two approaches:

A statistician fits a hypothesis-driven logistic regression model and interprets the significance of a coefficient.
A machine learner tunes and benchmarks 10 algorithms, and ends up with a random forest to be used for classification.

One could describe both models as a conditional probability model and talk about statistical properties. Both models give the conditional probability for a class dependent on some other variables. So not only can we describe the models with statistical language, the models even target the same distributional conditional probability. But there is a big difference in the mindset: The statistician starts the analysis with a statistical hypothesis, interprets parameters, and so on. The machine learner evaluates the models differently, has the goal of classification, … Even if the machine learner would end up with exactly the same logistic regression model, the interpretation and use in practice would be different.

8.4 Strengths

Task-oriented and therefore pragmatic.
A job in machine learning potentially pays you lots of money.
A computer-oriented mindset in a computer-oriented world.
Machine learning is predestined for automating tasks and building digital products.

8.5 Limitations

Not as principled and hypothesis-driven as statistical modeling.
A confusing amount of approaches with different motivations and technical bases.
Model that solve tasks are not necessarily the best for insights. Methods of interpretable machine learning can alleviate this problem.
Often requires a lot of data and is computationally intensive.

There is a field called computational statistics, which is computer-oriented. But we are talking about archetypes of mindsets here. You can think of computational statistics as a statistical mindset that is slightly infused with the machine learning mindset.↩︎
I find it difficult to say that the machine learns by itself. Because machine learning also requires programming. You have to implement the learning part and all the glue code to integrate the final model into the product.↩︎
Even my favorite machine learning books is called “The Elements of Statistical Learning”.↩︎