How Machines Learn

by Greg Satell

How Machines LearnIt’s fun to look at old pictures of Bill Gates from back when he was a boy genius.  Unlike Mark Zuckerberg, he arrived on the scene looking very much like the nerd he was; big glasses and a sheepish grin, like he’s just happy, albeit a bit embarrassed, to be invited to the party.

He’s grown up a lot since then.  Years of success and media training have given him a quiet confidence.  He speaks from the heart about issues he is devoted to, like education and eradicating malaria.

And technology has grown up with him.  The bulky green fonts and command lines have been replaced by far more natural interfaces.  Computers are now able to recognize speech, text and even gestures.  As they continue to learn, they will become intelligent enough for most human tasks, changing the nature of work, business and society.

A Petty Academic Squabble

It all began in the early 20th century with one of those obscure academic squabbles that usually don’t amount to much.  A mathematician and theologian named Pavel Nekrasov argued that since independent events follow the law of big numbers and social phenomena such as crime statistics do as well, then humans must have free will.

Andrei Markov, one of the great mathematicians of the day, thought Nekrasov’s argument was hogwash.  After all, he argued, just because independent variables follow a certain mathematical law doesn’t mean that conscious activity can’t do so as well.

To prove his point, he did a mathematical analysis of Eugene Onegin, Pushkin’s famous novel in verse, and showed that the combinations of vowels and consonants follow the law of big numbers as well.  A vowel, will most likely be followed by a consonant and vice versa, in proportions that become more stable as you analyze more text.

And so, Markov succeeded in showing that dependent variables could yield distinct probabilities.  It was the kind of interesting, but relatively useless insight that academics specialize in and it remained obscure for most of the 20th century.

Recently, however, Markov models have taken center stage in how machines learn.

Learning to Decipher Patterns

Today is a beautiful, sunny day and, chances are, it will be tomorrow as well.

You see weather, much like text, is a dependent system.  If it’s sunny today, it’s more likely to be sunny tomorrow, if it rains on Saturday, I shouldn’t bank on a nice time at the pool on Sunday either.

Brian Hayes (to whom I am indebted for the narrative above) gave an excellent example of how we can adapt Markov’s insights to predict the weather with this chart he included in a recent article in American Scientist. (Click to enlarge).

 Weather Markov Model

If you used this scheme to predict the weather it would be reasonably accurate. Professional forecasters often do apply some version of a Markov model in order to get a baseline and then incorporate other factors, such as barometric pressure, to improve accuracy.

Once you begin thinking about it, you start seeing Markov models everywhere.  What I’m doing now will affect what I’ll do next.  What I say now will affect what I say next. Human behavior is, when you get down to it, like one big Markov chain.  Both necessity and habit make us highly predictable.

What’s more, just like weather forecasters, we can augment Markov models by adding information as it comes in, adapting our analysis through a technique called Bayesian inference.  Interestingly, this is exactly what we humans learn to do as we mature, to adapt to the habits of others in our lives.

Infinite Monkeys Hard at Work

There is an old literary concept, called the Infinite Monkey Theorem, which states that if you had an infinite amount of monkeys pecking away an infinite amount of keyboards, producing masterpieces like Pushkin’s Eugene Onegin would be more a matter of curation than creation.

Today, with data centers running hundreds of thousands of processors which can perform millions of calculations per second, we are beginning to experience a real life version of the Infinite Monkey Theorem.  Companies like Narrative Science are able to produce coherent documents from raw data this way and Brian Hayes has built a rudimentary program that does a passable job.

And it’s not hard to see how the process can be reversed.  If mindless processors can be made to create patterns, they can learn how to recognize them as well.  More sophisticated forms of Markov models are what drive pattern recognition technologies such as Apple’s Siri and Microsoft’s Kinect.

The Education of an Algorithm

Recognizing patterns is one thing, understanding meaning is another.  A toddler begins to learn language by identifying phonemes – elemental units of language  – and eventually is able to form words.  However, it takes years of exposure to language for them to learn to talk and even more to be able to read.  Humans can spend a lifetime deciphering meaning in a particular field.

Computers, however, have far fewer limitations.  Their capacity, for practical purposes, is almost infinite (although somewhat constrained by budgetary concerns). Consequently, they can learn at superhuman speeds.  IBM’s Watson computer can reportedly analyze hundreds of millions of documents in seconds.

For example, researchers at IBM taught their algorithm to translate between French and English by exposing it to proceedings of the Canadian Parliament, which by law must be produced in both languages.  This allowed them to connect not just words, but entire phrases and even slang.  It would take a year for a human to sit through it all, but a computer can do it wihout breaking a sweat.

Others, such as Mattersight, a company that uses artificial intelligence to analyze and improve call center operations, uses a more human centered approach.  They have trained analysts check the computer’s work and teach it to improve over time.  Researchers at Cornell have recently developed algorithms that can learn by merely observing human behavior.

Much like a young Bill Gates, our machines are learning to be more human.

The New Learning Organization

For decades, management theorists have been talking about how organizations need to continually learn by eschewing the traditional command and control approach in favor of empowering their employees.

Today, as computers are beginning to perform legal discovery, making medical diagnoses and even evaluating creative work such as music and screenplays.  In other words, they are learning many of the same things that people do, except they do not get tired or sick, never ask for a raise and when they get too old to function effectively, their hardware can be replaced.

This of course, presents a dilemma.  How can organizations empower their people at the same time they outsourcing their jobs to algorithms and microchips?

The answer is this:  Effective professionals, rather than focusing on building skills to recognize patterns and take action, will need to focus on designing the curricula, to direct which patterns computers should focus on learning and to what ends their actions should serve.

Build a common language of innovation on your team

Wait! Before you go.

Choose how you want the latest innovation content delivered to you:

Greg Satell is an internationally recognized authority on Digital Strategy and Innovation. He consults and speaks in the areas of digital innovation, innovation management, digital marketing and publishing, as well as offshore web and app development. His blog is Digital Tonto and you can follow him on Twitter.

No comments

Leave a Reply