Wednesday, July 6, 2016

Computer Hardware for Machine Learning

 A question that comes up from time to time is:

What hardware do I need to practice machine learning?

There was a time when I was a student when I was obsessed with more speed and more cores so I could run my algorithms faster and for longer. I have changed my perspective. Big hardware still matters, but only after you have considered a bunch of other factors.

machine learning hardware

TRS 80!
Photo by blakespot, some rights reserved.

Hardware Lessons

The lesson is, if you are just starting out, you’re hardware doesn’t matter. Focus on learning with small datasets that fit in memory, such as those from the UCI Machine Learning Repository.

Learn good experimental design and make sure you ask the right questions and challenge your intuitions by testing diverse algorithms and interpreting your results through the lens of statistical hypothesis testing.

Once hardware does start to matter and you really need lots of cores and a whole lot of RAM, rent it just-in-time for your carefully designed project or experiment.

More CPU! More RAM!

I was naive when I first stated in artificial intelligence and machine learning. I would use all the data that was available and run it through my algorithms. I would re-run models with minor tweets to parameters in an effort to improve the final score. I would run my models for days or weeks on end. I was obsessed.

This mainly stemmed from the fact that competitions got be interested in pushing my machine learning skills. Obsession can be good, you can learn a lot very quickly. But when misapplied, you can waste a lot of time.

I built my own machines in those days. I would update my CPU and RAM often. It was the early 2000s, before multicore was the clear path (to me) and even before GPUs where talked about much for non-graphics use (at least in my circles). I needed bigger and faster CPUs and I needed lots and lots of RAM. I even commandeered the PCs of housemates so that I could do more runs.

A little later whilst in grad school, I had access to a small cluster in the lab and proceeded to make good use of it. But things started to change and it started to matter less how much raw compute power I had available.

gpu machine learning

Getting serious with GPU hardware for machine learning.
Photo by wstryder, some rights reserved.

Results Are Wrong

The first step in my change was the discovery of good (any) experimental design. I discovered the tools of statistical hypothesis testing which allowed me to get an idea of whether one result really was significantly different (such as better) when compared to another result.

Suddenly, the fractional improvements I thought I was achieving were nothing more than statistical blips. This was an important change. I started to spend a lot more time thinking about the experimental design.

Questions Are Wrong

I shifted my obsessions to making sure I was asking good questions.

I now spend a lot of time up front loading in as many questions and variations on the questions as I can think of for a given problem. I want to make sure that when I run long compute jobs, that the results I get really matter. That they are going to impact the problem.

You can see this when I strongly advocate spending a lot of time defining your problem.

Intuitions Are Wrong

Good hypothesis testing exposes how little you think you know. We’ll it did for me and still does. I “knew” that this configuration of that algorithm was stable, reliable and good. Results when interpreted through the lens of statistical tests quickly taught me otherwise.

This shifted my thinking to be less reliable on my old intuitions and to rebuild my institution through the lens of statistically significant results.

Now, I don’t assume I know which algorithm or even which class of algorithm will do well on a given problem. I spot check a diverse set and let the data guide me in.

I also strongly advice careful consideration of test options and use of tools like the Weka experimenter that bake in hypothesis testing when interpreting results.

Best is Not Best

For some problems, the very best results are fragile.

I used to be big into non-linear function optimization (and associated competitions) and you could expend a huge amount of compute time on exploring (in retrospect, essentially enumerating!) search spaces and come up with structures or configurations that were marginally better than easily found solutions.

The thing is, the hard to find configurations were commonly very strange or exploited bugs or quirks in the domain or simulator. These solutions were good for competitions or for experiments because the numbers were better, but not necessarily viable for use in the domain or operations.

I see the same pattern in machine learning competitions. A quick and easily found solution is lower in a given performance measure, but is robust. Often, once you pour days, weeks, and months into tuning your models, you are building a fragile model of glass that is very much overfit to the training data and/or the leaderboard. Good for learning and for doing well in competitions, not necessarily usable in operations (for example, the Netflix Prize-Winning System was not Deployed).

machine learning data center

Machine Learning in a Data Center.
Photo by bandarji, some rights reserved.

Machine Learning Hardware

There are big data that require big hardware. Learning about big machine learning requires big data and big hardware.

On this site, I focus on beginners starting out in machine learning, who are much better off with small data on small hardware. Once you get enough of the machine learning, you can graduate to the bigger problems.

Today, I have an iMac i7 with a bunch of cores and 8 GB of RAM. It’s a run-of-the-mill workstation and does the job. I think that your workstation or laptop is good enough to get started in machine learning.

I do need bigger hardware on occasion, such as a competition or for my own personal satisfaction. On these occasions I rent cloud infrastructure, spin up some instances and run my models, then download the CSV predictions or whatever. It’s very cheap in time and dollars.

When it comes time for you to start practicing on big hardware with big data, rent it. Invest a little bit of money in your own education, design some careful experiments and rent a cluster to execute them.

What hardware do you practice machine learning on? Leave a comment and share your experiences.


5 DIY Python Functions for Data Cleaning

  Image by Author | Midjourney Data cleaning: whether you love it or hate it, you likely spend a lot of time doing it. It’s what we signed u...