Faces in the crowd can make Big Data more valuable


Your ability to spot a weird face might be worth big money in this, the dawning era of Big Data. A statistical technique called “Chernoff faces” transforms mulivariate data — numbers that tend to make people feel numb — into quirky/goofy cartoon faces.

Oddball items in the data stand out. You’ll more likely to recognize an unusual face than you will an unusual number.

Here’s how to turn your data into Chernoff faces. Display each variable as a feature of the cartoon face. If you want to, say, look at the financial performance of 10,000 companies, have the computer display the data as 10,000 faces, rather than 10,000 sets of numbers. You might choose to have length of the nose show a company’s cash flow, width of the mouth be the retained earnings, angle of the left eyebrow be the inventory turnover, have the hair color indicate whether the company is a multinational, and so on.

The pictures you see here are a recent example. An Australian biomedical report [“Multivariate Visual Clustering of Single Nucleotide Polymorphisms and Clinical Predictors using Chernoff Faces,” Shalem Lee,  Sharon Lee, Gus Decker, and Claire Roberts, Proceedings of the Fifth Annual ASEARC Conference – Looking to the future, 2 – 3 February 2012, University of Wollongong, Australia] analyzes data that describe 100 pregnant patients.

In each patient’s Chernoff face, the face height indicates her age, the mouth width shows her body mass index, the hair style indicates whether she herself was born preterm, and so on.


Looking at a large collection of data-displayed-as-faces, it becomes easier to notice patterns and strangenesses.


Herman Chernoff, a statistics professor at Stanford, later at MIT, and then Harvard, created the technique in the early 1970s. At the time, it was considered expensive to print computer-generated cartoon faces: Chernoff estimated then that each face cost about 25 cents to produce. He predicted that printing computer-generated cartoon faces would some day become affordably cheap.

Chernoff introduced his innovation in a 1971 technical report for the Office of Naval Research, called “The Use of Faces to Represent Points in K-Dimensional Space Graphically.” A later version appeared in the Journal of the American Statistical Association [vol. 68, no. 342, 1973, pp. 361–8].

Major math/stats software packages (Mathematica, MATLAB, etc.) now include tools to transform your data into Chernoff Faces.

If you have a heap of Big Data (a lovely phrase, which seems to have a Big Variety of definitions), you may be going nuts trying to find the hard-to-find valuable nuggets, and to spot subtle-but-important patterns. Consider using Chernoff faces to help find them.

If you go to the Big Data Bootcamp at the Boston Convention Center next month, ask about Chernoff faces. The Big Data Bootcamp sounds educational. Parts of its official description promise enlightenment: “a fast paced, vendor agnostic, technical overview of the Big Data landscape.” Parts promise, well, Hadoopment: “Attendees will experience real Hadoop clusters and the latest Hadoop distributions.”

You can read more about Chernoff and his faces, in a little review we did a few years ago in the Annals of Improbable Research.

NOTE: It is not clear whether, or how well, Chernoff faces work for people who have prosopagnosia.

Marc Abrahams is the editor of the Annals of Improbable Research magazine and organizer of the Ig Nobel Prizes.
Follow Marc on Twitter