District of Corruption

The Bell Curve Doth Not Toll


On Saturday, December 10, 20,000 people gathered for a mass rally in Moscow; they were protesting alleged fraud in the December 4 parliamentary elections. Some of the allegations were mathematical in nature.

The Washington Post reports: 

"Obviously, he [Putin] doesn’t agree with Gauss,” one commenter wrote, referring to pioneering mathematician Carl Friedrich Gauss, who lived 200 years ago. Disenchanted Russians argue that United Russia’s reported election results are so improbable as to violate Gauss’s groundbreaking work on statistics.

The article does not say what exactly the problem with the election result is and what work of Gauss is relevant. It only says that he lived 200 years ago. This should be enough to trigger an alert, as science has advanced a bit over the past 200 years . . .

I decided to take a closer look at the allegations. 

In the piece entitled “Mathematics against Election Committee: Gauss against Churov [the head of the committee],” a blogger complains that the distribution of the percentage of the vote for the United Russia Party among election precincts is “non-Gaussian.” This, he writes, is evidence of election fraud because Gaussian distribution arises “always . . . in every case, when there is not one factor, but many”:

Whatever is measured in large quantities. Make a plot of how many millions of men in the country have the height of 165, 170, 175 centimeters and so on—and you will also get a symmetric bell curve with the top corresponding to the most typical height in the country. 

If you do not know what the Gaussian distribution is, the blogger gave a good example: distribution of people by height. Most men are of average height; the greater the deviation from the average, the smaller the number of men. There are some very tall people, but none of them is twice the average height.

The heights of people are definitely Gaussian-distributed, but what about incomes? They are influenced by many factors and measured in large quantities. However, they are distributed as if most people were 170 centimeters tall, but often you would meet a three-meter guy. Rarely you would encounter a five-meter man, more rarely—a ten-meter one. Sometime, from a distance, you would see a hundred-meter person. And there would be several hundred-kilometer chaps in the country. This distribution is very far from Gaussian, but for some reason it does not attract the wrath of our mathematicians, or our Berezovskies


The banner says “We don’t trust Churov [the head of Election committee]! We trust Gauss."

In a recent article in Significance, I argued that since there are so many distributions in nature and society that are not Gaussian, there is no reason to believe that vote distributions must be. To support this conclusion, I gave a mathematical model, which produces a non-Gaussian distribution of the percent of votes for a party among election precincts. 

A commentator challenged me to show non-Gaussian distributions in U.S. elections.

I took up the challenge.

I decided to look at 2008 Republican primaries (mostly because this was the last election I voted in). The primaries differ from national elections, as different states hold votes on different dates. Moreover, some candidate drop out during the process. All of this complicates the analysis. But 21 states do hold elections on the same day, “Super Tuesday.” Since almost half of the nation votes on this day, the elections function like a national primary.

The most complete elections results database I could find is Dave Leip's “Atlas of U.S. Presidential Elections.” It does not have precinct-level results for the election in question, but its results are listed by county for 19 out of 21 Super Tuesday states (the exceptions being Alaska and North Dakota). I computed the distribution of the percentage of the vote for four major candidates among 1,162 counties.

As you can see in Figure 1, Mike Huckabee's distribution has two equal peaks at 15 and 35 percent. The drop between peaks is half the peaks' height. John McCain's distribution has one peak at 35 percent and another at 80 percent. Between these peaks, the distribution drops almost to zero. Mitt Romney has one peak at 25 percent and another at 90 percent. Ron Paul has an exponential distribution.

Apparently, American elections also “violate Gauss’s groundbreaking work on statistics.” (The least you could say is that these distributions are no more “Gaussian” than the distributions observed in Russian elections ( Figure 2).) 

Figure 1. The results of the 2008 Republican presidential primaries in 19 Super Tuesday states. The distribution of the percentage of the votes for four major candidates among 1,162 counties. I used a 5 percent bin. All counties with the vote for the candidate of no more than 5 percent went to the first “5 percent” bin. Those with the vote of more than 5 percent but no more than 10 percent went to the second “10 percent” bin, and so on.

Figure 2. 2011 Russian parliamentary elections. The distribution of the percentage of the votes for parties among election precincts. The x-axis shows percentage of votes for the party; the y-axis—the number of precincts. The bin is 0.5 percent. The Brown line is for United Russia, Red—Communist party, Green—Russian United Democratic Party "Yabloko," Black—Liberal Democratic Party, Blue—A Just Russia. This picture traveled across hundreds of blogs during the past couple of weeks.

Another issue brought up by the bloggers is that there are spurious peaks at 50 percent and other multiples of 5 (see Figure 2). But when you examine precinct-level results, you notice that in many precincts, very few people voted, as little as one person in some of them (!). When two, four, six, eight, or 10 people vote, you can easily get a result of 50 percent, and never 49 or 51 percent.

The database mentioned above has precinct-level statistics for the 2000 U.S. presidential elections in California. In Figure 3, I plotted the distributions of the percentages of the vote among election precincts. You can see obvious peaks at 50 percent in both Al Gore's and George W. Bush's distributions. There are also less pronounced peaks at 20, 25, 60, and 75 percent. However, there are other obvious peaks at 34 percent (1/3) and 67 percent (2/3). These, obviously came from the precincts where three (or another small divisible of three) people voted. (I do not see such peaks in Russian election results. This problem requires additional study.)

Note also, that the distributions in Figure 3 are far from Gaussian. If there is something resembling a bell curve in Figure 3, this is a combined curve made up of Gore's distribution below 50 percent and Bush's distribution above 50 percent. If I use the same methods of “proof” used by the bloggers to allege large-scale fraud in the Russian elections, I can “prove” that Gore stole millions of votes from Bush in California! Surely, the Washington Post would want to report on this! Of course, such “proofs” are nonsense, since the distributions should not necessarily be Gaussian in the first place.

It's worth pointing out that my study does not prove that the recent Russian elections were honest. It does, however, prove that in making the case that the Russian elections were fake, the bloggers used fake math.

Figure 3. Results of 2000 presidential elections in California. The distribution of the percentage of the vote for three main candidates among 21,970 precincts. I used a 1 percent bin.