Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

The 18181 Vote Probability Solved... but Not by me..

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » Archives » General Discussion (Through 2005) Donate to DU
 
TruthIsAll Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 08:04 PM
Original message
The 18181 Vote Probability Solved... but Not by me..
Edited on Mon Sep-01-03 08:28 PM by TruthIsAll
My original analysis was way off. However, in thinking about the problem over the weekend and doing some preliminary number-crunching in Excel, I came to the conclusion that the odds were not as remote as I had computed. I would have pursued the analysis further, but frankly, I was unsure how to proceed and put it aside. I got lazy and decided to google to see if the problem had already been solved, and sure enough, it was - by a fellow named Brian Hayes.

My revised assumptions and methods agreed with Brian, but I was unsure how to express the calculation elegantly as a formula. I even considered simulation; that is running a large number of trials - but never got to use that approach. Brian thought of that ALSO. At least
I was on the right track. I feel real good about THAT.

The key thing to keep in mind, DUers: Solving a complex problem involves TRIAL and ERROR. Insight after insight. Fine-tuning the analysis until one gets a sufficiently robust solution - which, after all, will always be an approximation-but a good one. What we are NOT looking for is proof; what we ARE looking for is a rational basis for rejecting coincidence.

Basically, here is how the problem was solved by Hayes:

Step 1: get the data for 30 Texas elections in Comal county.

Step 2: determine the range of Republican votes - 3100.

Step 3: Calculate the probability that at least THREE of the 30 county-wide elections will have the IDENTICAL NUMBER OF REPUBLICAN VOTES WITHIN THE 3100 RANGE, ASSUMING ALL POSSIBLE TRIPLICATES,ETC ARE EQUALLY LIKELY (A UNIFORM DISTRIBUTION). In this case the number was 18,181, but it could just as well be 18,182, etc. Of course, 18,181 reversed is still 18181 (a palindrome)- but that is NOT going to be an issue here.

The Bottom line result: The odds of 3 or more elections resulting in the SAME number of votes is just 2500 to 1 - a long shot, but not THAT long.

Thus, we cannot assume, based on these results alone, that the voting machines were rigged.

On the other hand, if the odds were 1 Million to 1, one would be stretching it to assume the occurrence a coincidence.

................................................................
A coincidence problem.
Brian Hayes
A friend who worries about voting fraud sent me a note about a recent election in Texas, where three winning Republicans all turned up with the same number of votes: 18181. "What's the probability of that happening by chance?" he asked.

Always a good question, of course. I asked him for his own estimate of the odds. He proposed that the probability is (1/18181)^3, which puts the event well beyond the one-in-a-trillion threshold. I disagree with this estimate, but before we can zero in on a better one, we need more facts. First of all, the election did happen. When I Googled for the pleasantly palindromic numeral "18181" the other day, the search engine reported 25,200 references on the Web. For example, there was a document titled "Projected Cash Flow for 2000 for Mrs. Nettie Worth, Rodeo Ranch, Wildparty, Kansas," which mentioned 18181 several times in connection with beef calves. (Nettie Worth? Now what's the probability of that?) Poking around a bit more, I learned that 18181 is the German postal code for the resort village of Graal Müritz in western Pomerania, and that the address of the Yorba Linda Library in California is 18181 Imperial Highway.

But among these distractions I also found numerous pointers to news stories about the November 5, 2002, election in Comal County, Texas, which is just northeast of San Antonio (county seat, New Braunfels). Eventually I came to the web site operated by the county itself. The results of the 2002 general election are posted at this URL: Here is a slightly condensed table of the vote totals. I have included only county-wide contests, and I've excluded a constitutional amendment where the votes for and against were not identified by party. The three "suspicious" 18181 totals are highlighted in the table below. The total number of ballots cast was 24362.

Race Republican Democrat Other Total
1. U.S .Senator 18156 5696 350 24202
2. U.S .Rep. District 21 19066 4627 371 24064
3. Governor of Texas 18558 5047 550 24155
4. Lieutenant Governor 16504 7186 477 24167
5. Attorney General 17935 5498 576 24009
6. Comptroller 19601 3962 534 24097
7. Commissioner of L and Office 17328 5129 1144 23601
8. Commissioner of Agriculture 18259 4635 925 23819
9. Railroad Commissioner 17166 5675 784 23625
10. Chief Justice Supreme Court 18051 5011 530 23592
11. Justice Supreme Court 1 17456 5387 653 23496
12. Justice Supreme Court 2 17860 5181 391 23432
13. Justice Supreme Court 3 17894 5392 * 23286
14. Justice Supreme Court 4 17175 6166 * 23341
15. Judge Criminal Appeals 1 17778 4762 821 23361
16. Judge Criminal Appeals 2 18045 5221 * 23266
17. Judge Criminal Appeals 3 18301 4604 416 23321
18. Board of EducationDist.5 17089 5683 513 23285
19. State Senator District 25 18181 4988 723 23892
20. State Rep.District 73 18181 5303 * 23484
21. Chief Justice 3rd District 19261 * * 19261
22. Judge 207th District 19342 * * 19342
23. Judge 274th District * * 19348 19348
24. Criminal District Attorney * * 19315 19315
25. County Judge 18181 5547 * 23728
26. Judge County Courtat Law 19345 * * 19345
27. District Clerk 19311 * * 19311
28. County Clerk 19554 * * 19554
29. County Treasurer 19306 * * 19306
30. County Surveyor 19229 * * 19229




Following my friend's line of reasoning, one might well argue that the odds against this particular outcome are even more extreme than he suggested. In principle, the Republican candidates in races 19, 20 and 25 could each have received any number of votes between 0 and 24362. Thus the relevant probability is not (1/18181)3 but (1/24363) 3, which works out to 6.9 x 10-14. This is the probability of seeing any specific triplet of vote totals in those three races, on the assumption that the totals are independent random variables distributed uniformly over the entire interval of possible outcomes. That last assumption is rather dubious, and I'll return to it below.

More important, however, the probability that three specific candidates receive a specific number of votes is not what we really want to calculate. Would it be any less remarkable if three different winners had all scored 18181? Or, if three candidates all received the same number of votes, but the number was something other than 18181? What we have here is a "birthday problem," analogous to the classic exercise of calculating the probability that some pair of people in a group share the same birthday. The textbook approach to the birthday problem is to work backwards: First compute the probability that all the birthdays are different, then subtract this result from 1 to get the probability of at least one match. This method is easy and lucid.

Unfortunately, it's not immediately clear how to extend it to the case of three birthdays in common. For the Comal County election coincidence, I reluctantly resorted to frontward reasoning. For the moment, let's go along with the fiction that each Republican candidate had an equal chance of receiving any number of votes between 0 and 24362. Then the number of possible election outcomes (considering the Republican votes only) is 24363^30. This is the denominator of the probability. For the numerator, we need to count how many of those cases include at least one trio of identical tallies. We've already seen one way this could happen, namely with the candidates in races 19, 20 and 25 having 18181 votes each. Holding these results fixed, there are no constraints at all on the other 27 races, and so there are 24363^27 ways of achieving this outcome. But in fact we don't insist that the vote in the three coincidence races be 18181; it could be any number in the allowed range, so that we need to multiply the numerator by another factor of 24363. Thus the probability that races 19, 20 and 25 will all have the same total is 24363^28 / 24363^30.

Finally, we note that there's nothing special about the specific races 19, 20 and 25; we want the probability that any three totals are equal. How many ways can we choose three races from among the 30? In 30-choose-3 ways, of course. This number is 4060, and so the probability that at least three Republicans in Comal County would have the same vote totals on election night is:

4060 x 2436328/24363^30 = 6.84 x 10-6

We are down below the one-in-a-million level. At this point the most doubtful part of the analysis is the assumption that the votes are uniformly distributed across the range from 0 to 24362. The true distribution is unknowable, but surely we can make a better estimate than that. All the actual votes for Republican candidates lie in the interval from 16504 to 19601, a range that encompasses 3098 possibilities. Suppose we round this up to 3100 and assume -- or pretend -- that each of the 3100 totals is equally likely. Then the revised probability estimate becomes:

4060 x 310028/310030 = 4.22 x 10-4

In other words, the odds against such a three-way coincidence are somewhere near 2500 to 1. Note that included within this estimate are cases with more than just a trio of identical votes, such as four totals that are all the same, or a "full house" result of three-of-a-kind plus a pair. But those further coincidences are unlikely enough that they don't make much difference. The probability of exactly one triplet (and all other vote totals distinct) is 3.67 x 10-4.

As for my friend who worries about election tampering -- did this line of argument put his mind at ease? He replied by asking if I had properly accounted for the ingenuity of those who fix elections. If they are able to determine the outcome, could they not also arrange to make it look statistically acceptable? The question deserves to be taken seriously. According to my analysis, the narrower the range of vote totals, the less suspicious is the appearance of an identical triplet. So should we rest easy about such coincidences if all the winning totals lie within a range of, say, 100 votes? Suppose you have been appointed Rigger of Elections in Comal County. Because of technical limitations, you cannot specify the exact number of votes that each candidate will receive, but you can set the mean and the variance of the normal distribution from which the vote totals will be selected at random. Your job is to ensure that all of your party's candidates win, without arousing the suspicions of the public. What is the optimal strategy?

I need to end this note with a confession. The analysis given above was not my first attempt to calculate the probability of a three-way coincidence. I had tried several other approaches, and each time got a different answer. So what makes me think the answer given here is the right one? Simple. I kept trying until I got a result in agreement with a computer simulation. I'm not proud of this method of doing mathematics. I would much prefer to be one of those people with an unerring Gaussian instinct for the right way to solve a problem. But it won't do to pretend. So what excuse can I make for myself? Do I have more faith in the computer and its pseudorandom number generator than I have in mathematics? I would prefer to put it this way: I have more faith in the laws of probability than in my own ability to reason accurately with them.


Editor's note: The astute reader will notice that Brian added the probabilities for each triplet to get the probability that at least one triple occurs. This is not quite correct since these events are not mutually exclusive. So what he is actually computing is the expected number of triples. In the classical birthday problem, if you ask for the number that will make the expected number of birthday-coincidences greater than 1/2 the answer is 20 which is less than the number 23 required to make the probability greater than 1/2 for 2 people to have the same birthday. See Chance News 6.13 for an occasion when this difference was important.When the number of birthdays is large, these two numbers become very close so we can expect Brian's calculation not to be affected very much by this.

The correct calculation for a match of three or more is not, in principle, difficult. As Brian said, the probability of at least one pair with the same birthday is computed by counting the number with no pair and subtracting this from 1. For a match of 3 we have to subtract also the probability that at least one pair have the same birthday but no three people do. This is not so easy and that is probably why Brian had a problem with what he calls the backward method. You can find a nice discussion of the result of this computation at the Matchcad library. If we use the formula given here to compute the probability of a match of 3 or more with 3100 possible birthdays, we get 6.83-6 as compared to Brian's 6.84*10 -6. So much our quibble!



--------------------------------------------------------------------------------

Bill Montante asked for comments on his definition of the word "chance." You can send them directly to him at william.m.montante@marsh.com but please also send them to us at jlsnell@dartmouth.edu since I guess we should try to decide what it means also.

Defining Chance
Bill Montante
Printer Friendly | Permalink |  | Top
Jolene Donating Member (322 posts) Send PM | Profile | Ignore Mon Sep-01-03 08:30 PM
Response to Original message
1. Good, now what are the chances
of it happening only in TX, and only that time?
Printer Friendly | Permalink |  | Top
 
TruthIsAll Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 08:41 PM
Response to Reply #1
3. Good point. Texas had just installed touch screen machines, I believe..
Edited on Mon Sep-01-03 08:45 PM by TruthIsAll
But what would cause this result? Is this just a clue? Remember 18181 cropped up twice in other states.

Now, if FIVE elections out of 30 turned up 18181, we might have something.
Printer Friendly | Permalink |  | Top
 
unblock Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 08:50 PM
Response to Reply #3
5. it was FIVE?!?
wow, i didn't know this.

anyway, one would have to redo the study to include the elections in all 50 states to see the odds of FIVE elections matching exactly.
Printer Friendly | Permalink |  | Top
 
Cheswick2.0 Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 09:09 PM
Response to Reply #3
8. and did those other 2 states have the same electronic voting?
?
Printer Friendly | Permalink |  | Top
 
unblock Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 08:49 PM
Response to Reply #1
4. from a statistical perspective,
these facts were already taken into account, he looked at the texas counties only, and looked only at one particular election year.

now, what is significant is this: what are the odds that a 2500 to 1 shot would just happen to come in at a time when election rigging and ballot tampering concerns are the highest in years.

this wasn't just another randomly chosen election cycle.
Printer Friendly | Permalink |  | Top
 
punpirate Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 08:30 PM
Response to Original message
2. I still think odds of 2500 to 1 are significant....
Edited on Mon Sep-01-03 08:35 PM by punpirate
If it had been one in eighty, one in a hundred-fifty, that wouldn't be pushing beyond reasonable possibility. Three orders of magnitude and change still makes me wonder.

On edit, I should add that I think this way because of electoral uniqueness of place and candidates--after all, when one is thinking about odds, it's about similar instances (after all, using the birthday sample analogy, one doesn't apply those same odds to a group when considering day of the week of birth).

I think of this as a set behaving according to demographics and party distribution for this place. The odds, in that case, are that this would happen in one out of 2500 Comal County elections.... It becomes more than just an oddity, in that case.

Cheers.
Printer Friendly | Permalink |  | Top
 
unblock Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 08:54 PM
Response to Original message
6. also, was '18181' really random?
one other point to consider is the randomness of the 18181 number.

you tested for 3 elections coming up with the same number, which could just as well have been 18182, or 9405 for that matter.

but what are the odds that these 3 elections would not only match each other, but also match a number that is a very plausible number for a programmer to use in either testing or in rigging.

if the number had been something like 9405, i would be content to look at statistical analysis only. the number being 18181 changes things.
Printer Friendly | Permalink |  | Top
 
gristy Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 08:56 PM
Response to Original message
7. I think this analysis is much better, too, TIA
but maybe the calculated odds are still a bit too low. I think that you have to consider the possibility of it happening in any state with electronic voting. The more states, the greater the chance. The only reason the odds of it happening in Texas is being studied is because it was observed there. But it might just as well as happened in some other state, and then you'd be off studying the odds of it happening in that state.
Printer Friendly | Permalink |  | Top
 
ferg Donating Member (873 posts) Send PM | Profile | Ignore Mon Sep-01-03 09:32 PM
Response to Reply #7
9. exactly
This is only one county. If there were 100 counties country-wide which people were looking at, the probability of one of them having a triple coincidence would be something like

1 - (2499/2500)^100 = 4% or 1 in 25.

Now, there might well have been fraud, but this coincidence doesn't show it.


Printer Friendly | Permalink |  | Top
 
Ivory_Tower Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 10:39 PM
Response to Reply #7
10. I agree (thanks, TIA)
Thanks, TIA, for the follow-up on this.

I thought the same thing, gristy, but I would think that you'd have to consider the possibility across all elections using electronic voting, not just all states. But either way, the odds aren't astronomical, which goes back to my original question from the initial thread, which was: Were there any similar cases from the last election where multiple districts had identical vote totals? I mean, were there any cases in, say, California where the winners in 3 different districts all got, say, 22,239 votes? It sounds like given enough elections there should be other instances of this.

Of course, I don't exactly have the time to analyze election results from around the country, so I'll have to settle for speculation for the time being. :)
Printer Friendly | Permalink |  | Top
 
punpirate Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 11:07 PM
Response to Reply #10
11. Yes, with enough counties...
... it becomes more likely. However, consider that every one of those counties is going to have its own distinct voting patterns and demographics.

Additionally, virtually every one is going to have different numbers of voters and races. To compare all counties in the country for one winning number of votes might be interesting, but statistically impossible to correlate because of those factors.

Cheers.
Printer Friendly | Permalink |  | Top
 
TruthIsAll Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 11:26 PM
Response to Original message
12. WTF!! Check this one out! 2 out of 4 reps, senators got 18181!!!
Edited on Mon Sep-01-03 11:52 PM by TruthIsAll
Eliminate all except state and U.S. reps, senators. These are the only polls in the group!

So what are the chances of 2 out of 4? Compared to 3 out of 30!
1.2487E-06 or 1 out of 800,833.

What are the chances of 3 out of 4 getting 181xx?



Race Republican Democrat Other Total
1. U.S. Senator 18156 5696 350 24202
2. U.S. Rep. District 21 19066 4627 371 24064
19. State Senator District 25 18181 4988 723 23892
20. State Rep.District 73 18181 5303 * 23484

The other one was the county judge
25. County Judge 18181 5547 * 23728


Printer Friendly | Permalink |  | Top
 
TruthIsAll Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Sep-02-03 08:44 PM
Response to Reply #12
14. Change of assumptions: 2 of 4 legislative races: 1 out of 1,601,666!!
Edited on Tue Sep-02-03 09:05 PM by TruthIsAll
Interested DUers: please comment. Does it make sense to revise the original problem: 3 out of 30 elections have the same result (18181) to the following:

Determine the probability that 2 out of 4 races for Senator or Representative will have the same number of Republican winning votes?

I have calculated this to be: 1 out of 1,601,666!

Step 1: get the data for 4 Texas legislative elections in Comal county.

Step 2: determine the range of Republican votes - 3100.

Step 3: Calculate the probability that at least TWO of the 4 county-wide elections will have the IDENTICAL NUMBER OF REPUBLICAN VOTES WITHIN THE 3100 RANGE, ASSUMING ALL POSSIBLE DUPLICATES ARE EQUALLY LIKELY (A UNIFORM DISTRIBUTION). In this case the number was 18,181, but it could just as well be 18,182, etc. Of course, 18,181 reversed is still 18181 (a palindrome)- but that is NOT going to be an issue here.

Eliminate all the races except the legislative ones for U.S. and Texas Reps (2) and Senators (2).

Race Republican Democrat Other Total
1. U.S. Senator 18156 5696 350 24202
2. U.S. Rep. District 21 19066 4627 371 24064
19. State Senator District 25 18181 4988 723 23892
20. State Rep. District 73 18181 5303 * 23484

The other race having 18181 votes was for a non-legislative office
25. County Judge 18181 5547 * 23728
Disregard this one.

Step1: Note that there are 6 combination pairs in which 2 out of 4 elections can be identical, as follows:
1,2
1,3
1,4
2,3
2,4
3,4

Step 2: Calculate the probability as before, but with the number of elections now equal to 4. Note that the probability of a single combination of two races having the same vote count over a possible range of 3100 outcomes is
1/(3100*3100) = 1/9,610,000

Step 3: This result can occur in 6 different ways, for each combination, so the probability = 6/(9,610,000)= 6.2435E-07

or 1 out of 1,601,666 !.

Compare this result to the original, where 3 out of 30 elections were the focus. The chances for that occurrence is just 1 out of 2,500. Quite a difference.

Maybe our suspicions were legitimate, after all.


Printer Friendly | Permalink |  | Top
 
TruthIsAll Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Sep-02-03 09:53 PM
Response to Reply #14
17. My bad. Disregard the previous post. My thanks to ferg.
Edited on Tue Sep-02-03 09:56 PM by TruthIsAll
DUer ferg has replied to the other thread I just started. He is 100% right. I was too hasty and made the mistake of not considering that the first result is arbitary, so we need only ONE match, not TWO:

Here is what ferg wrote, correcting my error:
The probability of a single combination of two races with the same vote count is 1/3100, because there are 3100 possible results for the first race.

Remember, there are 3100 possible values for the first election. So the odds of the second matching the first is 1/3100.

The odds of any two (or more) being the same is:

1 - (3099/3100)*(3098/3100)*(3097/3100)

Which is 1 in 517.


Printer Friendly | Permalink |  | Top
 
ConsAreLiars Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Sep-01-03 11:39 PM
Response to Original message
13. Good work TIA
You have done extraordinary work for DU readers on a number of subjects. I'm always impressed by your talent, and I've challenged you on some of your probability calculations only because it the one area where your conclusions were at odds with your nick and violated your usual standards.

You may not be aware that Pobeka used simulation to reach the following very similar conclusion:

"That means the odds of getting 3 identical vote totals is about 1430 to 1.

"When you take into account all of the counties across the nation with elections, over all the years elections have been held, this doesn't seem like an anomoly. It just stands out because you have hindsight."
http://www.blackboxvoting.org/htdocs/dcforum/DCForumID11/7.html

There is plenty of evidence to support the view that computerized systems could be, would be, and have been used to give phony totals. If we had access to all the data from all the elections (it's rather disturbing that we do not!) we might discover anomolies, but the only assurance that the machine counts reflect the voters' choices is with a voter verified paper ballot and random auditing of the machine numbers to assure their validity.
Printer Friendly | Permalink |  | Top
 
Tansy_Gold Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Sep-02-03 09:03 PM
Response to Original message
15. I don't remember where I saw this. . . . .
Cannot claim to have come up with this myself.

A simple child's number=letter code makes 18181 read as
ahaha.
Printer Friendly | Permalink |  | Top
 
TruthIsAll Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Sep-02-03 09:07 PM
Response to Reply #15
16. AHAHA = 18181..UNBELIEVEABLE!
IS SOMEONE IS TRYING TO TELL US SOMETHING?
Printer Friendly | Permalink |  | Top
 
Tansy_Gold Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Sep-02-03 10:07 PM
Response to Reply #16
18. Scoop/Sludge Report #154
Printer Friendly | Permalink |  | Top
 
DU AdBot (1000+ posts) Click to send private message to this author Click to view 
this author's profile Click to add 
this author to your buddy list Click to add 
this author to your Ignore list Fri Apr 19th 2024, 04:29 PM
Response to Original message
Advertisements [?]
 Top

Home » Discuss » Archives » General Discussion (Through 2005) Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC