Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Godhumor

(6,437 posts)
Thu Sep 3, 2015, 12:38 AM Sep 2015

Let's Talk Polling: A primer on terminology, analysis and validity of polls

I was inspired to write this based on a reply I made in another thread.

So, quickly, a few words about me. I work in statistical analysis, so, while I am not a pollster myself, the concepts they work with are not only familiar to me but ones that I use daily. I am also a Clinton supporter, which, considering my post history, is probably not a surprise to many here. However, this is a post that is intended to be candidate-neutral and more of an educational guide. I may use the candidates in some fictitious examples, but I am not attempting to build one up in this post or tear others down. This is simply me wanting to share something I happen to know a great deal about.

That said, I am not going to go into super-specifics on any of the following concepts. In fact, I may even tell a few white lies to make the concept more understandable to those who are not in the daily grind of working with samples and populations.

All right, the disclaimers are out of the way. Let's get started.

Population Parameter(s)
All polling starts with establishing what the whole population looks like. A population parameter is an assumption the pollster must make to reach conclusions on a population from a sample. For example, let's pretend Iowa is 100% Democratic. A pollster may assume that the population will be 50-70% Moderate, 20-30% Liberal and 10-20% Very Liberal. How do they come to these assumptions? Generally, they use historical data or historical polls to establish what the population probably looks like. This can be an issue right from the start, for obvious reasons. One of the most criticized pollsters so far in the 2016 cycle is Quinnipac, because it appears their population parameter is based around turnout from the 2014 midterm elections (the Republican tsunami). This is an issue, as it ignores the difference in voter behavior in midterm years versus general election years.

Samples and Margin of Error (MOE)
So why do pollsters establish population parameters, anyway? Because, a randomly selected sample of participants should mirror the parameters of the population as a whole. The MOE establishes how much a sample may move within the population parameters (i.e. if the parameter says Very Liberal should be 10-20% of the population, and the sample shows 15%, then the very liberal portion of the sample could be off by as much as +/- 5%. By the way, this is a little white lie, but it is the easiest way to illustrate the concept.). A pollster will take into account all the subpopulation MOEs and establish an overall poll MOE that is generally reported out when the poll is released.

As I am sure everyone knows, sample size is one of the driving forces behind MOE, and it makes sense. The more people you talk to, the more likely you are to get a closer cross section of the population. However, sampling MOE is not linear (i.e. your MOE doesn't reduce by the same amount at 1000 people versus 2000 and 2000 versus 3000), and the law of diminishing returns quickly applies. Generally speaking, statisticians who work with samples (well, this statistician, at least) are happy with an absolute MOE of 4% or less.

Sampling done right makes it much cheaper to poll a population than to actually poll the entire population. And with a random sample, a well designed neutral poll and a sample size that leads to a smaller MOE, a pollster should be able to draw some pretty solid conclusions. However, there is one unfortunate part of polling that still needs to be discussed.

Confidence level or Intervals
A confidence level or interval (they're pretty much interchangeable terms) tells the public the chance that, through no fault of the pollster, the poll results are invalid. More specifically, it establishes the confidence level of the pollster that the random sample used for a poll is actually a valid cross-section of the population as a whole. Most pollsters work at the 95% confidence interval, which means there is a 1-in-20 chance that they drew a skunk of a sample. So why don't pollsters throw out polls that look fishy to them and draw a new sample? Because they are following a set methodology. If one was to throw back a sample that is suspected of being bad, that person has just introduced bias into the equation, and the next sample will not be random. If a pollster is confident in his or her methods, then, even if the poll looks funny to his or her eyes, they still release it in order to not indirectly influence poll results. Pollsters have accepted that they are just going to be flat-out wrong 5% of the time, and that is a necessary cost of doing business to keep the integrity of results strong. (Another by the way, this section was a gross oversimplification of confidence levels, but I think it worked as an introduction.)

So why can't pollsters just simply notify in the press release that some subpopulations are out of population alignment? A few reasons. First, it opens the entire poll up to questions on its validity. Second, and maybe a little off-putting, the pollster doesn't know they're out of alignment. This can happen either due to incorrect parameters to begin with or, more likely, the parameter itself is unknown. I won't get into it here, but, yes, it is possible to poll without knowing what the population looks like. Third, the pollster has adjusted results to fit into the population parameters.

Poll Adjustments
Pollsters have a number of tools to ensure results reported are aligned with established parameters. The first way is mathematical extrapolation of results. For example, let's say the poll only has 5% of respondents identify as very liberal. A pollster can assume that the ratio of support at 5% will mimic support at somewhere between 10% and 20%. Hopefully, you can see the issues with this method right up front (Smaller subsample sizes make each respondent count more than probable reality when adjusted to fit parameters.).

A second method to control the sample ratio is to simply only poll a certain number of people from each subpopulation and then no longer accept responses from that group. Again, it should be readily apparent why this method is problematic.

There are many other ways to adjust polls, but these are probably the two most well-known.

Bad Polls
So why do we have so much discussion on which polls are closest to reality? Shouldn't they all have similar results to each other while working within the same population? Well, no. See, the MOE and confidence level are not for the poll's validity versus real life; they are measuring the validity versus the polling parameters set by the pollster. In other words, the poll is being compared to the assumptions the pollster made about, say, Iowa and not the actual current population of Iowa. So bad polls start with bad assumptions. There should be no surprise in that.

However, a pollster can indirectly influence participant answers in many different ways. Here are just a couple:

- The pollster can ask loaded questions instead of impartial ones to generate a response. Like, for example, "Would you have a favorable or unfavorable opinion of Hillary Clinton if she passed confidential governmental secrets through an insecure server." I would not trust the general favorable/unfavorable percentages coming out of that poll.

- Question order can also influence voting. Seriously. Again let's pretend this was how a poll started off. "Before discussing candidate preferences, we wanted to ask your opinion on current events. 1) Do you feel Clinton has handled herself well in the Benghazi case? 2) Does it bother you that a presidential candidate has to answer questions on her email habits before Congress? 3) Which of the following candidates would you support..." In reality, this is done much more subtly by pollsters and can easily sway a certain percentage of respondents.

-------------------
So remember, all polls generally have a 5% chance of being junk, pollsters can influence respondents, assumptions are made before polling even begins and sample sizes have an upper limit on effectiveness. If you have any questions, feel free to post to the thread or PM me. Hope this helps. Oh yeah, also, if there is some weird grammar, misspellings or random jibberish, then I apologize. I am writing this well after midnight and am approaching an exhausted state.

14 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
Let's Talk Polling: A primer on terminology, analysis and validity of polls (Original Post) Godhumor Sep 2015 OP
You know, I am going to bump this once or twice Godhumor Sep 2015 #1
Thank you for all your work... N_E_1 for Tennis Sep 2015 #2
This message was self-deleted by its author DemocratSinceBirth Sep 2015 #3
Last bump before letting this sink n/t Godhumor Sep 2015 #4
Excellent post. This helps a lot in how we think about polls. K&R leftofcool Sep 2015 #5
great post dsc Sep 2015 #6
Mathematically, no there isn't. HerbChestnut Sep 2015 #7
yea there is dsc Sep 2015 #9
My point... HerbChestnut Sep 2015 #10
that is a valid point dsc Sep 2015 #11
Great post. HerbChestnut Sep 2015 #8
Ok, I'll kick this back up once more. kenn3d Sep 2015 #12
Seems like a good time to bump this n/t Godhumor Oct 2015 #13
Good Post Godhumor K&R nt fleabiscuit Dec 2015 #14

Godhumor

(6,437 posts)
1. You know, I am going to bump this once or twice
Thu Sep 3, 2015, 09:22 AM
Sep 2015

Last edited Thu Sep 3, 2015, 10:16 AM - Edit history (1)

Spent a bit of time working on it, so I am not letting it sink until I am sure the interest isn't there.

N_E_1 for Tennis

(9,721 posts)
2. Thank you for all your work...
Thu Sep 3, 2015, 09:34 AM
Sep 2015

Very interesting, especially the order of questions. Guess it could really set the tone

Thanks again.

Response to Godhumor (Original post)

dsc

(52,160 posts)
6. great post
Fri Sep 4, 2015, 08:58 AM
Sep 2015

I just want to clarify your last paragraph a bit. The 5% chance of being junk is even if the pollster does everything correctly. If they engage in some of the issues you list after that then the chance of being junk goes up from the 5%.

dsc

(52,160 posts)
9. yea there is
Fri Sep 4, 2015, 09:16 AM
Sep 2015

The 5% refers to the error rate of polls when the sample is random etc. Some random samples end up being just plain bad. The other stuff he listed are errors not accounted for by that 5% such as question order etc.

 

HerbChestnut

(3,649 posts)
10. My point...
Fri Sep 4, 2015, 01:05 PM
Sep 2015

was that you can't account for that. Sure, in reality you're probably right, but there's no way to calculate it.

kenn3d

(486 posts)
12. Ok, I'll kick this back up once more.
Tue Sep 22, 2015, 02:34 PM
Sep 2015

Your efforts are very much appreciated Godhumor.

There's a ton of total foolishness in the poll related threads here on DU GDP, and many posts repeatedly express a near complete misunderstanding of what polls can tell us. Many members are so invested in their candidate that the only poll they'll accept is one that appears to say they're winning (even when it doesn't say that at all). The pollsters and their motives are maligned, the results are misinterpreted and twisted (often beyond recognition). And the partisans revel in taunts and toldyasos ad nauseum.

Actually learning how polls are conducted, the strengths and weaknesses of various polling methods, and the confidence and fallibility they necessarily bring with their reported results, is good for us. It fosters better understanding, and reduces unnecessary frustration.


Thanks for adding a bit of sanity to the topic. I hope a few more folks will read your OP this time through.

Latest Discussions»Retired Forums»2016 Postmortem»Let's Talk Polling: A pri...