Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Godhumor

(6,437 posts)
Thu Sep 3, 2015, 11:24 PM Sep 2015

Today in polling, the RCP mega-sample

So, as most people know, there are two primary poll aggregate websites out there--Pollster and Real Clear Politics. Each of them approach aggregating polls in different ways. Today, I will talk about RCP. If the interest is there, I will address Pollster at another time.

So first, the amazing thing about aggregating services like RCP is that their margin of error on results is actually zero. There are many reasons for this, but my favorite is that aggregation combines a lot of information collected from different sources which makes trying to predict a unifying population impossible. Keep that in mind when looking at aggregation sites.

Anyway, RCP creates its current polling results through a technique that has many names, but I like to refer to as mega-sampling. Mega-sampling is where you take individual poll samples and results and combine them together to make a new much larger sample. This gives a higher weight to results of polls with larger sample sizes while not negating the results of smaller polls. A mega-sample is usually created on a rolling time period (All polls in the last month) or a rolling number of polls (The last 5 polls regardless of when they occurred). As new polls are released, they're added into the mega-sample and, depending on the type of mega-sample, the oldest poll is removed.

So, practically what does this look like? Here is an example:

Poll A has a sample of 500 people. 80% like Clinton, 20% like Sanders (400 to 100)

Poll B has a sample of 1000 people. 70% like Clinton, 30% like Sanders (700 to 300)

Poll C has a sample of 1500 people. Candidates are split at 50% each (750 to 750)

The mega-sample is 3000 people. Clinton has 1850 vs Sanders' 1150. The aggregate percentages therefore are 62% to 38%.


Pretty simple, really. RCP, in my opinion, has two main strengths over Pollster. First, they don't include internet only polls which are notoriously plagued with sample bias. Second, they don't try to smooth trend lines like Pollster. Smoothing causes a lagging effect to directional shifts and is extremely annoying from a statistical point of view.

Let me know if you have any questions. Depending on response, I will try and tackle the Pollster methodology next week.

6 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
Today in polling, the RCP mega-sample (Original Post) Godhumor Sep 2015 OP
Bumpity bump bump n/t Godhumor Sep 2015 #1
It's all based on The Law Of Large Numbers DemocratSinceBirth Sep 2015 #2
Many thanks Godhumor kenn3d Sep 2015 #3
Good analysis mythology Sep 2015 #4
Wouldn't have been simpler to say: "they average the last four polls"? brooklynite Sep 2015 #5
Easier to show how a weighted average works than to assume everyone knows what that means Godhumor Sep 2015 #6

DemocratSinceBirth

(99,710 posts)
2. It's all based on The Law Of Large Numbers
Fri Sep 4, 2015, 09:51 AM
Sep 2015

As your combine samples the size of your sample increases and your margin of error decreases.

kenn3d

(486 posts)
3. Many thanks Godhumor
Tue Sep 22, 2015, 10:54 AM
Sep 2015

I see this post got very little notice, and I'm sorry I missed it earlier. I really appreciate your insights. Excellent, easy to understand explanation of a poorly understood subject.

I tend to agree with RCP on the potential bias in internet polling and think that some culling of the Pollster dataset tends to improve its accuracy. I hope other DUers will show some interest in the subject of aggregation of polls, and I'd be keen to read your post on the HuffPolster methodology to learn how it differs from RCP.

Thanks again

Godhumor

(6,437 posts)
6. Easier to show how a weighted average works than to assume everyone knows what that means
Tue Sep 22, 2015, 01:42 PM
Sep 2015

I think I kept the explanation pretty short to show how all polls are used to generate one big poll, honestly.

Latest Discussions»Retired Forums»2016 Postmortem»Today in polling, the RCP...