This post was prompted by a short piece in the Guardian that got me thinking about opinion polls. the orginal is here in case you’re interested.

Numbers from opinion polls are often quoted but we don’t normally give much thought to what they can tell us or how accurate they are. Sampling the shifting opinions of a large population is just the sort of thing that gets statisticians excited and I thought I’d look at some of the numbers behind polls.

Most opinion poles are pretty simple. They ask a fairly standard question, usually “if the election was held tommorrow who would you vote for?”. In the most simple case they then work out how many people to ask by assuming the population is large and the sample is small and and random then use the formula for standard error

Standard error = SQRT( (p(1-p))/n)) (where p is the probability of the question, and n is the sample size.

From this I can see that assuming 50% of my population supports labour and I ask 1000 whether they do, the error is:

Standard error = SQRT( (0.5(1-0.5))/1000)) = 0.015 or 1.5%

Actually, most polls are considerably more complicated than this, they add their own home brew of random and quota sampling, with some stats used to choose a sample size to keep the error down to about 3%. Quota sampling, since you ask, is where you try to match your sample with the general population, so if 30% of the population are over 50 then you try to include 30% over 50s in your sample.

It should quickly become fairly clear that you can’t actually take a sample that exactly matches the whole population in every respect. People who are in their 20s could also be civil servants and also be women. Therefore you need some mathmatical wizardry afterwards to make sure all those women you interviewed to get the numbers of people in their 20s up don’t screw up the gender balance. In practise this means that different groups are identified and weighted more or less heavily to match the general population.

To try to get a result that could actually tell us what’ll happen we’d then need to understand where these people are and that is tricky. In an election it is no good having 90% of the vote in some areas, that doesn’t increase the number of seats. We could do some research to find out how 35 year old council workers opinions vary geograpically but it would probably just be out of date and wrong.

To get around the problem that our sample doesn’t actually tell us who will win which seats, we approach things another way. We do the same flawed test regularly and look at the changes not the absolute numbers. The numbers may not be real but the but the changes are real shifts in national public opinion. In short, opinion polls give a good picture of a national response to politics but are a bad way to find out which seats will be won and lost.

Interesting, and good point.

One of the ways they try and tell you something more about the actual result is polling marginal seats more accurately, eg – http://www.telegraph.co.uk/news/election-2010/7402102/Labour-and-Conservatives-level-in-marginal-seats.html

Also, I haven’t read it very closely, but I know this blog http://www2.politicalbetting.com/ has a pretty sophisticated understanding of the way polling works, which it brings to bear in posts every now and then.

Cracking housing.

I have nothing to add other about mechanics, but from reading Andrew Marr’s excellent book on British journalism he says that continual polling, which obviously revs up a lot more during election season, serves a very important role – creating news. With journalism ever more office bound and with the quality of local journalism long since departed, polling and surveys are always used to satisfy the 24-hour news cycle.

Polling is an interesting one though, because it is a more rigorous and systematic way of judging the political climate, but talking to individuals about their opinions etc seems to make for a slightly deeper analysis.