Over the last 30 years or so, probably longer, the size and distribution of the informal vote that we see at every election can be largely explained by a handful of variables – with the election on Saturday being no exception.

The first of these variables is ballot length – the number of candidates we see standing in each electorate. The mechanics of this one are fairly obvious, where a given voter is more likely to make a mistake filling out the ballot when there are say, 13 candidates on the ballot paper then they would if there were only, say, 3 candidates on the ballot paper. The more candidates there are on the ballot, the more human input there has to be to fill in the  ballot, meaning the number of mistakes increases. On Saturday’s election, the number of candidates on ballots ranged from a lowly 3 through to a high of 11.

The next variable is a language and communications one – the proportion of each electorate that speaks English poorly or not at all. The mechanics of this one are fairly obvious as well, where folks that may struggle to understand the instructions for filling out the ballot because of language barriers, generally tend to make more mistakes, deeming the ballot paper informal. The proportion of the electorate that speaks English poorly or not at all ranges from a low of  0.1% in the seat of Bendigo in Victoria through to a high of 15.9% in the seat of Fowler in NSW.

The third variable is a political systems variable – Optional Preferential Voting operating for state government elections. What we see happen here is that voters in states like NSW and Qld which have optional preferential voting at the state level, (too) regularly mark their ballots in the Federal election as if optional preferential voting was also applying there. So we witness relatively large numbers of ballot papers in NSW and Qld with just a “1” placed next to a candidate and no further preferences marked on the paper, deeming the ballot informal.

If we use the AEC data at the seat level, we can get candidate numbers and the size of the informal vote for each electorate, while census data can tell us the proportion of each electorate that speaks English poorly or not at well. On the optional preferential voting variable, we can use a dummy variable to denote electorates in states that have OPV operating at the state level (NSW and Qld). With this data, we can do a bit of regression work to explore it.

What we do is regress the informal vote against the other three variables to see how much each variable affects the size of the informal vote. For the stats types, the stats output comes in like this (click to expand):

informaleq1

For the non-stats types, the stuff above looks much more complicated than it actually is. This is what the results actually mean:

informaleq2

NES (non-english speaking background) is the census estimate of the proportion of the electorate that speaks English poorly or not at all.  The regression results tell us that for every 1% increase in that proportion of people in a given electorate that speak English poorly or not at all, the average increase in the size of the informal vote in that given electorate is 0.43%. So a 10% increase in the proportion speaking English poorly or not at all would lead, on average, to an increase in the informal vote of 4.3%. It’s a fairly powerful relationship, which we can see by running a simple scatter plot for all 150 electorates:

informalscatter1

The Candidate Number results tell us that for every additional candidate on the ballot paper, the size of the informal vote jumps by over 1 tenth of a percent – so having 5 more candidates would increase, on average, the size of the informal vote by 0.65%, or well over half a percent.

Finally, if a state had optional preferential voting operating at the state government level, the size of the informal vote in those states (holding other variables constant) is, on average, 1.65 – meaning that electorates in those states have an informal vote 1.65% higher on average than electorates on states without OPV operating at the state government level.

These variables are all statistically significant and together explain around 54% of the variation we saw in the informal vote on Saturday.

Finally, the “C” in the stats output is the Constant – telling us what the average level of informal vote would be were these other 3 variables all theoretically zero, in that OPV didn’t operate, everyone spoke English well and there were no candidates on the ballot. Some of that might sound a bit silly, but the Constant gives us an idea of the generic level of informal vote that cannot be explained by the three variables.

It’s also worth looking at how the informal has changed since the last election. First up, the broad changes in state and national averages :

informalbystate

The informal vote everywhere went up, with the ACT leading the way with a 2.4% increase while WA showed the smallest increase of exactly 1%. More interesting though is if we compare the regression results above with the same regression undertaken on the 2007 results. For the non stats types, the numbers above (0.43 for NES, 0.13 for Candidates and 1.65 for OPV) are called regression coefficients. We can compare how they’ve changed from the last election:

regressionchange

The NES coefficient increased from 0.308 in 2007 to 0.433 in 2010, meaning that speaking English poorly or not at all had a slightly larger effect this election than in 2007 – which is interesting to think about. It wasn’t a large increase, so it’s probably nothing to lose too much sleep over, but it’s still interesting non-the-less and probably large enough to warrant some further analysis.

On ballot length this election,  the size of the impact that candidate numbers had on the size of the informal vote decreased significantly. There are two good explanations for this. Firstly, the number of total candidates standing in this election was much lower than in 2007. Comparing the two we get:

candidates

The other factor is a bit more complicated, in that the effect of ballot length on the informal vote is only approximately linear. It is actually more accurate to spec it out in the regression equation as candidate number squared. This is because the marginal complexity of each additional candidate being added to the ballot paper is non constant. The informal vote increases more when the candidate number jumps from, say, 8 to 9 then it does when it increases from, say, 3 to 4.

So having a smaller number of candidates this election easily explains the significant drop in the power of the effect of ballot length on the size of the informal vote.

Some people disagree with this non-linear specification of ballot length – but it’s about 5 elections in a row now where non-linear specification of ballot length generally produced not only more robust results, but one with higher explanatory power as well. We’ll do a Nerdy Sunday post on this when the election results are finalised to get deeper into it, as well as how the NES variable actually behaves in the same way – which has interesting things to say about community dynamics.

On the OPV variable, this election saw a sharp increase in the average size of the informal vote in electorates that have OPV at the state level – up from an average increase of 0.94% per electorate in 2007 to a large 1.65% this election.

That is a large jump.

We really need to think about the interaction that OPV at the state level has on the size of the informal vote on our federal elections. 1.65% is enough to change the results in a dozen seats and it’s been clear for a while that the different regimes at different levels, increases the size of the informal vote to the point where it probably makes a material difference on federal election results – in an election like we had on Saturday, it is probably large enough that it may well have delivered a different government.

Finally, it’s worth noting the changes in the Constant and the R-squared. The R-Squared tells us how much variation in the dependent variable (the size of the informal vote) can be explained by the variation in the independent variables (NES, Ballot Length and OPV). On the R-squared, the three variables explained approximately the same amount of variation in the size of the informal vote as last time.

The Constant, however, more than doubled – up from 1.26 in 2007 to 3.08 on Saturday.

This suggests that things other than these variables we analysed here  increased the size of the generic informal vote – near doubling it. Mark Latham might yet have something to answer for.