Big Data and elections

Peter Loewen is an associate professor of political science at the University of Toronto. He also directs the Local Parliament Project, designed to survey Canadians about who they intend to vote for in the 2015 federal election, and what issues are important to them. Just in time for the October 19th vote, Loewen offers Research Matters insight into Big Data and elections.

Political scientists and political parties are equally interested in the potential of Big Data, though the focus of each group often differs. To be sure, both groups are interested in understanding the behaviour of voters. Generally speaking, political scientists are more interested in the deep, underlying causes of voter behaviour and less interested in understanding how the outcome of an election can be changed.

Polling sign.

Some researchers are turning to Big Data to better understand the electorate. (Dennis S. Hurd, flickr.com)

This is the principal interest of political parties. While parties are certainly interested in how broad coalitions are made and maintained, they are often interested in more marginal things: what issue can move voters’ opinions by two or three points? Or, which voter is one percentage point more likely to vote than another voter?

If there is not always overlap in the interests of political scientists and political strategists, there is more substantial overlap in methods. Traditionally, they have both relied on two different types of data: polling data and geographic data. Polling data might take the form of large scale election studies, in which a 600 or 700 people per week are sampled over the course of a campaign, allowing for a rich retrospective look at an election. Or, it might involve daily samples conducted by political parties to seek out trends and changes in real time.

Traditional data’s diminishing returns

There is an obvious challenge with this kind of polling data, namely that it is increasingly difficult to develop proper random samples of the population via telephone. Land line coverage is no longer universal. Response rates are typically very low. These problems combine to create bad data. Certainly, the move to create large online panels of respondents combined with advanced statistical techniques allows many of these problems to be overcome (a research interest of mine). But, on balance, polling data is worse now than it was in the past.

Geographic based data typically involves collecting relevant outcome data – such as vote shares – for some level of geography – such as a riding. This is then married to other available data on that geographic unit – such as average income, economic, and social data. Researchers and political strategists can then use statistical analysis to try to identify relevant trends and correlations (provided they are not worried about ecological fallacies.) The challenge in this approach is equally daunting. The demographics of Canada – especially in urban centres – are changing much more quickly than a census conducted once every five years can capture.

In response to these obstacles, some researchers have been turning to Big Data. While its definition is not settled, I like to think of it as having at least some of the following characteristics:

  • it is often continuous, flowing in on a real-time basis.
  • The volume is so great that traditional approaches to data handling and analysis are inadequate.
  • It often contains substantially more information than is needed to make an inference about a population.

 Big Data has its own issues

There are several sources of Big Data in elections, some explicitly political and others not obviously so. Twitter and Facebook provide a constant stream of individuals’ experiences of a political campaign, as well as their thoughts, observations, and opinions. Voting advice applications (like Vote Compass, where I was director of analytics from 2010 to 2013) provide a constant stream of data on issue preferences and importance. But other information is relevant to voting behaviour. Consumer and credit data can provide individual-specific data on a voter’s habits and financial standing. Likewise, real estate and banking data. This information can be used to profile and then target voters as individuals. Parties can be more efficient than ever at crafting and narrow-casting their messages. Ultimately, this offers as much good for democracy as it does bad.

There are shortcomings of Big Data, however. With traditional data sources, there was often not enough data to help us adjudicate between various theories of voter behaviour. It is, I suspect, the opposite problem with Big Data. There is too much data and not enough theory. Absent careful thinking and testing about what actually moves voters, there is the potential of finding meaningless correlations which explain voter behaviour in one instance but in no others.

For a demonstration of this, look no further than Google Flu Trends. This was one of the most promising early applications of Big Data. By locating where individuals were searching for information on the flu, researchers could predict outbreaks of influenza. Models which were highly accurate were fit to the data. However, they worked for only a short period of time before essentially completely failing. The problem was that researchers lacked relevant epidemiological knowledge on the operation of influenza. Absent a strong theoretical basis, they had only correlations with a limited shelf-life.

Like the flu, elections will always be with us. Hopefully, so will enough students of politics that we can add some theory to the study of political behaviour. It will become more, not less important, as data get bigger.

Tagged: Arts & Culture, Building Community, Technology, Blog, Stories

Share: Print

Leave Comments

Questions

Questions

Researchers

Blog Posts