A new report from two fellows at the Berkman Center for Internet and Society is out today: “Measuring Internet Activity: A (Selective) Review of Methods and Metrics” breaks down attempts at measuring the impact of the Internet into three categories — infrastructure and access, control, and online content and communities. The main takeaway: Our methods for measuring online activity are fragmented and usually offer only an incomplete, if tantalizing, picture of how people learn and communicate online.
Still, there are some intriguing arguments and examples in the report worth mentioning, especially regarding the content and communities layer of the web. Write authors Rebekah Heacock and Rob Faris:
Ultimately, the best understanding of the efficacy of networked technologies to affect social change and promote human progress — and the costs of restrictions — comes through the observation and measure of what people do and accomplish with digital tools.
The method of observation with which we’re most familiar is the survey, which is inherently problematic — expensive, often biased, non-representative, subjective, and inconsistent, according to the report. But surveys are also a frequently conducted and readily accessible technique. Some examples cited in the report include the work of Pew Research Center’s Internet and American Life Project, Eurostat, Web Index, and the World Economic Forum.
But beyond surveys, Faris and Heacock break down the measurement of online behavior into more quantifiable and more granular categories, such as search data, analytics, and social networking data (“e.g., hashtags, followers, or likes”).
On the potential of search:
Search data is a powerful gauge of the interests and preferences of Internet users and offers an unparalleled view into the questions on people’s minds, at least for those online and the queries that they seek to answer on the Internet. Other potentially powerful applications include gaining new insights into drug interactions or predicting voter turnout based on regional differences in search activity. It is unclear how far we might go in mining this type of data to better understand global Internet activity; it is possible that we’ve barely scratched the surface of this unique view into the collective digital mind.
On the limitations of third-party analytics:
Several web analytics companies gather data by recruiting Internet users to install software on their computers that tracks their browsing behavior. This data is then packaged and sold to companies that want to understand online behavior and browsing habits. Alexa and ComScore have made a business of this, while also offering free public access to portions of their data. This data covers potentially a wide set of Internet users and might be a closer approximation to representative surveys, but it is still subject to the same limitation: we don’t how the sample of people willing to install software that will monitor their behavior differs from the population of Internet users. Do they represent the gullible, naïve, or adventuresome? Does this alter their online behavior? On the plus side, these companies do track actual behavior, a marked improvement over respondent reported behavior. While they do offer some public access to their data, access to the most detailed data is reserved for their paying clientele.
On the potential of data from social platforms:
For example, Devin Gaffney explores word frequency, account creation, and retweeting during and after the 2009 Iranian presidential election. A 2012 paper by Panos Panagiotopoulos, Alinaghi Ziaee Bigdeli, and Steven Sams focused on the use of Twitter by local government authorities in London during the summer 2011 riots.
And on their imperfections:
Twitter is also being used to a means to estimate the attention of Internet users to different news stories and memes online, similar in scope and scale to the use of search term data mentioned earlier. Twitter data also provides interesting crowdsourced monitoring of global events. For example, social media marketing company SocialFlow has posted analyses of prominent hashtags used to share information during Hurricane Sandy and of hashtag use by different groups of users during the October 3, 2012 presidential debate. Tufekci points out several problems with studies that are structured around certain hashtags. The use of hashtags are not consistently good proxies for capturing an online debate, hashtags may mean different things in different contexts, and debates may persist despite users dropping the use of related hashtags, among other several factors that confound reliable interpretation. Facebook straddles the public and private realms. This complicates efforts to study Facebook, as it is more difficult to acquire data, and researchers must grapple with privacy concerns.
Ultimately, the authors find they are more interested in measuring the impact of networks rather than the behaviors of individuals:
Measures of individual activity online are a strong indication of the reach and influence of digital technologies: number of people online, online media consumption, contributions to user-generated content, hours spent on social networking sites, and so on. These metrics do not capture, however, the prevalence and impact of collective action online…Collaborative and interactive online activity is often discussed in the language of the networked public sphere, which offers a useful conceptual foundation for studying online activity. The networked public sphere represents not only the sum of the various digital media outlets resting on different platforms but also the interactions between these actors and sites that give rise to digital networks. Benkler describes the networked public sphere as an alternative arena for public discourse and political debate, an arena that is less dominated by large media entities, less subject to government control, and more open to wider participation. The networked public sphere provides an alternative structure for highlighting stories and sources based on relevance and credibility.
The report goes on to cite gaps in knowledge and funding and suggest arenas for further study, concluding that while there is a plethora of information available, “the power of big data and web analytics is yet to catch up with the data needs of policy making.”
Image by William Warby used under a Creative Commons license.