It’s less about what we’re doing on Facebook, and more about what’s being done to us.
Jonathan Albright, the research director at the Tow Center for Digital Journalism and a faculty associate at Harvard’s Berkman Klein Center for Internet and Society, isn’t big on studies that try to track how much fake news people have clicked on, or how many outright hoaxes they recall seeing in their feeds. Instead, his research into activity on the biggest platforms on the Internet — Facebook, YouTube, Instagram, and to a lesser extent, Twitter — situates everyday Internet users inside a kind of trap, one they can’t get out of without a great deal of help from those same platforms, which thus far haven’t been eager to tackle the problem.
It’s shadowy, scary, and difficult to pinpoint. I talked to Albright this week about the work he’s doing, which has come to center around pulling whatever data can be pulled from those platforms (almost always without the participation of those companies, and in the case of Facebook usually only through loopholes), analyzing it, releasing the data publicly, and helping journalists make sense of it means — and then repeat.
“It’s getting worse. Since the 2016 election what I’ve come to the realization — horribly, and it’s very depressing — that nothing has gotten better, despite all the rhetoric, all of the money, all of the PR, all of the research. Nothing has really changed with the platforms,” Albright told me. “We’re basically yelling about Russia right now when our technological and communication infrastructure — the ways that we experience reality, the ways we get news — are literally disintegrating around us.”
It’s all horrible and depressing, but it was still fun to talk with Albright, who speaks energetically and urgently and somehow manages not to be a total downer. Or maybe I was just scrambling to find something positive to pick up on. There are glints of light here, but a lot of them come down to the platforms accepting that they’re media companies and hiring people into totally new roles (Albright’s idea: “platform editor”). We’ll see. Our conversation, lightly condensed and edited for clarity, is below.
But I got into election-related things when I started looking at this more from a manipulation perspective. I did my Ph.D. on news hashtags. I was really interested in social change, things like the Arab Spring, and response to crisis events, like the Japanese earthquake. I was looking at this in the context of responsibility and using social media as a force for change and good.
But I was noticing through the 2016 [U.S. presidential] election — and this also occurred a little bit in the 2014 midterms — the patterns of how candidates were using social media, especially Trump. Things stood out. Twitter would run surveys during the presidential debates, and the Trump train would go crazy and dominate the other candidates. I saw people acting out in ways that weren’t what I expected. I saw polarizing callouts in News Feed. I saw tweets that got out of control.
I started collecting data in 2016, and once the election result kind of sunk in, I took a day or two and then started to dig into it. I focused on misinformation sites that continually spread unreliable, poorly sourced information or hyperpartisan news. I scraped all of those sites and spent hours and hours putting all the URLs together, pulling a network out of it, and really looking at this from a network perspective. I wanted to get a sense of the scale of this and see how the resources were being connected into from a linking perspective. I’d learned that you can’t just look at one platform or one type of communication channel — they’re all linked together.
What I saw in that first study was that YouTube was basically the center of that universe, especially from a resource perspective. So many sites, domains, tweets, and Facebook pages were linking into YouTube — not just YouTube channels or single videos, but previews in tweets or Facebook pages. It was shocking, and it showed something that hadn’t been brought to the conversation from a data perspective. People like Zeynep Tufekci have been on this for a long time, pushing that there are other platforms [to consider] besides Facebook and Twitter, and so in a sense I’m validating this with…not big data, very focused “medium “data.
Then I went back through that same network and scraped all the ad tech off of it — any script that was loading, any tracker, Facebook scripts, cookies, anything going on in the background between the browsers and the cache.
It showed that it’s not just about content. It’s really important to understand this “fake news” and information problem from a tracking perspective, because the tracking is actually how the people are getting the content. The number of people who navigate from a browser to a fake news story is just shockingly low. The vast majority of traffic on Facebook comes directly from the mobile app, and even among people who do the desktop version of Facebook, mobile dominates. So the studies that looking at this as people navigating from a URL to look at a news story on their own volition are completely missing the profiling and microtargeting that are happening.
Microtargeting is not just a buzzword; it exists, and in many ways people are being provoked to act out and share and keep spreading this misinformation and disinformation, as kind of a campaign. It’s part of the trolling. It’s targeting specific segments of audiences, hammering them with certain types of news at different times of day, and giving them a reason to act out.
There’s a whole other side of this with monetization as well, and I think other people are doing good work on that, like Craig Silverman. People are contributing [research] in all sorts of different ways, and hopefully it’s going to lead somewhere and none of it will be for naught.
From those initial maps of the ad tech and hyperlink ecosystem of the election-related partisan news realm, I dove into every platform. For example, I did a huge study on YouTube last year. It led me to almost 80,000 fake videos that were being auto-scripted and batch-uploaded to YouTube. They were all keyword-stuffed. Very few of them had even a small number of views, so what these really were was about impact — these were a gaming system. My guess is that they were meant to skew autocomplete or search suggestions in YouTube. It couldn’t have been about monetization because the videos had very few views the sheer volume wouldn’t have made sense with YouTube’s business model.
Someone had set up a script that detected social signals off of Twitter. It would go out and scrape related news articles, pull the text back in, and read it out in a computer voice, a Siri-type voice. It would pull images from Google Images, create a slideshow, package that up and wrap it, upload it to YouTube, hashtag it and load it with keywords. There were so many of these and they were going up so fast that as I was pulling data from the YouTube API dozens more would go up.
These things are just…not really explainable with regular logic.
I worked with The Washington Post on a project where I dug into Twitter and got, for the last week leading up to the election, a more or less complete set of Twitter data for a group of hashtags. I found what were arguably the top five most influential bots through that last week, and we found that the top one was not a completely automated account, it was a person.
The Washington Post’s [Craig Timberg] looked around and actually found this person and contacted him and he agreed to an interview at his house. It was just unbelievable. It turns out that this guy was almost 70, almost blind.
[From Timberg’s piece: “Sobieski’s two accounts…tweet more than 1,000 times a day using ‘schedulers’ that work through stacks of his own pre-written posts in repetitive loops. With retweets and other forms of sharing, these posts reach the feeds of millions of other accounts, including those of such conservative luminaries as Fox News’s Sean Hannity, GOP strategist Karl Rove and Sen. Ted Cruz (R-Tex.), according to researcher Jonathan Albright…’Life isn’t fair,’ Sobieski said with a smile. ‘Twitter in a way is like a meritocracy. You rise to the level of your ability….People who succeed are just the people who work hard.'”]
The most dangerous accounts, the most influential accounts, are often accounts that are supplemented with human input, and also a human identity that’s very strong and possibly already established before the elections come in.
I’ve looked at Twitter, YouTube — I’ve obviously looked at Facebook. I approach this from a different angle than other scholars do; I focus on accountability data. My purpose is to get data that otherwise wouldn’t be available, and to try and repackage it and share it so journalists can use it. These are things that are becoming more difficult to understand and write about quickly, and it’s becoming more difficult for journalists to get the kind of data they need. I’m not talking about the number of likes and retweets. I’m talking about how many people a post reaches.
As everything moves onto platforms and into closed walled gardens and apps, it’s becoming more and more difficult to get any type of data to hold institutions that are media companies — and platforms are media companies, there’s no question. We are losing access to the data that would let us understand what is going on and report this to the public.
One of the good things about Twitter, though, and there are a lot of good things about them, is that they have always been very open. Yes, they define the metrics, and yes, people are angry about bot signals and want more data on automated accounts. But Twitter has been very good about providing some kind of accountability.
Facebook basically went the opposite direction. Facebook has never really been open, but you used to be able to capture something like public hashtag data — you could pull Facebook data from people who’d left their privacy open or public and had posted with a hashtag. Facebook closed that and now only shares it with select marketing partners like Samsung. They also shut off access to the social graph. Three years ago, in my classes, I used to have my students pull their own social network graph. That’s no longer possible.
Instagram did the exact same thing. You used to be able to pull hashtags and things like networks from Instagram, but now the only easy data to pull from Instagram is GPS, so if an Instagram post is geotagged, you can pull a post in a perimeter. You can kind of pull hashtags. Other than that, you’d have to build a system from the ground up, and even then, every step you take, you could be violating their Terms of Service and get kicked off.
Twitter and YouTube are some of the only platforms that let you get large amounts of data. YouTube is actually fairly open for now. Every time a study like mine comes out, though, they’re going to consider closing that loophole.
As the access closes, the ability for us to study these platforms — their effects on society, their impact on elections — becomes smaller and smaller.
It’s really concerning. This is the zeitgeist of where politics, news, digital technologies, and algorithms all converge. It’s a huge problem when we can’t even go back and reconstruct. We don’t have the data to make sense of what’s happening and what’s taken place so we can prepare or organize or come up with realistic solutions.
There are clearly amazing, very concerned people working at Facebook. A lot of people work at Facebook specifically for that reason — they think they can affect the world in positive ways and build new tools to enrich people’s lives. But the problem is that with with the size and scale and sheer dominance of Facebook as a for-profit corporation, it’s getting to the point where it’s becoming impossible to affect it. [These platforms] are no longer startups that can shift direction.
Often, these companies are open to research partnerships and things, but it’s always on their terms. If you do research with them, you’re dealing with IP issues, you’re signing over the rights to the research. It has to be reviewed completely and vetted by their legal process. They often handpick researchers that help them and help their purpose and help their cause — they maybe throw in some sprinkles of criticism. I understand why they would be hesitant to want to work with people like me.
I mean, I do hold that it’s not okay to come in and try to influence someone’s election; when I look at these YouTube videos, I think: Someone has to be funding this. In the case of the YouTube research, though, I looked at this more from a systems/politics perspective.
We have a problem that’s greater than the one-off abuse of technologies to manipulate elections. This thing is parasitic. It’s growing in size. The last week and a half are some of the worst things I’ve ever seen, just in terms of the trending. YouTube is having to manually go in and take these videos out. YouTube’s search suggestions, especially in the context of fact-checking, are completely counter-productive. I think Russia is a side effect of our larger problems.
We’re being held in the dark data-wise, but equally problematic is that we’re not able to understand how things are being promoted and how they’re reaching people because of algorithms. Everything is an algorithm on top of an algorithm. The search function that I used to pull the videos is an algorithm, and you have a little bit of profiling involved in that. The recommendations are an algorithm, so everything is proprietary and highly secret, because if someone ever found the exact formulas they were using, they could instantly game it. If opaque algorithms continue to exist as a business model, we’re always gonna be chasing effects.
Maybe there needs to be a job called, like, Platform Editor, where someone works to not only stop manipulation but also works across the security team and the content team and in between the different business verticals to ensure the quality and integrity of the platform. That’s a lot of responsibility, but the kinds of things that I often see could literally be stopped by one person. I mean: 4chan trending on Google during the Las Vegas shooting? How that even happened, I have no idea, but I do know that one person could have stopped that. And I do know that a group of people working together — even if it involves deliberation, even if they don’t agree on one specific thing — can often solve problems that appear or are starting to surface because of automation. And I don’t mean, like, contract moderators from India — I mean high-level people. The companies need to invest in human capital as well as technological capital, but that doesn’t align with their business model. The rhetoric exists in their public statements, but we can clearly see that how it’s being implemented isn’t working.
It’s getting worse. Since the 2016 election, I’ve come to the realization — horribly, and it’s very depressing — that nothing has gotten better, despite all the rhetoric, all of the money, all of the PR, all of the research. Since nothing has really changed with the platforms, we can scream about Russia as the structure of our information decays around us. Our technological and communication infrastructure, the ways that we experience reality, the ways we get news, are literally disintegrating.
I think one of the biggest things that’s missing from political science research is that it usually doesn’t consider the amount of time that people spend online. Between the 2012 election and the 2016 election, smartphone use went up by more than 25 percent. Many people spend all of their waking time somehow connected.
This is where psychology really needs to come in. There’s been very little psychology work done looking at this from an engagement perspective, looking at the effect of seeing things in the News Feed but not clicking out. Very few people actually click out of Facebook. We really need social psychology, we really need humanities work to come in and pick up the really important pieces. What are the effects of someone seeing vile or conspiracy news headlines in their News Feed from their friends all day?