Google News at 10: How the Algorithm Won Over the News Industry

In April of 2010, Eric Schmidt delivered the keynote address at the conference of the American Society of News Editors in Washington, D.C. During the talk, the then-CEO of Google went out of his way to articulate — and then reiterate — his conviction that “the survival of high-quality journalism” was “essential to the functioning of modern democracy.”

This was a strange thing. This was the leader of the most powerful company in the world, informing a roomful of professionals how earnestly he would prefer that their profession not die. And yet the speech itself — I attended it — felt oddly appropriate in its strangeness. Particularly in light of surrounding events, which would find Bob Woodward accusing Google of killing newspapers. And Les Hinton, then the publisher of the Wall Street Journal, referring to Google’s news aggregation service as a “digital vampire.” Which would mesh well, of course, with the similarly vampiric accusations that would come from Hinton’s boss, Rupert Murdoch — accusations addressed not just toward Google News, but toward Google as a media platform. A platform that was, Murdoch declared in January 2012, the “piracy leader.”

What a difference nine months make. Earlier this week, Murdoch’s 20th Century Fox got into business, officially, with Captain Google, cutting a deal to sell and rent the studio’s movies and TV shows through YouTube and Google Play. It’s hard not to see Murdoch’s grudging acceptance of Google as symbolic of a broader transition: producers’ own grudging acceptance of a media environment in which they are no longer the primary distributors of their own work. This week’s Pax Murdochiana suggests an ecosystem that will find producers and amplifiers working collaboratively, rather than competitively. And working, intentionally or not, toward the earnest end that Schmidt expressed two years ago: “the survival of high-quality journalism.”

“100,000 Business Opportunities” 

There is, on the one hand, an incredibly simple explanation for the shift in news organizations’ attitude toward Google: clicks. Google News was founded 10 years ago — September 22, 2002 — and has since functioned not merely as an aggregator of news, but also as a source of traffic to news sites. Google News, its executives tell me, now “algorithmically harvests” articles from more than 50,000 news sources across 72 editions and 30 languages. And Google News-powered results, Google says, are viewed by about 1 billion unique users a week. (Yep, that’s billion with a b.) Which translates, for news outlets overall, to more than 4 billion clicks each month: 1 billion from Google News itself and an additional 3 billion from web search.

As a Google representative put it, “That’s about 100,000 business opportunities we provide publishers every minute.”

Google emphasizes numbers like these not just because they are fairly staggering in the context of a numbers-challenged news industry, but also because they help the company to make its case to that industry. (For more on this, see James Fallows’s masterful piece from the June 2010 issue of The Atlantic.) Talking to Google News executives and team members myself in 2010 — the height of the industry’s aggregatory backlash — I often got a sense of veiled frustration. And of just a bit of bafflement. When you believe that you’re working to amplify the impact of good journalism, it can be strange to find yourself publicly resented by journalists. It can be even stranger to find yourself referred to as a vampire. Or a pirate. Or whatever.

And that was particularly true given that, as an argument to news publishers, Google News’s claim for itself can be distilled to this: We bring you traffic. Which brings you money. Which is hard to argue with. As an addendum to this line of logic, Google staffers will often mention the fact that participation in Google News is voluntary; publishers who don’t want their content crawled by Google’s bot can simply append a short line of code to make themselves invisible. Staffers will mention, as well, the fact that Google News has been and remains headline-focused — meaning that its design itself encourages users to follow its links to news publishers’ sites. This is not aggregation proving its worth in an attention economy, those staffers suggest. It is aggregation proving its worth in a market economy. Google News, founder Krishna Bharat told me, is fundamentally “a gateway — a pathway — to information elsewhere.”

Publishers, as familiar with their referral numbers as Google is, are coming around to that view. In fact, Murdoch’s transition suggests, they have pretty much finished the coming around. In the broad sense of the long game, Google News is very much a product of its parent company: The service saw where things were going. It built tools that reflected that direction. And then it waited, patiently, for everyone else to catch up.

Concession Stands 

As far as the Google/news relationship goes, though, numbers are only half the story. Google has reiterated its stats — did we mention billions, with a b? — to, yes, pretty much anyone who will listen. But it has also tackled its industry publicity problem more strategically, in a way that even more explicitly emphasizes the “Google” component of “Google News”: It has ingratiated itself to the news industry iteratively, experimentally, and incrementally.

Google added to its team of engineers staff members with backgrounds in journalism, people whose jobs were to interact — or, in Google-ese, to “interface” — with news producers. It experimented with new ways of processing and presenting journalism — Fast FlipLiving Stories — and framed them as tools that could help journalists to better do their jobs. It introduced sitemaps meant to give publishers greater control over how their articles get included on the Google News homepage. Responding to outlets’ frustrations that their original work was getting lost among the work of aggregators, Google created a new tag that publishers could use to flag standout stories for Google News’s crawlers. Responding to a new cultural emphasis on the role of individual writers, Google integrated authors’ social profiles into their displayed bylines. And, nodding to a news industry that values curation, it implemented Editors’ Picks, which allows news organizations themselves, independently of the Google News algorithm, to curate content to be displayed on the Google News homepage. (The Atlantic is included in the Editor’s Picks feature.)

All those developments, on some level, have been concessions to an indignant industry. Which is also to say, they have been concessions to an industry that is not populated by engineers. When Google News launched in 2002, it’s worth remembering, it did so with the following, delightfully Google-y declaration: “This page was generated entirely by computer algorithms without human editors. No humans were harmed or even used in the creation of this page.” Since then, as news publishers have emphasized to Google how human a process news production actually is, the company’s news platform has — carefully, incrementally, strategically — found ways to balance its core algorithmic approach with more human concerns.

There have been the product-level innovations. There have been the public declarations. (Schmidt, in addition to his pro-journalism speeches, wrote op-eds re-professing his love of the news. Bharat spent a year as a professional-in-residence at Columbia’s Graduate School of Journalism.) But, less obviously and less visibly, there has also been the infrastructural effort Google has put into making the news industry a colleague rather than a competitor. “There’s a reporting aspect to it,” says David Smydra, Google News’s manager of strategic partnerships and himself a former reporter. Google tries to figure out what would help news producers to produce better content, he told me, and responds accordingly. With that in mind, Google News staffers have made themselves a friendly and patient and constant presence at journalism conferences and industry events. They have offered tutorials on making use of Google News and other Google tools. They have written explainers on becoming a Google News source in the first place. They have visited individual newsrooms to meet with publishers and other news producers, listening to their concerns and imagining innovations that might prove useful to outlets as well as users. They have reiterated, in ways both subtle and explicit, their good intentions. If Google News is a vampire, it is an incredibly perky one.

Harvesting the News

Part of Google’s pitch to news organizations, though, has also been a pitch to users. And it has made its case to both groups at the same time, through the same vehicle. When it came to the Google News homepage — its primary user interface — Google iterated. It updated. It redesigned. It introduced real-time coverage of breaking news events. It introduced geo-targeted local news. It introduced customization. It integrated a social layer. It introduced video results. It introduced expandable stories. It emphasized contextual news consumption, presenting Wikipedia links along with the top news stories it displayed for its users.

And it has been, all along, tweaking — and tweaking, and tweaking — its algorithm. While Google News is notoriously reticent about the particular elements included in its algorithm, some of its general signals, engineers have said, include: the commonality of a particular story arc; the site an individual story appears on; and a story’s freshness, location, and relevance. The algorithm’s main point, a representative told me, is to sort stories “without regard to political viewpoint or ideology” — and to allow users to choose among “a wide variety of perspectives on any given story.”

Achieving all this through an algorithm is, of course, approximately one thousand percent more complicated than it sounds. For one thing, there’s the tricky balance of temporality and authority. How do you deal, for example, with a piece of news analysis that is incredibly authoritative about a particular story without being, in the algorithmic sense, “fresh”? How do you balance personal relevance with universal? How do you determine what counts as a “news site” in the first place? How do you account for irony and cheekiness in a headline? How do you accommodate news coverage’s increasing emphasis on the update as its own form of news narrative? Andre Rohe, Google News’s head of engineering, summed up the challenge: “How do I take a story that has 20,000 articles, potentially, and showcase all of its variety and breadth to the user?”

And then there’s the Tiger Woods problem. During Woods’s cheating scandal, Rohe points out, many, many people were following the story of the golfer’s affairs and their aftermath. “They weren’t necessarily following that story because they were particularly interested in Tiger Woods,” Rohe notes; “they weren’t necessarily following that story because they were particularly interested in golf.” They were following the story, he says, because of its place within a different kind of news category: a “fall from grace.”

But, then: How do you quantify that category? How do you work one of the oldest narratives there is — the plummet, the pathos — into an algorithm? And how do you translate all that into the user experience — the content placed on a page?

“One has to be, actually, rather subtle when doing these things,” Rohe says.

And those subtleties, he and his colleagues say, are the things that will continue to challenge Google’s journalism arm as it moves into its teenage years. Google News, says Richard Gingras, its head of product, is the result of “continued evolution” — not just in terms of design improvement, but also in terms of the news system that underscores it. As journalism changes, Google News will change with it — strategically, yes, but also inevitably. And it will do so because of the thing that has been at the heart of Google’s journalism pitch from the beginning: We’re in this together. For Google News’s next phase, Gingras says, we can expect to see the “continued evolution of the algorithmic approach to address the changing ecosystem itself — in some ways subtle, and in other ways, going forward, likely more profound.”

WE STILL DON’T KNOW HOW GOOGLE NEWS WORKS

Alphabet executive chairman Eric Schmidt said during an interview this week that Google is trying to “engineer the systems” so that Russian state-owned media outlets RT and Sputnik stop dominating Google News results and making money via Google’s ad network Adsense. “We don’t want to ban the sites,” Schmidt said. “That’s not how we operate.”

How does Google operate when it comes to news sources? It’s true that Google rarely bans sites outright from its search engine at Google.com, which crawls the entire open web — sites only get kicked out if they are illegal or attempting to game the algorithm. But Google has always maintained more discretion over Google News, which is restricted to sites that “primarily offer timely reporting or analysis of recent events,” according to the company. As of this writing, Google was still surfacing Sputnik and RT in Google News.

Unlike Google’s main search engine, which will pick up any site on the web, sources have to be approved before they are included in Google News. The criteria for inclusion are broad and vague. “In general, you should write original content that’s clear and free of grammatical errors,” Google says in its guidelines for publishers. Other factors include original reporting, clear attribution with bylines and datelines, transparent author bios, honesty — “Sites included in Google News must not misrepresent, misstate, or conceal information about their owner or primary purpose” — and having an amount of content that exceeds the amount of advertising.

It’s not clear how stringently those guidelines are enforced. AllBusiness, which shows up in Google News results, doesn’t appear to have datelines on any of its articles. The Economist, which famously eschews bylines, is also included. In general, Google seems reluctant to remove any publisher that was once approved — which could be why Schmidt seemed more willing this week to adjust the platform’s entire algorithm than to kick two outlets out.

Google News launched in 2002. Creator Krishna Bharat, a longtime Google research scientist who headed the company’s news development development for years before he left the search giant in 2015, found himself bouncing around the web after Sept. 11, getting coverage from different media sources. “It seemed fundamentally inefficient. That’s not the way the web was supposed to work,” he said at the time. “The web was supposed to have a link structure that helped you find content.” He conceived of a news aggregator that categorized related stories into “clusters” of coverage, going back 30 days, from thousands of approved sources. The first Internet Archive snapshot of Google News shows eclectic headlines from sources ranging from MTV to the Singapore Strait Times. In 2003, Google put the number of news sources at around 4,500. In 2011 the company released a cache of news stories about Osama bin Laden, which researchers found to contain articles sourced from some 4,500 separate publishers ranging from USA Today to small local papers like the Bennington Banner. A Google spokesperson declined to provide an up-to-date list of Google News sources for this story.

“In general, you should write original content that’s clear and free of grammatical errors.”

That lack of transparency can create the type of credibility problems that Google now finds itself pushing back against. Google News still indexes the U.K. tabloid The Daily Mirror, which is notorious for glaring errors like depicting a traditional Russian pancake festival as a training camp for violent soccer hooligans, and bottom-tier content mills like Business2Community, which runs endless listicles about thinly-sourced business topics, or Elite Daily, a lifestyle blog that’s been accused of copyright infringementletting authors post pseudonymously with photos of models as profile pictures, and even posting under the name of a Gawker writer.

Google News’ prominence, though, is undeniable. In 2012, Google started saying that there were 50,000 sources in Google News. The Guardian noted in 2013 that while Google’s “crippled communication machine” had struggled to justify Google News’ benefits to the news media, its 72 editions in 30 languages were drawing six billion visitors per month in an era when The New York Times was attracting just 40 million visitors monthly.

That popularity is probably due to the site’s significant technical achievements, patent filings for which describe how the site evaluates the newsworthiness and originality of each source in deciding its rank — a news engine that applies Google’s aptitude for grading web content to the entirety of web publishing.

Google is now increasingly grappling with criticism around which sources pop up in its more curated products like Google News and the answer boxes that appear at the top of search results, which are called featured snippets. In another prominent example, Google’s “Top Stories” section, which serves a purpose much like Google News, showed conspiracy theories sourced to 4chan after the October mass shooting on the Las Vegas strip.

Kevin Carty, a researcher at the Open Markets Institute, said that Google’s enormous stature gives it a special responsibility to offer some form of transparency about how its algorithms work — especially since Google News depends on news outlets in order to exist as a useful tool.

“Google News and Google Search are interesting because they’re only possible and profitable because these other services and publications are providing things of such great value,” he said. “Google News would be nothing without CNN and the Washington Post and NPR.”

Carty, who favors the solution of regulating Google’s services like a public utility, worries that leaning on the search giant to stamp out misinformation on its platform will foster a narrative in which a corporate Big Brother can make closed-door algorithmic decisions that affect users around the world with no public oversight.

“This Google News thing illustrates this problem where a company like Google or FB has enough power to control a whole sector of trade,” he said. “If you have a problem, like election interference or fake news, these companies are being asked to behave as government.”

Schmidt seems to believe that Google must rework the algorithm so that RT and Sputnik are sort of naturally de-ranked rather than intervene manually. That is consistet with Google thinking; from the beginning, Google News was touted as being run entirely by “computer algorithm.” But this position is increasingly unconvincing as Google boots channels off YouTube and demonetizes publishers en masse. Google let RT and Sputnik into Google News. Why pretend it can’t kick them out?