In the past year, Twitter and other social media channels have come under fire for the personalized experiences they provide to users. Social media feeds currently employ algorithms to surface content that users may want to see, inherently creating self-perpetuating echo chambers through filter bubbles.
According to Internet Live Stats, on Twitter alone 6,000 posts are published every second on average. This translates to roughly 500 million tweets per day, a number that’s projected to grow by 30% year over year.
With such a massive influx of live opinions, images, videos, facts and fiction, how can reporters and avid consumers cut through the chatter to find the small percentage of tweets that are actually breaking news? The answer is machine learning.
How can machine learning verify news?
Today, Reuters journalists use Reuters News Tracer, our proprietary algorithm that employs over 700 signals to determine whether trending topics are newsworthy and truthful. The social media listening tool was taught by our journalists to ask key questions, consult historical data, and weigh relevance just like a human would – but within 40 milliseconds.
Who posted this tweet? Is the account verified? Does the tweet contain links and images? What is the tone of the tweet – is it more opinion or facts based? Where was this poster located? Is it an ad? What common topic threads connect this tweet to others? Are others confirming this information?
Tracer then takes the tweets themselves and, through natural language processing, generates a short summary for the event cluster alongside other helpful indicators.
Reuters News Tracer live demo
In the video below, Dr. Khalid Al-Kofahi, our VP of R&D here at Thomson Reuters, and Tom Reilly, CEO of Cloudera, walk through a live demo of Reuters News Tracer at the Strata + Hadoop world conference.
Tom Reilly: To prove that this application is not fake news here on the stage, we’re going to do a live demo so you can see it. I think it’s a good time for us to go see the app.
Khalid Al-Kofahi: All right. So this is the news, and forgive me if sometimes some pictures, especially around crises and casualties, may show up. So if it shows up, forgive us, guys. We don’t control the news.
Tom Reilly: This is real.
Khalid Al-Kofahi: So what you see here are four channels. A user can define their own channels. So here I’ve redefined four channels. The first one, we call it the crises channel. There’s no key terms; it’s just based on the newest, in the past 24 hours. I set the top to be loose on the facts, so Tracer will show me things that look like rumors, things that are not necessarily accurate, etc.
Underneath it, you see some of the events that we are capturing.
The next one is trending, and you see the live updates count appear because I am hovering here on top of this cluster. The updates in the channel have paused so that I can see the cluster I’m selecting. This cluster contains 7,968 tweets. If you click on it, it’s about the shooting that happened in a French school.
Tom Reilly: That just happened this morning.
Khalid Al-Kofahi: Yes, that just happened this morning, so this is live. You can see the concepts and the topics that we algorithmically extracted from the tweet cluster above related concepts, etc. The star icon at the upper right of the tweet means that it’s newsworthy. The five green circles at lower left mean that we believe the story is true.
We defined a couple of more channels: crises, bomb blasts, etc. We defined a channel for Trump taxes. So if you click on the actual cluster, you see –
Tom Reilly: I actually built this channel this morning with Khalid.
Khalid Al-Kofahi: Yes, you did. So this is running live. So to answer the original question: “How accurate is Reuters News Tracer in distinguishing between fake news, rumors, and actual news?” Well for news stories we are about 84% accurate. On tweets, even on events that have at least 5 tweets – not 500, just 5 – we can distinguish between rumors, assertions and facts with 78% accuracy.
Tom Reilly: Khalid, who are the users of this? Who sits in front of this screen?
Khalid Al-Kofahi: So this is meant – initially was meant to scale our journalists. They can’t monitor news live from the field. They can’t set up keywords; it’s just too noisy. So we have journalists who set up these channels, and they monitor them for the newsworthy –
Tom Reilly: So a journalist in Lebanon who has interest about local things can say, “These are the topics and these are the information I trust,” and build their own channel.
Khalid Al-Kofahi: Absolutely, and then monitor it live. And everything here is indexed and searchable and all kinds of analytics on top of it that you can imagine.
In addition to journalists, this is running live with our financial customers. We define channels that have the potential to move the market or are relevant to our financial customers, and they consume it live without any human intervention, with a veracity score to tell them maybe it’s not completely true, maybe it’s true and so forth.
Human intelligence paired with machine learning
Reuters News Tracer takes the wealth of citizen journalism that Twitter provides and gives it structure, giving journalists a head start in the actions that make for real news: interviewing key witnesses, asking novel questions, and connecting the dots between disparate platforms and sources.
Visit Innovation @ ThomsonReuters.com for more on how we bring together smart data and human expertise to find trusted answers.