Belgium’s top news supplier Belga News Agency launched its digital news platform Belgapress, giving clients personalised, real-time access to the latest news.
Goal of the project
Belga wanted to automatically tag articles with hashtags such as #brexit to enable journalists and communication professionals to find all articles related to that topic.
- Every day up to 100k new articles are processed by our AI and receive hashtags fully automatically.
- 95% of the end-users are happy with the generated hashtags.
Before this project, journalists manually added Belga hashtags to each Belga-produced article (e.g., #brexit) so that Belga stories around the same topic are linked. We used these expert-annotated articles to train a Machine Learning model that can automatically suggest hashtags to apply to new articles.
As soon as a new hashtag is created, our model is able to propose this hashtag to new articles from other outlets if it concerns the same story. The AI is integrated in Belga’s app Belgapress and reviews on average 40k articles every day to automatically add the hashtags that are relevant for those stories.
The challengeIn 2018, Tom Wuytack, CIO at Belga, approached Radix. Before the launch of Belgapress, Belga used an app called Gopress, a media monitoring platform that journalists were using to find news about a certain topic. To monitor and develop a story, a journalist needs a complete overview of what’s already in the news. The first problem with the existing Gopress was that the app wasn’t optimised for real-time news gathering. Secondly, previous tagging solutions failed and a lot of effort went into manually tagging all stories.
The briefingTom was looking for a solution which would solve these issues. His vision was a real-time app, a single content collection where communication professionals could find out all relevant news, personalised to their preferences.
Belga journalists can tag the articles they produce, but they are also interested in seeing articles from other sources, and in other languages too. With up to 100k pieces of news a day, this would be an incredible manual effort for the Belga team.
Tom asked us to help with the automation of the news tagging to achieve the goal of a real-time app with no manual effort required from the Belga team. Find out below how Radix approached the problem.
The workSince the beginning of the project, journalists on the editorial floor manually added Belga hashtags to each Belga-produced article, for example, #brexit. They did this consistently for every story and used the same hashtags so that Belga stories around the same topic were linked. The collection of all hashtags and articles can be defined as a “training set” for machine learning algorithms.
From the moment a hashtag is created (into Belga’s editorial system), new incoming articles from Belga and any other news outlet should be given the same hashtag if it concerns the same story.
The combination of the Belga journalists tagging their articles and our machine learning algorithm powers the Belgapress app experience.
The technical details
How can a machine know which articles are about the same, or similar topic, and tag articles accordingly? For this project, we used a Machine Learning technique called Natural Language Processing (NLP), that makes machines read and understand human languages.
To make a machine understand the words in each article, we transform words into “numbers”. We call this a “numerical representation” of the article. Each “number” is a coordinate of what is called a “vector space”, where all words that are related to each other are close. If two words are connected in the vector space, we know that they are related to the same or similar topic.
The below GIF shows the vector space for Brexit. The closer the words to the main term, the more similar they are to the topic. In that way, the machine knows that articles that contain those words are related to the main Belga hashtag Brexit, and can be tagged accordingly.
95% of app users are happy with the results. The biggest impact is that the solution is fully automated and there is no need for human annotation. If 40k articles were sent to the app, 10% were tagged on any given day.
The futureThe collaboration of Belga and Radix created a real impact on the journalists’ experience in Belgapress. This was made possible by finding the right AI techniques and leveraging them to solve the real-time and personalisation challenges. The model keeps getting better by learning from new data and is integrated in Belgapress. The positive contribution of AI and machine learning to Belga’s workflow has also opened the door for additional innovations and use cases from Radix at the news agency.
Before engaging with Radix, we tried to implement the solution with named entities, but we realised it was too rigid and manual, and not cross-lingual. What I really enjoy is to see the machine learning model work, giving us less maintenance.