There are many myths and conjectures surrounding the Facebook News Feed algorithm. Until recently, few people knew about the features of her work. even when the company had already shown how content personalization works . Read more in the post for details.
Not only “likes” and “shares” are taken into account
The Facebook algorithm is an extremely complex and branched ranking system based on machine learning (machine learning, ML). This system needs to show relevant and useful content every time a user visits the Facebook website or application. This means analyzing a huge amount of content. More than 2 billion people have Facebook pages. And for each, the system selects thousands of possible posts that could potentially appear in the feed.
We are talking about trillions of publications and thousands of ranking signals need to understand what exactly a single user would like to see. When someone logs into Facebook, this whole process happens in the background, and the news feed loads in a matter of seconds.
In addition, in addition to “likes”, shares added to the saved and others, you need to take into account more and more factors, such as clickbait and fake news, for which Facebook has to find other solutions.
The news feed is not a single algorithm, but a multi-level system. It is based on several machine learning models to determine the most relevant content. Determining what is more likely to interest the user, the system filters out thousands of posts, and as a result, the pool of possible publications narrows down to a few hundred. They appear in the news feed.
What is Juan interested in?
To understand how this all happens in practice, consider a specific example.
Let’s say a day ago a certain user, let’s call him Juan, logged into Facebook. Over this time:
- his friend Wei posted a photo of his Cocker Spaniel;
- Saanvi’s friend posted a video taken while jogging in the morning;
- one of the pages Huang follows posted an article on how to best view the Milky Way at night;
- and in the group dedicated to cooking, four recipes for yeast dough appeared.
All of this content is likely to interest Juan because he follows the relevant pages and users.
To determine what content should be higher in Juan’s newsfeed, you need to figure out what is more important to him. In mathematical terms, it is necessary to determine the selection criterion for Huang and perform a one-criteria optimization.
To understand whether Juan will like a particular post, the system analyzes the publication data: the date or users marked on the photo, “likes”, and so on.For example, if Juan frequently comments or shares Saanvi’s posts, and Saanvi recently posted a video of her running, there is a high chance that Juan will like her new post. If Juan has interacted more with video content in the past, he is unlikely to like Wei’s photo of a Cocker Spaniel. In this case, the ranking algorithm will place the video of the run higher than the photo of the dog.
But “likes” are not the only way to express your preferences. People daily share articles, watch videos on celebrity pages, or leave comments on friends’ posts. From a mathematical point of view, the task is complicated by the fact that optimization is needed according to several criteria, each of which helps to form a list of relevant content for the feed.
Plenty of ML models come up with a ton of predictions for Huang: the likelihood of interacting with Wei’s photo, Saanvi’s video, an article about the Milky Way, or dough recipes. Each of the models offers its own list of content for the user. Sometimes there are discrepancies.
For example, Juan might be more likely to like a video about Saanvi’s run than an article about the Milky Way. But at the same time, he is more likely to comment on the article, rather than the video. Therefore, you need to combine all the assumptions into an overall ranking optimized for the ultimate goal: to show the user meaningful and relevant content.
When forming the feed, the opinion of the audience is taken into account – Facebook regularly conducts polls. Users are asked how valuable they find interacting with friends’ content, and whether posting is worth the time.
We need an efficient mechanism to sort more than a thousand posts in real time every day for each of the 2 billion users. Such a task is carried out in several stages, strategically designed to do everything quickly and reduce the amount of computational resources required.
First, the system collects all possible publications for Juan’s feed: a photo of a cocker spaniel, a video of a run, and so on. The list of potential content includes any posts that friends, groups, and pages have shared with Huang since he last opened the Facebook app or website.
But what about the posts that were posted before the previous visit to the social network and that Juan did not see? Such posts, if they are relevant to Juan’s interests, may appear in the current feed. The logic of the feed formation also takes into account the actions of friends. That is, posts that Juan has already seen, but that provoked active discussion in the future, may also end up in the feed.
The system then evaluates each post according to a number of criteria:
- content type;
- similarity with other posts;
- matching with what Juan usually interacts with.
In order to calculate all this for two billion people in real time, ML models are run in parallel on several predictor machines.
But before combining all predictions into a single rating, additional rules apply. The system waits for the first predictions and then narrows down the list of possible posts. This is done in several approaches to save computing resources.
- Initially, the social network applies certain integral processes to each post in order to determine whether and what sequence search methods are needed.
- In the next step, the simplified model narrows down the list to around 500 of the most relevant posts for Juan. Ranking fewer posts allows you to use more powerful neural network models in the future.
- Then comes the main stage of rating calculation, where most of the personalization takes place. For each post, an individual rating is calculated. And each of the 500 posts gets their place on this list.
Some posts may rank higher due to “likes” rather than comments, as many users prefer them. Actions that users rarely perform (rarely leave the same “likes”) play a minimal role in the rating.
- Rounding out all the calculations is the contextual stage, where the system takes into account characteristics such as the variety of content types. Therefore, in the conditional Juan’s tape, the videos do not go one after the other.
All of these complex calculations happen while you open the Facebook app. That is, in a few seconds people get a ready-made tape that can be viewed with interest.
The Facebook News Feed Algorithm is a multi-level and branched ranking system based on machine learning.
The system works in several stages:
- Collects all possible posts for the user’s feed (taking into account the actions of his friends and subscriptions).
- Narrows down the list to around 500 most relevant posts based on its own predictions.
- Then personalizes this list as much as possible. That is, it places 500 posts in the feed according to the rating, assigning “interest points” to each of the publications based on the user’s previous experience (what he “liked”, what publications he shared, and so on)
- Adds an element of diversity so that posts of the same type do not go one after another.
All this happens in a matter of seconds while the Facebook news feed loads. The social network is used by 2 billion people around the world, that is, we are talking about ranking trillions of posts every day.