People are more likely to click on the top search and recommendation spots because they’re at the top, not because of their relevance. If you order your search results with an ML model, they may eventually degrade in quality due to such a positive self-reinforcing feedback loop. How can this problem be solved?
Whenever you present a list of things to a person, such as search results or recommendations, it’s rare that we can fairly evaluate all of the items on the list.
Item ratings are all around us.
The cascade compression model assumes that people evaluate all items in a list sequentially before finding the appropriate one. But then that means the stuff at the bottom has less of a chance to be appreciated at all, so will get less clicks organically;
Higher in the list:—?more clicks.
Top items get more clicks just because of their position.—this behavior is called position bias. However, position bias is not the only bias in product listings, there are many other dangerous things to watch out for.
- Presentation biasFor example, due to the 3×3 grid layout, an item in position #4 (above right #1) may receive more clicks than item #3 in the corner.
- Model biasWhen you train an ML model on historical data generated by the same model.
In practice, the position bias is the strongest, and removing it during training can improve the reliability of your model.
We conducted a small population study on position bias. Using the RankLens database, we used the Google Keyword Planner tool to create a set of queries to find each specific movie.
Using Google Keyword Planner to get real queries that people use to find movies.
With a range of movies and relevant real queries, we have a perfect database for search evaluation. All elements are known to a wider audience and we know the correct labels in advance.
All major crowd-sourcing platforms, such as Amazon Mechanical Turk, Scale.com, and Toloka.ai, have common search templates;
A typical search ranking estimation template.
But there’s a neat trick to such templates that prevents you from shooting yourself in the foot with positional bias: each subject must be studied independently. Even if multiple items are present on the screen, their order is random. But does the randomness of items prevent people from clicking on the first results?
The raw data for the experiment is available at github.com/metarank/msrd, but the main observation is: people still click more on the first position, even on random items!
More clicks on the first items, even for random classification.
But how can you offset the effect of position on the exposure you get from clicks? Whenever you measure the likelihood of a point being clicked, you’re looking at a combination of two independent variables:
- biasProbability of clicking on a specific position in the list.
- Compliancethe importance of the item in the current context (eg BM25 score obtained from ElasticSearch and cosine similarity of recommendations)
In the MSRD database mentioned in the previous paragraph, it is difficult to distinguish the effect of position independently of the BM25 fit, because you only look at them together;
When sorting by BM25, people prefer matching items.
For example, 18% of clicks happen in position #1. Is it just because we have the most relevant item presented there? Will the same item in position #20 get the same number of clicks?
The inverse propensity weighting approach assumes that the observed click probability on a position is simply a combination of two independent variables;
Is true relevance independent of position?
And then, if you estimate the probability (propensity) to click on each position, you can weight all your relevant tags with that and get a true unbiased match.
Weighting by propensity
But how can propensity be assessed in practice? The most common method is to introduce a slight jumble of rankings so that the same items in the same context (such as a search query) are ranked in different positions.
Mixing propensity assessment.
But adding extra clutter will definitely lower your business metrics like CTR and conversion rate. Are there any less invasive alternatives that don’t involve mixing?
Slide from MICES’19 discussion Personalize search results in real time. 2.8% conversion drop when mixing search results.
The rank positioning approach suggests asking your ML model to optimize both rank relevance and position impact simultaneously;
- In training, you use the position of the object as an input attribute.
- In the prediction phase, you replace it with a permanent value.
Replacing bias factors with constants during inference
In other words, you trick your classification ML model to detect how position affects relevance during training, but nullify this property during prediction; all points are simultaneously presented in the same position.
But which constant value should you choose? The authors of the PAL paper performed several numerical experiments to select the optimal value. The rule of thumb is not to choose positions that are too high because there is too much noise.
The authors of PAL experimented with different position constant values
The PAL approach is already part of several open source tools for generating and searching recommendations;
- ToRecSys applies PAL as a bias elimination approach to train recommender systems on biased data.
- Metarank can use the PAL-based feature to train an unbiased LambdaMART Learn-to-Rank model.
Since the ranking approach is just a wrap around feature engineering, it’s just a matter of adding another feature definition to Metarank:
Adding position as a classification feature to Learn’s classification model
In the MSRD database mentioned above, such a PAL-inspired ranking feature has a fairly high SHAP importance compared to other ranking features;
The importance of position when training the LambdaMART model
A position-aware learning approach is not limited to purely ranking tasks and position disservice; you can use this trick to overcome any other type of bias;
- For presentation bias Thanks to the grid layout, you can introduce several features for the row and column position of the subject in the course. But change them to constant during prediction.
- For model biaswhen more frequently featured products get more clicks?—?you can implement the “click count” learning property and replace it with a constant value of prediction time.
A PAL model built with the PAL approach should provide an unbiased prediction.
Roman Grebennikov Delivery Hero SE is a lead engineer working on search personalization and recommendations. Pragmatic fan of functional programming, learning classification models, and performance engineering.