Recommender Systems

used in LINE Timeline

two systems

collaborative filtering

a model

requirements

ratings are considered explicit feedback

user likes and clicks are considered implicit feedback

from user view history log check if user has clicked the post after viewing. if the user has clicked, then it is a positive feedback. if the user has viewed but didn't click, it's a negative feedback.

content filtering

...

embeddings

also called feature matrices (columns and rows).

you can theoretically take feature matrices to train the model, but too many features make it really time consuming to train.

therefore you make embeddings. embeddings group things semantically os that they improve training performance.

overly sparse features give really useless embeddings, because there is too little information to capture (0.0001% of all data has information)

user embeddings vs post embeddings (if you do both separately, they are too sparse and therefore does not capture enough meaning in the embedding)

the solution used was to embed users as an embedding of user post interactions (user = function of that user's post interactions)

this embedding model worked much better

feedback loop problem

  1. train model from raw user interaction
  2. give recommendations to user
  3. collect new user interaction
  4. user interactions that are caused by model recommendations will bias the next recommendation toward the ones the previous model have given

previous model bias

when a model is trained and used for user recommendations

solution

therefore you only train the new model on user raw data that was not expected by the model

problems with a sole ranking model

recommendations are too generalized (everybody gets the same recommendation)

solution

  1. candidate generation model
    • two phases
    • phase 1 - co-occurrence matrix to get post embeddings
    • phase 2 - candidate generation training pgaes
    • interaction history -> linear combination of history -> user vector -> nearest neighbor search -> candidates
  2. ranking model

problems with the increasing size of the post pool

more and more posts over time (problems with every social network)

aligning embeddings trained in batches

it would be nice to have a post pool of days 1-9

they want to align the multiple embedding models

this is called a "Orthogonal Procrustes Problem"

  1. get a pair of corresponding points
  2. map matrix a to matrix b

posts have an embedding over time that overlap: