Overcoming data sparsity
industrial collaborators: Unilever
academic collaborators: ESGI64
initiated : 2008/06/05
last updated: 2010/05/25

selected page:

Unilever is currently designing and testing recommendation algorithms that would make recommendations about products to online customers given the customer ID and the current content of their basket. Unilever collected a large amount of purchasing data that demonstrates that most of the items (around 80%) are purchased infrequently and account for 20% of the data while frequently purchased items account for 80% of the data. Therefore, the data is sparse, skewed and demonstrates a long tail. Attempts to incorporate the data from the long tail, so far have proved difficult and current Unilever recommendation systems do not incorporate the information about infrequently purchased items. At the same time, these items are more indicative of customers' preferences and Unilever would like to make recommendations from/about these items, i.e. give a rank ordering of available products in real time. Study Group suggested to use the approach of bipartite networks to construct a similarity matrix that would allow the recommendation scores for different products to be computed. Given a current basket and a customer ID, this approach gives recommendation scores for each available item and recommends the item with the highest score that is not already in the basket. The similarity matrix can be computed offline, while recommendation score calculations can be performed live.

Problem presented by
Benjamin Dias, Unilever UK

Study Group contributors
Rosemary Apple (University of Edinburgh)
Chris Cawthorn (University of Cambridge)
Kwan Yee Chan (University of Manchester)
Oded Lachish (Univerisity of Warwick)
Achim Nonnenmacker (University of Edinburgh)
Mason A. Porter (University of Oxford)
Sylvain Reboux (University of Nottingham)


related resources:
» Overcoming data sparsity
  Study group report 2008: overcoming data sparsity (Unilever)
 
other projects:
[Find other Information and Communication Technology projects]
[Find other Retail Project]
[Find other Study Group projects]