Friday, April 30, 2010

Facebook's moves and personalized advertising

It has been widely reported that Facebook has launched Open Graph and Implicit Personalization, which, among other things, give Facebook information about people's movements and what they like on the web. The service was launched opt-out and, even if you do want to opt-out, requires diving into confusing privacy settings to opt-out.

The prolific discussion of this elsewhere has thoroughly exhausted most of what there is to say, but I wanted to emphasize two things about this launch.

First, the fact that Facebook is so aggressively seeking this "treasure trove" of browsing behavior data may signal a major shift in its revenue model. Prior claims aside, the company now may be realizing that it is hard to target advertising to profile information and status updates because there is no commercial intent. This new source of data -- the websites people are visiting and what they like -- contains the purchase intent that Facebook so desperately needs.

Second, as Steve Lohr at the NYT reported today, other companies considering heavy use of personalized advertising have been waiting for someone else to take the first step and bear the brunt of any privacy-related backlash. It will be interesting to see if Facebook's latest move -- which probably is aggressive enough to count as the first step everyone was waiting for -- will result in a backlash or will open the floodgates.

Wednesday, April 28, 2010

Google launches web search similarities

In another aggressive move toward personalized search, Google has added similarities to some web search results to the bottom of the page, as Mike Melanson at ReadWriteWeb reports.

For example, if I search for [engadget], at the bottom of the page, I see:
Pages similar to: www.engadget.com
Gizmodo ... gizmodo.com
Ubergizmo ... ubergizmo.com
Wired ... wired.com
Lifehacker ... lifehacker.com
The similarity algorithm also appears to have changed, with noticeably better quality in the spot checks I did.

This is a fairly big deal. Similarities based aggregate behavior and targeted to the immediate context is a big step toward personalization. The next step is to tie the data to individual history and target similarities and recommendations both to the context and each person's past history.

Google has done a version of personalized search for some time, but the technique used mostly was based on biasing search results toward people's long-term interests. More recently, they also started boosting previously clicked search results to help support re-finding.

Google's latest move should let Google add fine-grained personalization based on current missions and short-term trends, which, in combination with their current search personalization, is likely to improve Google's ability to help people find what they need.

Update: Here is the announcement of the new feature on the Official Google Blog.

Sunday, April 25, 2010

Google News hybrid recommendations

Three Googlers published a paper, "Personalized News Recommendation Based on Click Behavior" (ACM), at the recent IUI 2010 conference that describes a hybrid recommender system combining user-based and content-based recommendations. This new hybrid recommender now appears to be deployed on Google News.

Some excerpts from the paper:
[The] previous Google News recommendation system was developed using a collaborative filtering method. It recommends news stories that were read by users with similar click history. This method has two major drawbacks ... First, the system cannot recommend stories that have not yet been read by other users ... Second ... news stories [that] are generally very popular ... are constantly recommended to most of the users, even for those users who never [are interested because] ... there are always enough clicks ... to make the recommendation.

A solution to these two problems would be to build profiles of user's genuine interests and use them to make news recommendations. The profiles ... filter out the stories that are not of interest ... [and recommend stories] even if [they have] not been clicked on by other users ... Based on a user's news reading history, the recommender predicts the topic categories of interest ... News articles in those categories are ranked higher in the candidate list.

On average, the hybrid method ... improves the CTR [of] the existing collaborative method by 30.9% ... [and increased] the frequency of website visits in the test group [by] 14.1%.
Hybrid recommenders are not that new. In the past, as in this paper, they usually were motivated by trying to deal with the sparsity and cold start problems that challenge collaborative filtering recommenders. Hybrid systems also have been used to deal with the so-called Harry Potter problem -- recommendations that focus too much on popular items -- by constraining the collaborative recommendations to the interests expressed in the profile, though that often can be better dealt with by tuning a collaborative recommender to discourage correlations between unpopular and popular items.

One thing that is surprising in this paper is the use of high-level topics rather than fine-grained topics. I would think that you would be better off getting as specific as possible on the profile, then branching out to related topics. The paper briefly addresses this, arguing that "specializing the user profile may limit the recommendations to news that the user already knew", but that seems like it would only happen if you rather foolishly only used read topics rather than including topics that appear to be related to read topics.

By the way, when you have as much data as Google should have, it is not at all clear you want to fall back on a content approach like they did in this paper here. Yehuda Koren, for example, has convincingly argued that, when you have big data, latent factor models extract these content-based relationships automatically in much more detail and much more accurately than you could hope to do with a manually constructed model.

Finally, I cannot quite let this one go by without mentioning that Findory was a hybrid news recommender, launched in January 2004, that dealt with the cold start and sparsity problems of a collaborative recommender, the same problems the Google News team apparently is still struggling with six years later. Findory is not mentioned in this paper in the related work, but I know the Google team is quite aware of Findory.