Tuesday, December 27, 2011

Quick links

Some of what has caught my attention recently:
  • Security guru Bruce Schneier predicts "smart phones are going to become the primary platform of attack for cybercriminals" soon ([1])

  • If, next, Amazon does a smartphone, I hope it is WiFi-based, like Steve Jobs originally wanted to do with the iPhone ([1] [2] [3])

  • iPhone owners love Siri despite its flaws ([1])

  • Valve, makers of Steam, talks about their pricing experiments: "Without making announcements, we varied the price ... pricing was perfectly elastic ... Then we did this different experiment where we did a sale ... a highly promoted event ... a 75 percent price reduction ... gross revenue [should] remain constant. Instead what we saw was our gross revenue increased by a factor of 40. Not 40 percent, but a factor of 40 ... completely not predicted by our previous experience with silent price variation." [[1]]

  • An idea whose time has come, profiling code based not on the execution time required, but the power consumed ([1])

  • Grumpy about work and dreaming about doing a startup? Some food for thought for those romanticizing startup life. ([1] [2])

  • Yahoo discovers toolbar data (the urls people click on and browse to) helps a lot for web crawling ([1])

  • Google Personalized Search adds explanations. Explanations not only add credibility to recommendations, but also make people more accepting of recommendations they don't like. ([1])

  • "Until now, many education studies have been based on populations of a few dozen students. Online technology can capture every click: what students watched more than once, where they paused, what mistakes they made ... [massive] data ... for understanding the learning process and figuring out which strategies really serve students best." ([1])

  • Andrew Ng's machine learning class at Stanford was excellent; I highly recommend it. If you missed it the first time, it is being offered again (for free again) next quarter. ([1])

  • Microsoft giving up on its version of Hadoop? Surprising. ([1])

  • The NYT did a fun experiment crowdsourcing predictions. The results are worth a look. ([1] [2])

  • Web browsers (Firefox and Chrome) will be a gaming platform soon ([1] [2])

Wednesday, November 30, 2011

Browsing behavior for web crawling

A recent paper out of Yahoo, "Discovering URLs through User Feedback" (ACM), describes the value from using what pages people browse to and click on (which is in Yahoo's toolbar logs) to inform their web crawler about new pages to crawl and index.

From the paper:
Major commercial search engines provide a toolbar software that can be deployed on users' Web browsers. These toolbars provide additional functionality to users, such as quick search option, shortcuts to popular sites, and malware detection. However, from the perspective of the search engine companies, their main use is on branding and collecting marketing statistics. A typical toolbar tracks some of the actions that the user performs on the browser (e.g., typing a URL, clicking on a link) and reports these actions to the search engine, where they are stored in a log file.

A Web crawler continuously discovers new URLs and fetches their content ... to build an inverted index to serve [search] queries. Even though the basic mechanism of a crawler is simple, crawling efficiently and eff ectively is a difficult problem ... The crawler not only has to continuously enlarge its repository by expanding its frontier, but also needs to refresh previously fetched pages to incorporate in its index the changes on those pages. In practice, crawlers prioritize the pages to be fetched, taking into account various constraints: available network bandwidth, peak processing capacity of the backend system, and politeness constraints of Web servers ... The delay to discover a Web page can be quite long after its creation and some Web sites may be only partially crawled. Another important challenge is the discovery of hidden Web content ... often ... backed by a database.

Our work is the first to evaluate the benefits of using the URLs collected from a Web browser toolbar as a form of user feedback to the crawling process .... On average, URLs accessed by the users are more important than those found ... [by] the crawler ... The crawler has a significant delay in discovering URLs that are first accessed by the users ... Finally, we [show] that URL discovery via toolbar [has a] positive impact on search result quality, especially for queries seeking recently created content and tail content.
The paper goes on to quantify the surprisingly large number of URLs found by the toolbar that are useful, not private, and not excluded by robots.txt. Importantly, a lot of these are deep web pages, only visible by doing a query on a database, and hard to ferret out of that database any way but looking at the pages people actually look at.

Also interesting are the metrics on pages the toolbar data finds first. People often send links to new web pages by e-mail or text message. Eventually, those links might appear on the web, but eventually can be a long time, and many of the urls found first in the toolbar data ("more than 60%") are found way before the crawler manages to discover them ("at least 90 days earlier than the crawler").

Great paper out of Yahoo Research and a great example of how useful behavior data can be. It is using big data to help people help others find what they found.

Monday, November 28, 2011

What mobile location data looks like to Google

A recent paper out of Google, "Extracting Patterns From Location History" (PDF), is interesting not only for confirming that Google is studying using location data from mobile devices for a variety of purposes, but also for the description of the data they can get.

From the paper:
Google Latitude periodically sends his location to a server which shares it with his registered friends.

A user's location history can be used to provide several useful services. We can cluster the points to determine where he frequents and how much time he spends at each place. We can determine the common routes the user drives on, for instance, his daily commute to work. This analysis can be used to provide useful services to the user. For instance, one can use real-time traffic services to alert the user when there is traffic on the route he is expected to take and suggest an alternate route.

Much previous work assumes clean location data sampled at very high frequency ... [such as] one GPS reading per second. This is impractical with today's mobile devices due to battery usage ... [Inferring] locations by listening to RF-emissions from known wi-fi access points ... requires less power than GPS ... Real-world data ... [also] often has missing and noisy data.

17% of our data points are from GPS and these have an accuracy in the 10 meter range. Points derived from wifi signatures have an accuracy in the 100 meter range and represent 57% of our data. The remaining 26% of our points are derived from cell tower triangulation and these have an accuracy in the 1000 meter range.
The paper goes on to describe how they clean the data and pin noisy location trails to roads. But the most interesting tidbit for me was how few of their data points come from GPS and how much they have to rely on less accurate cell tower and WiFi hotspot triangulation.

A lot of people have assumed mobile devices would provide nice trails of accurate and frequently sampled locations. But, if the Googlers' data is typical, it sounds like location data from mobile devices is going to be very noisy and very sparse for a long time.

Tuesday, November 15, 2011

Even more quick links

Even more of what has caught my attention recently:
  • Spooky but cool research: "Electrical pulses to the brain and muscles ... activate and deactivate the insect's flying mechanism, causing it to take off and land ... Stimulating certain muscles behind the wings ... cause the beetle to turn left or right on command." ([1])

  • Good rant: "Our hands feel things, and our hands manipulate things. Why aim for anything less than a dynamic medium that we can see, feel, and manipulate? ... Pictures Under Glass is old news ... Do you seriously think the Future Of Interaction should be a single finger?" ([1])

  • Googler absolutely shreds traditional Q&A and argues that the important thing is getting a good product, not implementing a bad product correctly to spec. Long talk, if you're short on time, the talk starts at 6:00, meat of the talk starts at 13:00, and the don't miss parts of the talk are at 17:00 and 21:00. ([1])

  • "There has been very little demand for Chromebooks since Acer and Samsung launched their versions back in June. The former company reportedly only sold 5,000 units by the end of July, and the latter Samsung was said to have sold even less than that in the same timeframe." ([1])

  • With the price change to offer Kindles at $79, Amazon is now selling them below cost ([1])

  • Personalization applied to education, using the "combined data power of millions of students to provide uniquely personalized learning to each." ([1] [2] [3] [4] [5] [6])

  • It is common to use human intuition to choose algorithms and tune parameters on algorithms, but this is the first I've ever heard of using games to crowdsource algorithm design and tuning ([1])

  • Great slides from a Recsys tutorial by Daniel Tunkelang, really captures the importance of UX and HCIR in building recommendation and personalization features ([1])

  • Bing finally figured out that when judges disagree with clicks, clicks are probably right ([1])

  • Easy to forget, but the vast majority of US mobile devices still are dumbphones ([1])

  • Finally, finally, Microsoft produces a decent mobile phone ([1])

  • Who needs a touch screen when any surface can be a touch interface? ([1])

  • Impressive augmented reality research demo using Microsoft Kinect technology ([1])

  • Very impressive new technique for adding objects to photographs, reproducing lighting, shadows, and reflections, and requiring just a few corrections and hints from a human about the geometry of the room. About as magical as the new technology for reversing camera shake to restore out-of-focus pictures to focus. ([1] [2])

  • Isolation isn't complete in the cloud -- your neighbors can hurt you by hammering the disk or network -- and some startups have decided to go old school back to owning the hardware ([1] [2])

  • "The one thing that Siri cannot do, apparently, is converse with Scottish people." ([1])

  • Amazon grew from under 25,000 employees to over 50,000 in two years ([1])

  • Google Chrome is pushing Mozilla into bed with Microsoft? Really? ([1])

  • Is advice Steve Jobs gave to Larry Page the reason Google is killing so many products lately? ([1])

  • Why does almost everyone use the default software settings? Research says it appears to be a combination of minimizing effort, an assumption of implied endorsement, and (bizarrely) loss aversion. ([1])

Friday, October 14, 2011

More quick links

More of what has caught my attention recently:
  • The first Kindle was so ugly because Jeff Bezos so loved his BlackBerry ([1])

  • "Sometimes it takes Bad Steve to bring products to market. Real artists ship." ([1])

  • "The Mac sleep indicator is timed to glow at the average breathing rate of an adult: 12 breaths per minute." Beautiful example of attention to design. ([1])

  • "A one-star increase in Yelp rating leads to a 5-9 percent increase in revenue" ([1])

  • Facebook games, rather than try to be fun, try to be addictive. They feed on the compulsive until they give up their cash. The most addicted spend $10k in one game in less than a year. ([1])

  • "The Like and Recommend buttons Facebook provides to other Web sites send information about your visit back to Facebook, even if you don't click on them ... Facebook can find out an awful lot about what you do online." ([1])

  • A new automated attack on CAPTCHAs that can break them in an average of three tries. Even so, paying people to break CAPTCHAs is so cheap that that is probably what the bad guys will continue to do. ([1] [2])

  • Online backup and storage is now basically free. I expect this to be part of the operating systems soon (nearly is in Windows and Ubuntu) and all profits in online backup to drop to zero. ([1])

  • Prices for Netflix acquiring their streaming content appear to be going way up. Netflix just paid $1B over eight years for some CW network shows, and Starz rejected $300M/year -- a x10 increase -- for their movies. ([1] [2])

  • Someone spun up a truly massive cluster on Amazon EC2, "30,472 cores, 26.7TB of RAM and 2PB (petabytes) of disk space." ([1])

  • "Google's brain [is] like a baby's, an omnivorous sponge that [is] always getting smarter from the information it [soaks] up." ([1])

Monday, September 19, 2011

Quick links

Some of what has caught my attention recently:
  • "60 percent of Netflix views are a result of Netflix's personalized recommendations" and "35 percent of [Amazon] product sales result from recommendations" ([1] [2])

  • When doing personalization and recommendations, implicit ratings (like clicks or purchases) are much less work and turn out to be highly correlated to what people would say their preferences are if you did ask ([1])

  • Good defaults are important. 95% won't change the default configuration even in cases where they clearly should. ([1])

  • MSR says 68% of mobile local searches occur while people are actually in motion, usually in a car or bus. Most are looking for the place they want to go, usually a restaurant. ([1])

  • Google paper on Tenzing, a SQL layer on top of MapReduce that appears similar in functionality to Microsoft's Scope or Michael Stonebraker's Vertica. Most interesting part is the performance optimizations. ([1])

  • Googler Luiz Barroso talks data centers, including giving no love to using flash storage and talking about upcoming networking tech that might change the game. ([1] [2])

  • High quality workers on MTurk are much cheaper than they should be ([1])

  • Most newspapers should focus on being the definitive source for local news and the primary channel to get to small local advertisers ([1] [2])

  • Text messaging charges are unsustainable. Only question is when and how they break. ([1])

  • "If you want to create an educational game focus on building a great game in the first place and then add your educational content to it. If the game does not make me want to come back and play another round to beat my high-score or crack the riddle, your educational content can be as brilliant as it can be. No one will care." ([1])

  • A few claims that it is not competitor's failures, but Apple's skillful dominance of supply chains, that prevents Apple's competitors from successfully copying Apple products. I'm not convinced, but worth reading nonetheless. ([1] [2] [3])

  • Surprising amount of detail about the current state of Amazon's supply chain in some theses out of MIT. Long reads, but good reads. ([1])

  • If you want to do e-commerce in a place like India, you have to build out your own delivery service. ([1])

  • Like desktop search in 2005, Dropbox and other cloud storage products exist because Microsoft's product is broken. Microsoft made desktop search go away in 2006 by launching desktop search that works, and it will make the cloud storage opportunity go away by launching a cloud drive that works. ([1] [2] [3])

  • Just like in 2005, merging two failing businesses doesn't make a working business. Getting AOL all over you isn't going to fix you, Yahoo. ([1] [2])

  • Good rant on how noreply@ e-mail addresses are bad customer service. And then the opposite point of view from Google's Sergey Brin. ([1] [2])

  • Google founder Sergey Brin proposed taking Google's entire marketing budget and allocating it "to inoculate Chechen refugees against cholera" ([1])

  • Brilliant XKCD comic on passwords and how websites should ask people to pick passwords ([1])

Wednesday, September 07, 2011

Blending machines and humans to get very high accuracy

A paper by six Googlers from the recent KDD 2011 conference, "Detecting Adversarial Advertisements in the Wild" (PDF) is a broadly useful example of how to succeed at tasks requiring very high accuracy using a combination of many different machine learning algorithms, high quality human experts, and lower quality human judges.

Let's start with an excerpt from the paper:
A small number of adversarial advertisers may seek to profit by attempting to promote low quality or untrustworthy content via online advertising systems .... [For example, some] attempt to sell counterfeit or otherwise fraudulent goods ... [or] direct users to landing pages where they might unwittingly download malware.

Unlike many data-mining tasks in which the cost of false positives (FP's) and false negatives (FN's) may be traded off, in this setting both false positives and false negatives carry extremely high misclassification cost ... [and] must be driven to zero, even for difficult edge cases.

[We present a] system currently deployed at Google for detecting and blocking adversial advertisements .... At a high level, our system may be viewed as an ensemble composed of many large-scale component models .... Our automated ... methods include a variety of ... classifiers ... [including] a single, coarse model ... [to] filter out .. the vast majority of easy, good ads ... [and] a set of finely-grained models [trained] to detect each of [the] more difficult classes.

Human experts ... help detect evolving adversarial advertisements ... [through] margin-based uncertainty sampling ... [often] requiring only a few dozen hand-labeled examples ... for rapid development of new models .... Expert users [also] search for positive examples guided by their intuition ... [using a custom] tool ... [and they have] surprised us ... [by] developing hand-crafted, rule-based models with extremely high precision.

Because [many] models do not adapt over time, we have developed automated monitoring of the effectiveness of each ... model; models that cease to be effective are removed .... We regularly evaluate the [quality] of our [human experts] ... both to access the performance of ... raters and measure our confidence in these assessments ... [We also use] an approach similar to crowd-sourcing ... [to] calibrate our understanding of real user perception and ensure that our system continues to protect the interest of actual users.
I love this approach, blending experts and the human intuition of experts to help guide, assist, and correct algorithms running over big data. These Googlers used an ensemble of classifiers, trained by experts that focused on labels of the edge cases, and ran them over features extracted from a massive data set of advertisements. They then built custom tools to make it easy for experts to search over the ads, follow their intuition, dig in deep, and fix the hardest cases the classifiers missed. Because the bad guys never quit, the Googlers not only constantly add new models and rules, but also constantly evaluate existing rules, models, and the human experts to make sure they are still useful. Excellent.

I think the techniques described here are applicable well beyond detecting naughty advertisers. For example, I suspect a similar technique could be applied to mobile advertising, a hard problem where limited screen space and attention makes relevance critical, but we usually have very little data on each user's interests, each user's intent, and each advertiser. Combining human experts with machines like these Googlers have done could be particularly useful in bootstrapping and overcoming sparse and noisy data, two problems that make it so difficult for startups to succeed on problems like mobile advertising.

Tuesday, July 19, 2011

Quick links

Some of what has caught my attention recently:
  • Netflix may have been forced to change its pricing by the movie studios. It appears the studios may have made streaming more expensive for Netflix and, in particular, too costly to keep giving free access to DVD subscribers who rarely stream. ([1] [2] [3])

  • Really fun idea for communication between devices in the same room, without using radio waves, by using imperceptible fluctuations in the ambient lighting. ([1])

  • Games are big on mobile devices ([1] [2])

  • "Customers have a bad a taste in their mouths when it comes to Microsoft's mobile products, and few are willing to give them a try again." Ouch, that's going to be expensive to fix. ([1])

  • Microsoft's traditional strategy of owning the software on most PC-like devices may not be doing well in mobile, but they're stomping in consoles ([1]). On a related note, Microsoft now claims their effort on search is less about advertising revenue and more about improving interfaces on PC-like devices. ([2])

  • Many people have vulnerable computers and passwords. Why aren't more of them hacked? Maybe it just isn't worth it to hackers, just too hard to make money given the effort required. ([1])

  • In 2010, badges in Google Reader is an April Fools joke. In 2011, badges in Google News is an exciting new feature. ([1]).

  • Good (and free) book chapter by a couple Googlers summarizing the technology behind indexing the Web ([1])

  • Most people dread when their companies ask them every year to set performance goals because it is impossible to do well and can impact raises the next year. Google's solution? Don't do that. Instead, set lightweight goals more frequently and expect people to not make some of their goals. ([1] [2])

  • 60% of business PCs are still running WinXP. Maybe this says that businesses are so fearful of changing anything that upstarts like Google are going to have an uphill battle getting people to switch to ChromeOS. Or maybe this says businesses consider it so painful to upgrade Microsoft software and train their people on all the changes that, when they do bite the bullet and upgrade, they might as well switch to something different like ChromeOS. ([1])

  • Fun interview with Amazon's first employee, Shel Kaphan ([1])

  • Thought-provoking speculation on the future of health care. Could be summarized as using big data, remote monitoring, and AI to do a lot of the work. ([1])

  • Unusually detailed slides on Twitter's architecture. Really surprising that they just use mysql in a very simple way and didn't even partition at first. ([1])

  • Impressive demo, I didn't know these were possible so easily and fluidly using just SVG and Javascript ([1] [2])

Monday, July 11, 2011

Google and suggesting friends

A timely paper out of Google at the recent ICML 2011 conference, "Suggesting (More) Friends Using the Implicit Social Graph" (PDF), not only describes the technology behind GMail's fun "Don't forget Bob!" and "Got the right Bob?" features, but also may be part of the friend suggestions in Google+ Circles.

An excerpt from the paper:
We use the implicit social graph to identify clusters of contacts who form groups that are meaningful and useful to each user.

The Google Mail implicit social graph is composed of billions of distinct nodes, where each node is an email address. Edges are formed by the sending and receiving of email messages ... A message sent from a user to a group of several contacts ... [is] a single edge ... [of] a directed hypergraph. We call the hypergraph composed of all the edges leading into or out of a single user node that user's egocentric network.

The weight of an edge is determined by the recency and frequency of email interactions .... Interactions that the user initiates are [considered] more significant .... We are actively working on incorporating other signals of importance, such as the percentage of emails from a contact that the user chooses to read.

"Don't forget Bob" ... [suggests] recipients that the user may wish to add to the email .... The results ... are very good - the ratio between the number of accepted suggestions and the number of times a suggestion was shown is above 0.8. Moreover, this precision comes at a good coverage ... more than half of email messages.

"Got the wrong Bob" ... [detects] inclusion of contacts in a message who are unlikely to be related to the other recipients .... Almost 70% of the time [it is shown] ... users accept both suggestions, deleting the wrong Bob and adding the correct one.
I like the idea of using e-mail, mobile, and messaging contacts as an implicit social network. One problem has always been that the implicit social network can be noisy in embarrassing ways. As this paper discusses, using it only for suggesting friends is forgiving and low-risk while still being quite helpful. Another possible application might be to make it easier to share content with people who might be interested.

For more on what Google does with how you use e-mail to make useful features, you might also be interested in another Google paper, "The Learning Behind Gmail Priority Inbox" (PDF).

For more on implicit social networks using e-mail contacts, please see my 2008 post, "E-mail as the social network".

Thursday, June 09, 2011

Quick links

Some of what has caught my attention recently:
  • Oldest example I could find of the "PC is dead" in the press, a New York Times article from 1992. If people keep making this prediction for a few more decades, eventually it might be right. ([1])

  • Amazon CEO Jeff Bezos says to innovate, you have to try many things, fail but keep trying, and be "willing to be misunderstood for long periods of time". ([1])

  • Median tenure at Amazon and Facebook is a year or less (in part due to their massive recent hiring). Also, most people at Facebook have never worked anywhere other than Facebook. ([1])

  • Spooky research out of UW CS and Google that crowdsources surveillance, finding all the Flickr photos from an big event like a concert that happen to include a specific person (no matter at what angle or from what location the crowd of people at the event took the pictures). ([1])

  • You can scan someone's fingerprints from 6 feet away and copy their keys from 200 feet away. ([1] [2])

  • Pretty impressive valuations incubator Y Combinator is getting on its startups: "The combined value of the top 21 companies is $4.7 billion."([1])

  • But even even for some of the more attractive small startups to acquire, those out of Y Combinator, odds of acquisition still are only about 8%, and most of those will be relatively low valuation talent acquisitions. Sometimes it can seem like everyone is getting bought, but it is only a fortunate few who have the right combination of product, team, timing, luck, and network.([1])

  • Someone going solidly for the dumbphone market, which is by far the biggest market still, with a snazzy but simple and mostly dumb phone. That's smart. ([1] [2])

  • Google Scribe makes suggestions for what you are going to type next when you are writing documents. Try starting with "All work and" ([1]).

  • When I started my blog and called it "Geeking with Greg", the word "geek" still had pretty negative connotations, especially in the mainstream. A decade later, things have changed. ([1])

  • Not surprising people don't use privacy tools since the payoff is abstract and the tools require work for the average user to understand and use. What surprises me more is that more people don't use advertising blocking tools like AdBlock. ([1])

  • The sad story of why Google never launched GDrive. ([1])

  • Carriers are going to be upset about Apple's plans to disrupt text messaging. Those overpriced plans are a big business for carriers. ([1])

  • It would be great if Skype acquisition was part of a plan to disrupt the mobile industry by launching a mobile phone that always picks the lowest cost data network (including free WiFi networks) available. Consumers would love that; it could lower their monthly bills by an order of magnitude. ([1] [2])

  • Social data is of limited use in web search because there isn't much data from your friends. Moreover, the best information about what is a good website for you almost certainly comes from people like you who you might not even know, not from the divergent tastes of your small group of friends. As Chris Anderson (author of The Long Tail) said, "No matter who you are, someone you don't know has found the coolest stuff." ([1] [2])

  • Customization (aka active personalization) is too much work. Most people won't do it. If you optimize for the early adopter tinkerer geeks who love twiddling knobs, you're designing a product that the mainstream will never use. ([1])

  • If you launch a feature that just makes your product more complicated and confusing to most customers, you would have been better off doing nothing at all. Success is not launching things, but launching things that help customers. ([1])

  • Google News shifts away from clustering and toward personalization. ([1] [2])

  • Crowdsourcing often works better when unpaid ([1])

  • Eli Pariser is still wrong. ([1])

Monday, June 06, 2011

Continuous profiling at Google

"Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers" (PDF) has some fascinating details on how Google does profiling and looks for performance problems.

From the paper:
GWP collects daily profiles from several thousand applications running on thousands of servers .... At any moment, profiling occurs only on a small subset of all machines in the fleet, and event-based sampling is used at the machine level .... The system has been actively profiling nearly all machines at Google for several years.

Application owners won't tolerate latency degradations of more than a few percent .... We measure the event-based profiling overhead ... to ensure the overhead is always less than a few percent. The aggregated profiling overhead is negligible -- less than 0.01 percent.

GWP profiles revealed that the zlib library accounted for nearly 5 percent of all CPU cycles consumed ... [which] motivated an effort to ... evaluate compression alternatives ... Given the Google fleet's scale, a single percent improvement on a core routine could potentially save significant money per year. Unsurprisingly, the new informal metric, "dollar amount per performance change," has become popular among Google engineers.

GWP profiles provide performance insights for cloud applications. Users can see how cloud applications are actually consuming machine resources and how the picture evolves over time ... Infrastructure teams can see the big picture of how their software stacks are being used ... Always-on profiling ... collects a representative sample of ... [performance] over time. Application developers often are surprised ... when browsing GWP results ... [and find problems] they couldn't have easily located without the aggregated GWP results.

Although application developers already mapped major applications to their best [hardware] through manual assignment, we've measured 10 to 15 percent potential improvements in most cases. Similarly ... GWP data ... [can] identify how to colocate multiple applications on a single machine [optimally].
One thing I love about this work is how measurement provided visibility and motivated people. Just by making it easy for everyone to see how much money could be saved by making code changes, engineers started aggressively going after high value optimizations and measuring themselves on "dollar amount per performance change".

For more color on some of the impressive performance work done at Google, please see my earlier post, "Jeff Dean keynote at WSDM 2009".

Wednesday, May 18, 2011

Eli Pariser is wrong

In recent interviews and in his new book, "The Filter Bubble", Eli Pariser claims that personalization limits serendipity and discovery.

For example, in one interview, Eli says, "Basically, instead of doing what great media does, which is push us out of our comfort zone at times and show us things that we wouldn't expect to like, wouldn't expect to want to see, [personalization is] showing us sort of this very narrowly constructed zone of what is most relevant to you." In another, he claims, personalization creates a "distorted view of the world. Hearing your own views and ideas reflected back is comfortable, but it can lead to really bad decisions--you need to see the whole picture to make good decisions."

Eli has a fundamental misunderstanding of what personalization is, leading him to the wrong conclusion. The goal of personalization and recommendations is discovery. Recommendations help people find things they would have difficulty finding on their own.

If you know about something already, you use search to find it. If you don't know something exists, you can't search for it. And that is where recommendations and personalization come in. Recommendations and personalization enhance serendipity by surfacing useful things you might not know about.

That is the goal of Amazon's product recommendations, to help you discover things you did not know about in Amazon's store. It is like a knowledgeable clerk who walks you through the store, highlighting things you didn't know about, helping you find new things you might enjoy. Recommendations enhance discovery and provide serendipity.

It was also the goal of Findory's news recommendations. Findory explicitly sought out news you would not know about, news from a variety of viewpoints. In fact, one of the most common customer service complaints at Findory was that there was too much diversity of views, that people wanted to eliminate viewpoints that they disagreed with, viewpoints that pushed them out of their comfort zone.

Eli's confusion about personalization comes from a misunderstanding of its purpose. He talks about personalization as narrowing and filtering. But that is not what personalization does. Personalization seeks to enhance discovery, to help you find novel and interesting things. It does not seek to just show you the same things you could have found on your own.

Eli's proposed solution is more control. But, as Eli himself says, control is part of the problem: "People have always sought [out] news that fits their own views." Personalization and recommendations work to expand this bubble that people try to put themselves it, to help them see news they would not look at on their own.

Recommendations and personalization exist to enhance discovery. They improve serendipity. If you just want people to find things they already know about, use search or let them filter things themselves. If you want people to discover new things, use recommendations and personalization.

Update: Eli Pariser says he will respond to my critique. I will link to it when he does.

Friday, May 13, 2011

Taking small steps toward personalized search

Some very useful lessons in this work in a recent WSDM 2011 conference, "Personalizing Web Search using Long Term Browsing History" (PDF).

First, they focused on a simple and low risk approach to personalization, reordering results below the first few. There are a lot of what are essentially ties in the ranking of results after the first 1-2 results; the ranker cannot tell the difference between the results and is ordering them arbitrarily. Targeting the results the ranker cannot differentiate is not only low risk, but more likely to yield easy improvements.

Second, they did a large scale online evaluation of their personalization approach using click data as judgement of quality. That's pretty rare but important, especially for personalized search where some random offline human judge is unlikely to know the original searcher's intent.

Third, their goal was not to be perfect, but just help more often than hurt. And, in fact, that is what they did, with the best performing algorithm "improving 2.7 times more queries than it harms".

I think those are good lessons for others working on personalized search or even personalization in general. You can take baby steps toward personalization. You can start with minor reordering of pages. You can make low risk changes lower down to the page or only when the results are otherwise tied for quality. As you get more aggressive, with each step, you can verify that each step does more good than harm.

One thing I don't like about the paper is that they only investigated using long-term history. There is a lot of evidence (e.g. [1] [2]) that very recent history, your last couple searches and clicks, can be important, since they may show frustration in an attempt to satisfy some task. But otherwise great lessons in this work out of Microsoft Research.

Monday, May 09, 2011

Quick links

Some of what has caught my attention recently:
  • Apple captured "a remarkable 50% value share of estimated Q1/11 handset industry operating profits among the top 8 OEMs with only 4.9% global handset unit market share." ([1]). The iPhone generates 50% of Apple's revenue and even more of their profits. To a large extent, the company is the iPhone company. ([2] [3]) But, Gartner predicts iPhone market share will peak in 2011. ([4])

  • Researchers find bugs in payment systems, order free stuff from Buy.com and JR.com. Disturbing that, when they contacted Buy.com to report the problem, Buy.com's accounting systems had the invoice as fully paid even though they never received the cash. ([1])

  • Eric Schmidt says, "The story of innovation has not changed. It has always been a small team of people who have a new idea, typically not understood by people around them and their executives." ([1])

  • Netflix randomly kills machines in its cluster all the time, just to make sure Netflix won't go down when something real kills their machines. Best part, they call this "The Chaos Monkey". ([1] [2])

  • Hello, Amazon, could I borrow 1,250 of your computers for 8 hours? ([1])

  • Felix Salmon says, "Eventually ... ad-serving algorithms will stop being dumb things based on keyword searches, and will start being able to construct a much more well-rounded idea of who we are and what kind of advertising we're likely to be interested in. At that point ... they probably won't feel nearly as creepy or intrusive as they do now. But for the time being, a lot of people are going to continue to get freaked out by these ads, and are going to think that the answer is greater 'online privacy'. When I'm not really convinced that's the problem at all." ([1])

  • Not sure which part of this story I'm more amazed by, that Google offered $10B for Twitter or that Twitter rejected $10B as not enough. ([1])

  • Apple may be crowdsourcing maps using GPS trail data. GPS trails can also be used for local recommendations, route planning, personalized recommendations, and highly targeted deals, coupons, and ads. ([1] [2] [3])

  • Management reorg at Google. Looks like it knocks back the influence of the PMs to me, but your interpretation may differ. ([1] [2])

  • If you use Google Chrome and go to google.com, you're using SPDY to talk to Google's web servers, not HTTP. Aggressive of Google and very cool. ([1] [2])

  • Shopping search engines (like product and travel) should look for good deals in their databases and then help people find good deals ([1])

  • When Apple's MobileMe execs started talking about what the poorly reviewed MobileMe was really supposed to do, Steve Jobs demanded, "So why the f*** doesn't it do that?", then dismissed the executives in charge and appointed new MobileMe leaders. ([1] [2])

Friday, May 06, 2011

The value of Google Maps directions logs

Ooo, this one is important. A clever and very fun paper, "Hyper-Local, Direction-Based Ranking of Places" (PDF), will be presented at VLDB 2011 later this year by a few Googlers.

The core idea is that, when people ask for directions from A to B, it shows that people are interested in B, especially if they happen to be at or near A.

Now, certain very large search engines have massive logs of people asking for directions from A to B, hundreds of millions of people and billions of A to B queries. And, it appears this data may be as or more useful than user reviews of businesses and maybe GPS trails for local search ranking, recommending nearby places, and perhaps local and personalized deals and advertising.

From the paper:
A query that asks for directions from a location A to location B is taken to suggest that a user is interested in traveling to B and thus is a vote that location B is interesting. Such user-generated direction queries are particularly interesting because they are numerous and contain precise locations.

Direction queries [can] be exploited for ranking of places ... At least 20% of web queries have local intent ... [and mobile] may be twice as high.

[Our] study shows that driving direction logs can serve as a strong signal, on par with reviews, for place ranking ... These findings are important because driving direction logs are orders of magnitude more frequent than user reviews, which are expensive to obtain. Further, the logs provide near real-time evidence of changing sentiment ... and are available for broader types of locations.
What is really cool is that, not only is this data easier and cheaper to obtain than customer reviews, but also there is so much more of it that the ranking is more timely (if, for example, ownership changes or a place closes) and coverage much more complete.

I find it a little surprising that Google hasn't already heavily been using this data. In fact, the paper suggests that Google is only beginning to start using it. At the end of the paper, the authors write that they hope to investigate what types of queries benefit the most from this data and then look at personalizing the ranking based on each person's specific search and location history.

Monday, April 25, 2011

Resurgence of interest in personalized information

There has been a lot of news about personalization and recommendations of information in the last week.

Google News launched additional implicit personalization of news based on your clickstream, bringing it yet another step closer to the Findory of seven years ago.

Yahoo reversed an earlier policy to keep data only for 90 days, now upping it to three years, because tossing data away was hurting their efforts to improve relevance and personalize their search, news, and advertising.

Hank Nothhaft writes at Mashable that the Facebook News Feed really needs to be personalized implicitly based on what you consume and surface news from across the web. He says it should not only deliver "the best and most relevant news and blog posts on my favorite topics and interests, but it also recommends deals and product information, things to do and even media like videos and podcasts, all picked just for me" (which, if implemented as described, might also make Facebook News Feed similar to Findory).

Finally, Mark Milian at CNN points to all the efforts at newspapers and startups on personalized news. What strikes me about these is how few focus on advertising, which is the core business of newspapers, and improving the usefulness and revenue of ads on news sites. Former Google CEO Eric Schmidt had some useful thoughts on that some time ago that are still worth reading for those working on personalizing information today.

Wednesday, March 30, 2011

Latest reading

Here are a few of the articles that have caught my attention recently:
  • Google tries again in social, this time focusing on the common trend of having sharing buttons distributed across the Web. ([1])

  • Good thoughts on building robust websites. I particularly like the assumption that off and slow are perfectly valid states in underlying services and should be expected by all other services. ([1])

  • A Yahoo data center based on chicken coop designs (along with relaxing assumptions about maximum tolerable peak temperature) yields big energy savings. ([1])

  • Cool paper from Googlers coming out later this year at VLDB that talks about how useful direction searches (like on Google Maps) are for finding places of interest and places people might want to go. Similar to the GPS trail data work (using data from GPS-enabled cell phones), but using search logs instead. ([1])

  • I really like this vision for what Google should do for social, that they should go after personal search. This is closer to the idea of an external memory, Memex, and the promise of the languishing Google Desktop than what Facebook is today, but it is a problem many people have, want solved, and one that Google is better able to solve than anyone else (except maybe Microsoft). ([1])

  • Surprising results in this paper out of Yahoo Research showing queries that tend to be unique to particular demographic groups (like men/women, racial groups, age groups, etc.). Jump right to Table 3 (about halfway through the paper) to see it. ([1]

  • Over at blog@CACM, I outrageously claim that both netbooks and tablets are doomed. ([1] [2])

  • Randall Munroe (author of xkcd) is brilliant. Again. ([1] [2])

  • Sounds like YouTube wants to build millions of TV channels designed to match any mood, interest, or person. ([1])

Wednesday, March 09, 2011

Personal navigation and re-finding

Jaime Teevan, Dan Liebling, and Gayathri Geetha from Microsoft Research had a fun paper at WSDM 2011, "Understanding and Predicting Personal Navigation", that focuses on a simple, highly accurate, easy, and low risk approach to personalization, increasing the rank of a result that a person keeps clicking on.

The basic idea is noticing that people tend to use search engines instead of bookmarks, just searching again to re-find what they found in the past. But -- and this is the key insight -- not everyone uses the same query to bookmark the same page, so, for example, one person might use [lottery] to get to the Michigan lottery, another to get to the Illinois lottery, and only a minority use it to get to the top ranked result, lottery.com.

So, keeping track of what individual searchers want when they repeat queries, then giving each searcher back what they want is an easy form of personalization that can actually make a significant difference. Moreover, supporting this kind of re-finding is a baby step toward fully personalized search results (and requires the same first steps to build the underlying infrastructure to support it).

Some excerpts from the paper:
This paper presents an algorithm that predicts with very high accuracy which Web search result a user will click for one sixth of all Web queries. Prediction is done via a straightforward form of personalization that takes advantage of the fact that people often use search engines to re-find previously viewed resources.

Different people often use the same queries to navigate to different resources. This is true even for queries comprised of unambiguous company names or URLs and typically thought of as navigational.

For example, the reader of this paper may use a search engine to navigate to the WSDM 2011 homepage via the query [wsdm], while a person interested in country music in the Midwest may use the same query to navigate to the WSDM-FM radio station homepage. Others may ... issue it with an informational intent to learn more about Web Services Distributed Management .... [Likewise], on the surface it appears obvious that the query [real estate.com] is intended to navigate to the site http://www.realestate.com. However, for only five of the 23 times that query is used for personal navigation does the query lead to a click on the obvious target. Instead, it is much more likely to be used to navigate to http://realestate.msn.com or http://www.realtor.com.

Personal navigation presents a real opportunity for search engines to take a first step into safe, low-risk Web search personalization ... Here we look at how to capture the low-hanging fruit of personalizing results for repeat queries ... There is the potential to significantly benefit users with the identification of these queries, as the identified targets are more likely to be ranked low in the result list than typical clicked search results.
Table 4 in the paper definitely is worth a look. Note that, using a month of data, nearly 10% of queries are personal navigation queries that can be personalized with high accuracy. In addition, on another 5% of queries "when the general navigation queries trigger as personal navigation", "the prediction is over 20% more accurate than when predicted based on aggregate behavior alone." That's a big impact for such a simple step toward personalization, low-hanging fruit indeed.

Please see also my older posts, "Designing search for re-finding", "To personalize or not to personalize", and "People often repeat web searches", about papers by some of the same authors.

Thursday, February 17, 2011

What I have been reading lately

Here are a few of the articles that have caught my attention recently:
  • Googler and AI guru Peter Norvig on what artificial intelligence has been able to do and what was a surprise ([1])

  • Watson's performance in Jeopardy was impressive, but even Google and Bing get the right answers to Jeopardy questions more often than the average human ([1])

  • The Google Translate app is downright amazing. Almost a babel fish in my ear, just remarkable. ([1])

  • Kicking off a project to rewrite code is a really bad idea, but people want to do it all the time ([1] [2])

  • Excellent answer on why Dropbox succeeded ([1])

  • Dilbert on the morality of using web click data. Ouch! ([1])

  • Old but good Slate article on why we don't have ubiquitous free Wifi. Makes me sad. ([1])

  • Enjoyable rant against stealth mode startups ([1])

  • Internet investment funds and unprofitable companies going public? The dot-com bubble is back, baby! ([1] [2])
More in my shared items microblogging stream or, better yet, if you use Google Reader, search for me and follow my shared items there.

Wednesday, February 16, 2011

Comparing Google Megastore

A small pile of Googlers recently presented a paper, "Megastore: Providing Scalable, Highly Available Storage for Interactive Services" (PDF) that details the architecture of a major distributed database used at Google.

Megastore "has been widely deployed within Google for several years ... handles more than three billion write and 20 billion read transitions daily ... [and] stores nearly a petabyte ... across many datacenters."

Others have already summarized the paper ([1] [2]), so I'll focus on my reaction to it. What I found surprising about Megastore, especially when comparing to other large scale databases, is that it favors consistency over performance.

For consistency, Megastore provides "full ACID semantics within partitions", "supports two-phase commit across entity groups", guarantees that reads always see the last write, uses Paxos for confirming consensus among replicas, and uses distributed locks between "coordinator processes" as part of detecting failures. This is all unusually strong compared to the more casual eventual consistency offered by databases like Amazon's Dynamo, Yahoo's HBase, and Facebook's Cassandra.

The problem with providing Megastore's level of consistency is performance. The paper mostly describes Megastore's performance in sunny terms, but, when you look at the details, it does not compare favorably with other databases. Megastore has "average read latencies of tens of milliseconds" and "average write latencies of 100-400 milliseconds". In addition, Megastore has a limit of "a few writes per second per entity group" because higher write rates will cause conflicts, retries, and even worse performance.

By comparison, Facebook expects 4ms reads and 5ms writes on their database, so Megastore is an order of magnitude or two slower than what Facebook developers appear to be willing to tolerate.

Google application developers seem to find the latency to be a hassle as well. The paper says that Googlers find the latency "tolerable" but often have to "hide write latency from users" and "choose entity groups carefully".

This makes me wonder if Google has made the right tradeoff here. Is it really easier for application developers to deal with high latency all of the time than to almost always have low latency but have to worry more about consistency issues? Most large scale databases have made the choice the other way. Quite surprising.

Update: Googler DeWitt Clinton writes with a good point: "We build on top of Megastore when we require some of those characteristics (availability, consistency), and to Bigtable directly when we require low latency and high throughput instead. So it's up to the application to decide what tradeoffs to make, definitely not one-size-fits-all."

Thursday, February 03, 2011

Google, Bing, and web browsing data

I suppose I should comment, as everyone else on the planet has, on Google's claim that Bing is copying their results.

My reaction is mostly one of surprise. I am surprised that Google wants this issue discussed in the press. I am surprised that Google wants this aired in the court of public opinion.

Google is trying to draw a line on what use of behavior data is acceptable. Google clearly thinks they are on the right side of that line, and I do too, but I'm not sure the average searcher would agree. And that is why Google is playing a dangerous game here, one that could backfire on them badly.

Let's take a look at what Google Fellow Amit Singhal said:
This experiment confirms our suspicion that Bing is using some combination of:
  • Internet Explorer 8, which can send data to Microsoft via its Suggested Sites feature
  • the Bing Toolbar, which can send data via Microsoft’s Customer Experience Improvement Program
or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click.
Of course, what Amit does not mention here is that the widely installed Google Toolbar and the fairly popular Google Chrome web browser sends very similar data back to Google, data about every page someone visits and every click they make. Moreover, Google tracks almost every web search and every click after a web search made by web users around the world, since almost every web search is done on Google.

By raising this issue, Google very publicly is trying to draw a particular line on how toolbar and web browsing data should be used, and that may be a dangerous thing for Google to do. The average searcher, for example, may want that line drawn somewhere other than where Google might expect it to be drawn -- they may want it drawn at not using any toolbar/Chrome data for any purposes, or even not using any kind of behavior data at all -- and, if that line is drawn somewhere other than where Google wants it, Google could be hurt badly. That is why I am surprised that Google is coming out so strong here.

As for the particular issue of whether this is copying or not, I don't have much to say on that, but I think the most thought-provoking piece I have seen related to that question is John Langford's post, "User preferences for search engines". John argues that searchers own their browsing behavior and can reveal what they do across the web to whoever they want to. Whether you agree or not with that, it is worth reading John's thoughts on it and considering what you think might be the alternative.

Update: In the comments, Peter Kasting, a Google engineer working on Chrome, denies that Chrome sends clickstream data (the URLs you visit) back to Google. A check of the Google Chrome privacy policy ([1] [2]) appears to confirm that. I apologize for getting that wrong and have corrected the post above.

Update: A few days later, search industry watcher Danny Sullivan writes, "In short, Google doesn’t occupy any higher ground than Microsoft, from what I can see, when it comes to using data gathered from browser add-ons to improve its own services, including its search engine." Whether you think Danny is right or not, his article demonstrates that Google was wrong in thinking they would easily win if they took this fight to the press.

Tuesday, February 01, 2011

YouTube uses Amazon's recommendation algorithm

In a paper at the recent RecSys 2010 conference, "The YouTube Video Recommendation System" (ACM), eleven Googlers describe the system behind YouTube's recommendations and personalization in detail.

The most interesting disclosure in the paper is that YouTube has switched from their old recommendation algorithm based on random walks to a new one based on item-to-item collaborative filtering. Item-to-item collaborative filtering is the algorithm Amazon developed back in 1998. Over a decade later, it appears YouTube found a variation of Amazon's algorithm to be the best for their video recommendations.

Other notable tidbits in the paper are what the Googlers have to do to deal with noisy information (noisy video metadata and user preference data), the importance of freshness on videos (much like news), that they primarily used online measures of user satisfaction (like CTR and session length) when competing different recommendation algorithms against each other and tuning each algorithms, and the overall improvement (about x3 better) that recommendations got over simple features that just showed popular content.

Some excerpts from the paper:
Recommending interesting and personally relevant videos to [YouTube] users [is] a unique challenge: Videos as they are uploaded by users often have no or very poor metadata. The video corpus size is roughly on the same order of magnitude as the number of active users. Furthermore, videos on YouTube are mostly short form (under 10 minutes in length). User interactions are thus relatively short and noisy ... [unlike] Netflix or Amazon where renting a movie or purchasing an item are very clear declarations of intent. In addition, many of the interesting videos on YouTube have a short life cycle going from upload to viral in the order of days requiring constant freshness of recommendation.

To compute personalized recommendations we combine the related videos association rules with a user's personal activity on the site: This can include both videos that were watched (potentially beyond a certain threshold), as well as videos that were explicitly favorited, “liked”, rated, or added to playlists ... Recommendations ... [are the] related videos ... for each video ... [the user has watched or liked after they are] ranked by ... video quality ... user's unique taste and preferences ... [and filtered] to further increase diversity.

To evaluate recommendation quality we use a combination of different metrics. The primary metrics we consider include click through rate (CTR), long CTR (only counting clicks that led to watches of a substantial fraction of the video), session length, time until first long watch, and recommendation coverage (the fraction of logged in users with recommendations). We use these metrics to both track performance of the system at an ongoing basis as well as for evaluating system changes on live traffic.

Recommendations account for about 60% of all video clicks from the home page ... Co-visitation based recommendation performs at 207% of the baseline Most Viewed page ... [and more than 207% better than] Top Favorited and Top Rated [videos].
For more on the general topic of recommendations and personalization on YouTube, please see my 2009 post, "YouTube needs to entertain".

By the way, it would have been nice if the Googlers had cited the Amazon paper on item-to-item collaborative filtering. Seems like a rather silly mistake in an otherwise excellent paper.

Update: To be clear, this was not intended as an attack on Google in any way. Googlers built on previous work, as they should. What is notable here is that, despite another decade of research on recommender systems, despite all the work in the Netflix Prize, YouTube found that a variant of the old item-to-item collaborative filtering algorithm beat out all others for recommending YouTube videos. That is a very interesting result and one that validates the strengths of that old algorithm.

Wednesday, January 26, 2011

Latest reading

A few things that have caught my attention lately:
  • Google generates $24 per user, Yahoo $8, and Facebook $4. ([1])

  • Paul Graham's most important traits in founders. I'd say they even are in priority order. ([1])

  • Groupon competitors suddenly are everywhere. Not surprising, there's no technology barrier here, and it is easy to attract people if you offer 50% off at a popular store. But, I wonder if this booming fad is sustainable. Particularly concerning is that only 22% who use a Groupon offer return to the merchant, which, combined with the steep discounts, make this look expensive. ([1] [2] [3] [4])

  • Google is accused of copying a bunch of someone else's code. And this isn't the first time. Back in 2005, Google was accused of copying much of the code for Orkut. ([1] [2]).

  • Great, updated, and free book chapters on how people spam search engines. I particularly like the new discussion of spamming click data and social sites. ([1])

  • A very dirty little secret: most AOL subscribers don't realize they don't need to be paying, and the revenue from hoodwinking those people is a big part of AOL's profits. ([1])

  • Best article I've seen on why Google CEO Eric Schmidt is stepping down ([1])

  • With this latest move, Amazon closes the gap between their low level cloud services (microleasing of servers and storage) and the higher level offerings of Google App Engine and Microsoft Azure. ([1])

  • The trends of abandoning TV and embracing smartphones have been vastly overstated. Most Americans watch more TV than ever and use dumbphones. ([1] [2]).

  • A claim that Facebook has resorted to scammy malware-type advertisements in an effort to drive revenue. ([1]).

Thursday, January 06, 2011

The experiment infrastructure at Google

Papers that reveal details of Google's internal systems are always fun. At KDD 2010 a few months ago, four Googlers presented "Overlapping Experiment Infrastructure: More, Better, Faster Experimentation" (PDF).

The paper describes Google's tools for handling the challenging task of running many experiments simultaneously and includes tidbits on how they launch new features. Some excerpts:
We want to be able to experiment with as many ideas as possible .... It should be easy and quick to set up an experiment ... Metrics should be available quickly so that experiments can be evaluated quickly. Simple iterations should be quick ... The system should ... support ... gradually ramping up a change to all traffic in a controlled way.

[Our] solution is a multi-factorial system where ... a request would be in N simultaneous experiments ... [and] each experiment would modify a different parameter. Our main idea is to partition parameters into N subsets. Each subset is associated with a layer of experiments. Each request would be in at most N experiments simultaneously (one experiment per layer). Each experiment can only modify parameters associated with its layer (i.e., in that subset), and the same parameter cannot be associated with multiple layers ... [We] partition the parameters ... [by] different binaries ... [and] within a binary either by examination (i.e., understanding which parameters cannot be varied independently of one another) or by examining past experiments (i.e., empirically seeing which parameters were modified together in previous experiments).

Given this infrastructure, the process of evaluating and launching a typical feature might be something like: Implement the new feature in the appropriate binary (including code review, binary push, setting the default values, etc) ... Create a canary experiment (pushed via a data push) to ensure that the feature is working properly ... Create an experiment or set of experiments (pushed via a data push) to evaluate the feature ... Evaluate the metrics from the experiment. Depending on the results, additional iteration may be required, either by modifying or creating new experiments, or even potentially by adding new code to change the feature more fundamentally ... If the feature is deemed launchable, go through the launch process: create a new launch layer and launch layer experiment, gradually ramp up the launch layer experiment, and then finally delete the launch layer and change the default values of the relevant parameters to the values set in the launch layer experiment.

We use real-time monitoring to capture basic metrics (e.g., CTR) as quickly as possible in order to determine if there is something unexpected happening. Experimenters can set the expected range of values for the monitored metrics (there are default ranges as well), and if the metrics are outside the expected range, then an automated alert is fired. Experimenters can then adjust the expected ranges, turn off their experiment, or adjust the parameter values for their experiment. While real-time monitoring does not replace careful testing and reviewing, it does allow experimenters to be aggressive about testing potential changes, since mistakes and unexpected impacts are caught quickly.
One thing I like about the system they describe is that the process of launching is the same as the process for experimentation. That's a great way to set things up, treating everything to be launched as an experiment. It creates a culture where every change to be launched needs to be tested online and experiments are not treated so much as tests to be taken down when done as candidates to be sent out live as soon as they prove themselves.

Another thing I like is the real-time availability of metrics and ability to very quickly change experiment configurations. Not only does that allow experiments to be shut down quickly if they are having a surprisingly bad impact which lowers the cost of errors, but also it speeds the ability to learn from the data and iterate on the experiment.

Finally, the use of standardized metrics across experiments and an "experiment council" of experts who can be consulted to help interpret the experimental results is insightful. Often, results of experiments are subject to some interpretation, unfortunately enough interpretation that overly eager folks at a company can attempt to torture the data until it says what they want (even when they are trying to be honest), so an effort to help people keep decisions objective is a good idea.

One minor but surprising tidbit in the paper is that binary launches are infrequent ("weekly"); only configuration files can be pushed frequently. I would have thought they would push binaries daily. In fact, reading between the lines a bit, it sounds like developers might have to do a bit of extra work to deal with infrequent binary pushes, trying to anticipate what they will want during the experiments and writing extra code that can be enabled or disabled later by configuration file, which might interfere with their ability to rapidly learn and iterate based on the experimental results. It also may cause the configuration files to become very complex and bug-prone, which is alluded to in the section of the paper talking about the need for data file checks. In general, very frequent pushes are desirable, even for binaries.