Friday, April 24, 2009

Related content as you type

Over the years, one idea that recurs is to intelligently show the user information related to what he's typing. The thinking goes like this: "Since related-links on web pages work well, and related-videos on YouTube are awesome, it will be great to target information on what the user is typing! Especially since the user is indicating proactive interest rather than passively web-surfing!"

I've experimented several times with this, and most results were actually NOT great. I've given some thought to this, and would be interested to hear other people's insights.

The components I perceive are:


  1. Deciding what counts as the user's "current context". If they're typing a long email or document, do you use the last sentence? The last paragraph? Or time-based (the last 20 seconds of typing)?

  2. Extracting the most interesting parts of the user context.

  3. Matching information to the extracted parts of user context. If I'm writing to Neha about hiking in SF, the matched information might be links (top google result is the useful bahiker.com), my own documents (previous communications with Neha about hiking), products (hiking shoes), or ads (such as Gmail Ads which I worked on).

  4. Ranking the information results, based on their quality and correlation to the extracted user context.

  5. Deciding when the user is in the right mindset to be presented with information. If they are focused on finishing their writing, they may want to minimize interruptions. My instinct is to use typing speed (faster typing = less interest in interruptions).

  6. Displaying the information in a non-obtrusive way. If the information changes every 30 seconds with the user's typing, it can get annoying to have an area of the screen updating so frequently.


#3 (matching) and #4 (ranking) are very similar to web search, so the existing solutions are mature. #1 (determining user context) can be reached with some tweaking. The parts I found most challenging were #2 (extracting the key parts of the context) and #5 (determining when to show results).

I think #6 (displaying) depends on #5 (user mindset). Solving #5 would make #6 tractable.

I did an experiment a couple years ago to automatically match images as users typed in an IM conversation. I wanted to see if IM conversations could be summarized via images. Then if you had a lobby of real-time group chats, you could look at the image streams to decide which of the chats was most interesting to you.

I preprocessed a few million online images to match each one with semantic word-clusters. During an IM conversation, I took lines of typing and converted them to the same clusters, and then correlated that with my image corpus.

The results were mixed. Sometimes the image matching worked serendipitously well. Once I was writing about my experiment, and the algorithm posted one of the standard image-processing photos (I think it was the babboon). The issue was the amount of noise, caused by images being related to a tangential part of the conversation rather than the main gist (the extraction problem discussed in point 2).

I've also done this experiment using emails instead of IM, and another time even hooking into the user's keyboard strokes so that I captured whatever they were typing at the moment. Increasing the amount of input context didn't actually help too much. At any moment, the user is still only interested in a small amount of information that he's typing. It is hard to capture the gist from that short piece of text.

The academic papers in this area (at least the ones I read a few years ago) didn't seem to have good solutions to this problem.

I think this is an interesting area. Seeing advancements in this space would be pretty cool.

7 comments:

Jacek FedoryƄski said...

There was once a project done by the Gnome guys that was supposed to provide something similar, but it's been dead for several years now:

http://nat.org/dashboard/

old codger said...

I might be part of a vocal minority here, but I think this is a solution looking for a problem. When I'm typing, or watching television for that matter, I do not want to be distracted from the task at hand. Not even a bit. Ever. The Firefox Smart Bar is an insufferable distraction, throwing up a menu of dancing, ever-changing options at the precise moment when I'm trying to quietly sift through my memories to recall somewhere I've been. I hate it, and disable it shortly after install since it's enabled by default, because someone obviously thought it would be a great idea. Grumble. And those absurd, animated advertisements that go running across the bottom of the TV screen, purposefully trying to draw your attention away from what you're currently watching (since marketers have figured out that people with TiVos intentionally fast forward through regular ads). They do not care if it destroys your enjoyment of the regular programming, they will make you watch ads, by golly. Gah! It's enough to make me use the TV for DVDs only, and start dusting off my gopher client. Sorry for the rant.

jack dahlgren said...

I just don't see it working out well in the end.

Perhaps I'm writing a new paragraph introducing a new idea. There may be no previous context which is relevant.

Or if I get caught up writing an analogy, do I get suggestions about THAT too?

The spelling and grammar checkers are interuption enough. Writing well requires paying attention. As Old codger states "I do not want to be distracted from the task at hand. I used to write my papers late at night. Nothing to break concentration.

Yishan said...

This sounds like when you eavesdrop on a conversation and hear a part of it, think you know what's going on, and obnoxiously jump in and volunteer some observation of your own, and then both of the people look at you weirdly and you slink off awkwardly, later to realize that they were talking about something completely different.

i.e. something a nerd would do.

Unknown said...

I thought about this years ago with Georges -- it was a little whimsical. An easier example is getting relevant images (from recent news?) on the side while you're chatting about a celebrity or a movie. Then you just have to look for some key words to be mentioned

old codger said...

On further reflection, I think this might be more of a control issue. When I force myself to rationally consider the idea of getting 'related content as I type', I think I would not mind having the capability of doing so, as long as it was off by default and could be easily toggled on or off through the application interface (a contextual search button, showing the current state [on/off] that resets to the default (set through a checkbox in Prefs) at the end of the browsing session). I like the illusion of being in control of my computing experience, and when the OS or applications attempt to do things automatically in a misguided effort to help me, as the poster above mentioned--it ends up being the wrong kind of help, and it quickly exhausts my patience. I hate nothing worse than a fresh install of XP, with all sorts of informational balloons and various bits of crap popping up left and right (without any obvious way of making it stop, until you gain enough expertise to start spelunking through the bowels of the system). I guess that's why I tend to use some flavor of BSD or Linux--in return for a fairly large investment of time and energy, they place YOU in squarely control of your hardware and the OS. Just like the old days, when DOS or CP/M would sit for an eternity, patiently awaiting your input and not doing one damn thing until you told it to--I liked that. I still like it, and maybe it's just a generational difference? To feel comfortable using a computer, I need that feeling of being (somewhat) in control of the experience. Advanced capabilities are nice, but stow them just out of sight and let me ask for them when -I- decide that I want or need them. Otherwise, leave me the heck alone, lol. Anyway, that's my 2^32 cents. :)

Anonymous said...

Maybe you can install some hook procedures that monitor window messages. Then submit queries to search engines predicting what the user wants. The query results should be simple to read and should introduce user perceivable entropy.

Good luck~