Research

Apr 2009

What are people talking about

One of my favorite topics of investigation along with folks at IBM, has been the identification and disambiguation of named entities that are also common words used in language. Others at Microsoft Live Labs have a much cuter name for it - cultural entities! Our approach demonstrates the use of scoping contextual information from a Taxonomy while complementing traditional statistical and NLP techniques.

Identifying the named entities allows applications to track, trend and analyze their usage. This information along with a situational context, such as, spatial, temporal information and social context such as the network the data was generated in, empowers a newer breed of socially-aware applications. I am currently working on such an application with some of my colleagues, analyzing Twitter data using domain knowledge from DBPedia.

How people write

Another theme I am exploring is how people write, focussing on the words we use in certain restricted domains. I had the most wonderful time working with Marti Hearst at UC Berkeley, examining language use in Online dating profiles. Using the Linguistic Inquiry Word Count program, we looked at gender differences in how people self-represent and wrote about what they were looking for in a potential mate. We found that in online dating profiles, self-expression might tend toward attempting homophily. A poster for the work can be found here, and the paper is here.

I am currently looking at how word usages (potentially among user groups), can help in entity disambiguation. Check out a tag cloud of top 100 words teen and tween members of MySpace use when talking about the female pop artist Lilly Allen vs. Coldplay (male rock band)!

Why people write

My interest in looking at why people write started with my stint at IBM, Almaden where we analyzed sentiments expressed by people on MySpace toward music artists. Who knew "the song is dope" meant that the song was awesome! The larger goal was to generate a popular artists lists based on listener preferences by mining comments from MySpace music forums. Details of the system can be found here; a shorter poster is available here. More recently, we used voting schemes over rankings from different data sources to generate final popular artist lists. More details here.

My interest since, has evolved towards identifying intention in social media content. Broadly, intentions behind user activity or posts were information seeking, information sharing and transactional in nature. I am currently working on using the functional or communicative properties of words to bootstrap the process of automatically identifying intents. Some preliminary work can be found here.

Professional Activities

Feb 2009

IBM UIMA Innovation Award 2007: Primary contributor (proposal based on my work) to the winning proposal "UIMA-based Infrastructure for Summarizing Casual, Unstructured Text"

Microsoft's Beyond Search - Semantic Computing and Internet Economics Award 2008: Primary contributor (proposal based on my work) to the winning proposal Chatter, Intent and Good Karma for Targeted Advertising in Social Networks