Posted on November 7, 2009


I just attended CIKM in Hong Kong. The conference was very successful. There were many interesting papers and the organizers did a good job. (I also liked the food here much better than in SIGIR Boston 😉

Let me post about a couple of interesting trends and works. I want to summarize the big picture as ‘a strenuous effort for convergence’. Although CIKM conference itself is aimed at providing a venue of collaboration between IR, DB and KM researchers, I could observer many efforts toward more unified approach.

The keynote speech on the first day was about the DBMS architecture where DB and IR components are tightly integrated. The speaker Kyu-Young Whang asserted that major concepts of IR such as inverted index and relevance matching operators should be embedded in DBMS as a core component.

Given that full-text search capability of current RDBMS systems are implemented as an extension layer on top of traditional DBMS components (e.g. B Tree index, Query Optimizer, and so on), having more tight integration will help efficient execution of queries where traditional data types in RDBMS (e.g. string, number and time) should be combined with full-text data types.

He demonstrated this vision by building a large-scale web search engine which can handing filtering operations such as site-specific search efficiently. Although the evaluation part was somewhat questionable, I agree with the main point he made.

There was also panel discussion titled ‘Information Extraction Meets Relational Databases’, which started with a thought-provoking question of ‘Where would you spend your next million dollar to solve this challenge?’. Each panelist represented the field of one’s major interests — Andrei Broder (Yahoo!) for Web Search, Edward Chung (Google China) for Data Mining and so on. During the discussion, Andrei took an illustrative example of answering queries like ‘Brad Farve’ (a football player), where he made the point that presenting automatically extracted information can help user fulfill information needs more effectively.

While the overall conclusion of the discussion was nothing new — there should be a convergence for user benefit, it was interesting to see how each subfield is limited and how the findings in related field can address the very issue. For instance, since information extraction works reasonably only for limited domain, it was suggested that aggregated query logs can provide useful clues on which object type or properties to focus as extraction targets. Getting back to the example query ‘Brad Farve’ this extracted information can help search relevance for each query related to specific object type.

In spite of these efforts for convergence, I also got the impression that researchers are not quite ready for such trend. For one thing, I was surprised that none of XML keyword retrieval papers from database people cited my recent work, although they even used the same collection! (IMDB) Anyway, I believe that we are in the right direction and conferences like CIKM can play a crucial role in our way forward.

I plan to write a follow-up posts on papers that drew my attention particularly.

