LiFiDeA

Jinyoung Kim on Information Retrieval and Personal Information Management

Retrieval Experiments in Pseudo-desktop Collections

My paper ‘Retrieval Experiments in Pseudo-desktop Collections’ (co-authored with my advisor Bruce Croft) will be presented in CIKM2009. It is about a new model of desktop search research, where we introduced ‘pseudo-desktop’ — a simulated desktop collection composed of automatically gathered documents and generated queries. The method to validate generated collection is suggested as well.

For retrieval model perspective, we saw desktop search as a known-item search task over semistructured document collection, since people usually find what they already know of and each document in desktop has metadata. For instance, e-mail has sender and receiver fields in addition to usual title and content fields. We presented an improved retrieval model based on PRM-S, which was introduced in my previous work.

I believe that the significane of this work is threefold. First off, it is an effort to bring more scientific effort to desktop search (or searching personal information in general). People have studied desktop search for a long time yet they mostly built their own systems and reported the result of user-study, which lacks reproducibility. Pseudo-desktop can be a sharable data collection which address this problem, by which new researchers can test their algorithm against state-of-the-art baselines without building yet another desktop search engine.

Also, the experimental result shows the value of simulation as a method in IR research. Simulated query is not only free to get but also provides total control over the parameters. In our experiment, using algorithmically-generated queries many different characteristics, we could find insights over the performance of tested retrieval methods. Another paper in SIGIR2009 also demonstrated this value of simulated queries.

Lastly, PRM-S — a novel retrieval model based on the mapping between query-word and document structure — was found to be useful in a noisy settings like e-mails (e.g. many word-overlap between document fields) as well as the collection of clean structure like movie database.

Recently, I’ve been working on the development of LiFiDeA — a prototype PIM system, by which I plan to compare the experimental results of pseudo-desktop and real-desktop collections. I believe that this will provide ultimate validation for our approach here.

Filed under: Information Retrieval, Personal Information Management , ,

Leave a Reply

About Me

Twitter Updates

  • It's surprising how big a difference I can make by paying 'full' attention. Then what will be the best way to staying in that state of mind? 1 week ago
  • Waking up early gives me new energy and motivation to follow the way of life I determined to live. Why don't you start now. 2 months ago
  • Always be minimal, in terms of code, data, writing, and everything. Otherwise you'll soon find yourself flooded with wastes. {productivity} 4 months ago
  • Why blog is more popular than wiki? It lets you divide message into individual posts and get a feedback for each. It's more motivational. 5 months ago
  • http://www.slifeweb.com/ Check this out. This looks interesting. 8 months ago

Blog Stats

  • 1,691 hits