Three Methods for Personal Information Access

Posted on December 9, 2010


Although many of previous literatures mostly focused single method for accessing personal information, the choice of access method can be different depending on user’s knowledge, preference, and many other factors.In this post, I intend to compare three access methods and think about how these can be combined in the single retrieval task. The three methods are term-based search (e.g., desktop search), faceted search (filtering by metadata) and associative browsing (finding what’s related to current item). The table below provides an overview to three access methods for personal search.
Access Method User’s Knowledge Support Mechanism Steps of Interaction
Term-based Search Search terms Ranking based on
search terms
Initial query, then reformulation if necessary
Faceted Search Metadata Metadata fields Boolean search (filter) by metadata fields until the the result is sufficiently specific, discarding conditions as necessary
Associative Browsing Related item(s) Ranking based on the similarity to currently viewed item Click for browsing, getting back to original page as necessary
When the user remembers a good search keyword, a typical full-text search model can be useful. We introduced a retrieval model for desktop search in SIGIR’10 [1], where the results from each document type is combined into the final result with a type score. This approach allows the optimization of type-specific ranked list (e.g. use of thread feature for email ranking), and the incorporation of the knowledge on type prediction (e.g. user might be finding pdf for this particular query).

Associative Browsing
Another access method we introduced in CIKM’10 [2] is the associative browsing between documents and concepts, where concepts are the person names, events and terms of interest to the user. Here, the system suggested items (documents or concepts) related to what the user is currently looking at. For instance, if the user is given a search result, she can click on the result to browse into related items. Our work focused on improving the quality suggestions for browsing by using user’s click feedback.
The following figure taken from [2] shows an retrieval scenario where the user first finds a concept (person), then browsing into related email, finally reaching the target webpage. As you can see below, associative browsing can help the user reach the target item even if initial search is not successful, supporting orienteering strategy for personal information access.
Faceted Search
In previous work [1] [2], we studied term-based search model and associative browsing model. However, faceted search has not been explored as a access method, although it is an obvious option in an environment with rich metadata like personal information archives.
Faceted search in personal search can work just as in many other applications in which it is used. We can consider the following facets as a starting point, grouped by universal (across different types) and type-specific. The facets mentioned below can be easily extracted from the document metadata.
Universal Facets Type-specific Facets
Document Type
Source of Collection
(e.g., desktop, blog, …)
Date of Collection
Sender / Receiver (email)
Place / Participant (calendar)
Author / PubYear / Venue (paper)
Host / Tag (webpage)
In CIKM’10, there was a paper [4] about the cost-based evaluation of faceted search, where they introduced a selection algorithm for facet conditions based on the cost model. If the number of facets and facet conditions are beyond what can be displayed in a single page, such selection methods can reduce user’s burden. They evaluated their model in a move and an used car database, showing that their model reduces user’s browsing efforts.
In personal search scenario, the difference is that there are many document types, each of which has different facets to be used. Another point is that faceted search cannot cover all documents. i.e. There are documents with no metadata which can be used to identify it uniquely, and there is no way to use faceted search to find such documents. I believe that the technique suggested in [4] can be applied to some of facets in personal collection with a larget number of conditions (e.g. email senders and receivers, tags), yet it seems to be applicable for only such cases.
In overall, faceted search is a meaningful way of personal information access, especially for document types with rich metadata. Yet it may not be applicable for all document types, since some of document types doesn’t have metadata. I’m not sure if there’s any interesting research question on faceted search in personal information retrieval, yet I think that it needs to be taken into account when building the model of user or designing an user study, to see how it interacts with other access methods. 

Combination of Three Access Methods
These three methods can be combined in several ways. Imagine the following scenario for known-item search. If the user can recall about the item what he thinks of as effective search keyword or filtering conditions, he can initiate the retrieval process by searching and filtering. As he continues the process, if he runs into something that looks similar to the target item, then he can click the item to initiate the associative browsing. Or he can come up with new search keyword or filtering condition and refine his results further.

As you can see above, three methods can be combined dynamically in a single retrieval session. The details of the combination can differ depending on user’s initial knowledge about the target item, and what he may learn during the search process. Yet this capability of system to mix and match different methods within single search task seems obviously beneficial to the user, since each user would have different state of knowledge, and different preference for access methods.
In this post, I described three methods for personal information access, focusing on how they can be combined dynamically. In the following post, I plan to deal with issues regarding the evaluation of this combination approach.


[1] Ranking using Multiple Document Types in Desktop Search
[2] Building a Semantic Representation for Personal Information
[3] Retrieval Experiments using Pseudo-Desktop Collections
[4] FACeTOR: Cost-Driven Exploration of Faceted Query Results
Posted in: PIM