IS2140 Information Retrievial: Notes Week 7 and NO MUDDIST FOR THIS WEEK

Week 7
NO MUDDIST FOR WEEK 6

IIR

CH9 Relevance feedback and query expansion

This chapter discuss ways in which a system can help with query refinement, either fully automatically or with the user in the loop.

v Relevance feedback and pseudo relevance feedback

Ø The idea of relevance feedback (RF) is to involve the user in the retrieval process so as to improve the final result set. In particular, the user gives feedback on the relevance of documents in an initial set of results. The basic procedure is:

§ The user issues a (short, simple) query.

§ The system returns an initial set of retrieval results.

§ The user marks some returned documents as relevant or nonrelevant.

§ The system computes a better representation of the information need based on the user feedback.

§ The system displays a revised set of retrieval results.

Ø Image search provides a good example of relevance feedback. Not only is it easy to see the results at work, but this is a domain where a user can easily have difficulty formulating what they want in words, but can easily indicate relevant or nonrelevant images.

Ø The Rocchio algorithm

§ The Rocchio Algorithm is the classic algorithm for implementing relevance feedback. It models a way of incorporating relevance feedback information into the vector space model .

§ Relevance feedback can improve both recall and precision. But, in practice, it has been shown to be most useful for increasing recall in situations where recall is important. This is partly because the technique expands the query.

Ø Probabilistic relevance feedback

§ if a user has told us some relevant and nonrelevant documents, then we can proceed to build a classifier. One way of doing this is with a Naive Bayes probabilistic model. If R is a Boolean indicator variable expressing the relevance of a document, then we can estimate P(xt = 1|R), the probability of a term t appearing in a document, depending on whether it is relevant or not,

Pˆ(xt = 1|R = 1) = |VRt|/|VR| Pˆ(xt =1|R=0) = (dft−|VRt|)/(N−|VR|)

Ø When does relevance feedback work?

§ Firstly, the user has to have sufficient knowledge to be able to make an initial query which is at least somewhere close to the documents they desire.

§ Secondly, the relevance feedback approach requires relevant documents to be similar to each other.

Ø RF is rarely used in the web search.

Ø Evaluation in RF strategies

§ Interactive relevance feedback can give very substantial gains in retrieval performance. Empirically, one round of relevance feedback is often very useful. Two rounds is sometimes marginally more useful.

§ There is some subtlety to evaluating the effectiveness of relevance feed- back in a sound and enlightening way. The obvious first strategy is to start with an initial query q0 and to compute a precision-recall graph. A second idea is to use documents in the residual collection (the set of documents minus those assessed relevant) for the second round of evaluation.

Ø Pseudo RF

Pseudo RF provides a method for automatic local analysis. It automates the manual part of relevance feedback, so that the user gets improved retrieval performance with- out an extended interaction. The method is to do normal retrieval to find an initial set of most relevant documents, to then assume that the top k ranked documents are relevant, and finally to do relevance feedback as before under this assumption.

v Relevance feedback has been shown to be very effective at improving relevance of results. Its successful use requires queries for which the set of relevant documents is medium to large. Full relevance feedback is often onerous for the user, and its implementation is not very efficient in most IR systems. In many cases, other types of interactive retrieval may improve relevance by about as much with less work.

Beyond the core ad hoc retrieval scenario, other uses of relevance feedback include:

Ø Following a changing information need (e.g., names of car models of interest change over time)

Ø Maintaining an information filter (e.g., for a news feed). Such filters are discussed further.

Ø Active learning (deciding which examples it is most useful to know the class of to reduce annotation costs).

v Global methods for query reformulation

Ø This section briefly talked about three global method for expanding a query: by simply aiding the user in doing so; by using a manual thesaurus, and through building a thesaurus automatically.

§ Vocabulary tools

Various user supports in the search process can help the user see how their searches are or are not working. This includes information about words that were omitted from the query because they were on stop lists, what words were stemmed to, the number of hits on each term or phrase, and whether words were dynamically turned into phrases.

§ Query expansion

In relevance feedback, users give additional input on documents (by marking documents in the results set as relevant or not), and this input is used to reweight the terms in the query for documents. In query expansion on the other hand, users give additional input on query words or phrases, possibly suggesting additional query terms.

§ Automatic thesaurus generation

As an alternative to the cost of a manual thesaurus, we could attempt to generate a thesaurus automatically by analyzing a collection of documents. There are two main approaches. One is simply to exploit word co-occurrence. The other approach is to use a shallow grammatical analysis of the text and to exploit grammatical relations or grammatical dependencies.

IS2140 Information Retrievial

2014年2月21日星期五

Notes Week 7 and NO MUDDIST FOR THIS WEEK

没有评论:

发表评论