Week 7
NO MUDDIST FOR WEEK 6
NO MUDDIST FOR WEEK 6
IIR
CH9 Relevance feedback and query expansion
This chapter discuss ways in which a
system can help with query refinement, either fully automatically or with the
user in the loop.
v Relevance feedback and
pseudo relevance feedback
Ø The idea of relevance
feedback (RF) is to involve the user in the retrieval process so as to improve
the final result set. In particular, the user gives feedback on the relevance
of documents in an initial set of results. The basic procedure is:
§ The user issues a (short,
simple) query.
§ The system returns an
initial set of retrieval results.
§ The user marks some
returned documents as relevant or nonrelevant.
§ The system computes a
better representation of the information need based on the user feedback.
§ The system displays a
revised set of retrieval results.
Ø
Image search provides a good example of relevance feedback. Not only is
it easy to see the results at work, but this is a domain where a user can
easily have difficulty formulating what they want in words, but can easily
indicate relevant or nonrelevant images.
Ø
The Rocchio algorithm
§ The Rocchio Algorithm is
the classic algorithm for implementing relevance feedback. It models a way of
incorporating relevance feedback information into the vector space model .
§ Relevance feedback can
improve both recall and precision. But, in practice, it has been shown to be
most useful for increasing recall in situations where recall is important. This
is partly because the technique expands the query.
Ø
Probabilistic relevance feedback
§ if a user has told us some
relevant and nonrelevant documents, then we can proceed to build a classifier.
One way of doing this is with a Naive Bayes probabilistic model. If R is a
Boolean indicator variable expressing the relevance of a document, then we can
estimate P(xt = 1|R), the probability of a term t appearing in a document,
depending on whether it is relevant or not,
Pˆ(xt = 1|R = 1) = |VRt|/|VR|
Pˆ(xt =1|R=0) =
(dft−|VRt|)/(N−|VR|)
Ø When does relevance
feedback work?
§ Firstly, the user has to
have sufficient knowledge to be able to make an initial query which is at least
somewhere close to the documents they desire.
§ Secondly, the relevance
feedback approach requires relevant documents to be similar to each other.
Ø RF is rarely used in the
web search.
Ø Evaluation in RF
strategies
§ Interactive relevance
feedback can give very substantial gains in retrieval performance. Empirically,
one round of relevance feedback is often very useful. Two rounds is sometimes
marginally more useful.
§ There is some subtlety to
evaluating the effectiveness of relevance feed- back in a sound and
enlightening way. The obvious first strategy is to start with an initial query
q0 and to compute a precision-recall graph. A second idea is to use documents
in the residual collection (the set of documents minus those assessed relevant)
for the second round of evaluation.
Ø Pseudo RF
Pseudo RF provides a
method for automatic local analysis. It automates the manual part of relevance feedback,
so that the user gets improved retrieval performance with- out an extended
interaction. The method is to do normal retrieval to find an initial set of
most relevant documents, to then assume that the top k ranked documents are
relevant, and finally to do relevance feedback as before under this assumption.
v Relevance feedback has
been shown to be very effective at improving relevance of results. Its
successful use requires queries for which the set of relevant documents is
medium to large. Full relevance feedback is often onerous for the user, and its
implementation is not very efficient in most IR systems. In many cases, other
types of interactive retrieval may improve relevance by about as much with less
work.
Beyond the core ad hoc retrieval scenario, other
uses of relevance feedback include:
Ø Following a changing
information need (e.g., names of car models of interest change over time)
Ø Maintaining an information
filter (e.g., for a news feed). Such filters are discussed further.
Ø Active learning (deciding
which examples it is most useful to know the class of to reduce annotation
costs).
v Global methods for query
reformulation
Ø This section briefly
talked about three global method for expanding a query: by simply aiding the
user in doing so; by using a manual thesaurus, and through building a thesaurus
automatically.
§ Vocabulary tools
Various user supports in the search process can
help the user see how their searches are or are not working. This includes
information about words that were omitted from the query because they were on
stop lists, what words were stemmed to, the number of hits on each term or
phrase, and whether words were dynamically turned into phrases.
§ Query expansion
In relevance feedback, users give additional input
on documents (by marking documents in the results set as relevant or not), and
this input is used to reweight the terms in the query for documents. In query
expansion on the other hand, users give additional input on query words or
phrases, possibly suggesting additional query terms.
§ Automatic thesaurus generation
As an alternative to the cost of a manual
thesaurus, we could attempt to generate a thesaurus automatically by analyzing
a collection of documents. There are two main approaches. One is simply to
exploit word co-occurrence. The other approach is to use a shallow grammatical
analysis of the text and to exploit grammatical relations or grammatical
dependencies.
没有评论:
发表评论