Week 4
IIR sections 1.3 and 1.4,
chapter 6
CH1
1.3 Processing Boolean
queries
·
The
five steps when possessing a simple conjunctive query:
1.
Locate the previous word in the Dictionary
2. Retrieve its postings
3. Locate the next word in
the Dictionary
4. Retrieve its postings
5. Intersect the two postings
lists
·
For
more complicated process we can use query optimization.
·
For
arbitrary Boolean queries, we have to evaluate and temporarily store the
answers for intermediate expressions in a complex expression.
1.4 The extended Boolean
model versus ranked retrieval
·
The
Boolean retrieval model contrasts with ranked retrieval models such as the
vector space model, in which users largely use free text queries, that is, just
typing one or more words rather than using a precise language with operators
for building up query expressions, and the system decides which documents best
satisfy the query.
CH6 Scoring, term weighting
and the vector space model
·
Parametric
and zone indexes serve two purposes, First, they allow us to index and retrieve
documents by metadata such as the language in which a document is written.
Second, they give us a simple means for scoring (and thereby ranking) documents
in response to a query.
·
Tf-idft,
d assigns to term t a weight in document d that is highest when t occurs many
times within a small number of documents (thus lending high discriminating
power to those documents); lower when the term occurs fewer times in a document,
or occurs in many documents (thus offering a less pronounced relevance signal);
lowest when the term occurs in virtually all documents.
没有评论:
发表评论