2014年1月30日星期四

Notes Week 4


Week 4
IIR sections 1.3 and 1.4, chapter 6

CH1
1.3 Processing Boolean queries
·      The five steps when possessing a simple conjunctive query:
1. Locate the previous word in the Dictionary
2. Retrieve its postings

3. Locate the next word in the Dictionary

4. Retrieve its postings

5. Intersect the two postings lists
·      For more complicated process we can use query optimization.
·      For arbitrary Boolean queries, we have to evaluate and temporarily store the answers for intermediate expressions in a complex expression.
1.4 The extended Boolean model versus ranked retrieval
·      The Boolean retrieval model contrasts with ranked retrieval models such as the vector space model, in which users largely use free text queries, that is, just typing one or more words rather than using a precise language with operators for building up query expressions, and the system decides which documents best satisfy the query.

CH6 Scoring, term weighting and the vector space model
·      Parametric and zone indexes serve two purposes, First, they allow us to index and retrieve documents by metadata such as the language in which a document is written. Second, they give us a simple means for scoring (and thereby ranking) documents in response to a query.
·      Tf-idft, d assigns to term t a weight in document d that is highest when t occurs many times within a small number of documents (thus lending high discriminating power to those documents); lower when the term occurs fewer times in a document, or occurs in many documents (thus offering a less pronounced relevance signal); lowest when the term occurs in virtually all documents.



没有评论:

发表评论