2014年3月26日星期三

Notes Week 10 and NO MUDDIST FOR THIS WEEK

NO MUDDIST FOR the class on 03/24

Week 11
IES CH14 Parallel IR
·      Index partitioning and replication are two popular approaches to improve the efficiency of information retrieval.
·      Intra-query parallelism is that we divide the index into independent parts so that each node is responsible for a small piece of the overall index, which greatly increase the efficiency.
·      The two predominant index partitioning schemes are document partitioning schemes and term partitioning.
·      In a document-partitioned search engine, each of the n nodes is involved in processing all queries received by the engine. In a term-partitioned configuration, a query is seen by a given node only if the node’s index contains at least one of the query terms.
·      The main advantage of the document-partitioned approach is its simplicity. Because all index servers operate independently of each other, no additional complexity needs to be introduced into the low-level query processing routines.
·      Term partitioning addresses the disk seek problem by splitting the collection into sets of terms instead of sets of documents.

·      Despite its potential performance advantage over the document-partitioned approach, at least for on-disk indices, term partitioning has several shortcomings that make it difficult to use the method in practice LIKE SCALABILITY/LOAD IMBALANCE/TERM-AT-A-TIME.

没有评论:

发表评论