NO MUDDIST FOR the class on 03/24
Week 11
IES CH14 Parallel IR
·
Index partitioning and replication are two
popular approaches to improve the efficiency of information retrieval.
·
Intra-query parallelism is that we divide the
index into independent parts so that each node is responsible for a small piece
of the overall index, which greatly increase the efficiency.
·
The two predominant index partitioning schemes
are document partitioning schemes and term partitioning.
·
In a document-partitioned search engine, each of
the n nodes is involved in processing all queries received by the engine. In a
term-partitioned configuration, a query is seen by a given node only if the
node’s index contains at least one of the query terms.
·
The main advantage of the document-partitioned
approach is its simplicity. Because all index servers operate independently of
each other, no additional complexity needs to be introduced into the low-level
query processing routines.
·
Term partitioning addresses the disk seek
problem by splitting the collection into sets of terms instead of sets of
documents.
·
Despite its potential performance advantage over
the document-partitioned approach, at least for on-disk indices, term
partitioning has several shortcomings that make it difficult to use the method in
practice LIKE SCALABILITY/LOAD IMBALANCE/TERM-AT-A-TIME.
没有评论:
发表评论