2014年1月17日星期五

Notes Week 1

Week 1

FOA
·      Linguistic expressions must make sense in meaning (appreciate word game) so that the search engine can figure out what is talking about.
·      New types of electronic artifact bring an empirical grounding for new theories of language that may well be revolutionary.
·      FOA conversation loop:
1.     Asking a question – query (maybe imprecise or vague)
2.     Constructing an answer – retrieval by search engine
3.     Assessing the answer – relevance feedback (filter the answers)
·      IR – information retrieval
Exists since 1987,borrowed heavily from computational linguistics.

·      Fundamental operation of a search engine is match – descriptive features by users and the documents in database. How? By keywords.


IES
1.1.1 Web search engines usually incorporate layers of caching and replication, taking advantage of commonly occurring queries and exploiting parallel processing.
1.1.2 The search application systems must interface directly with the file system layer of the operating system and must be engineered to handle a updated load.
1.1.3 Techniques for categorization and filtering have the widest applicability, and we provide an introduction to these areas.
1.2.1 To perform relevance ranking, the search engine computes a score, sometimes called a retrieval status value, for each document.
1.2.2 A particular document might be an e-mail message, a Web page, a news article, or even a video.
1.2.3 Response time is the most visible aspect of efficiency experienced by a user between issuing a query and receiving the results.
MIR
·      Man has organized information for later retrieval and searching by compiling, storing, organizing, and indexing papyrus, hieroglyphics, and books.
·      IR has gained a place with other technologies at the center of the stage.
·      How the Web Changed Search
1、The characteristics of the document collection itself
1) Composed of pages distributed over millions of sites and connected through hyperlinks.
2) Crawling: new phase introduced by the Web.
2、The size of the collection
3、In a very large collection, predicting relevance is much harder than before.


没有评论:

发表评论