博文

目前显示的是 一月, 2019的博文

Week #2 Muddiest Point

If we don't know the distribution of the key-value pairs in a new collection, how to do the MapReduce? I'm confused with the punctuations "/" and "-"

#Week 2 Unit 2: Document and query processing

·        IIR sections 1.2, chapters 2 and 3 In the section 1.2 A first take at building an inverted index, I learn the major steps in build the index.  1)collect the documents to be indexed 2)tokenize the text, turning each document into a list of tokens 3)do linguistic preprocessing, producing a list of normalized tokens, which are the indexing terms 4)index the documents that each term occurs in by creating an inverted index, consisting of a dictionary and postings In the chapter 2 the term vocabulary and postings lists. I learnd that the first step of processing is to convert this byte sequence into a linear sequence of characters. And the next phase is to determine what the document unit for indexing is. What's more, we should given a character swquence and a defined documnet unit, tokenization is the task of chopping it up into pieces, called tokens, perhaps at the same time throwing away certain characters, such as punctuation. Th...

Week1 :Unit 1: Introduction and Course Overview

·        FOA section 1.1 By reading the FOA, I learn that the book is a closer look at the process of finding out about, reserch activities that allow a decision-maker to draw on others' knowledge. And the FOA process of browsing readers can be imagined to involve three phases : 1) asking a question 2) constructing an answer 3) assessing the answer. What's more,  give the schematic of search engine. ·        IES section 1.1 and 1.2 I learned that information retriveval is concerned with representingm searching, and manip ulating large collections of electronic text and other human-language data.In detail, I learn about the web search, desktop and file system search and how others IR applications works associated wi th the storage, manipulation, and retrieval of human-language data.Then, I learn the basic IR system architecture, the components of an IR system. What's more, the update and modify of the documents. And ...

Week #1 Muddiest Points

1. How does the website score the different webpage to evaluate the relationship between what the user wants and the provided links? 2. In the library, information retrieval system, will the reviews or adopt of the specific document will affect the future sort of the recommendations of the relative pieces of information?