· IIR sections 1.2, chapters 2 and 3 In the section 1.2 A first take at building an inverted index, I learn the major steps in build the index. 1)collect the documents to be indexed 2)tokenize the text, turning each document into a list of tokens 3)do linguistic preprocessing, producing a list of normalized tokens, which are the indexing terms 4)index the documents that each term occurs in by creating an inverted index, consisting of a dictionary and postings In the chapter 2 the term vocabulary and postings lists. I learnd that the first step of processing is to convert this byte sequence into a linear sequence of characters. And the next phase is to determine what the document unit for indexing is. What's more, we should given a character swquence and a defined documnet unit, tokenization is the task of chopping it up into pieces, called tokens, perhaps at the same time throwing away certain characters, such as punctuation. Th...