Class CIndex

java.lang.Object
   |
   +----CIndex

public class CIndex
extends Object
Central index of the search engine consisting of: Creating the search engine's index is a two-step process:
  1. all tables are set up as either vectors or nested arrays to make extensions fast and easy.
  2. all vectors and nested arrays are replaced by "normal" arrays to make retrieval fast and easy.


Constructor Index

 o CIndex()

Method Index

 o addPage(PPage)
adds a document to the index.
 o dump()
 o getMatchingWordIDs(String)
returns a all words within the idex that match a given wildcard expression (string*)
 o getPage(int)
returns a document from a given document ID.
 o getPage(String)
returns a document from a given URL.
 o getPostingList(int)
returns a given word's posting list
 o getPostingList(String)
returns a given word's posting list
 o getWordLink(String)
returns a given words ID as used in docReps.
 o optimize()
optimizes the index by replacing all vectors and nested arrays with "real" arrays.
 o statistics(PrintStream)

Constructors

 o CIndex
 public CIndex()

Methods

 o addPage
 public boolean addPage(PPage page)
adds a document to the index. The steps performed are:
  1. add the document to the document table
  2. add the document's URL to the URL table
  3. create a local index for this document
  4. rank all words based on the local index
  5. create the document's docRep from the local index
  6. merge the local index with the global index. Add all new words to the global word list.

Parameters:
page - page to be added to the index
Returns:
true, if the page was added, otherwise false.
 o optimize
 public void optimize()
optimizes the index by replacing all vectors and nested arrays with "real" arrays. The steps performed are:

 o getWordLink
 public int getWordLink(String strWord)
returns a given words ID as used in docReps.

Returns:
an integer > 0 if the word was found, otherwise 0.
 o getPostingList
 public int[] getPostingList(String strWord)
returns a given word's posting list

Parameters:
strWord - word to be looked up
Returns:
posting list as an array of document IDs. If strWord is unknown, null will be returned.
 o getPostingList
 public int[] getPostingList(int idWord)
returns a given word's posting list

Parameters:
idWord - word ID to be looked up
Returns:
posting list as an array of document IDs
 o getMatchingWordIDs
 public Vector getMatchingWordIDs(String mask)
returns a all words within the idex that match a given wildcard expression (string*)

Parameters:
mask - - wildcard expression
Returns:
Vector of all word's IDs that match the expression
 o getPage
 public CIndexedPage getPage(int idPage)
returns a document from a given document ID.

Parameters:
idPage - - document ID
Returns:
CIndexedPage object if the document was found, otherwise null.
 o getPage
 public CIndexedPage getPage(String strURL)
returns a document from a given URL.

Parameters:
strURL - - URL
Returns:
CIndexedPage object if the document was found, otherwise null.
 o dump
 public void dump()
 o statistics
 public void statistics(PrintStream pout)