Class CIndex

java.lang.Object
   |
   +----CIndex

Central index of the search engine consisting of:

An index map containing the stems of all indexed words. Each stem is linked to its posting list via an unique id.
An indexed array of posting lists.
A hashed URL list. Each URL is linked to the appropriate document via an unique id.
An indexed array of documents.

Creating the search engine's index is a two-step process:

all tables are set up as either vectors or nested arrays to make extensions fast and easy.
all vectors and nested arrays are replaced by "normal" arrays to make retrieval fast and easy.

addPage(PPage): adds a document to the index.
dump()
getMatchingWordIDs(String): returns a all words within the idex that match a given wildcard expression (string*)
getPage(int): returns a document from a given document ID.
getPage(String): returns a document from a given URL.
getPostingList(int): returns a given word's posting list
getPostingList(String): returns a given word's posting list
getWordLink(String): returns a given words ID as used in docReps.
optimize(): optimizes the index by replacing all vectors and nested arrays with "real" arrays.
statistics(PrintStream)

CIndex

 public CIndex()

 public boolean addPage(PPage page)

adds a document to the index. The steps performed are:

add the document to the document table
add the document's URL to the URL table
create a local index for this document
rank all words based on the local index
create the document's docRep from the local index
merge the local index with the global index. Add all new words to the global word list.

 public void optimize()

optimizes the index by replacing all vectors and nested arrays with "real" arrays. The steps performed are:

 public int getWordLink(String strWord)

returns a given words ID as used in docReps.

 public int[] getPostingList(String strWord)

returns a given word's posting list

Parameters:: strWord - word to be looked up
Returns:: posting list as an array of document IDs. If strWord is unknown, null will be returned.

 public int[] getPostingList(int idWord)

returns a given word's posting list

 public Vector getMatchingWordIDs(String mask)

returns a all words within the idex that match a given wildcard expression (string*)

 public CIndexedPage getPage(int idPage)

returns a document from a given document ID.

 public CIndexedPage getPage(String strURL)

returns a document from a given URL.

dump

 public void dump()

 public void statistics(PrintStream pout)