HtmlSearch Documentation: Index panel
This screen is only available in the Pro version.
Depending on your particular configuration, this screen may not be available.
The Index panel allows you to save the results of your search as an index to the pages found.
The index is built from the list of pages displayed in the
Result panel. If the option
'Found previous search' is selected on the Options panel,
the index is built using the search string and pages found during the
previous search, otherwise, the results of the current search are used.
You can generate an index, either as a 'simple' list ,a CSV (Comma Separated Values) file to be used in
a spreadsheet or database, or as an HTML file that you can use directly in your browser.
A CSV index is akin to a spreadsheet with 3 columns: the word(s) found, the page URL, and the page title.
When an HTML index is generated, a link to the page found is included in the file.
If the option "Display page in its container if possible" was checked in the
Advanced panel,
the link will point to the page's container if appropriate. For example, if a word is found in
a frame page, the link will point to the frameset. This feature operates only for HTML indices,
i.e. for a text or CSV index, that substitution is not done.
The generated index is displayed in the text area. You can edit it before saving it if you wish.
The panel elements are, from top to bottom:
The buttons:
- Make index: click this button to generate the index.
Generating the index can be a lengthy operation if there is a large number of pages to index,
or if you request to index all the words (see below). It may also slow down the search
if you generate the index while a search is in progress.
While the index is being built, the button changes to Stop index
in case you want to interrupt that operation.
If the search is in progress or if you interrupt the index generation,
a partial index reflecting the current state of the search is built.
- Save index: click this button to save the generated index.
A dialog allowing you to select the destination file will appear.
If you generated an HTML index, you should use the '.html' extension for your file; use .csv
for a CSV file.
Because of applet security restrictions, you may not be able to save the file
using the 'Save' button; in this case, you can use Cut and Paste (e.g. with the Windows Clipboard)
to save the index to a file. You may also be able to overcome this restriction by using
the methods described in the Troubleshooting guide.
- Exclude words from index: click this button to select an
Index Exclusion Dictionary that
specifies the list of words to be excluded from the generated index.
This button is active only when the 'Index all words'
option has been selected (see below).
The options:
- Generate as text, HTML, or CSV: select the appropriate option to generate the index as
an HTML file that you can use within a browser, a spreadsheet, or view as a text file.
- Index for all lookFor words / each lookFor word / all words: this allows you to
specify which words are indexed:
- In the first case ('all lookFor words'), the list of pages found
in the search is generated, with no distinctions as to which page contains which word.
- In the second case ('each lookFor word separately'), a list a pages is generated for
each word in the lookFor string that produced the search result: this allows you to see in which
page each of the words appears. If the search string has
only one word (i.e. no OR or AND clause), this is equivalent to the previous selection.
The "Match Case" and "Word Only" settings from the option panel are taken into account,
i.e. if you did a case sensitive search, the index will be case sensitive.
- In the last case ('all words'), a list of pages is generated for all the words
in the found pages, not only those searched (except for those words listed in the
Index Exclusion Dictionary, numbers and words starting with a number
or any special character, i.e. non-letter, and any 1-character word).
The index generated is an alphabetical index
of all the words in the pages, and for each word, the pages where it appears.
This option can be very time and resource consuming, and may make your PC almost
unusable for anything else while the index is being built (from minutes to hours, depending
on the number of pages to index).
- A status line indicates the current state of the index or the save.
- The generated index is shown in the text area.
If the index is very large, it cannot be displayed all at once; the 'Next' and 'Previous'
buttons allow you to page through it.
The Index Exclusion Dictionary is a list of words that will not be included in the index when indexing
all the words (e.g. you may not want to index 'the', 'is', etc...).
You can use either the default Index Exclusion Dictionary that comes with HtmlSearch, or any file you wish.
You can also modify any of the dictionary files you load, so you can add, modify or delete entries.
An Index Exclusion Dictionary file is simply a text file that contains a list of words. Spaces, tabs and
carriage returns are ignored when the dictionary is processed, so feel free to format your file as you wish.
When you press the 'Exclude...' button,
a pop up window is displayed. This window allows you to select a new Index Exclusion Dictionary file,
load it, modify it, or set the list of words as the dictionary to be used. You can also reload the default
dictionary file that is included with HtmlSearch
Tips:
- to index all the words in a site, set the 'Look For' string to nothing (' ')
- the indexing does not distinguish between a word and its plural and conjugated form,
e.g. 'table' is considered to be a different word than 'tables' or 'tabled'.
- the index is case insensitive, i.e. 'Table', 'TABLE' and 'table' are considered to be the same word.
- the default Index Exclusion Dictionary is pretty minimal; feel free to augment it.