HtmlSearch Documentation: Overview
Why use HtmlSearch
The HtmlSearch allows you to search through a web site that is either local
(e.g. on your hard disk , a CD-ROM or a LAN) or remote (e.g. on the World Wide Web).
Many web sites have a search facility but it generally involves the execution of a
CGI script that is located on the site's server.
If a site is located on a server that does
not support CGI scripts, or on a local media, you cannot search it.
HtmlSearch allows you to search without the assistance of CGI scripts. It is therefore useful for:
Main features
New in Release 2.1
Usage
There are 2 versions of HtmlSearch:
- HtmlSearch lite which is simple search engine with basic capabilities.
So say you, why bother with it ? it's free !!!!. You can download it and pass it along. It's a good idea to
register your copy, because we can notify you in case of bug fixes. It also allows us to know how and where
HtmlSearch is used, so we can improve it.
- HtmlSearch Pro gives you all the features described above.
It is a shareware, so please register your copy
and you'll sleep better knowing that you've done the right thing.
HtmlSearch can be used in one of 3 modes:
- As part of a page in a web site: the site's designer has configured HtmlSearch to work on this web site.
Depending on the choices s/he made when incorporating HtmlSearch, you may be able to search outside
of that site, or not, have access to all the panels described in this help or not, etc...
- As part of a web page not included in a web site: this would be the case for example if you
downloaded the HtmlSearch program so you can use it on any web site or files.
- As an application executed outside of a web page. This offers the advantage of unrestricted access
to local or remote files and pages, but since it is not executed within a browser, you will not
be able to view the pages that match your search.
There is a how-to file that explains how to set up HtmlSearch.
How it works
HtmlSearch functions very much like your browser does when you click on a link, except it
does it automatically. In other
words, it reads a page, looks at the links it contains, follows these links, so on and so forth.
As such, it has the same access capabilities and limitations as your browser,
in addition to restrictions due to the particular nature of Java applets.
In addition it keeps the pages it examines in its own cache, so that from one search
to the next it doesn't re-read pages that have already been read. (This applies only as long
as you do not exit the current search session.)
The links followed are those found in hyperlinks (e.g. "A HREF" image maps, etc...).
HtmlSearch does not follow links to or generated by CGI scripts, Java, or JavaScript calls.
If a link points to a directory or to an unreachable URL, HtmlSearch will try to access the files 'index.html' and 'index.htm'
in that directory.
Searches can be slow if the pages searched are located on slow or busy servers,
or if your modem
is slow. The speed of the search also depends on your PC, and the settings in the
Advanced panel.
Starting and Stopping the Search: as stated above, HtmlSearch keeps all the visited pages
in its cache. This means that if you stop a search then restart it, even from a different
URL or searching for a different string, HtmlSearch will first use its cache, before going
back to the network to read the page. This minimizes the search time and network load, and allows for
very fast searches the "second time", e.g. when searching for different strings on the same set
of pages. The cache is window-specific, i.e. there is one such cache per window where HtmlSearch
is loaded, and even if there are several HtmlSearch windows opened, they do not share their cache.
All the searches do use the browser's caching mechanism which may also reduce access time and network load.
The caching mechanism may have a negative impact if you do many searches in the same window,
since all the pages visited are kept in the cache: if you search through thousands of pages, the
memory requirements may exceed the browser's capacity (this is platform, browser and browser settings
dependant). It may therefore be wise to every now and then 'kill' the search and restart in a new window.
Requirements
HtmlSearch is a Java program, and you need a Java enabled browser
(e.g. Microsoft Internet Explorer version 3.0 and above,
Netscape version 3.0 and above). It is built on the Java 1.0 JDK and is therefore compatible
with browsers using JDK 1.0 and 1.1.
Depending on your system configuration, where you got HtmlSearch from,
and your browser security settings,
you may be restricted as to which sites on the WWW you can search with HtmlSearch.
Registration
A shareware is a program like any other program you can buy in the store, except it is distributed
using the honor system, i.e. you can try it, and if you like it, you buy it.
Besides allowing you to try things out before signing the big check,
it also reduces your cost thanks to very low distribution overhead.
Even if you are using the Lite version, it is a good idea to register so we can inform you
of bug fixes.
To register, please go to the MandoSoft site.
Other Search Methods
HtmlSearch needs to read each page to search it, which is very inefficient compared
to CGI-based searches, not to mention the associated network load.
In other words, if the site you are looking at provides a search function, you may be
better off using it than HtmlSearch, especially since HtmlSearch may not be able to access
all the pages of the site. On the other hand, HtmlSearch provides indexing while the
site's CGI-based search may not; the site's provided search may also restrict the searches
in ways that may not fit you (e.g. only some of the pages are covered by the search), or not give you the flexibility that HtmlSearch offers.
For local searches (hard disk, CD-ROM), HtmlSearch is probably slower than operating systems based
tools (grep on Unix, Tools->Search on Windows), but these tools do not follow links: they operate
on files only.
There are also some other Java or JavaScript based search engines available from other sources,
but obviously HtmlSearch is better :-).