| scanDirectoryList |
Starts scanning (crawling and indexing) the files under the specified
directories and continues until it covers all subdirectories underneath.
If you don't specify anything in scanDirectoryList, scanXmlList or scanUrlList
it scans the files under the current web application by default.
Note that if you enter anything in scanDirectoryList you also need to set
mapPathList so that it can map to the virtual path to crawl properly. |
| scanXmlList |
Parses the local XML file specified by "filePath" to extract the urls
from the elements or attributes specified by "urlXPath".
You can list one or more website navigation files such as UltimateMenu, UltimatePanel and
UltimateSitemap source XML files, each one specified in a separate "scanXml" element.
Note that "urlXPath" is case-sensitive. Also note that you can set "filePath"
in three different forms. |
| scanUrlList |
Starts scanning (crawling and indexing) with the specified urls and continues
with the urls inside each page until it covers all urls within each domain.
You can list multiple domains, home pages, sitemap pages, or any other url.
Note that scanUrl can be set to any URL that opens as a page
in your browser window. If you set it to a directory like WebApplication2
you should enable default documents on the Documents tab of the IIS settings. |
| excludePathList |
Urls starting with the specified prefixes will be discarded.
Note that you can also use the robots.txt file to disallow paths, or
robots meta tags to set noindex and nofollow flags in each page.
You may visit http://www.robotstxt.org/wc/exclusion-admin.html
to get more familiar with the robots.txt file and meta tags.
If you don't specify anything it will exclude the UltimateSearchInclude
directory under the current web application by default. |
| Ignore Tags |
You can exclude a portion of your pages in three different ways:
1. Use UltimateSearch_IgnoreBegin and UltimateSearch_IgnoreEnd tags
to exclude everything between these tags from indexing.
2. Use UltimateSearch_IgnoreTextBegin and UltimateSearch_IgnoreTextEnd tags
to exclude only the text between these tags from indexing, while following the links.
3. Use UltimateSearch_IgnoreLinksBegin and UltimateSearch_IgnoreLinksEnd tags
to exclude only the links between these tags from indexing, while indexing the text.
See how you can define these ignore tags below:
<!-- UltimateSearch_IgnoreBegin -->
Everything here will be ignored
<!-- UltimateSearch_IgnoreEnd -->
<!-- UltimateSearch_IgnoreTextBegin -->
Text here will be ignored, but links will be followed
<!-- UltimateSearch_IgnoreTextEnd -->
<!-- UltimateSearch_IgnoreLinksBegin -->
Links here will be ignored, but text will be indexed
<!-- UltimateSearch_IgnoreLinksEnd -->
|
| includeFileTypeList |
Only the files with the specified extensions will be scanned.
Note that these files must be of text/html type so that they can be crawled
properly. For non text/html file types you will need to use IFilters
as explained in the ifilterList and mapPathList elements. |
| ifilterList |
IFilters are used to open and parse the non text/html file types such as
pdf, doc, xls, ppt, etc. that are not in the default includeFileTypeList above.
You need to install the specific IFilter for each file type. You may visit
http://www.ifilter.org to download the necessary IFilters for free.
Note that you don't need to install an IFilter for doc, xls, ppt since
they already exist on Windows server. You only need to add the file extensions
here in separate ifilter elements.
Also note that you need to set mapPathList since it requires
a physical path or UNC in order to open these file types.
They can't be crawled as text/html files. So they have to reside on your local network. |
| mapPathList |
Virtual to physical path mappings must be provided if you use scanDirectoryList or ifilterList. |
| devProdMapPathList |
When you deploy your web application to a production/hosting environment, you may not have the ability to crawl/index your website or you may not have the necessary permissions to save your index files. In that case, you may build your index file on your development/publishing machine, and then copy the Index directory onto your production machine.
On your development/publishing machine, you have to provide "devProdMapPathList" so that the generated
index files have the urls point to the actual production machine instead of
your development machine. After copying the Index directory onto the production machine
you will also need to update the config file on that machine to set
"saveIndex", "saveEventLog", and "saveSearchLog" to "false" since you're not
allowed to write onto that machine.
On your production/hosting machine, open the UltimateSearch.admin.aspx page in IE, and click "Load Copied Index" in order to load the copied index. |
| defaultDocumentList |
Default documents under each directory. When you specify this list,
it won't index the directory url and the default document url at the same time. |
| stopWordList |
These words will not be indexed.
Note that you don't need to list words that are shorter than the "minWordLength" attribute setting. |
| ignoreAllNumericWords |
Ignore words that contain only numeric characters such as 1234. |
true |
| ignoreMixedNumericWords |
Ignore words that contain both numeric and alphabetic characters such as ABC123. |
true |
| indexDirectory |
Directory that contains the index files. Give full permission to the ASP.NET user (NETWORK SERVICE in Windows 2003) on the Index directory in order to save the index files properly. |
~/UltimateSearchInclude/Index |
| logDirectory |
Directory that contains the event and search logs. Give full permission to the ASP.NET user (NETWORK SERVICE in Windows 2003) on the Log directory in order to save the log files properly. |
~/UltimateSearchInclude/Log |
| saveEventLog |
Whether to log history of index and configuration change history. |
true |
| saveSearchLog |
Whether to log history of search operations. |
true |
| useRobotsFile |
Whether to use robots file to disallow paths from crawling. If you want to keep the querystrings as part of the indexed urls you should set this flag to false. |
false |
| useRobotsMeta |
Whether to use meta tags to set noindex and nofollow flags in each page. |
false |
| removeQueryString |
Whether to remove query string from urls while crawling. |
true |
| urlCaseSensitive |
Whether to treat urls case-sensitive or not. If you set this flag to true indexed urls will be case-sensitive, i.e. search results may show both http://www.mydomain.com and
http://www.MyDomain.com if both links exist on your pages.
This feature is especially useful if the values in querystrings
need to be case-sensitive. |
false |
| frequencyInDaysForReindexFull |
Reindexes everything from scratch after the specified number of days.
For example, you can set the value to 7 to reindex everything once a week. |
7 |
| frequencyInDaysForReindexIncremental |
Reindexes only the updated documents after the specified number of days.
For example, you can set the value to 7 to reindex changes only once a week. |
7 |
| dependencyFileForReindexFull |
Reindexes everything from scratch whenever this file gets updated.
For example, you can set the value to ~/Samples/Cookbook/Index.htm. |
7 |
| dependencyFileForReindexIncremental |
Reindexes only the updated documents whenever this file gets updated.
For example, you can set the value to ~/Samples/Cookbook/Dessert/PoppySeedBundtCake.htm. |
7 |
| maxPageCount |
Maximum number of pages to be crawled and indexed. There is no limitation on this setting.
You can set it to a larger number if you have enough memory and disk space to support. |
1000000 |
| maxPageLength |
Maximum number of characters to be parsed and indexed in every page. There is no limitation on this setting. You can set it to a greater number if your pages are too big and you want to index all page content.
Note that this value needs to be greater than the number of characters displayed
on a page because of the HTML tags and hidden text in the source of the page.
In other words you should take into account the actual number of characters
that you see when you make a view source on a page. |
1000000 |
| minWordLength |
Minimum number of characters allowed in a word to be indexed. Words with less number of characters won't be indexed. |
3 |
| maxWordLength |
Maximum number of characters allowed in a word to be indexed. Words with more number of characters won't be indexed. |
30 |
| scoreUrl |
Score assigned to a word found inside the page url. Set it to 0 (zero) if you don't want the words in page url to be indexed. |
16 |
| scoreTitle |
Score assigned to a word found inside the page title. Set it to 0 (zero) if you don't want the words in page title to be indexed. |
8 |
| scoreKeywords |
Score assigned to a word found inside the page keywords. Set it to 0 (zero) if you don't want the words in page keywords to be indexed. |
4 |
| scoreDescription |
Score assigned to a word found inside the page description. Set it to 0 (zero) if you don't want the words in page description to be indexed. |
2 |
| scoreText |
Score assigned to a word found inside the page text. Set it to 0 (zero) if you don't want the words in page text to be indexed. |
1 |
| useDefaultProxy |
Whether to use the default proxy. |
true |
| proxyAddress |
Proxy address to use when the website is behind a proxy server. |
None |
| proxyUsername |
Proxy username to use when the website is behind a proxy server. |
None |
| proxyPassword |
Proxy password to use when the website is behind a proxy server. |
None |
| proxyDomain |
Proxy domain to use when the website is behind a proxy server. |
None |
| useDefaultCredentials |
Whether to use the default network credentials. |
true |
| networkUsername |
Network username to use when the website uses Windows authentication. |
None |
| networkPassword |
Network password to use when the website uses Windows authentication. |
None |
| networkDomain |
Network domain to use when the website uses Windows authentication. |
None |