Configuration Elements
Element Description
scanDirectoryList Starts scanning (crawling and indexing) the files under the specified directories and continues until it covers all subdirectories underneath. If you don't specify anything in scanDirectoryList, scanXmlList or scanUrlList it scans the files under the current web application by default. Note that if you enter anything in scanDirectoryList you also need to set mapPathList so that it can map to the virtual path to crawl properly.
scanXmlList Parses the local XML file specified by "filePath" to extract the urls from the elements or attributes specified by "urlXPath". You can list one or more website navigation files such as UltimateMenu, UltimatePanel and UltimateSitemap source XML files, each one specified in a separate "scanXml" element. Note that "urlXPath" is case-sensitive. Also note that you can set "filePath" in three different forms.
scanUrlList Starts scanning (crawling and indexing) with the specified urls and continues with the urls inside each page until it covers all urls within each domain. You can list multiple domains, home pages, sitemap pages, or any other url. Note that scanUrl can be set to any URL that opens as a page in your browser window. If you set it to a directory like WebApplication2 you should enable default documents on the Documents tab of the IIS settings.
excludePathList Urls starting with the specified prefixes will be discarded. Note that you can also use the robots.txt file to disallow paths, or robots meta tags to set noindex and nofollow flags in each page. You may visit http://www.robotstxt.org/wc/exclusion-admin.html to get more familiar with the robots.txt file and meta tags. If you don't specify anything it will exclude the UltimateSearchInclude directory under the current web application by default.
Ignore Tags You can exclude a portion of your pages in three different ways:
1. Use UltimateSearch_IgnoreBegin and UltimateSearch_IgnoreEnd tags to exclude everything between these tags from indexing.
2. Use UltimateSearch_IgnoreTextBegin and UltimateSearch_IgnoreTextEnd tags to exclude only the text between these tags from indexing, while following the links.
3. Use UltimateSearch_IgnoreLinksBegin and UltimateSearch_IgnoreLinksEnd tags to exclude only the links between these tags from indexing, while indexing the text.

See how you can define these ignore tags below:

<!-- UltimateSearch_IgnoreBegin -->
  Everything here will be ignored
<!-- UltimateSearch_IgnoreEnd -->

<!-- UltimateSearch_IgnoreTextBegin -->
  Text here will be ignored, but links will be followed
<!-- UltimateSearch_IgnoreTextEnd -->

<!-- UltimateSearch_IgnoreLinksBegin -->
  Links here will be ignored, but text will be indexed
<!-- UltimateSearch_IgnoreLinksEnd -->
includeFileTypeList Only the files with the specified extensions will be scanned. Note that these files must be of text/html type so that they can be crawled properly. For non text/html file types you will need to use IFilters as explained in the ifilterList and mapPathList elements.
ifilterList IFilters are used to open and parse the non text/html file types such as pdf, doc, xls, ppt, etc. that are not in the default includeFileTypeList above. You need to install the specific IFilter for each file type. You may visit http://www.ifilter.org to download the necessary IFilters for free. Note that you don't need to install an IFilter for doc, xls, ppt since they already exist on Windows server. You only need to add the file extensions here in separate ifilter elements. Also note that you need to set mapPathList since it requires a physical path or UNC in order to open these file types. They can't be crawled as text/html files. So they have to reside on your local network.
mapPathList Virtual to physical path mappings must be provided if you use scanDirectoryList or ifilterList.
devProdMapPathList When you deploy your web application to a production/hosting environment, you may not have the ability to crawl/index your website or you may not have the necessary permissions to save your index files. In that case, you may build your index file on your development/publishing machine, and then copy the Index directory onto your production machine. On your development/publishing machine, you have to provide "devProdMapPathList" so that the generated index files have the urls point to the actual production machine instead of your development machine. After copying the Index directory onto the production machine you will also need to update the config file on that machine to set "saveIndex", "saveEventLog", and "saveSearchLog" to "false" since you're not allowed to write onto that machine. On your production/hosting machine, open the UltimateSearch.admin.aspx page in IE, and click "Load Copied Index" in order to load the copied index.
defaultDocumentList Default documents under each directory. When you specify this list, it won't index the directory url and the default document url at the same time.
stopWordList These words will not be indexed. Note that you don't need to list words that are shorter than the "minWordLength" attribute setting.
Configuration Attributes
Attribute Description Default Value
ignoreAllNumericWords Ignore words that contain only numeric characters such as 1234. true
ignoreMixedNumericWords Ignore words that contain both numeric and alphabetic characters such as ABC123. true
indexDirectory Directory that contains the index files. Give full permission to the ASP.NET user (NETWORK SERVICE in Windows 2003) on the Index directory in order to save the index files properly. ~/UltimateSearchInclude/Index
logDirectory Directory that contains the event and search logs. Give full permission to the ASP.NET user (NETWORK SERVICE in Windows 2003) on the Log directory in order to save the log files properly. ~/UltimateSearchInclude/Log
saveEventLog Whether to log history of index and configuration change history. true
saveSearchLog Whether to log history of search operations. true
useRobotsFile Whether to use robots file to disallow paths from crawling. If you want to keep the querystrings as part of the indexed urls you should set this flag to false. false
useRobotsMeta Whether to use meta tags to set noindex and nofollow flags in each page. false
removeQueryString Whether to remove query string from urls while crawling. false
urlCaseSensitive Whether to treat urls case-sensitive or not. If you set this flag to true indexed urls will be case-sensitive, i.e. search results may show both http://www.mydomain.com and http://www.MyDomain.com if both links exist on your pages. This feature is especially useful if the values in querystrings need to be case-sensitive. false
frequencyInDaysForReindexFull Reindexes everything from scratch after the specified number of days. For example, you can set the value to 7 to reindex everything once a week. 7
frequencyInDaysForReindexIncremental Reindexes only the updated documents after the specified number of days. For example, you can set the value to 7 to reindex changes only once a week. 7
dependencyFileForReindexFull Reindexes everything from scratch whenever this file gets updated. For example, you can set the value to ~/Samples/Cookbook/Index.htm. 7
dependencyFileForReindexIncremental Reindexes only the updated documents whenever this file gets updated. For example, you can set the value to ~/Samples/Cookbook/Dessert/PoppySeedBundtCake.htm. 7
maxPageCount Maximum number of pages to be crawled and indexed. There is no limitation on this setting. You can set it to a larger number if you have enough memory and disk space to support. 1000000
maxPageLength Maximum number of characters to be parsed and indexed in every page. There is no limitation on this setting. You can set it to a greater number if your pages are too big and you want to index all page content. Note that this value needs to be greater than the number of characters displayed on a page because of the HTML tags and hidden text in the source of the page. In other words you should take into account the actual number of characters that you see when you make a view source on a page. 1000000
minWordLength Minimum number of characters allowed in a word to be indexed. Words with less number of characters won't be indexed. 3
maxWordLength Maximum number of characters allowed in a word to be indexed. Words with more number of characters won't be indexed. 30
scoreUrl Score assigned to a word found inside the page url. Set it to 0 (zero) if you don't want the words in page url to be indexed. 16
scoreTitle Score assigned to a word found inside the page title. Set it to 0 (zero) if you don't want the words in page title to be indexed. 8
scoreKeywords Score assigned to a word found inside the page keywords. Set it to 0 (zero) if you don't want the words in page keywords to be indexed. 4
scoreDescription Score assigned to a word found inside the page description. Set it to 0 (zero) if you don't want the words in page description to be indexed. 2
scoreText Score assigned to a word found inside the page text. Set it to 0 (zero) if you don't want the words in page text to be indexed. 1
userAgent User-Agent to identify the originator of the HTTP request sent to the web server during crawling. For example, you can set it to "BlackBerry8100/4.2.0 Profile/MIDP-2.0 Configuration/CLDC-1.1" if you want to index your website for people connecting from a mobile device like BlackBerry Pearl. Karamasoft UltimateSearch Crawler
useDefaultProxy Whether to use the default proxy. true
proxyAddress Proxy address to use when the website is behind a proxy server. None
proxyUsername Proxy username to use when the website is behind a proxy server. None
proxyPassword Proxy password to use when the website is behind a proxy server. None
proxyDomain Proxy domain to use when the website is behind a proxy server. None
useDefaultCredentials Whether to use the default network credentials. true
networkUsername Network username to use when the website uses Windows authentication. None
networkPassword Network password to use when the website uses Windows authentication. None
networkDomain Network domain to use when the website uses Windows authentication. None