Tuning the stopword list
How can my stopword list be optimized to improve search effectiveness?


Stopwords are those which are ignored in both indexing and searching. Adding a word to the stopword list for answers or incidents (exclude_answers.txt or exclude_incidents.txt) means that the presence of the word in a search query will have no effect on the result list. Naturally, the stopword list should not include any words that might be useful for finding information. However, if a word has no information value, as with "the" or other very common words, there are two benefits to designating it as a stopword. First, by reducing the size of the index (phrases or ans_phrases table), all searches become faster. Second, and more importantly, search result lists become more relevant, as they are no longer contaminated by non-meaningful matches to the designated stopword.

Because every organization has its own terminology and uses words in a unique way, the standard stopword lists in the Oracle B2C Service application may not contain all words that are appropriate. For example, if a company makes phones and almost all answers contain the word "phone," then all those answers are at least partial matches to any search query that includes "phone." Adding "phone" to the stopword list may therefore improve the relevance of search results for many queries.

To determine the impact of changes to the stopword list, consult the Keyword Searches report to see what people are actually entering as search queries. If a given word is commonly a component of searches that are returning large numbers of results, then it is a candidate for adding to the stopword list. However, one should be cautious in adding words that searchers use in single-word queries, as these searches will then return zero results, which could be surprising or frustrating. In that case, whether or not the word is added to the stopwords, it may be helpful to create one or more Search Priority Words that have that word as keyword. This ensures that preselected results are always prominent. A good example is a company name, which might appear in many answers, but only one or two answers are most relevant for information about the company itself.

The contents of the stopword list can be managed most easily through the Answer Stopwords or Incident Stopwords manager in Configuration > Service > Knowledge Base. This provides a simple interface for adding or removing words, and may also supply suggestions for additions or removals. These suggestions are determined automatically based on word frequencies in answers and incidents, and may or may not be helpful for a particular site; use your best judgment, with reference to the Keyword Searches report. The stopword lists can also be directly edited with the File Manager.

For additional information, refer to the 'Add and remove answer stopwords' section in online documentation for the version your site is currently running. To access Oracle B2C Service manuals and documentation online, refer to the Documentation for Oracle B2C Service Products.

