Skip Navigation
Expand
Using special characters in a search in Oracle B2C Service
Answer ID 541   |   Last Review Date 10/20/2023

How does Oracle B2C Service index words that contain underscores or hyphens?

Environment:

Answer Phrase Searching

Resolution:

Determining How Oracle B2C Service Indexes Strings with Special Characters:

Oracle B2C Service indexes complex strings differently depending on the nature of the string itself.  With such a wide variety of combinations of alphabetic, numeric, and special characters in a string, it is best to test how the application indexes your specific string so that you can determine how your site visitors can search on the string of interest, such as a part number, business code, or document number.

Note: If you are working with an answer that is not yet published, you can search your answers from the administrative side using the same search options that are allowed on your end-user pages. Set up a view or report that includes the same search options as your end-user pages. This allows you to replicate the search behavior that your end-users experience when you publish the answer.


To determine how the string is indexed:  When a complex string is indexed, Oracle B2C Service indexes either the entire string or it will index each component of the string separately.

For example, abc_def is indexed as a whole, but abc(def) is indexed as "abc" and "def".

Therefore, to determine how the string gets indexed, use the following approach:

  • Step 1: Search for the answer using a component of the overall string, that is, the content before or after the special character. In the case, of 123.abc, search on either "123" or "abc". 

    If the answer is returned, then the components are indexed separately.  End-users will be able to search on "abc", "123" or "abc.123" and they will get the answer in the search results.

    Note: If each component is indexed separately, end-users will typically get more answers than expected, especially if your site uses OR searching. Searching on "abc" returns the answer of interest but also returns answers with "abc", "abc.456", and "abc.789" as well.
     
     
  • Step 2: If the answer is not returned from the search done in step 1, the entire string is likely indexed. 
     
    Search on the entire string and your answer should be returned. Consider adding parts of the overall string as keywords or aliases if you feel your end-users will search on the substrings.


Indexing parts of the overall string: You can add each individual component to the Keywords field for the answer. So, in cases where searching on "abc" should return answers that contain "abc_def", you can include "abc" in the Keywords field of the answer.

You can use the aliases.txt file to define aliases for components that you search on. For example, you could add the following line to the aliases.txt file to allow the user to search on "abc" to return answers with "abc_def" and "abc123":

ABC,ABC_DEF,ABC123


With this line added to the aliases file, when a user searches on "abc", answers that have "abc_def" or "abc123" are returned in the results. For more information on keywords and aliases, refer to Answer ID 1660: Adding Synonyms for Searching.


Words that begin or end with a special character

When a word that begins or ends with a special character is indexed into the phrases table of the database (from which all searches find keywords), all punctuation characters are removed from the word. As a result, when "~tabletop" or "tabletop~" is indexed, it is entered as "tabletop".

When a phrase (or similar phrase) search is performed on "~tabletop~", the search ignores the leading and trailing characters and searches on "tabletop", which will find all incidents with "tabletop" in them. This search will pull up more answers than you are searching for since it removes any initial special characters.


Words with Special Characters in the Middle of the Word

With the exception of periods, when a special character (such as a dash or underscore) is within an alphanumeric string, the entire string is typically indexed to include the special character and word stemming is applied to the end of the word. For strings that include parentheses and brackets, in general, each component is indexed and the entire string is not indexed. In other words, parentheses and brackets make no difference.

Important! The search functionality on your site may work differently due to the addition of keywords to individual answers or due to aliasing that may be configured for your specific application. In addition, specific combinations and special characters may behave differently. Be sure to use the approach above to determine how your specific string is indexed.

The most common characters are provided below with specific descriptions:

Alphanumeric: (abc123, for example). Single letters, such as A or b, are not indexed and are not searchable. For alphanumeric combinations, the whole sequence is indexed. If the sequence ends in a letter, the sequence is subject to word stemming. Numbers within the sequence are not indexed separately. Searching on "abc" or "123" does not return the answer. Searching on "abc123" does return the answer.  
 
 
Numeric: ( 7 or 123, for example). All numbers except for zero (0) are indexed, such as 123. Searching on "7" or "123" returns all answers with that numerical value. Searching on "0" is considered a null search so all answers are returned.

In cases of measurements, such as " (inches) or ' (feet), the double or single quotes are not indexed, so 14" is indexed as 14 and 32' is indexed as 32. 
 
 
Periods within words: How the string and the components are indexed depend on how the string is entered.

For example: rightnow.com. If the letters are all lowercase, then both parts of the word are indexed separately. So, in this case, searching on "rightnow" or on "com" would return the answer. Searching on "rightnow.com" also returns the answer.

If any letter is capitalized, then ONLY the entire string is indexed and the search is case-sensitive. This allows you to enter specific file names in all uppercase letters so that the entire file name is indexed. As a result, "RIGHTNOW.COM" and "RightNow.com" are indexed differently from one another and the search is case-sensitive. Searching on "rightnow" will not return RIGHTNOW.COM or RightNow.com. Searching on "RIGHTNOW.COM" does not return the answer with RightNow.com in it. 
 
 

Dash or hyphen: Example: abc-123. The components before and after the hyphen are indexed separately because the dash is treated as a space.  This applies only to alpha-numeric phrases.  All alphabetic phrases (i.e. meta-answers), the whole string is indexed and word stemming is applied.  However, when there is a space before a dash, it acts as an exclude function. For example, if you search on "meta -answers" it will return only answers with "meta" in them, but not "answers".

In February 2010 and later releases, the trailing word has to start with a letter to be kept with the previous word and the first word needs to start with a letter as well.
  

Underscore: (abc_def, for example) The whole word is indexed and word stemming is applied. The components before and after the underscore are not indexed separately. Searching on "abc" or "def" will not return the answer. Searching on "abc_def" will return the answer. 
 
 
Parentheses and Brackets: Including ( ), [ ], and { }. Example: abc(def) or ab[c]de. The parentheses or brackets act like a space and break the string into separate components that are indexed separately.

In the case of abc(def) or abc[def], searching on "abc" and "def" will return the answer. Searching on abc(def) will perform the same search as "abc def". If OR searching is enabled for the end-user search, then answers with "abc" or "def" are returned.

Single letters are not indexed so if one component is a single letter, that component is not searchable. In the case of ab{c}de, searching on "ab" or "de" returns the answer, but searching on the single letter "c" does not return the answer. 
 
 

HTML Brackets < and >: Words and components within HTML brackets < and > are not indexed. So in the case of abc<def>, the "abc" component is indexed, but the "def" component is not.

 

Slash: The search will separate the words on the / like the . mentioned in the answer above for Periods. The slash only separates the words and doesn't follow any of the special handling that the period does with capital letters.