Skip Navigation
Expand
Keywords searching doesn't work for Japanese characters
Answer ID 9637   |   Last Review Date 01/14/2019

Why is keyword searching not working for Japanese characters?

Environment:

Oracle B2C Service, Product listing
Knowledge Advanced

Resolution:

If the question is in Japanese characters on an en_US-configured interface, this implies the question is in en_US (English/United States) language. In such a case, very limited Japanese keyword searching is supported as the question is considered as English. The Japanese language is a very contextual language and it might only work for a single character in Japanese if it is not a skip list.

To support Japanese language search, it is required to have Japanese language selected as part of the search request. Beyond a potential customized solution, this is accomplished by default from the interface locale. There can only be one language mapped to an Oracle B2C Service Interface, and similarly there can only be one Knowledge Advanced locale mapped to an Oracle B2C Service Interface. This means that if Japanese characters are to be accepted in search a Japanese language pack should be applied to the Oracle B2C Service interface, and a locale mapped to the Japanese language should be associated with that interface within Knowledge Advanced Authoring.

This is not to say that some searches using Japanese characters with such a setup will always not return results. The chief problem would be tokenization. The English tokenizer will treat any sequence of Japanese characters as a single token, but the OLT Japanese tokenizer might see it as several. So the tokens / stems that the English parser tries to find might not appear in the content at all, even if the raw individual characters do. If a document is auto-recognized as Japanese it won't use the English tokenizer, it'll use the Japanese one, so there will be a mismatch. Even if it's regarded as an English doc, if there's a phrase or sentence in Japanese, it'll all probably be treated as a single large token. Further, the tokenizer won't do anything at all with Japanese grammar, which is more morphological (word endings change) than syntax (different word order).