Search

Understanding full vs incremental content processing

Answer ID 9945 | Last Review Date 03/15/2021

What is the difference between full and incremental content processing, or why do we need full content processing?

Environment:

Oracle B2C Service Knowledge Advanced
Authoring
Search
Customer Portal

Issue:

Search results are not being updated to reflect changes to existing replacement tokens, product or category hierarchy, or an external page linked via a URL answer.

When doing a cutover to a new release or patch update, sometimes a incremental crawl will be missed. If this happens some updates to the content might also be missed by the incremental crawl.

Resolution:

When an article is published, the content processing utilities must then add it to the search index in order for the article to appear in search results. First, the okcs-im_content_update utility will parse the document into the format used by the indexer. Next, the okcs-content_indexing utility will complete the process of indexing that document as well as other new and updated documents, and send the necessary information to the search runtimes.

These utilities and their respective jobs both have a "full" mode and an "incremental" mode. The incremental jobs will only consider articles with a modification time since their last run. The full modes are only run when manually queued from Collection Setup. In English, the buttons are labeled as follows:

For the crawler, "KB Full content update".
For the indexer, "Index Processing and Maintenance".
External collections are re-crawled nightly for changes reflected in a sitemap and weekly for all changes, and those buttons are here as well if there are changes to updated external content.

There are some cases when a full crawl of the knowledge base is needed:

Changes to replacement token text
Product/category hierarchy changes not done through a document
External document changes linked from a URL answer
Changes to Search Configuration settings such as industry dictionary

The actions listed above are not typical daily functions. You can run the full crawl by queuing it through Collection Setup.

The full crawl does take longer than an incremental crawl as it will recrawl and then reindex each published article. If you so choose, you could use Find to make a modification to affected documents so that the incremental crawl would pick them up again. This could be accomplished, for example, by adding or removing a dummy view to the articles. That would avoid the need for a full run where the changes do not affect all documents.

Notes:

The full indexer job Index Processing and Maintenance removes deleted, unpublished, and prior-version articles from the search index entirely. The incremental job only flags them as removed so that they are not returned in search results. For this reason, it could cause a small performance gain to run Index Processing and Maintenance periodically or after large changes to the knowledge base.

See also Answer ID 10101: Understanding Knowledge Advanced content processing for general information on the content processing process.

Search

What is the difference between full and incremental content processing, or why do we need full content processing?

Was this answer helpful?

Still have questions?

Related Answers

Login