Skip Navigation
Expand
Knowledge Advanced URL Content Type
Answer ID 12484   |   Last Review Date 11/29/2022

Why would we use a URL Content Type?

Environment:

  • Oracle B2C Service
  • Sites with Knowledge Advanced only, all versions
  • Authoring, Content Creation for URL Answers

Issue:

A specific Content Type can be added for URLs that you would like to be crawled and appear as articles.  This content type has to have a specific reference key of URL_ANSWER and there has to be one field with reference key URL.  If it also recommended that a field wtih SUMMARY reference key be used as the URL title.  If a summary field is not added then the title will be pulled from the page based on document crawl rules.  The labels can be adjusted to meet customer needs and other fields can be added for documentation, but they are not crawled.  Only a SUMMARY and URL field are crawled and indexed.  If the summary field is a master identifier field it will show up as the title where IM master identifiers are used, like browse, search, find, IM reports etc.  This is what is recommended.
 
Resolution:
 
This content type acts just like any other content type with two exceptions.
1. The URL field is crawled with depth of 1, if the URL is complete and can be crawled. The URL can point to valid KA supported document type like HTML (like https://docs.python.org/3/library/decimal.html), PDF, MS-WORD, MS-EXCEL, MS-POWERPOINT, TEXT, XML, IQXML, RTF, OPEN-OFFICE-DOCUMENT, OPEN-OFFICE-SPREADSHEET and OPEN-OFFICE-PRESENTATION.
2. The content type article itself is not shown to users except in the preview pane, just the URL is opened when the article is opened from the search or browse functions.  Any extra fields are for information purposes in authoring only.
 
The differences between an external web collection created in the collection setup interface are:
1. External web collections are scheduled to crawl incrementally for every 15 minutes for changes based on timestamps. External web collections are crawled in FULL mode (Complete re-crawl and re-index) weekly for all changes.  The URL answer is only crawled when the content article is updated.  Since the content is a URL the up to date content is always shown, but there could be an phrase or excerpt returned in search that is no longer correct.
2. The External web collection is a spider crawl, there is no depth but there is a limit to the collection size.  The URL answer is only crawled as a single url page.
 
Conclusion:

What this might be best for is documents that you want to host on your own server.  HTML pages themselves might be better in a web collection unless they are very static and you really only want one page crawled.