How can I modify the robots.txt file to allow search engines to index our site?
Spiders, Robots, Search Engine Site Indexing
For Oracle B2C Service sites, a robots.txt file is installed on each interface. The robots.txt file prevents random spider searches that can be enacted against an Oracle B2C Service site. This file can be viewed at:
The default disallow file used on sites contains the following:
The default allow file contains the following:
User-agent: Googlebot # Google # ADDED BY HMS Disallow: # ADDED BY HMS User-agent: MSNBot # MSN # ADDED BY HMS Disallow: # ADDED BY HMS Crawl-delay: 0.2 # ADDED BY HMS User-agent: Slurp # Yahoo! # ADDED BY HMS Disallow: # ADDED BY HMS Crawl-delay: 0.2 # ADDED BY HMS User-agent: TEOMA # Ask.com # ADDED BY HMS Disallow: # ADDED BY HMS User-agent: bingbot # Bing # ADDED BY HMS Disallow: # ADDED BY HMS User-agent: * # ADDED BY HMS Disallow: / # ADDED BY HMS
The instructions to update the robots.txt file can be found in Answer ID 12254: Sitemap and robots.txt.
Note: When configuring your changes please follow these formatting requirements.
- There are default allow and default block robots files. You can only ADD entries.
- All entries added need to have the flag #CUSTOM on the end of the line.
- Since the default uses disallow it is easier to add disallows to get the desired end result.
There are some standard rules for configuring robot.txt files. For information about building, modifying, and maintaining robots.txt files, refer to http://www.robotstxt.org. NOTE: Robots.txt files cannot be altered for community sites.
Most search engines code their spiders to look for these files and obey them. However, not everyone does -- especially email harvesters -- thus, the existence of the robots.txt file does not always mean that a spider cannot search your site.
Allowing your Site to be Indexed:
Customers can modify the robots.txt file to allow site indexing. This would allow their site to be indexed by search engines, including Google, Yahoo, Bing, and so on.
Many of these search engines have various robot agents that index sites. Some are standard (free), while others require content submission and/or are paid services.
Each of the major search engines has a section on their web site devoted to configuration parameters for their robots. Also, there are exclusion parameters available that allow various pieces of a site to be indexed and others excluded.
Every legitimate robot should have online content devoted to its proper configuration so that you can determine how your robots.txt file should be updated. Expand the section below to see a few common examples (though there are several other search engines):
Click the next to the appropriate heading below to expand that section for viewing.
Prior to making changes:
It is important that you review the appropriate information regarding robot configuration for that engine. Then, determine the content for your robots.txt file in accordance with what you want indexed. With this feature enabled, common search engine's web bots are added to the robots.txt file.
It is your responsibility as a customer to define the content of the robots.txt file if you wish to change it. In addition, if you have multiple interfaces, be sure to select the appropriate interfaces and robots.txt files are to be modified since each interface has its own robots.txt file.
- Note that the search engines only look at the entries until they find the entry that applies to them. So if the first entry on your robots.txt file is the default, this will allow every search engine to index your site and ignore following entries, regardless of what the following entries are.
- Please also note that our system will sometimes add entries below what you have specified in your robots.txt file, but as long as the first entry is not changed, this will not affect indexing.
This feature enables search engines to easily catalog your site. For more information regarding sitemap, refer to Answer ID 2553: Using a sitemap with our Oracle B2C Service application.
Additional information is available in online documentation for the version your site is currently running. To access Oracle B2C Service manuals and documentation online, refer to the Documentation for Oracle B2C Service Products.
If you have questions around what generates a session and how you can prevent inaccurate session billing on your site please review Demystifying Session Usage (PDF). Some simple mis-steps in customization and configuration can increase billable sessions. For more information, see Session usage information.