Link to USGS home page
Internal USGS Access Only

How Search Engines Work

A must read!

A Web Master's Guide To Search Engines provided by Search Engine Watch. Comprehensive source of information which includes, among others, the following links:

Search engines are providing their service with a handicap. As WebCrawler founder Brian Pinkerton puts it, "Imagine walking up to a librarian and saying, "travel." They're going to look at you with a blank face." Add to that they can't ask you any refining questions to help them out!

Search Engine Features Chart. A GREAT RESOURCE -- designed to help Web Masters get their pages found.

Key Points to Being Found

The location of keywords and frequency are important indexing factors (except for Google), therefore:

Off The Page Factors

Because Web authors have gotten so sophisticated in methods of striving for higher rankings all major search engines now use rating factors from external sources.

How well linked you are
By analyzing how pages link to each other, a search engine can both determine what a page is about and whether that page is deemed to be "important" and thus deserving of a ranking boost.
Click-through measurement
In short, this means that a search engine may watch what results someone selects for a particular search, then eventually drop high-ranking pages that aren't attracting clicks, while promoting lower-ranking pages that do pull in visitors.

All major search engines will rank your pages higher if you have a lot of links to your pages ... the more popular you appear to be the more highly ranked you are. The search engine Google uses this idea as a central theme for ranking.

Because of how extremely important incoming links are to your search rating, you need to be pro-active in developing relationships with other Web Masters: Develop Link Reciprocity! Encourage other sites to link to your site by offering to link to theirs.

You can use Altavista to find existing links to your site. Search example:   link:water.usgs.gov

Meta Tags

Most search engines (though not Google) use the description meta tag as their site description
<META name="description" content="Home page for water resources information from the U.S. Geological Survey">
AltaVista Example

The Meta KEYWORDS tag is used by a very few search engines, and allow you to specify words that may not exist in the document.

<META name="keywords" content="earthscience, hydrology, hydrologic, data, water, stream flow, stream, river, lake, flood, drought, quality, basin, watershed, environment">

Don't repeat keywords; "keyword spamming" is penalized. An example of what not to do can be seen in the html source of http://www.fws.gov/r5cbfo/

Meta tags will be most useful when a standard is defined for them within USGS and then used by the USGS search engine.

To Keep Pages From Being Indexed

While in most cases you want content to be found, there are some instances in which you may not want pages indexed.

The Robots Exclusion Protocol (robots.txt)
Search engines look for a file named robots.txt in your DocumentRoot directory and consult it to see which directories they are allowed to traverse. The example below directs search engines not to index the "private" directory.
User-agent: search.usgs.gov
Disallow: /icons/
Disallow: /imagemaps/
Disallow: /images/

User-agent: *
Disallow: /private/

This protocol is a gentleman's agreement, and may not be honored by unscrupulous or badly written crawlers.

The Robots META tag
It can be placed on any page you don't want indexed. A simple example is:
<META name="robots" content="noindex, nofollow">

A good source of additional information is: The Web Robots Pages and read about the Standard for Robot Exclusion.

Favorites Icon

IE 5 started the notion of searching for a small icon in the home directory of any Web site pages which are bookmarked, to use in the Favorites and Address Bar displays.

Now used by IE , Netscape 6-7, and others. The file must be named favicon.ico, 16x16 pixels, in the MS ico format.

slide 52


[up]
"Mastering a Web Site" online course
Created and maintained by Lorna Schmid and David Boldt.
http://water.usgs.gov/usgs/training/webmaster/search.html    
Last modified: Tue Sep 20 12:19:57 EDT 2005