What is Search Engine?
Internet search engines (e.g. Google, AltaVista) help users find web pages on a given subject. The search engines maintain databases of web sit
es and use programs (often referred to as “spiders” or “robots”) to collect information, which is then indexed by the search engine. Similar services are provided by “directories”, which maintain ordered lists of websites e.g. Yahoo!
How Internet Search Engines Work
The good news about the Internet and its most visible component, the World Wide Web, is that there are hundreds of millions of pages available, waiting to present information on an amazing variety of topics.
When you need to know about a particular subject, ho
w do you know which pages to read? If you're like most people, you visit an Internet search engine.
Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in
the ways various search engines work, but they all perform three basic tasks:
- They search the Internet -- or select pieces of the Internet -- based on important words.
- They keep an index of the words they find, and w here they find them.
- They allow users to look for words or combinations of words found in that index.
Early search engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand inquiries each day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of milli
ons of queries per day.
Before a search engine can tell you where a file or document is, it must be found. To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites. When a spider is building its lists, the process is called Web crawling. (The
re are some disadvantages to calling part of the Internet the World Wide Web -- a large set of arachnid-centric names for tools is one of them.) In order to build and maintain a useful list of words, a search engine's spiders have to look at a lot of pages.
How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the mo
st widely used portions of the Web.
Types of Search Engines
The term “search engine” is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in radically different ways.
Crawler-Based Search Engines
Crawler-based search engines, such as Google, create their listings automatically. They “crawl” or “spider” the web, then people search through what they have found.
If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.
Human-Powered Directories
A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.
Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.
“Hybrid Search Engines” Or Mixed Results
In the web's early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search is more likely to present human-powered listings from LookSmart. However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries.
Post a Comment