Banner

Introduction to Web Searching

Alton Patrick
The Shodor Education Foundation
June 1998

This introduction is on a very elementary level and assumes no experience with search engines. If you feel relatively comfortable with the syntax of searching, you might find a more in-depth look, like AltaVista's Help page, more useful.

A List of Popular Search Engines

Introduction

A search engine is a program that keeps records of millions of web pages. There are lots of search engines on the Internet -- AltaVista, Infoseek, Lycos, Excite, Magellan, Webcrawler, and Yahoo are some of the more popular ones. Search engines use programs called "robots" which spend their time reading web pages and noticing which words are on which pages. In addition, search engines let people who write web pages submit a description of their page to be added to the list of pages the search engine knows about.

You'll probably never need to know about robots. You will need to know how to get search engines to tell you which pages might interest you. There are a couple of ways to do this. The first is called a category search and is useful if you want information on a general subject. Yahoo (http://www.yahoo.com) is a good place to do this kind of search.

Category Searching with Yahoo

On Yahoo's main page is a list of categories that Yahoo has information about. Each of these links takes you to a page with more related categories. The broadest topics are first, and they get narrower as you go deeper. So, to find pictures taken by the Hubble Space Telescope, you might follow these links:
  • Science
  • Astronomy
  • Astrophotography
  • Space and Astronomy Pictures
  • Hubble Space Telescope
Eventually, you should find a page with a list of links related to whatever you're searching for.

Searching by Keyword

A more common type of search is a keyword search. Keyword searches are faster than hunting through a hierarchical index like Yahoo's, but they often turn up a lot of pages that don't really relate to the subject you were searching for. For example, instead of going through the list of subjects above to find Hubble Space Telescope pictures, you could use this query string in a keyword search:
    +"Hubble Space Telescope" +pictures
(I'll explain the addition signs and quotation marks later).

The words you type are referred to as a "search string" or "query string." Each page a search engine returns in response to your query is a "hit." Sometimes a search can return a lot of hits.

Another popular search engine is AltaVista (http://www.altavista.digital.com). Go there now. In the middle of the page is a text box. This is where you type your search string. Type in the following:

    bats
and press the "search" button.

AltaVista will return a page with a list of links to pages that contain the word "bat." A line near the top of this page tells you how many hits the search generated: "About 52250 documents match your query." 50,000 web pages is a lot to look through.

There are several ways to narrow down the number of hits you get, depending on what type of information you are looking for. Try typing in this search string:

    +bats +animals
You should get about 5,000 hits this time. Five thousand is still a lot, but you've cut the number of hits to a tenth of what it was before. The addition signs in the string above tell the search engine to return only pages containing the word after the addition sign. The addition sign in front of the first word is important. Without it, the search engine will only require the second word to appear on any page it returns; the first word may or may not appear. This usually means you'll get more hits (try it).

Now try this search string:

    +bats -baseball
The minus sign, predictably, tells the search engine to exclude any pages that contain the word appearing after the minus sign. There are nearly 37,000 hits for this query. Clearly searching for "bats which are animals" is not the same as searching for "bats which aren't related to baseball" (which makes sense when you consider that there are other kinds of bats -- cricket, for instance). The point is that using the addition and subtraction operators requires some forethought to figure out which is most likely to give you the results you want.

Look at the list of links that your search for +bats -baseball generated. Near the top there should be one for "Re: Approved bats for USSSA etc." -- a baseball related topic. Why? You told the search engine to exclude pages related to baseball, right? Not exactly. You told it to exclude pages that use the word "baseball," which is slightly different. While it's usually safe to assume that pages about a subject will all use certain words related to that subject, it's not always the case. In this instance, the word "baseball" does not appear anywhere in the text of the page about USSSA bats. Search engines aren't perfect. They can reduce the amount of work you have to put into finding something, but you still have to carefully sift the information they return.

Here's one more way to narrow a search. Try this search string:

    medical school (no pluses or anything)
You'll get something like 6.35 million hits. You might think that typing in that search string, or possibly +medical +school, would give you information on medical schools. It would, but it would also give you a lot of things you didn't care about. The search engine looks for the words you use in your search string to occur anywhere in a page -- they don't have to be together. So a page that has text like this:
    In other medical news, blah blah blah yakety-smakety school blah irrelevant blah...
would fit your search string just like a page that has this text:
    At Acme Medical School, you will learn how to...
To get the search engine to find only pages that have two or more words near each other, put the words in quotation marks:
    "medical school"
This should give you about 65,000 hits. Note that the quotation marks require that the words appear near each other, not necessarily right beside each other. So the search engine could still return a page that uses the phrase "medical and dental school," but not pages where "medical" and "school" are widely separated.

Summary

To review, here are three simple ways to narrow a keyword search:
    + Any word with an addition sign in front of it must be on a page.
    - Any word with a minus sign in front of it must not be on a page.
    "" Words with quotation marks around them must all appear within a few words of each other on a page.


When all else fails...

When trying to come up with a search string, think about what words you would use in writing about the subject you're researching. Especially think of words and phrases that are specific to that subject and make it stand out from others. Unfortunately, you will often get thousands of hits, no matter how much thought you put into your search string. Remember that the first few are statistically the most relevant (they use the specified keywords most often, etc.). You should start with those pages. Don't ignore the links on pages -- even if the page you're looking at isn't useful itself, it might have a link to just the information you need. Still, you might have to click on quite a few pages until you find one that is really helpful. Sometimes there is no substitute for effort.

Also, don't limit yourself to just one search engine. No one engine even comes close to indexing every page on the World Wide Web. If you don't have any luck with the first search engine you try, try another one. Be sure to remember, however, that the syntax for searching presented in this tutorial is for AltaVista; other search engines may have their own operators for building search strings.


Intro to the Web


Last Update: June 6, 1998
Please direct questions and comments about this page to [email protected]
© Copyright 1998 The Shodor Education Foundation, Inc.