FREE BACKGROUNDS, BUTTONS, PHOTOS AND TEMPLATES
Plus Website Design Tips, Tricks and Resources

How to make sure your site gets properly crawled and indexed by robots
by Andrei Smith

Search engines have robots that come to your site and grab everything there is to grab. But because competition is so fierce, there is no way to get in the search engines, unless you pay for ads or hire a SEO (Search Engine Optimization) consultant, right? Wrong!

Even if you pay big money, if your site is not properly seen by the robots used by search engines for indexing, chances are many of your pages will never make it.

In this article I will discuss the importance of having your website structured properly, the importance of using the old fashioned hyperlinks versus the modern Flash menus, scripts and extensions and provide you with a very simple and free tool that will allow you to see your site in a similar fashion most indexing robots do. But first, let's define some of the concepts.

What is a www robot?

A robot is a computer program that automatically reads web pages and goes through every link that it finds.

The first robot was developed by MIT and launched in 1993. It was named the World Wide Web Wander and its initial purpose was of a purely scientific nature, its mission was to measure the growth of the web. The index generated from the experiment's results proved to be an awesome tool and effectively became the first search engine. Most of the online stuff we can't live without today was born as a side effect of some scientific experiment.

What is a search engine?

Generically, a search engine is a program that searches through a database. In the popular sense, as referred to the web, a search engine is considered to be a system that has a user search form, which can search through a repository of web pages gathered by a robot.

What is a bot? What is a spider? What is a crawler?

Bot is just a shorter, cooler (for some) version of the word robot. Spiders and crawlers are robots, only the names sound more interesting in the press and within metro-geek circles. For reasons of consistency, I will use the term robot throughout this article, when referring to spiders, crawlers and bots.

Are there other... things that crawl out there?

Oh yeah, but these things are way beyond the scope of this article. Well, for the conspiracy theory aficionados, let's see... we have worms - self-replicating programs, webants (or ants) - distributed cooperating robots, autonomous agents, intelligent agents and many other bots and beasties.

How do robots work?

As with all other things technical, I believe that the only way you will utilize a technology to its full potential and to your best advantage is if and when you understand how that technology works. When I say how it works, I don't mean intricate technical details, but fundamental processes, big picture stuff.

Generally, robots are nothing but stripped down versions of web browsers, programmed to automatically browse and record information about web pages. There are some very specialized robots out there, some that look only for blogs, some that index nothing but images. Many (such as Google's GoogleBot) are based on one of the first popular browsers, called Lynx. Lynx was initially a pure text browser, therefore, in today's internet Lynx would be extremely robust and fast. Basically, if you can program, you can take Lynx, modify it and make a robot.

So how do these things actually work? They get a list of websites, and literally start "browsing" them. They come to your site and then start reading the pages and following every link, while storing different information, such as page titles, the actual text of the page, etc.

Based on the above, what would happen if instead of your beloved Internet Explorer, Firefox, Opera or whatever browser you are attached to, you go dig on the internet and download a version of the venerable Lynx browser?

I'll tell you what would happen, and some will probably accuse me of giving away one of the secrets the SEO corporate community does not want you to know:

You will be able to see your site very close to the way a robot sees it. You will be able to look for errors in your pages and track down navigation errors that might block a robot from seeing portions of your site.

In plain English, let's say you built a great looking site. There is an index page, the first page one sees when entering your site. On that page you have the most incredible Flash navigation system, with a huge button pointing to your products and services and the rest of the site. If Lynx goes to your index page and will not see a standard link, it will not be able to see the rest of your site. There are extremely high chances that a lot of indexing robots will not see your site either.

You will then understand why your very large site, that has one of the most intricate and functional Flash based navigation systems on the planet never makes it high into the search engines, even after all your efforts of manually submitting it everywhere. It's simply because you forgot to add basic hyperlinks. It's because when you submit a site - even manually - all that really happens is you telling the search engine "hey, Mr. Search Engine, whenever you think you can find some time, please send your trusty robot to my site".

Folks, robots can't usually use a navigation menu made in Flash, Java script, PHP, etc. and will not be able to get to your pages, it's as simple as that.

How do I get Lynx?

Lynx first started life as a UNIX application, written by the University of Kansas as part of their campus-wide information system. It then became a gopher application (a pre-web search tool), then a web browser. The official page for Lynx is http://lynx.isc.org, however, if you are not a Linux geek, used to play with binary distribution files and used to compiling your own apps (don't worry about what I just said), you might want to find a version that someone else already made usable for your computer. For example, if you are a PC user running Windows, you might want to check links to "Win32 compiled versions". At the time of writing, one such site is http://csant.info/lynx.htm (called a distribution site) where you can download a version that will install onto Windows machines in a fashion that will be familiar to non-geeks. After you install the browser, you might want to read the documentation. To get you going and to alleviate your beginner frustrations, I'll tell you that you must press the G key (as in "go"), then type the complete URL of the site you want to browse (starting with "http://"), then hit Enter. Use the arrows to navigate.

Bottom line, use Lynx to verify that every page of site is accessible and let the robots do all the work for you. You'll save yourself a lot of aggravation and maybe some money that you would waste on advertising your otherwise non-indexable site.

About the Author
Andrei co-owns bsleek ( http://www.bsleek.com ) - a site that specializes in web hosting, design, promotional items, printing, CD Presentations and more. Andrei is on the Board of Consultants for Daterade.com and has amassed an extensive technical knowledge and experience through his career as the CIO for a major travel management company and through his past careers in military research, data acquisition and aerospace engineering.

  BACKGROUNDS

BUTTONS

TEMPLATES
   HTML Websites, Flash Intros, PSDs
   Available for online purchase.

ANIMATIONS

SYMBOLS/ICONS

MUSIC

CLIP ART

PHOTOS

HTML

FRAMES/TABLES

JAVA

COUNTERS, GUEST BOOKS, ETC.

SITE PROMOTION

FAQ

DEVELOPER FORUM

HOME


Copyright ci-Interactive. All Rights Reserved
Visit us at www.cyberisle.com