Thursday, March 23, 2006
How Web Sites Are Indexed
bySo you’ve just launched the new Web site that your team worked on for months and you do a quick Google search for a few of your keywords and you’re nowhere to be seen. How can this be? You did everything right. You researched SEO and carefully selected keywords and optimized your copy. Your pages validate. You even have a blog as part of you site. What went wrong? The answer is that you’ve done nothing wrong. Other than a common misconception many people make by thinking that search engines magically know everything about their site the moment it goes live. It’s an understandable misconception, after all most search engines aren’t exactly open about how their magic formulas work.
So, using Google as an example I’m going to try explaining how Web pages are indexed.
Google needs to know your site exists
In order to be indexed, Google and other search engines need to know you have a Web page. Your site should be submitted using the form on Google for adding URLs to be indexed. If other pages link to your site, Google should find it on its own the next time it indexes those pages with links to your page. Your page is added to the queue and will be indexed as soon as possible.
The Googlebot visits your page
Once your site gets to the top of the queue, the Googlebot will grab all the text from your page, along with any links. It then adds the links to the queue and will come back to index those pages as well. As long as all your pages are linked, you only need to submit your base URL to Google for indexing. It will find the rest of the pages on its own.
Your page gets indexed
The Googlebot gives the indexer the text it grabbed from your site and is added to the index database. The indexer makes note of the words in your text and remembers where in the document they are. This information will be used later when people enter keywords to search for.
Somebody submits a keyword query to Google
Every page that Google indexes is given a PageRank. PageRank is a number between 0 and 10 that denotes how important a page is. A page with a high PageRank is more likely to show up in search results than a page with a low PageRank.
Over 100 factors are considered when determining the PageRank and Google keeps its formula closely guarded. Popularity of the page, age of the URL, where the keywords are located on the page and the number of pages linked to a page are some of the factors that determine the PageRank.
Not all factors are equal in determining the PageRank. Some factors like meta keywords have been abused by spammers and thus, Google places very little importance on them. Where and with what proximity to each other the keywords are is a very important factor. Two of the bigger factors (at least people think they are as nobody knows for sure) are the number of pages that link to you and the title tag of the page.
When will Google index my page again?
Google constantly updates the index it uses in order to keep results fresh and to weed outdated or removed pages from its results. Pages that are updated more often are also re-indexed more often. This is why blogs do so well in search engine results. Many blogs are updated numerous times a day. Active blogs could be indexed every day while a site that rarely updates its content might be weeks between indexing.
Why doesn’t my page show up on the first page of search results?
The pages Google decides to place at the top of search results are those pages it feels are the most relevant to the keywords searched for. The PageRank for a page plays a large part in deciding if and where it shows up in the rankings. In order to take the place of a page that scores higher than your page, you need to show Google that your page is more relevant than the other pages.
A good way to increase your relevancy is to offer unique content not found on pages that rank better than your page. Updating often with unique content is often the best way to increase the relevancy of your Web pages. Focusing your pages on one or two keyword combinations is also a good strategy. Don’t try to throw everything and the kitchen sink on your page. To get to the top, your page has to be good enough to replace those that are already there. Time is also a factor. Google is rumored to ‘sandbox’ new pages, sometimes up to six months before letting them rise in the rankings.
Be relevant. Be unique. Most of all, be patient.