As search engines index the web’s link structure and page contents, they find two distinct kinds of information about a given site or page – attributes of the page/site itself and descriptive about that site/page from other pages. Since the web is such a commercial place, with so many parties interested in ranking well for particular searches, the engines have learned that they cannot always rely on websites to be honest about their importance. Thus, the days when artificially stuffed Meta tags and keyword rich pages dominated search results (pre-1998) have vanished and given way to search engines that measure trust via links and content.
The theory goes that if hundreds or thousands of other websites link to you, your site must be popular, and thus, have value. If those links come from very popular and important (and thus, trustworthy) websites, their power is multiplied to even greater degrees. Links from sites like NYTimes.Com, Yale.edu, Whitehouse.gov, Indianrail.Gov.in and others carry with them inherent trust that search engines then use to boost your ranking position. If, on the other hand, the links that point to you are from low-quality, interlinked sites or automated garbage domains (aka link farms), search engines have systems in place to discount the value of those links.
The most well-known system for ranking sites based on link data is the simplistic formula developed by Google’s founders – PageRank. Page-Rank, which relies on log-based calculations, is described by Google in their technology section:
PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.”
PageRank is derived (roughly speaking), by amalgamating all the links that point to a particular page, adding the value of the PageRank that they pass (based on their own PageRank) and applying calculations in the formula.
PageRank, in essence, measures the brute link force of a site based on every other link that points to it without significant regard for quality, relevance or trust. Hence, in the modern era of SEO, the PageRank measurement in Google’s toolbar, directory or through sites that query the service is of limited value. Pages with PR8 can be found ranked 20-30 positions below pages with a PR3 or PR4. In addition, the toolbar numbers are updated only every 3-6 months by Google, making the values even less useful. Rather than focusing on PageRank, it’s important to think holistically about a link’s worth.
Here’s a small list of the most important factors search engines look at when attempting to value a link:
- The Anchor Text of Link – Anchor text describes the visible characters and words that hyperlink to another document or location on the web. For example in the phrase, “CNN is a good source of news, but I actually prefer the BBC’s take on events,” two unique pieces of anchor text exist – “CNN” is the anchor text pointing to http://www.cnn.com, while “the BBC’s take on events” points to http://news.bbc.co.uk. Search engines use this text to help them determine the subject matter of the linked-to document. In the example above, the links would tell the search engine that when users search for “CNN”, SEOmoz.org thinks that http://www.cnn.com is a relevant site for the term “CNN” and that http://news.bbc.co.uk is relevant to “the BBC’s take on events”. If hundreds or thousands of sites think that a particular page is relevant for a given set of terms, that page can manage to rank well even if the terms NEVER appear in the text itself (for example, see the BBC’s explanation of why Google ranks certain pages for the term “Miserable Failure“).
- Global Popularity of the Site – More popular sites, as denoted by the number and power of the links pointing to them, provide more powerful links. Thus, while a link from SEOmoz may be a valuable vote for a site, a link from bbc.co.uk or cnn.com carries far more weight. This is one area where PageRank (assuming it was accurate), could be a good measure, as it’s designed to calculate global popularity.
- Popularity of Site in Relevant Communities – In the example above, the weight or power of a site’s vote is based on its raw popularity across the web. As search engines became more sophisticated and granular in their approach to link data, they acknowledged the existence of “topical communities”; sites on the same subject that often interlink with one another, referencing documents and providing unique data on a particular topic. Sites in these communities provide more value when they link to a site/page on a relevant subject rather than a site that is largely irrelevant to their topic.
- Text Directly Surrounding the Link – Search engines have been noted to weight the text directly surrounding a link with greater important and relevant than the other text on the page. Thus, a link from inside an on-topic paragraph may carry greater weight than a link in the sidebar or footer.
Subject Matter of the Linking Page – The topical relationship between the subject of a given page and the sites/pages linked to on it may also factor into the value a search engine assigns to that link. Thus, it will be more valuable to have links from pages that are related to the site/pages subject matter than those that have little to do with the topic.
Link metrics are in place so that search engines can find information to trust. In the academic world greater citation meant greater importance, but in a commercial environment, manipulation and conflicting interests interfere with the purity of citation-based measurements. Thus, on the modern WWW, the source, style and context of those citations is vital to ensuring high quality results.