As you travel the vast world of
search engine results among Google, Yahoo, and MSN, you are likely to
run into junk pages at some point in your journey. Although the search
engines are working daily to improve search engine results, the search
engine spammers are working just as hard to slip through the cracks.
As you know, Google's algorithm is light years ahead of Yahoo and MSN.
However, Yahoo has been implementing a number of changes to keep up.
One of these includes the recent filing of a patent application called "Link-based spam detection." This patent details Yahoo's ideas on how to reduce the massive amount of web spam that litters the search engines.
The search engines are very well aware that there are spammers who
would like nothing more than to trick the search engines in any way
possible. This is shown within their patent, which states:
"Since top positions (high ranking) in a query result list may confer
business advantages, authors of certain Web pages attempt to
maliciously boost the ranking of their pages. Such pages with
artificially boosted ranking are called "web spam" pages and are
collectively known as "web spam." "
In fact, the Yahoo patent even describes many of the spam techniques that are currently being used today.
Little has been said about the release of this new patent application.
I am sure if it had been Google releasing a new patent, there would
have been massive coverage on the topic. However, as web masters, we
should not ignore the search engines, even if they are minor players.
This new patent reveals important trends that should not be overlooked.
Before I begin, keep in mind that Yahoo does not necessarily use
these techniques. They have simply filed a patent application, which
gives us some good indications on what they have planned for the future.
Within this document, Yahoo has outlined a system to cut down on web
spam. The authors propose a technique to semi-automatically separate
good, quality sites from spam sites. This is achieved through an
algorithm that detects spam farms with the help of PageRank and
TrustRank.
Interestingly enough, both of these terms are trademarks of
Google. Although the same terms are used, the application of these
algorithms are probably somewhat different. Here is how Yahoo's patent
application defines each term:
"PageRank is a family of well known algorithms for assigning numerical
weights to hyperlinked documents (or web pages or web sites) indexed by
a search engine. PageRank uses link information to assign global
importance scores to documents on the web.[...]. The PageRank of a
document is a measure of the link-based popularity of a document on the
Web.
TrustRank is a link analysis technique related to PageRank. TrustRank
is a method for separating reputable, good pages on the Web from web
spam. TrustRank is based on the presumption that good documents on the
Web seldom link to spam. TrustRank involves two steps, one of seed
selection and another of score propagation. The TrustRank of a document
is a measure of the likelihood that the document is a reputable (i.e.,
a nonspam) document."
This is not the first time that Yahoo has thought about TrustRank. In 2004 , Yahoo co-authored a research paper with Standford University entitled, "Combating Web Spam with Trustrank."
This paper has many of the same theories as the new Yahoo patent
application. Both use a semi-automated system for determining whether a
page is reputable or spam. Some human intervention is required in order
to pick out a set of reputable seed pages. The algorithm then uses this
set of seed pages and rates other pages based on their interlinking
pattern with the trusted seed pages.
However, in this particular document, details were not given on
how this would take place. With the release of Yahoo's new patent, we
are given a glimpse at one possible approach. Unfortunately, the
explanation is way beyond my technical and mathematical abilities.
The basics, on the other hand, are pretty easy to understand. For
example, let's say that a particular web site has been determined to be
a reputable web site. If you acquire a link from this site, your web
site would then be given a higher TrustRank because you are closely
associated with the reputable site.
The further out a web site is within the linking structure, the lower
the TrustRank they would receive. Basically, according to Yahoo's
proposed mechanism, the link structure of reputable web sites can be
used to discover other pages that are likely to be reputable sites.
What does this mean for your web site?
This is just one more attempt to improve the relevancy of search
results. This time the idea is centered around detecting links from
link farms and other shady resources. The value of staying in the
search engine's "good" book is becoming increasingly more important.
It is crucial that you obtain inbound links from quality, authority
sites and avoid un-reputable junk sites at all costs. Focus on organic
link building and link to high-quality sites that are beneficial to you
and your web site visitors. Services that offer instant link exchanges
may look good on the surface, but they could very well cause damage in
the long run.
The search engines are getting smarter every day. Fortunately, we don't
have to. The search engines have always been looking for the same
thing: good quality content. As long as you fill your site with good
content and follow some basic search engine optimization principles,
you should do well.
This article may be freely distributed without modification and
provided that the copyright notice and author information remain intact.