Ever since there have been search engines, there have been techniques that unscrupulous webmasters and shady search engine optimization firms have used to artificially boost rankings. As search engines caught on to these techniques, they devised ways to detect them without having someone physically look at each site (a practical impossibility, considering that several individual engines now index well over a billion pages). While most engines are becoming more adept at detecting "spam" pages and penalizing or removing them, there is an unfortunate side effect to this efficiency- some companies that are innocent of intentional wrongdoing unknowingly have sites that fall into the "spam" category. What follows is a list of some of the issues that can hurt such sites, followed by suggestions of how to prevent penalization or removal.
Issue #1: Bad Links.
Much of the internet is founded on sites linking to one another (a search engine itself is really just a very large collection of links). However, with the relatively recent emphasis placed upon a site's links as part of the ranking formula (commonly called "link popularity"), it has become crucial to carefully select and closely monitor the sites with which you exchange links. Google, the pioneer of this ranking methodology, often penalizes sites that provide links to what they call "bad neighborhoods"- sites that Google determines serve no purpose save for artificially boosting link popularity. It is important to note that sites are only penalized when they actively link to another site, not when a site links to them (which is only fair, as webmasters have no real control over what sites choose to link to theirs). If any page of your site contains links to outside sites, it is important to make certain that these outside sites are not being penalized. The easiest way to do this on Google is to download the Google toolbar (available at http://toolbar.google.com/). Most pages that you find on the internet have been assigned a "Pagerank", which is represented by a sliding green scale on the toolbar (visit the link to see an example). To be safe, avoid linking to any site that does not show any green on this scale (most importantly when this scale is grayed out). Such sites may be penalized, and linking to them may get your site penalized in turn (do not, however, refrain from exchanging links with sites simply because they show just a sliver of green- these sites are not being penalized and links from them may become more valuable over time). It is also very important to monitor the sites that you link to periodically to make certain that they have not been penalized since you originally added their link to your site.
Issue #2: Hidden Text.
Almost all search engines use the words on the pages of web sites as one factor in their ranking equation. This means that if the text on your pages includes your keyphrases, you have a better chance of ranking highly for those phrases than a competing page that does not include them. Some webmasters, aware of this but not wanting their visitors to actually see the text (usually for "aesthetic" reasons), began taking keyphrase-rich text and making it the same color as the page background. For example, if a page had a white background, they would add text to the page, loaded with keyphrases, in the same shade of white. A human visitor would not be able to see the text, but the search engine "spider" (the programs that search engines use to go out and index web pages) would, and it would get a ranking boost accordingly. However, engines soon caught on and began penalizing pages that used this tactic. Unfortunately, some innocent sites are still penalized for this, even though the text on their pages is visible. Say, for example, that the background of a page is white. On this white background is a large blue box that has white text within it. Even though the text is clearly visible to the visitor, the search engine is not smart enough to realize that the white text appears in a blue box- it just assumes that the white text has been placed on a white background. To avoid any potential problems, it is important that you let your webmaster know that the text on your pages should never be the same color as the assigned background color.
Issue #3: Keyword Stuffing.
As mentioned above, the words on your pages can be an important factor in the ranking of your web pages. However, it is entirely possible to have too much of a good thing. "Keyphrase Density", as it is commonly called, is the ratio of keyphrases on your page to the overall number of words on the page. While different engines prefer different keyphrase density, almost all have an upper limit, after which pages can be penalized. In most cases, this threshold would be hard to break without the text sounding inane. However, particularly when a keyphrase is part of a company name, density can accidentally become unnaturally high. For example, if your company name was "Atlanta Plumbing Pros" and you styled your text so that this company name was used in almost every sentence, you would have a dangerously high density for the phrase "Atlanta Plumbing" and would be at risk of penalization. To correct any potential problems, go over the text on each of your pages and make certain that it reads naturally and that no phrases are repeated too frequently (for example in more than half of the sentences).
Issue #4: Cloaking
Cloaking, loosely defined, is the practice of showing a search engine spider a different page than what an actual human visitor sees. This means that the server of a cloaked page makes a note of the unique address assigned to each visitor, and when that visitor is a spider, it feeds it specialized content that is designed to rank highly for certain search terms. Virtually every major engine now imposes harsh penalties on sites that use cloaking (although a few of them will allow you to pay them for the privilege, but that's a topic for a future article). Unfortunately, the intent of cloaking isn't always necessarily to trick search engines. Some high-ranking pages are cloaked simply to prevent others from stealing the underlying code (such theft is commonly called "pagejacking"). This concern, however, is somewhat unfounded today. With the increased emphasis of "off the page" elements, such as link popularity, an unscrupulous webmaster could steal the code from a high-ranking page and replicate it exactly without achieving the same high rankings. In any case, the practice of cloaking, for whatever reason, puts your site at risk of being penalized or removed from major engines, so make sure that your webmaster does not employ the technique.
Conclusion:
Search engines are becoming increasingly cognizant of the techniques used to try to fool them, and they are also becoming better at detecting and removing pages that violate their terms of service. It's important to remember that search engines make decisions on how to rank pages based upon extensive studies of their users and their preferences, and any webmaster or optimization firm that claims to know better (and subsequently uses underhanded techniques) is doing a disservice to their client. Unfortunately, however, sometimes the spam detection methods that the engines use target good sites that inadvertently meet the criteria for removal or penalization. By paying attention to the four issues above, you can help ensure that your site isn't one of them.