Today is a good day to code

No Follow and Comment Spam

Posted: December 31st, 1969 | Author: | Filed under: Uncategorized | No Comments »

No Follow and Comment Spam

Picture of Irv Owens Web DeveloperA large problem looms on the horizon in the wake of so many blogs being deployed right now. How does a search engine determine which comments on a page are linking to a site of interest to a reader versus a site full of spam and pop-ups? Google and Yahoo both feel that using the attribute rel=nofollow will help to alleviate this issue by removing the benefit of comment spam.

There are a few issues with using the rel attribute in this way. First of all, the rel attribute was never meant for this purpose, and if you are using it to set relationships within your site, then it will be difficult to do both in some cases. This situation is going to be rare, but it will still be a situation. Another issue is that it doesn't stop the poor blogger from having a site with ugly comment spam all over it, and having their readers click on the link thinking that it is a helpful one, but finding instead that they now have 1000 pop-up sex spam windows. The final issue is that if a webmaster of blogger decides that all comment links are to use this attribute via scripting, then useful links that should be spidered to other people's blogs will not be followed by the search engines, thereby negatively effecting the search engine's ranking system and preventing good pages from being found.

I think I have a solution. My current solution is to disallow links in comments, this is a stopgap until I can figure out a better way. The solution could be that since most search engines have some sort of internal process for determining the quality of pages, whether it is via links or manually, it makes sense for them to enable a developer API whereby a webmaster can use some XML to indicate that the following, or the encapsulated are comments. Then what can happen is that the webmaster can use the API to bounce the links coming out of that section off of the search engine's quality system. If it is a link of poor quality, or the page is just a spam page and not related to the content, then the search engine can show results that are probably more in line with what the reader wanted to see. It the link is of good quality, then the API would go ahead and let the user follow the link.

The spider could be configured to hit the link and if it returns results from a search engine it would know that the link was bad and wouldn't try to follow it again, if the link went through it would know that it was a reliable link and the linked page would get credit for a backlink. With the breakthroughs in contextual search on the horizon, ala Y!Q, I think Yahoo! is closer to being able to implement this than Google, if I had access to their backends and their spiders, I could probably do it myself, but alas, I don't.

The final step in the chain would be for the developer or webmaster in charge of the site to remove the bogus link spam from the page, but at least it wouldn't mess with the readers, or the search spiders.

Yahoo! Search Blog – A Defense Against Comment Spam