Today is a good day to code

The Web as Google”s Data Source

Posted: December 31st, 1969 | Author: | Filed under: java, Programming, Uncategorized | Tags: | No Comments »

The Web as Google's Data Source

Picture of IrvinOne of the things that Google is doing to the web is that it is, I believe by design, deterring individuals from doing things that make the presentation layer richer. The method by which they crawl and index pages leaves out Flash and AJAX content entirely. When a company that bases their revenue on pageviews and or visits is designing a web property, doing an all AJAX / Flash / Silverlight / etc… implementation of the interface is often swept off the table because Google and other search engines can't crawl it. It makes more sense for these companies, financially, to do a tidy looking crawlable interface.

The interesting thing about doing a “SEO friendly” interface is that it often isn't the best from an architectural perspective since your presentation and business logic are tied up in your controller. Technically speaking, you can't change one without the other changing on the application server. I talked about this a while ago in an article discussing the MVC pattern and application servers. Instead what you get, minus any JavaScript and CSS, is a nicely formatted, structured document, ready for ingestion into a massive database, i.e. Google's, and presented through a much nicer UI, i.e. Google's.

When looked at this way, it becomes apparent what the future of the web looks like. At first I was against things like AppEngine and doing SEO based pages until I realized that much of what is happening around us, RSS readers, and the like, are really interfaces that aggregate web content and taylor it to the way I want to see it in an often native interface. When you imagine the web as a massive data source you begin to understand the need to impose formatting rules on the interfaces. Not that crawling AJAX sites wouldn't be a daunting task, but with Google's technology, they could accomplish it, but I don't think it would be in their best interest.

If Google can get everyone to insert their content into the App Engine, then it saves them a step because your content would be inserted directly in their database, no need to crawl. Then Google could create optimized mashups of your content with other content to create superior applications. Anyone using AppEngine, or anything of the sort has to be aware that this is a possibility. If they want to control the creation of mashups, they should not use it.

My recommendation to build a future-proof site would be to build a clean simple version for the web, per Google's guidelines, and then build a desktop and mobile version that can digest your content and present it in a more flexible way. This may change as runtime web scripting languages become more powerful, but in some ways I think that we would be worse off without Google's webmaster guidelines. The web as a data source is a powerful concept, one that I'm sure is not lost on Google, or Vint Cerf. I just don't know how much that concept jives with the way people currently think of the web as a platform.