SEM Wired

Archive for the ‘ SEO ’ Category

Site architecture is quite of an important factor when it comes to code/architecture optimization. This video from SEOmoz goes in deep with a few examples, so that you can get a real grip of page structure and architecture.

No matter how updated is your sitemap, a correct site architecture is the first step towards a site proper optimization. The video also mention the potential effects of this procedure on usability, keeping always content at a one-click distance from every page.

Content structure is a pretty invisible task in any SEO project but absolutely fundamental. I have always seen positive consequences anyway.

flat site architecture does not mean at all that you have to get rid of directories in URLs and these are my two cents to those articles. On the contrary, it is something to add to the click distancerelevancy distribution equation.

how-to-write-non-us-ascii-text-in-robots-txtRobots.txt has got its own system of codification for content, which does not allow any text codification different than US-ASCII.

According to the URI specifications, only the US-ASCII character set has to be used in order to define URL’S. This very point can create quite a lot of trouble for webmasters trying to set up their own robots.txt with a different set of characters.

ASCII’s 128 characters only covers the English alphabet, numbers, and punctuation marks, making impossible to control search engine behaviour when some “weird” characters are used into folder codification, like ñ in Spanish and ç in French, which are left out of ASCII.

Most characters in non-Latin-based alphabets, such as pi (π) in Greek, ya (я) in Cyrillic, and entire alphabets from many other world languages, can’t be accurately written in the limited, English-oriented ASCII.

robots.txt file codification is the following:

  • ANSI (Windows-1252)
  • Unicode
  • UTF-8

The file however supports following codifications for its content:

  • ANSI (Windows-1252): 8 bit
  • ASCII: 7 bit
  • ISO-8859-1: 8 bit
  • UTF-8: 8 bit

Let’s take the case of a russian website, using Cyrillic codification for its folders and directories. In this case, characters like π or я should be correctly encoded into US-ASCII.

Percent-encoding comes into play, making possible to encode a non-ASCII string into a set of characters which can be perfectly read by search engines.

Let’s consider a russian website with a admin folder we do not want search engine to crawl:

http://www.domain.com/папка/

In order to avoid search engines crawling the admin folder, the folder’s name should be encoded as following:

Disallow: /%D0%BF%D0%B0%D0%BF%D0%BA%D0%B0/

…while the following line won’t work, since directory specifications into robots.txt must be always encoded in US-ASCII:

Disallow: /папка/

You might also want to read this article from the Bing Community, which explains the issue.

The truth about SEO

October 15, 2009 Ramblings, SEO, internet Comments

the-truth-about-seoLike everyone else who’s in this industry, I daily read loads of stuff, bog posts and articles, about how good SEO is, how important is to be ranked for the right keywords, how successfull a product can be if is commercially developed on the internet through SEO and online marketing services.

I read loads of this stuff and I’m honestly starting to get sick of it. Years ago things were simpler: marketing departements had no even idea about what SEO was and how crucial the internet would have become in a few years.

Now everybody jumps on the bandwagon… just take a look to your list of Twitter followers: how many people or companies you can find which supposedly know all secrets about SEO and internet marketing? Well, take a few of them and read those blogs and articles. As pointed out by Derek Powazek in this article:

The problem with SEO is that the good advice is obvious, the rest doesn’t work, and it’s poisoning the web.

Then it goes deeper analysing his own perspective about the industry and even if it goes to harsh sometimes, I can only agree with him. The fact is that all the hype about SEO it’s just artificially created by the market. Good webdesign and content shouldn’t need any “seo expert”.

However the need of search engine optimization professional tells us that the quality of web development these days (and the quality of the content) are getting worse and worse, requiring someone to artificially (and magically) make them appear on search engine simply correcting all those stupid mistakes developers do. It’s a pretty simplified explanation, but I guess it delivers the big picture about what the industry has become nowadays.

Take a look to the reaction to the articles as well: from the offended “seo professional” to the old school coder, to whom “magic seo practice” sound obvious. It’s quite interesting and gives also an idea about the different character the industry of made of…

Here’s the full article. And think about adding the Derek blog to your feeds, it’s worth it.

What is the Canonical Link Tag?

A few months ago Google introduced the Canonical Link Tag.

This tag is supposed to solve the duplicate content issue on different URL, which can negatively affect those page’s ranking. It just tells search engine the preferred version of the content, in order to be ranked better by the engines themselves.

The Canonical Link Tag should reside in the HEAD section of the page, and the href attribute should point to the URL of the chosen page. That should be enough.

<link rel="canonical" href="http://www.yourURL.com/">

Canonical Link Tag vs. Redirect

Upsides
It is way better than a 301 redireccion: it’s search engine to be redirected, not the users hence not affecting the user experience though making sure rankings wouldn’t be affected.

A 301 redireccion would actually affects search engine who have to update their rankings according to the quality of the content of those pages

Downsides
However, while the 301 redirects visitors and search engines from different domains, the Canonical link tag can be used only into a single domai, its folders and subdomains. That’s the only downside. Still it’s a pretty tool.

Possible application of the Canonical Link Tag

Usually PHP pages create dynamic content with random urls, lacking of informative content related in any way to the content. They usually have the visitor’s session ID and merge content from different sources. The Canonical Link Tag can preserve rankings of the original page which featured the content.

To learn more about the Canonical Link Tag, check out this post from Matt Cutts or take a look into the Google Webmaster Center.

Search the original Google algorithm and show side-by-side comparisons to Caffeine.

Last week Google announced a next-generation architecture for Google’s web search, called Caffeine.

They also kidnly asked users to contribute in the effort of benchmarking the new technology providing some feedback on their searches through this URL:

http://www2.sandbox.google.com/

Anyway, a few days ago a new service has been released which compares in real time results from both Google and Google Caffeine: GoogleCompare.

I’m not going to explain any more about the service since it’s pretty straightforward: the whole thing has a funny retro-style, kind of coffe adverts from the ’50s. It’s quite funny. Then you just type your keyword and click on brew and you get a quick comparison eventually showing how SERPs have changed due to the new algorithm.

So how did your ranking changed due to the update? I’ve have not experienced any major change in my rankings. However search speed has increased a lot… I guess that was the main effort behind the development of Caffeine.

I’ll keep an eye on Caffeine for the next few days if anything changes…