[squeak-dev] SqueakSource indexability (aka should we just ask crawlers to desist?)

Wed Apr 28 20:08:42 UTC 2010

> -------- Original Message --------
> Subject: Re: [squeak-dev] SqueakSource indexability (aka should we just
> ask crawlers to desist?)
> From: Bert Freudenberg <bert at freudenbergs.de>
> Date: Wed, April 28, 2010 2:59 pm
> To: The general-purpose Squeak developers list
> <squeak-dev at lists.squeakfoundation.org>
> 
> 
> On 28.04.2010, at 21:07, Ken Causey wrote:
> > 
> > At times access to source.squeak.org becomes slower, as has been the
> > case today.  I can see in the logs that various web-crawlers are the
> > likely culprit.  Having the information there accessible via search
> > engines is a wornderful thing but I have to suspect that the Seaside
> > session IDs eliminate this option.  (Of course when URLs like
> > http://source.squeak.org/trunk.html are found on other sites they then
> > become indexed.)
> 
> Which URLs are the bots accessing?

Well, without detailed analysis it seems to be everything.  Feel free to
look at ~squeaksource/apachelogs/.

> 
> > Unless I'm mistaken about this, and I would appreciate any guidance, it
> > seems like we need to add a robots.txt to the site which guides or
> > simply asks crawlers to stay away.  Thoughts?  I'm no SEO export.
> 
> We do have a robots.txt:
> http://source.squeak.org/robots.txt

Aha.  Well, I know little about this subject.  But if this means what I
think it means it seems that the crawlers are ignoring it.

> 
> - Bert -