[squeak-dev] SqueakSource indexability (aka should we just ask crawlers to desist?)
ken at kencausey.com
Wed Apr 28 20:08:42 UTC 2010
> -------- Original Message --------
> Subject: Re: [squeak-dev] SqueakSource indexability (aka should we just
> ask crawlers to desist?)
> From: Bert Freudenberg <bert at freudenbergs.de>
> Date: Wed, April 28, 2010 2:59 pm
> To: The general-purpose Squeak developers list
> <squeak-dev at lists.squeakfoundation.org>
> On 28.04.2010, at 21:07, Ken Causey wrote:
> > At times access to source.squeak.org becomes slower, as has been the
> > case today. I can see in the logs that various web-crawlers are the
> > likely culprit. Having the information there accessible via search
> > engines is a wornderful thing but I have to suspect that the Seaside
> > session IDs eliminate this option. (Of course when URLs like
> > http://source.squeak.org/trunk.html are found on other sites they then
> > become indexed.)
> Which URLs are the bots accessing?
Well, without detailed analysis it seems to be everything. Feel free to
look at ~squeaksource/apachelogs/.
> > Unless I'm mistaken about this, and I would appreciate any guidance, it
> > seems like we need to add a robots.txt to the site which guides or
> > simply asks crawlers to stay away. Thoughts? I'm no SEO export.
> We do have a robots.txt:
Aha. Well, I know little about this subject. But if this means what I
think it means it seems that the crawlers are ignoring it.
> - Bert -
More information about the Squeak-dev