[squeak-dev] SqueakSource indexability (aka should we just ask
crawlers to desist?)
Levente Uzonyi
leves at elte.hu
Wed Apr 28 20:18:04 UTC 2010
On Wed, 28 Apr 2010, Ken Causey wrote:
> At times access to source.squeak.org becomes slower, as has been the
> case today. I can see in the logs that various web-crawlers are the
> likely culprit. Having the information there accessible via search
> engines is a wornderful thing but I have to suspect that the Seaside
> session IDs eliminate this option. (Of course when URLs like
> http://source.squeak.org/trunk.html are found on other sites they then
> become indexed.)
See http://code.google.com/p/seaside/issues/detail?id=262 . I had two
solutions for the problem in Seaside 2.8. One was using a linked hashtable
to manage the sessions, resulting in O(1) session creation/access time,
but it broke the almost never used feature, that every session can have
a distinct timeout value.
To solve that problem I replaced the linked hashtable with a heap, which
gave O(log(n)) creation/access time, but this time I was told to implement
it in Seaside 2.9 using the new plugin system. The above solutions can't
be implemented as a plugin, so we got nowhere.
>
> Unless I'm mistaken about this, and I would appreciate any guidance, it
> seems like we need to add a robots.txt to the site which guides or
> simply asks crawlers to stay away. Thoughts? I'm no SEO export.
This should do it:
User-agent: *
Disallow: /
Levente
>
> Ken
>
>
>
More information about the Squeak-dev
mailing list
|