[tahoe-dev] Modifying the robots.txt file on allmydata.org

Brian Warner warner at lothar.com
Wed Feb 24 04:01:02 UTC 2010


Kevin Reid wrote:
> On Feb 23, 2010, at 21:52, Peter Secor wrote:
> 
>>   There is currently a robots.txt[1] file which blocks crawlers from a
>> few of the projects on the site, specifically everything under / 
>> trac.
> 
> I agree that the Trac content should be indexable.


Incidentally, I originally put that robots.txt in place to avoid the
load and general confusion of having search engines index every source
file of every revision back to the beginning of the project. I've seen
that happen on other projects, and it isn't pretty. You search for some
phrase that you think should be in the source file, and you file 5000
hits all from the same site (and then can't find any of the latest
versions, or tickets that use the same string, etc).

I suggest looking carefully at the top-level URL space and allow
crawling of only the URLs that don't provide a lot of history. Tickets
good, current-version-of-source good, old-versions-of-source bad,
timeline bad, wiki good, roadmap sort of useless.

cheers,
 -Brian



More information about the tahoe-dev mailing list