We currently seem to have 3-4 separate webspiders aggressively crawling every version of every file in Diffusion. Since the content is ajaxed in in most cases, they can't even index anything meaningful. These pages require `git` operations and are relatively expensive for us to generate.
Diviner looks like it's also a bit of a spider trap, although not as bad (and there's content there, and indexing it could be useful, and it's not as costly for us to generate).
I think T3923 is the most general solution here (force individual clients to back off) but serving `robots.txt` could be helpful too. Particularly, I suspect //no// installs are //ever// interested in spiders generating an index of Diffusion. Can you guys think of any reason to let spiders into Diffusion?
Can you come up with reasonable use cases for giving administrators more control? Given that policies already exist, my thinking is that we should just block `/diffusion/` unconditionally and leave it at that for now.