With reasonable frequency, users will copy/paste blocks of log output or header files or whatever else without escaping them explicitly. This is hard to read. It's particularly problematic when some lines pick up code block style or quote style, creating a mishmash. Here's one example:
http://secure.phabricator.com/M56#8
There's an example with an HTML document in T3127.
It would be nice to automatically guess that something is a block of log output and format it specially.
Signals I can think of offhand:
- Many contiguous lines with no paragraph breaks. This is probably the strongest signal by far, and is mostly the thing we care about.
- Lots of symbols in the text?
- All lines about the same length (logfiles)?
- Some very long lines (other types of headers)?
- Match common patterns? This might be good for logfiles (match date stamps?) and HTML (match tags?).
The big cost here is false positives, but as long as the text isn't formatted too much differently I think that should be fine to occasionally pick up a false positive.