Page MenuHomePhabricator

Try to guess when a user copy/pastes a giant block of output from something?
Closed, WontfixPublic

Description

With reasonable frequency, users will copy/paste blocks of log output or header files or whatever else without escaping them explicitly. This is hard to read. It's particularly problematic when some lines pick up code block style or quote style, creating a mishmash. Here's one example:

http://secure.phabricator.com/M56#8

There's an example with an HTML document in T3127.

It would be nice to automatically guess that something is a block of log output and format it specially.

Signals I can think of offhand:

  • Many contiguous lines with no paragraph breaks. This is probably the strongest signal by far, and is mostly the thing we care about.
  • Lots of symbols in the text?
  • All lines about the same length (logfiles)?
  • Some very long lines (other types of headers)?
  • Match common patterns? This might be good for logfiles (match date stamps?) and HTML (match tags?).

The big cost here is false positives, but as long as the text isn't formatted too much differently I think that should be fine to occasionally pick up a false positive.

Event Timeline

epriestley raised the priority of this task from to Wishlist.
epriestley updated the task description. (Show Details)
epriestley added a project: Remarkup.
epriestley added subscribers: epriestley, chad.
Delivered-To: chad@chadsdomain.com
Received: by 10.229.46.73 with SMTP id i9csp106608qcf;
        Sat, 1 Mar 2014 06:49:29 -0800 (PST)
X-Received: by 10.224.161.140 with SMTP id r12mr4939868qax.24.1393685369202;
        Sat, 01 Mar 2014 06:49:29 -0800 (PST)
Return-Path: <000001447e1f0eac-ef397a5b-11c3-4101-afa4-3c51e35fe958-000000@amazonses.com>
Received: from a8-86.smtp-out.amazonses.com (a8-86.smtp-out.amazonses.com. [54.240.8.86])
        by mx.google.com with ESMTP id r61si3024461qga.189.2014.03.01.06.49.29
        for <chad@chadsdomain.com>;
        Sat, 01 Mar 2014 06:49:29 -0800 (PST)
Received-SPF: pass (google.com: domain of 000001447e1f0eac-ef397a5b-11c3-4101-afa4-3c51e35fe958-000000@amazonses.com designates 54.240.8.86 as permitted sender) client-ip=54.240.8.86;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of 000001447e1f0eac-ef397a5b-11c3-4101-afa4-3c51e35fe958-000000@amazonses.com designates 54.240.8.86 as permitted sender) smtp.mail=000001447e1f0eac-ef397a5b-11c3-4101-afa4-3c51e35fe958-000000@amazonses.com
Date: Sat, 1 Mar 2014 14:49:28 +0000
Return-Path: 000001447e1f0eac-ef397a5b-11c3-4101-afa4-3c51e35fe958-000000@amazonses.com
To: chad@chadsdomain.com
From: "epriestley (Evan Priestley)" <noreply@phabricator.com>
Subject: [Pholio] [Commented On] M56: Compress homepage sidenav
Message-ID: <000001447e1f0eac-ef397a5b-11c3-4101-afa4-3c51e35fe958-000000@email.amazonses.com>
X-Priority: 3
Thread-Topic: M56: Compress homepage sidenav
X-Phabricator-To: <PHID-USER-nbueerxdfl6csylnv6oe>
X-Phabricator-To: <PHID-USER-ba8aeea1b3fe2853d6bb>
X-Phabricator-Cc: <PHID-USER-f1ca174bb6af9391a5f7>
X-Phabricator-Cc: <PHID-USER-ba8aeea1b3fe2853d6bb>
In-Reply-To: <PHID-MOCK-tslbrngjkgrivwxc2rto@phabricator.com>
References: <PHID-MOCK-tslbrngjkgrivwxc2rto@phabricator.com>
Thread-Index: OWNmZDc4NmRjYWJkYmZlMzUyOWI2ODI5ZDAxIFMR83c=
X-Phabricator-Sent-This-Message: Yes
X-Mail-Transport-Agent: MetaMTA
X-Auto-Response-Suppress: All
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"
X-SES-Outgoing: 2014.03.01-54.240.8.86

I guess I could have done that better.

Oh, nothing against you! This happens all the time and I think there's a reasonable chance we can just do the right thing automatically -- I've been thinking about this for a while, and that just reminded me.

I sort of wonder if "code block" should be a dialog with a text area. Do you format your code blocks by hand with two spaces each line? That example above I pasted in my logs then copied them again and clicked 'code block'.

I often format stuff in TextMate and indent with Command-] (I'm mostly copying code out of it anyway). I use the triple-backtick form in other cases. The button should probably use the triple-backtick form, not the two-space-indent form.

(I'm onboard with making it a dialog too.)

i honestly didn't know you could triple backtick. seems that would be better than the spaces, at least what the example should do with the remarkup bar.

Incidentally, the Remarkup button does do triple backticks now.

Could we achieve this purley in the UI? Like, bind to the paste event (not sure if there is such a thing... My JavaScript knowledge isn't great) and just be like "Is this a code block?". Although that could be a little annoying, so maybe just auto-convert it to a code block and provide the ability to undo.

FWIW, I am not a fan of the indented code-block rule... I much prefer triple backticks. Dropping support for indented code blocks would slightly help with copy-pasted code blocks.

epriestley claimed this task.

I haven't come up with any kind of rule for this in the last couple years which feels sufficiently un-magical, even in theory. It hasn't happened all that often recently, either, at least that I can recall.

As noted above, we now hint with triple backtick, so that's a slight improvement.