Page MenuHomePhabricator

Improve the heuristic which guesses which function/method/class line best provides context for diff snippets
Open, WishlistPublic

Description

Even with diffs that have full-file context available, the diff is shown without a line showing which function each change is in. Showing the function as "git diff" or "diff --show-c-function" do would help a lot to put the change in context and aid reviewing.

Seen with C++ code in e.g. https://phabricator.kde.org/D2934 [1] but also in https://secure.phabricator.com/D16677 : src/applications/conpherence/application/PhabricatorConpherenceApplication.php

[1] "git log -p" for this change shows:

diff --git a/daemon/powerdevilcore.cpp b/daemon/powerdevilcore.cpp
index 30adb64..6c264ac 100644
--- a/daemon/powerdevilcore.cpp
+++ b/daemon/powerdevilcore.cpp
@@ -388,6 +388,8 @@ void Core::loadProfile(bool force)
     }
 
     // If the lid is closed, retrigger the lid close signal
+    // so that "switching profile then closing the lid" has the same result as
+    // "closing lid then switching profile".
     if (m_backend->isLidClosed()) {
         Q_EMIT m_backend->buttonPressed(PowerDevil::BackendInterface::LidClose);
     }

i.e. it includes the enclosing "void Core::loadProfile(bool force)" function signature in the context.

(I notice your syntax highlighting agrees that c-function contexts are useful! )

Event Timeline

Are you referring to the function in the grey area above the diff?

pasted_file (234×1 px, 79 KB)

I'm referring to that area in general, but I'm talking about e.g. the src/applications/conpherence/application/PhabricatorConpherenceApplication.php section. The grey area above the diff shows

'/conpherence/' => array(

when it should say something like

public function getRoutes() {
epriestley triaged this task as Wishlist priority.Oct 6 2016, 5:42 PM

We use a heuristic to guess which function, class, etc., the context is in. Git uses some similar heuristic.

This heuristic we use is crude, and primarily based on indentation levels. In some cases, it produces especially bad results -- for example, if code is written in this style:

if (condition)
{
  ...
}

..I believe we select { as the context for the ... content, which is useless.

There are some ways we can improve the heuristic in general (for example, ignore line containing a single opening { as candidates for context inclusion).

However, at some point I expect that we'll be unable to continue meaningfully improving the heuristic in general, and need to start adding rules based on the file language (for example, look for function signatures in C, keywords like "function" and "class" in PHP, patterns like ".prototype" in Javascript, lots of opening braces in Lisp, etc).

Getting the best possible result (e.g., the line a human would select) in all cases requires us to have substantial knowledge of how to parse every programming language. We can modularize this and chip away at it, but it's a bottomless amount of work.

epriestley renamed this task from add function to diff context to Improve the heuristic which guesses which function/method/class line best provides context for diff snippets.Oct 6 2016, 5:43 PM

However, at some point I expect that we'll be unable to continue meaningfully improving the heuristic in general, and need to start adding rules based on the file language (for example, look for function signatures in C, keywords like "function" and "class" in PHP, patterns like ".prototype" in Javascript, lots of opening braces in Lisp, etc).

This is, of course, what git itself does, in userdiff.c:

userdiff.c
...

PATTERNS("php",
	 "^[\t ]*(((public|protected|private|static)[\t ]+)*function.*)$\n"
	 "^[\t ]*((((final|abstract)[\t ]+)?class|interface|trait).*)$",
	 /* -- */
	 "[a-zA-Z_][a-zA-Z0-9_]*"
	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
	 "|[-+*/<>%&^|=!.]=|--|\\+\\+|<<=?|>>=?|===|&&|\\|\\||::|->"),

...