Page MenuHomePhabricator

During arc land harbormaster lint/unit, it is hard to remember the failure information from previous arc diff
Open, WishlistPublic

Description

During arc diff, you must make an excuse for lint error. During arc land, you must confirm you want to land despite error, but you are not surfaced the errors or the former excuses.

T8856 suggests that we surface the unit errors during arc diff/arc unit. I am suggesting surfacing the errors during arc land.

I also would like to suggest surfacing the excuse that we made earlier.

Example of current behavior

Harbormaster failed to build the active diff for this revision. Build failures:

     FAILED  Build 52326: arc lint + arc unit

You can review build details here:

    Harbormaster URI: https://tails.corp.dropbox.com/B52279

    Land revision anyway, despite build failures? [y/N] y

One potential idea

Harbormaster failed to build the active diff for this revision. Build failures:

     FAILED  Build 52326: arc lint + arc unit
         - Lint excuse: "I am taking care of this lint error separately"
         - Unit excuse: "I didn't run unit tests because XYZ"

You can review build details here:

    Harbormaster URI: https://tails.corp.dropbox.com/B52279

    Land revision anyway, despite build failures? [y/N] y

Event Timeline

nipunn raised the priority of this task from to Needs Triage.
nipunn updated the task description. (Show Details)
nipunn added projects: Restricted Project, Bug Report.
nipunn changed the edit policy from "All Users" to "Custom Policy".
nipunn added subscribers: angie, jhurwitz, nipunn.

I don't understand the difference between this and T9936. Aren't they aiming to solve the same problem (difficulty remembering what failed during arc diff when running arc land)?

(Also, I think neither this nor T9936 describe bugs in Phabricator. See Contributing Feature Requests and particularly Describing Root Problems.)

Oh -- are you creating these from the workboard? Never mind, in that case.

I saw it as two subproblems

  1. Difficulty remembering whether lint or unit failed
  2. Difficulty remembering excuse for lint failures in particular.

Stepping back, they can both be combined into that one larger issue (can't remember what happened at arc diff when typing arc land). This typically happens for me when several days pass between "arc diff" and "arc land".

Here is a scenario that has bit me in particular when working on a diff that touched many files.

  1. arc diff. Make legitimate lint excuse. Make a bogus unit excuse like "These unit tests take a while. I'll run it on the final revision"
  2. Put up for code review. Several days pass. It gets accepted
  3. arc land. Skip through the warning because I know there was a legitimate lint error and misremembered my running of unit.

In this case for me, the reasons for bypassing lint/unit failures are at the front of the mind during "arc diff", but in distant memory by "arc land".

Our install does have a separate issue which exacerbates this issue that our lint warns too often, but we're working through that separately. Regardless, I do think some tooling assistance is helpful.

It seems to me that the root purpose of the "Land revision anyway" warning is to remind you of what your earlier failures, so making that reminder more explicit seems handy.

I'm going to go ahead and merge in T9936

nipunn renamed this task from During arc land, surface lint/unit excuses from previous arc diff to During arc land harbormaster lint/unit, it is hard to remember the failure information from previous arc diff.Dec 9 2015, 12:56 AM

These unit tests take a while. I'll run it on the final revision

  • Specifically, you're passing --nounit?
  • Don't you have builds of revisions hooked up so it's sort of moot if you run tests or not? Maybe not for this project? Or is that not working well?
  • Can you make the tests run fast instead of slow?
  • Can you delete every linter which emits false positives right now? I strongly recommend this and I think I suggested it in at least once or twice in March. If they're still flaky 9 months later, maybe it's more politically reasonable to delete them all now?
  1. In this case I was yes.
  2. We have post-push builds hooked up, but not pre-push for this project in particular. (rely on arc unit)
  3. We have tests only run on affected files, but we have certain files that can trigger a lot of tests.
  4. I'm all for this (reducing some big categories from warning to advice by default). Going to try to make it happen. Would be cool to have a list of "top errors in this project" somewhere to make this process easier.

4 is a bit tricky because we use regex matching on pep8 warnings, but we also have multiple big old nasty codebase which wasn't pep8 from the start, but I'm going to try to pull the regex down to advice and pull a few select ones up (which we've sufficiently fixed to warrant) to warning (in the particular project I'm working on).

I still think that a reminder would be handy since my memory is bad, but I can report back after dropping the hammer on these warnings.

I was able to get some metrics for how many revisions have lint errors in this particular project (repo callsign 'C') with this query:

/* Lint status for all latest diffs */
SELECT
  lintStatus,
  COUNT(*)
FROM (
  SELECT
     phabricator_differential.differential_diff.revisionID,
     phabricator_differential.differential_diff.lintStatus,
     MAX(phabricator_differential.differential_diff.dateModified)
   FROM
     phabricator_differential.differential_diff
     JOIN phabricator_repository.repository
     ON phabricator_repository.repository.phid = phabricator_differential.differential_diff.repositoryPHID
   WHERE
     phabricator_repository.repository.callsign = 'C' AND
     phabricator_differential.differential_diff.dateCreated > (1449701043 - (3600 * 24 * 7)) AND
     phabricator_differential.differential_diff.creationMethod = 'arc'
   GROUP BY
     phabricator_differential.differential_diff.revisionID,
     phabricator_differential.differential_diff.lintStatus
) one_per_revision
GROUP BY lintStatus
+------------+----------+
| lintStatus | COUNT(*) |
+------------+----------+
|          0 |        2 |
|          1 |       65 |
|          2 |       19 |
|          3 |       13 |
|          4 |        2 |
+------------+----------+

I've made some big changes to our lint rules to hopefully largely reduce the lint false positives on this particular project and I'll be following along with that metric and hopefully see a lot more "1"s.

I do think it is several clicks from the "are you sure you want to land despite lint/unit failures" to determine what you clicking through.
It roughly sounds like we don't want to optimize the "failed lint" workflow to be easier to navigate because it encourages bad linting setups. In that case, we should close this one out as WONTFIX.

Thanks for the timely responses!

epriestley triaged this task as Wishlist priority.Dec 11 2015, 11:53 PM

I think this is worth improving. However, I think it is not a very important feature in situations where things are set up reasonably well (not too many lint false positives, fairly fast tests or server-side CI), and that fixing these problems will have a lot of other benefits in cases where they aren't working well. That is, this should be more like a "nice-to-have" that saves you a few seconds once a week, not something you're hitting on >50% of revisions.

If you're hitting it all the time, I'd rather find things we could do to help with that root problem. Some examples might be:

  • Better tools for understanding lint across the project (like your query above)?
  • Server-side tools for disabling lint messages (or recategorizing them into warnings/advice) for specific projects? This feels a little flimsy and depends on linters emitting good codes in the first place, but maybe it's not wholly unreasonable.
  • Maybe tools like --draft for letting you say "not ready to land" more formally?
  • Adding an explicit prompt for "--nolint" and "--nounit", separate from excuse/build data.

On this, particularly, it's mostly just blocked on improving API access to the data we'd need:

  • I don't want to do it in a way that's narrowly tailored to only the arc messages, since having reminder text for other Harbormaster failures would also be nice.
  • We don't currently expose API methods for reading detailed Harbormaster data, and the internal handling probably needs to get a little more solid first.
  • We don't currently expose API methods for reading the "excuse" data, and the way it's currently stored is weird/iffy.

A broad API pathway forward is taking shape through T9964 + T7715, although I don't plan to bring these modern endpoints to Differential until they stabilize in Maniphest, and Harbormaster probably needs more specialized approaches.

That prioritization makes sense to me. It doesn't seem worth putting a lot of effort into optimizing the workflow for a bad configuration.

Makes more sense to make it easy to convert bad configurations into good ones.

I do in particular think that tools for understanding effects of lint across a project would be really enabling. Having worked on several projects over here, over-aggressive lint configurations appear a chronic problem. I believe the failure mode arises with:

  1. Have an old codebase
  2. Add a large number of lint rules (pylint for example) and use default severities
  3. Many people ignore severities due to existing code errors / false positives.
  4. Very little insight into # of false positives.

For now, I'm going to use this query to try to improve the situation on this one project by sorting priorities by hand on the lint messages we hit most frequently. It will also help us prioritize which sorts of warnings we should go back and one-time fix in our codebase. Thanks!

eadler added a project: Restricted Project.Aug 5 2016, 4:52 PM