Page MenuHomePhabricator

Improve error-handling behavior of Diffusion repository landing page on non-clustered installations
Closed, ResolvedPublic

Assigned To
Authored By
cspeckmim
Sep 3 2021, 10:23 PM
Referenced Files
F9661171: Screen Shot 2021-09-04 at 5.28.41 PM.png
Sep 5 2021, 12:30 AM
F9661036: Screen Shot 2021-09-04 at 6.34.08 PM.png
Sep 4 2021, 10:50 PM
F9661034: Screen Shot 2021-09-04 at 6.33.54 PM.png
Sep 4 2021, 10:50 PM
F9660791: Screen Shot 2021-09-04 at 12.45.04 PM.png
Sep 4 2021, 7:46 PM
F9660766: Screen Shot 2021-09-04 at 3.16.41 PM.png
Sep 4 2021, 7:20 PM
F9660740: Screen Shot 2021-09-04 at 12.03.36 PM.png
Sep 4 2021, 7:06 PM
F9660744: Screen Shot 2021-09-04 at 12.03.41 PM.png
Sep 4 2021, 7:06 PM

Description

On non-clustered installations the semantics of error handling Futures is different than for clustered installations. The result for non-clustered installations is that when an error happens querying history (or querying file paths for browse, branches, etc.) an exception is thrown during the construction of the query future rather than the resolution of the future. The construction of the futures is not guarded by try/catch and the end result to the user is that no content appears. The intent of this page is to still render parts of the page which succeed in having their content loaded, particularly the ability the access the Actions > Manage button.

To reproduce:

  1. Configure a non-clustered environment with a repository
  2. Remove a recent changeset from the repository's on-disk state
  3. Navigate to the repository's main page

The result is a page with the error but no other controls:

Screen Shot 2021-09-04 at 3.16.41 PM.png (998×3 px, 250 KB)

Additionally, this specific error occurs when trying to query recent history of the repository, of which the history is no longer used on this page. This query can be outright removed, reducing the chance of an error happening and possibly improving the loading of the page.

Event Timeline

cspeckmim created this task.

I'll have some more detail in D21717 in a minute, but I can't immediately reproduce this, I think? If I intentionally break diffusion.historyquery like this:

diff --git a/src/applications/diffusion/conduit/DiffusionHistoryQueryConduitAPIMethod.php b/src/applications/diffusion/conduit/DiffusionHistoryQueryConduitAPIMethod.php
index a7e9494a80..fc2e4c05d3 100644
--- a/src/applications/diffusion/conduit/DiffusionHistoryQueryConduitAPIMethod.php
+++ b/src/applications/diffusion/conduit/DiffusionHistoryQueryConduitAPIMethod.php
@@ -46,6 +46,8 @@ final class DiffusionHistoryQueryConduitAPIMethod
     $commit_hash = $request->getValue('commit');
     $against_hash = $request->getValue('against');
 
+    $commit_hash = 'does-not-exist';
+
     $path = $request->getValue('path');
     if (!strlen($path)) {
       $path = null;

...the History tab, specifically, fatals:

Screen Shot 2021-09-04 at 12.03.36 PM.png (1×2 px, 379 KB)

...but the landing page is fine:

Screen Shot 2021-09-04 at 12.03.41 PM.png (1×2 px, 290 KB)

...so there's no issue with accessing "Manage": the only page with "Manage" on it renders properly. Am I missing something, or is this description conflating some issues, or something else?

Most of this description still applies reasonably to fixing the "History" tab, but it looks like that issue is at least a slightly narrower one, maybe?

I think I did conflate this partially - namely that this isn't directly caused by the issue discussed in T13365. Where this issue came up for me is while testing out D21715, I removed the head commit from the on-disk repo which caused the landing page to not render properly:

Screen Shot 2021-09-04 at 3.16.41 PM.png (998×3 px, 250 KB)

I'm able to reproduce this but I'll investigate further

Ah, that's helpful. I found a repro, it's specific to the behavior of the ImmediateFuture pathway for non-clustered resolution of Conduit "calls". This should force it from any configuration/state, I think:

diff --git a/src/applications/diffusion/conduit/DiffusionHistoryQueryConduitAPIMethod.php b/src/applications/diffusion/conduit/DiffusionHistoryQueryConduitAPIMethod.php
index a7e9494a80..fc2e4c05d3 100644
--- a/src/applications/diffusion/conduit/DiffusionHistoryQueryConduitAPIMethod.php
+++ b/src/applications/diffusion/conduit/DiffusionHistoryQueryConduitAPIMethod.php
@@ -46,6 +46,8 @@ final class DiffusionHistoryQueryConduitAPIMethod
     $commit_hash = $request->getValue('commit');
     $against_hash = $request->getValue('against');
 
+    $commit_hash = 'does-not-exist';
+
     $path = $request->getValue('path');
     if (!strlen($path)) {
       $path = null;
diff --git a/src/applications/repository/storage/PhabricatorRepository.php b/src/applications/repository/storage/PhabricatorRepository.php
index 9cbc6db76d..6e75252238 100644
--- a/src/applications/repository/storage/PhabricatorRepository.php
+++ b/src/applications/repository/storage/PhabricatorRepository.php
@@ -2268,7 +2268,7 @@ final class PhabricatorRepository extends PhabricatorRepositoryDAO
       $viewer,
       $never_proxy);
 
-    if (!$client) {
+    if (!$client || true) {
       $result = id(new ConduitCall($method, $params))
         ->setUser($viewer)
         ->execute();

This produces:

Screen Shot 2021-09-04 at 12.45.04 PM.png (1×2 px, 442 KB)

The implementation of PhabricatorRepository->newConduitFuture() is incorrect in the non-clustered case, because it has incorrect exception semantics. I'll see if I can figure out the best way to fix this and send you a patch.

cspeckmim renamed this task from Improve error-handling behavior of Diffusion repository landing page to Improve error-handling behavior of Diffusion repository landing page on non-clustered installations.Sep 4 2021, 9:29 PM
cspeckmim updated the task description. (Show Details)

With just the updates to the future handling this is now the landing page I see with missing commits:

Screen Shot 2021-09-04 at 6.33.54 PM.png (948×3 px, 180 KB)

And the history tab shows

Screen Shot 2021-09-04 at 6.34.08 PM.png (866×3 px, 191 KB)

Thanks for this!

One tangential thing I noticed from the error messages in these cases, in DiffusionHistoryQueryConduitAPIMethod the getMercurialResult() function executes this hg command

$path_args = array();
if (strlen($path)) {
  $path_args[] = $path;
  $revset_arg = hgsprintf(
    'reverse(ancestors(%s))',
    $commit_hash);
} else {
  $revset_arg = hgsprintf(
    'reverse(ancestors(%s)) and branch(%s)',
    $drequest->getBranch(),
    $commit_hash);
}

It looks like the order of arguments in the else block are reversed, resulting in a revset of reverse(ancestors($branch)) and branch($commit_hash). Reading it that way seems like an issue but this looks intentional from D18817. Just making note of it.

Good catch!

I think it's wrong, but moot often enough to have escaped notice until now.

I think the goal of D18817 was to put reverse(ancestors(%s)) first, since the order of results in Mercurial for A and B depends on the order of A. That is, compare:

$ hg log --template '{rev}\n' --rev '((2 or 3) and (2 or 3))'
2
3
$ hg log --template '{rev}\n' --rev '((3 or 2) and (2 or 3))'
3
2

So that part is correct, but swapping the commit and branch was incidental, and I overlooked it because we usually get the same result: in common cases, this is moot because ancestors(<branch name>) is equivalent to ancestors(<heads of that branch>), and branch(<commit hash>) is equivalent to branch(<branch of that commit>). The latter is always equivalent; the former is equivalent when commit is the single head of branch, which it almost always is in regular use.

But this behavior (on this install) is unambiguously wrong:

Screen Shot 2021-09-04 at 5.28.41 PM.png (1×2 px, 407 KB)

...since b7be5c961297 (the parameter) is an ancestor of 162a9c1e8e44 (the first commit in the result set). I'll send you a diff.

So that part is correct, but swapping the commit and branch was incidental, and I overlooked it because we usually get the same result: in common cases, this is moot because ancestors(<branch name>) is equivalent to ancestors(<heads of that branch>), and branch(<commit hash>) is equivalent to branch(<branch of that commit>). The latter is always equivalent; the former is equivalent when commit is the single head of branch, which it almost always is in regular use.

But this behavior (on this install) is unambiguously wrong:

Screen Shot 2021-09-04 at 5.28.41 PM.png (1×2 px, 407 KB)

...since b7be5c961297 (the parameter) is an ancestor of 162a9c1e8e44 (the first commit in the result set). I'll send you a diff.

Ah I was thinking it resolved to the same thing since we've never noticed an issue but I didn't consider this difference