Page MenuHomePhabricator

Archived repositories aren't imported
Closed, ResolvedPublic

Description

When the repository daemons perform the initial repository clones, archived repositories are skipped. I'm not sure if this is the case for hosted repositories, but it is for imported repositories. As a result, it is not possible to view the commit history or browse the repository in Diffusion. Attempting to do so gives the following error:

Unable to Retrieve Paths
ERR-CONDUIT-CORE: File system entity '/mnt/repositories/XYZ/' does not exist.

A workaround is to unarchive the repository, wait for it to import and then mark it as archived again.

Event Timeline

I'm missing something I think, this sounds like expected behavior with a reasonable workaround. What's the bug?

I disagree that this is the expected behavior, especially if this occurs on hosted repositories (which I can't confirm because I don't have any). If this occurs with hosted repositories, then archived repositories will be forever lost when replacing the repository hosting tier.

My expectation is "archive" means "mark as deleted" in Phabricator.

That's what I mean by "I must be missing something". What data is lost if it's hosted elsewhere?

The problem here is that things will mysteriously stop working. Let's say that I refer to rXabcdef in a comment and then later mark rX as archived and replace my repository tier. Anyone who later clicks on the rXabcdef reference will hit an error.

If the repository is hosted elsewhere then Phabricator isn't the canonical source, and the other system would be responsible for data anyway.

Ok, if you feel this is a bug report, can you list the steps to reproduce the core issue in the description?

Well, specifically what you were doing, and what issues were caused after. Like moving tiers.

The steps to reproduce are quite simple:

  1. Import a repository into Diffusion.
  2. Reference a commit from the reference somewhere, for example rREPO018f7a225b78d7dcdd29a26b5179e46f728c07de.
  3. Mark the repository as archived.
  4. Replace the host that is serving VCS traffic (or, alternatively, just delete the /var/repo/REPO/ directory).
  5. Clicking on the previously-created commit reference throws a ConduitClientException.

Thanks! The moving of the host was I think my main point that clarified it for me.

Suppose you have a hosted, active repository and do this:

  1. Replace the host that is serving VCS traffic (or, alternatively, just delete the /var/repo/REPO/ directory).

What do you expect to happen?

It sounds like you don't expect data loss, since the idea that data loss could occur seems surprising?

If this occurs with hosted repositories, then archived repositories will be forever lost when replacing the repository hosting tier

When we migrate our repository hosting to Phabricator, I plan to setup multiple repository hosts for sufficient redundancy. In this case, I would expect the new hosts to clone the repositories from a read-only slave.

None of the repository replication code is even written, though. What behavior do you expect today?

This just sounds a lot like "I ran rm -rf /var/repo/X/ and all the data for rX vanished" to me, which is completely expected and should be predictable.

Although we can technically restore the data for deactivated repositories in most cases, you'd still lose data permanently for an imported repository if the source had changed or no longer worked.

Perhaps just including this consequence ("Updates to this repository will no longer be fetched.") in the "Deactive Repository" dialog would be sufficient? Would the behavior align with expectations if you'd acknowledge a dialog saying that fetching would stop?

(Maybe "Disable" instead of "Deactivate" would be more clear? I think this would also be more consistent with other applications.)

D15873 makes this explicit in the dialog.