Page MenuHomePhabricator

Daemons may not have permission to remove files from local disk storage engine
Open, NormalPublic

Assigned To
None
Authored By
epriestley
Apr 7 2014, 3:15 PM
Referenced Files
None
Tokens
"Like" token, awarded by mihaim."Like" token, awarded by avivey."The World Burns" token, awarded by sascha-egerer.

Description

If the local disk storage engine is configured, the webserver may try to read/write from it, and the daemons may also try to read/write from it (primarily to perform garbage collection). If the two users don't both have full access to the files, one of these operations may fail.

Conceptually, the most consistent approach might be to write some bin/local-files-storage-engine read/write <some path>, and have the webserver sudo to that. However, that's messy and heavy and gross.

The daemons could also try to make some sort of call over Conduit, conceivably. This is probably OK from a scalability point of view because the local disk engine is inherently only appropriate at small scale. However, this is blocked by other Conduit work.

Another approach might be to adjust the umask when we write stuff to the local disk engine.

All of these solutions seem fairly bad.

Event Timeline

epriestley raised the priority of this task from to Normal.
epriestley updated the task description. (Show Details)
epriestley added a project: Files.
epriestley added subscribers: epriestley, bluehawk.

Another approach might be to adjust the umask when we write stuff to the local disk engine.

Why is that bad? What about a optional configuration option where a umask can be defined if required?

Per IRC, there's a more general problem here which we usually dodge mostly by luck: the webserver also needs read permission on the repositories right now, but shouldn't really. Generally, it would be better if all resources had a single owner and could be isolated completely from one another, with strong partitioning between them.

One approach is to write a bin/run-server-command script, and then have the webserver run, e.g.:

sudo -u daemons bin/run-server-command git ...
sudo -u daemons bin/run-server-command svn ...
sudo -u daemons bin/run-server-command read-file-storage ...

Pros:

  • This is a reasonably general solution to isolating all resources into single-owner containers.

Cons:

  • You still have to do one-time setup to enable sudo.
  • Relatively high overhead (~20-30ms) to invoke sudo + php compared to direct git or filesystem operations.

An alternative which would not require sudo setup or have the overhead is to have one of the daemons act as a command server and listen on a domain (or normal?) socket. The webserver would connect to it as a console server. However, this is very complex and I'm not sure if we could easily restrict connections to just the webserver user.

Another possibility -- I'm not saying this is a good idea -- is to use unix groups. That is, have users set up a new "phabricator" unix group, and add the webserver and daemon users to that group. Then use umask to make sure that everything that the webserver and the daemons do are group-writable, and that they chgrp() to the phabricator group when starting up.

This still has a one-time setup -- and a more complicated one that sudo -- but at least doesn't have a performance hit.

Support Impact This is fairly broken, although there isn't a clear fix for it.

Daemons now do both reads and writes, and may need to do reads and writes which are nonlocal in cluster configurations. So I think the pathway forward here must be to expose local disk storage over HTTP with S3-like raw APIs (read, write, delete).

rubenriverae renamed this task from Daemons may not have permission to remove files from local disk storage engine to Daemons may have permission to remove files from local disk storage engine.May 7 2016, 1:08 PM
rubenriverae updated the task description. (Show Details)
epriestley renamed this task from Daemons may have permission to remove files from local disk storage engine to Daemons may not have permission to remove files from local disk storage engine.May 7 2016, 1:17 PM
epriestley updated the task description. (Show Details)

In the meantime, we could maybe do the following:

change <whatever>, when creating a new directory in phabricator-files, to create it group-writable, and owned by group phabricator-files (maybe this is a config setting).

Tell people, as part of the setup, to create a group called phabricator-files that includes the user that runs the webserver (e.g. www-data) and the user that runs the daemons (e.g. $USER when you are logged in).

This is the solution I did for this problem, and it seem to be working *knock on wood*.

Oh, I see I already made that suggestion! I am nothing if not consistent.

We managed to solve this on our server setup using Linux ACL (I had help from someone in IT). Here are roughly the steps we took

  1. Turn on ACL for the mount point under which the filestore exists
    1. Modify /etc/fstab so the filestore location is auto-mounted with ACL
    2. Re-mount the filesystem so the server doesn't need rebooted
      • mount -o remount,acl / (assumes / is the mount point being updated
  2. Modify the ACL for the filestore so that both daemon and web accounts can access
    1. Modify the filestore directory so default (-d) future files created will have the appropriate permissions (-R for recursive, -m for modify)
      • setfacl -Rdm u:phab-phd:rw /var/local/phabricator/filestore
      • Do for both service accounts (ex: phab-phd and nginx)
    2. Modify the filestore directory so existing files are updated with the appropriate permissions
      • setfacl -Rm u:phab-phd:rw /var/local/phabricator/filestore
      • Do for both service accounts

We did this about a week ago and no longer see exceptions in the daemon logs. I've yet not lost my mustard truck either (T10907). Also phabricator still works~

I've implemented the sudo approach at some point - P2016 - and I think it worked alright.

I'm wondering if for cluster we can just say "Don't use local file storage, it will not work", and maybe suggest/support a locally-hosted S3-clone instead of implementing one.

setfacl -Rdm u:phab-phd:rw /var/local/phabricator/filestore
setfacl -Rm u:phab-phd:rw /var/local/phabricator/filestore

Unfortunately in our case rw was not enough for phd to be able to delete the files. (It did not have execute right for newly created directories.)
The were indeed no more errors in the phd.log anymore, but the files remained.
Setting the rights to rwx did the trick.

Old orphaned files can be found with the following shell script:

while IFS= read file; do
  refs=$(echo "SELECT COUNT(*) FROM \`file\` WHERE \`storageHandle\` = '$file';" | mysql phabricator_file | sed -n 2p)
  if [ $refs -eq 0 ]; then
    echo $file
  fi
done < <(find /opt/phabricator/files -type f -printf "%P\n")