I believe we haven't seen more of this in two years, and "make the worker always exit in less than 2 hours" is a more-or-less reasonable remedy. Getting one extra email every two hours also isn't a huge problem even if we do get this wrong.

Feb 15 2019, 3:47 AM · Restricted Project, Daemons, Differential, Transactions

epriestley moved T13052: Differentiate "Waiting" from "Restarting after an error" on the daemon console from Backlog to vNext on the Daemons board.

Feb 15 2019, 3:44 AM · Daemons

epriestley moved T9170: parsing giganto-normous commits causes admin headaches from Backlog to Far Future on the Daemons board.

Feb 15 2019, 3:22 AM · Restricted Project, Daemons, Diffusion

epriestley triaged T9170: parsing giganto-normous commits causes admin headaches as Wishlist priority.

As of D19748, I'm not aware of any change of size X that requires more than 8X bytes of memory to parse. This isn't ideal, but it's a fair bit better than the 32X in the original report.

Feb 15 2019, 3:21 AM · Restricted Project, Daemons, Diffusion

epriestley closed T8871: Indexing a task with 2,000 comments required a lot of RAM in mid-2015 as Resolved.

Presumably resolved elsewhere by D19503.

Feb 15 2019, 2:00 AM · Search, Daemons

epriestley closed T9415: PhabricatorApplicationTransactionPublishWorker requires unreasonable amount of memory for large numbers of mail recipients? as Invalid.

No clue how to reproduce this and we haven't seen anything similar since.

Feb 15 2019, 1:58 AM · Daemons

epriestley moved T12425: User-initiated search reindex tasks can end up stuck behind import tasks in the daemon queue from Backlog to vNext on the Daemons board.

Feb 15 2019, 1:57 AM · Customer Impact, Daemons, Search, Diffusion

epriestley moved T12827: phd stop --force can’t find any daemons to kill on Solaris from Backlog to Far Future on the Daemons board.

Feb 15 2019, 1:55 AM · Daemons, Bug Report

Jan 14 2019

epriestley moved T5969: Add timeouts to service calls to external mailers (was: PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely) from Soon? to Infrastructure on the Mail board.

Jan 14 2019, 4:55 PM · Mail

Jan 5 2019

epriestley added a revision to T5969: Add timeouts to service calls to external mailers (was: PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely): D19960: Upgrade Sendgrid to the modern mailer API; removes "api-user" option.

Jan 5 2019, 2:32 PM · Mail

epriestley added a revision to T5969: Add timeouts to service calls to external mailers (was: PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely): D19959: Update Mailgun adapter for the new mail adapter API.

Jan 5 2019, 12:59 PM · Mail

Jan 4 2019

epriestley added a revision to T5969: Add timeouts to service calls to external mailers (was: PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely): D19956: Update Postmark adapter for multiple mail media.

Jan 4 2019, 8:09 PM · Mail

Jan 2 2019

epriestley renamed T5969: Add timeouts to service calls to external mailers (was: PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely) from PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely to Add timeouts to service calls to external mailers (was: PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely).

Jan 2 2019, 5:14 PM · Mail

epriestley moved T5969: Add timeouts to service calls to external mailers (was: PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely) from v3 to Soon? on the Mail board.

Jan 2 2019, 5:14 PM · Mail

Jul 19 2018

amckinley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

EC2 volume ddata005.phacility.net filled up, causing problems for instances hosted on db005, leading to PHI771. I'll dig back into the CloudWatch monitoring stuff I setup a few months ago and make the db hosts report storage metrics the same way the repo hosts already do.

Jul 19 2018, 11:42 PM · Diffusion, Ops, Daemons, Phacility

Apr 20 2018

amckinley added a revision to T12857: Temporary directory fullness can cause daemon issues?: Restricted Differential Revision.

Apr 20 2018, 7:26 PM · Diffusion, Ops, Daemons, Phacility

Apr 12 2018

amckinley added a revision to T12857: Temporary directory fullness can cause daemon issues?: D19363: Initial CloudWatch metric reporting support.

Apr 12 2018, 11:25 PM · Diffusion, Ops, Daemons, Phacility

Apr 9 2018

amckinley claimed T12857: Temporary directory fullness can cause daemon issues?.

Apr 9 2018, 6:08 PM · Diffusion, Ops, Daemons, Phacility

Mar 6 2018

epriestley closed T13096: Implement a global lock log to make it easier to debug global locks as Resolved.

This log is now available at HEAD of master.

Mar 6 2018, 2:04 AM · Daemons, Infrastructure

Mar 5 2018

epriestley added a revision to T13096: Implement a global lock log to make it easier to debug global locks: D19174: Add a "lock log" for debugging where locks are being held.

Mar 5 2018, 11:27 PM · Daemons, Infrastructure

epriestley added a revision to T13096: Implement a global lock log to make it easier to debug global locks: D19173: Parameterize PhabricatorGlobalLock.

Mar 5 2018, 10:11 PM · Daemons, Infrastructure

Mar 1 2018

epriestley added projects to T13096: Implement a global lock log to make it easier to debug global locks: Infrastructure, Daemons.

Mar 1 2018, 5:25 PM · Daemons, Infrastructure

Feb 14 2018

epriestley closed T12298: Allow daemon pools to autoscale down to 0 processes as Wontfix.

We no longer offer free instances so I don't currently plan to pursue this.

Feb 14 2018, 2:12 PM · Daemons, Ops, Phacility

Jan 29 2018

epriestley closed T5166: Expand the BulkJob tool as Resolved.

D18962 uses this to implement "Export Data" for large result sets.

Jan 29 2018, 10:59 PM · Restricted Project, Phacility, Daemons

epriestley closed T5166: Expand the BulkJob tool, a subtask of T7307: Daemons UI is confusing when you aren't an administrator, as Resolved.

Jan 29 2018, 10:59 PM · Daemons, Phacility

Jan 27 2018

yelirekim updated the task description for T13052: Differentiate "Waiting" from "Restarting after an error" on the daemon console.

Jan 27 2018, 10:10 PM · Daemons

epriestley added a project to T13052: Differentiate "Waiting" from "Restarting after an error" on the daemon console: Daemons.

Jan 27 2018, 9:10 PM · Daemons

Jan 24 2018

epriestley closed T5965: Mirroring an empty repository causes errors in the daemon log every minute. as Resolved by committing rPd28ddc21a550: Don't error when trying to mirror or observe an empty repository.

Jan 24 2018, 11:50 PM · Daemons

epriestley added a revision to T5965: Mirroring an empty repository causes errors in the daemon log every minute. : D18920: Don't error when trying to mirror or observe an empty repository.

Jan 24 2018, 8:30 PM · Daemons

Jan 4 2018

epriestley moved T10753: Remove Mercurial daemon working copy operations from Backlog to Clustering on the Mercurial board.

Jan 4 2018, 7:19 PM · Infrastructure, Daemons, Diffusion, Mercurial

Oct 12 2017

epriestley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

Tangentially related here and to T12611, the traffic volume from the ongoing "attack" in T13003 filled /tmp on admin.phacility.com. I pruned some old logs for now, but the mitigations in T12611 (e.g., separate log volumes) would resolve this properly.

Oct 12 2017, 3:38 PM · Diffusion, Ops, Daemons, Phacility

Jul 27 2017

isfs added a comment to T12629: Start daemons that should be running but aren't.

Agreed. I haven't experienced the problem since I upgraded, so I think it was related to an earlier fix, even if it wasn't the identified fix (which should have already been in my install when I did have the problems). There's nothing that needs to be addressed here.

Jul 27 2017, 9:22 PM · Daemons, Feature Request

epriestley closed T12629: Start daemons that should be running but aren't as Invalid.

We aren't going to implement a bin/phd start-missing-daemon command.

Jul 27 2017, 2:32 PM · Daemons, Feature Request

Jul 9 2017

chad removed the image for Daemons.

Jul 9 2017, 7:29 PM

Jun 23 2017

epriestley claimed T12867: Youtube remarkup rule fails to parse "ambiguous URI".

I think a minimal reproduction case which is typical of this example is:

Jun 23 2017, 1:35 PM · Remarkup, Bug Report

jcarrillo7 added a comment to T12867: Youtube remarkup rule fails to parse "ambiguous URI".

Looks to be just the presence of the "?" in the text

Jun 23 2017, 6:47 AM · Remarkup, Bug Report

jcarrillo7 added a comment to T12867: Youtube remarkup rule fails to parse "ambiguous URI".

"XXX://123456 XXX XXX XXX://123456 XXX XXX"

Jun 23 2017, 6:47 AM · Remarkup, Bug Report

jcarrillo7 added a comment to T12867: Youtube remarkup rule fails to parse "ambiguous URI".

You and me both. I am super confused.

Jun 23 2017, 6:46 AM · Remarkup, Bug Report

chad added a comment to T12867: Youtube remarkup rule fails to parse "ambiguous URI".

Well the stack trace says PhabricatorYoutubeRemarkupRule, so I'm confused what the issue is.

Jun 23 2017, 6:44 AM · Remarkup, Bug Report

jcarrillo7 added a comment to T12867: Youtube remarkup rule fails to parse "ambiguous URI".

LOL @chad Literally reproduced this issue here by trying to paste the above line without the back ticks. It refuses to let me comment.

Jun 23 2017, 6:42 AM · Remarkup, Bug Report

jcarrillo7 added a comment to T12867: Youtube remarkup rule fails to parse "ambiguous URI".

I'm not sure how without giving you the entire commit message which I cannot. I think key would be a having a commit with a line (probably the first one) that looks like this "XXX://123456 XXX XXX XXX? XXX://123456 XXX XXX XXX"

Jun 23 2017, 6:42 AM · Remarkup, Bug Report

chad added a comment to T12867: Youtube remarkup rule fails to parse "ambiguous URI".

How can we reproduce this issue locally?

Jun 23 2017, 6:18 AM · Remarkup, Bug Report

jcarrillo7 updated subscribers of T12867: Youtube remarkup rule fails to parse "ambiguous URI".

Jun 23 2017, 6:13 AM · Remarkup, Bug Report

jcarrillo7 added a comment to T12867: Youtube remarkup rule fails to parse "ambiguous URI".

There also seems like there may be an issue with the current parsing logic since the "detected URI" has spaces which I do not think are valid in a uri and it should have been detected as two separate URI's with some stuff in the middle.

Jun 23 2017, 6:12 AM · Remarkup, Bug Report

jcarrillo7 added a comment to T12867: Youtube remarkup rule fails to parse "ambiguous URI".

The URI that is ambiguous is a URI to a company App on MacOS. The only part of the URI that matters is the xxx://123456. The rest is just the title of the item referenced by this URI and this title contains a "?" which mixed with T12526 may be causing this issue. There may also be a place you now need to catch exceptions thanks to URI parsing logic changes. Just guessing here from what I can tell in the code.

Jun 23 2017, 6:07 AM · Remarkup, Bug Report

jcarrillo7 created T12867: Youtube remarkup rule fails to parse "ambiguous URI".

Jun 23 2017, 5:57 AM · Remarkup, Bug Report

Jun 21 2017

amckinley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

I think we should have our crontabs in version control regardless of whether or not we add tmpreaper to them, so I'll make a task for that.

Jun 21 2017, 7:26 PM · Diffusion, Ops, Daemons, Phacility

epriestley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

If you want to move forward with that:

Jun 21 2017, 6:35 PM · Diffusion, Ops, Daemons, Phacility

amckinley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

This should do the trick. It runs off atime by default. We could just set the time period to several days if we wanted to. Alternatively, if the filenames for extremely long-running jobs are predictable, there's a --protect '<shell_pattern>' argument we could use to avoid cleaning up those files.

Jun 21 2017, 6:04 PM · Diffusion, Ops, Daemons, Phacility

epriestley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

Every repo host is equally affected, so I'd like to deploy crontabs as part of the regular deployment process if we use them as part of the approach here. That would require first codifying a handful of custom crontabs, including one on secure which regenerates documentation daily on only one host. This codification should happen anyway eventually, but it's a little bit of work, and wasted effort if we're switching to Chef/Salt/Ansible/etc soon anyway.

Jun 21 2017, 5:57 PM · Diffusion, Ops, Daemons, Phacility

amckinley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

Should we just add a crontab entry to clean /tmp to paper this over until we get it fixed for real?

Jun 21 2017, 5:27 PM · Diffusion, Ops, Daemons, Phacility

Jun 20 2017

epriestley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

Here's another clue, from the relevant host's error log:

Jun 20 2017, 1:02 PM · Diffusion, Ops, Daemons, Phacility

Jun 19 2017

epriestley added a revision to T12857: Temporary directory fullness can cause daemon issues?: D18139: If the overseer can't update the PID file, just move on.

Jun 19 2017, 9:53 PM · Diffusion, Ops, Daemons, Phacility

epriestley added a revision to T12857: Temporary directory fullness can cause daemon issues?: D18138: Don't compute MIME type of noninitial chunks from `diffusion.filecontentquery`.

Jun 19 2017, 9:45 PM · Diffusion, Ops, Daemons, Phacility

epriestley created T12857: Temporary directory fullness can cause daemon issues?.

Jun 19 2017, 9:28 PM · Diffusion, Ops, Daemons, Phacility

Jun 14 2017

epriestley closed T12844: MetaMTA worker can win a race against MTAMail despite both being inserted in the same transaction, because they aren't actually inserted in the same transaction as Resolved by committing rP3d70db9eb5d0: Queue a worker task to send mail only after committing the mail transaction.

Jun 14 2017, 7:27 PM · Daemons, Mail, Restricted Project, Bug Report

epriestley renamed T12844: MetaMTA worker can win a race against MTAMail despite both being inserted in the same transaction, because they aren't actually inserted in the same transaction from MetaMTA worker can win a race against MTAMail despite both being inserted in the same transaction to MetaMTA worker can win a race against MTAMail despite both being inserted in the same transaction, because they aren't actually inserted in the same transaction.

Jun 14 2017, 7:19 PM · Daemons, Mail, Restricted Project, Bug Report

epriestley added a revision to T12844: MetaMTA worker can win a race against MTAMail despite both being inserted in the same transaction, because they aren't actually inserted in the same transaction: D18124: Queue a worker task to send mail only after committing the mail transaction.

Jun 14 2017, 7:15 PM · Daemons, Mail, Restricted Project, Bug Report

epriestley added a comment to T12844: MetaMTA worker can win a race against MTAMail despite both being inserted in the same transaction, because they aren't actually inserted in the same transaction.

Oh, this doesn't isolate things because they're on different databases, and thus we establish different connections. The daemon insert does not happen inside a transaction.

Jun 14 2017, 7:10 PM · Daemons, Mail, Restricted Project, Bug Report

epriestley created T12844: MetaMTA worker can win a race against MTAMail despite both being inserted in the same transaction, because they aren't actually inserted in the same transaction.

Jun 14 2017, 6:59 PM · Daemons, Mail, Restricted Project, Bug Report

Jun 12 2017

epriestley triaged T12827: phd stop --force can’t find any daemons to kill on Solaris as Wishlist priority.

See also T4124 for another Solaris issue.

Jun 12 2017, 12:43 PM · Daemons, Bug Report

mavit created T12827: phd stop --force can’t find any daemons to kill on Solaris.

Jun 12 2017, 11:08 AM · Daemons, Bug Report

Jun 8 2017

epriestley closed T12803: When tasks fail permanently, they should log the failure even if `phd.verbose` is not enabled as Resolved by committing rP3400f24c8b53: Send permanent dameon failures to the log, even when not running in verbose mode.

Jun 8 2017, 10:26 PM · Daemons, Restricted Project

epriestley added a revision to T12803: When tasks fail permanently, they should log the failure even if `phd.verbose` is not enabled: D18106: Send permanent dameon failures to the log, even when not running in verbose mode.

Jun 8 2017, 1:21 PM · Daemons, Restricted Project

epriestley added a comment to T12803: When tasks fail permanently, they should log the failure even if `phd.verbose` is not enabled.

It looks like the current behavior rose out of D16268 / T11309, although it was something of a side effect of those changes.

Jun 8 2017, 1:15 PM · Daemons, Restricted Project

epriestley added a comment to T12803: When tasks fail permanently, they should log the failure even if `phd.verbose` is not enabled.

Existing sources of permanent failure are worth at least a cursory review before we ship this since they're pretty easy to grep for, but I don't anticipate any issues.

Jun 8 2017, 1:06 PM · Daemons, Restricted Project

Jun 7 2017

epriestley created T12803: When tasks fail permanently, they should log the failure even if `phd.verbose` is not enabled.

Jun 7 2017, 12:14 AM · Daemons, Restricted Project

May 26 2017

epriestley closed T12720: Record of hibernating daemon was garbage collected as Resolved by committing rP69538274c1ac: Garbage collect old daemon records based on modification date, not creation date.

May 26 2017, 4:18 PM · Customer Impact, Daemons

epriestley added a revision to T12720: Record of hibernating daemon was garbage collected: D18024: Garbage collect old daemon records based on modification date, not creation date.

May 26 2017, 3:51 PM · Customer Impact, Daemons

May 23 2017

joshuaspence added a comment to T12720: Record of hibernating daemon was garbage collected.

Ah this probably explains what I have observed on our installation too.

May 23 2017, 2:27 PM · Customer Impact, Daemons

May 18 2017

epriestley added a comment to T12720: Record of hibernating daemon was garbage collected.

Yeah, some workarounds are:

May 18 2017, 7:09 PM · Customer Impact, Daemons

cteffetalor added a comment to T12720: Record of hibernating daemon was garbage collected.

I believe we see the same issue in our environment, but I didn't think much of it/rule out actual problems with our setup and just restarted the daemons the first few times it's happened.

May 18 2017, 6:45 PM · Customer Impact, Daemons

May 17 2017

epriestley added a comment to T12720: Record of hibernating daemon was garbage collected.

I papered over this in the short term by restarting daemons for all instances;

May 17 2017, 12:45 PM · Customer Impact, Daemons

epriestley created T12720: Record of hibernating daemon was garbage collected.

May 17 2017, 12:41 PM · Customer Impact, Daemons

Apr 24 2017

isfs added a comment to T12629: Start daemons that should be running but aren't.

One thing I noticed. All three daemons (Taskmaster, Trigger, PullLocal) are listed as "Waiting" on my install currently, and also show up in the output of phd status. When this problem occurred, I didn't look at the Daemons app in the web UI, but I did notice that Taskmaster was not listed in the phd status output. I'm guessing that behaviour is not normal and perhaps provides a little insight into what's going on here.

Apr 24 2017, 1:07 AM · Daemons, Feature Request

isfs added a comment to T12629: Start daemons that should be running but aren't.

I've adjusted my monitoring to just alert me instead of restart the daemons when there's an issue so if/when this happens again I can investigate more fully/provide more information. The code from D17397 had definitely landed when I experienced this, as I saw it in the source code when I investigated. I've upgraded to current stable now.

Apr 24 2017, 12:45 AM · Daemons, Feature Request

Apr 23 2017

epriestley added a comment to T12629: Start daemons that should be running but aren't.

It is intentional that daemons shutdown when they aren't doing anything. See T12298. They will be restarted automatically when work becomes ready.

Apr 23 2017, 5:45 PM · Daemons, Feature Request

isfs added a comment to T12629: Start daemons that should be running but aren't.

I made a diff (D17780) that adds bin/phd check which runs the setup check that the web UI runs, writing the result to the console, and exiting with an indicative status. This at least allows the circumstance to be detected and I can fix up the problem with bin/phd restart. This might be good enough. Even though using bin/phd start or having Phabricator self-repair through the Overseer would be better, it's likely too rare to warrant work on more complex options.

Apr 23 2017, 4:01 PM · Daemons, Feature Request

isfs added a revision to T12629: Start daemons that should be running but aren't: D17780: Allow phd to run daemon setup check (Ref T12629).

Apr 23 2017, 4:00 PM · Daemons, Feature Request

isfs created T12629: Start daemons that should be running but aren't.

Apr 23 2017, 3:52 PM · Daemons, Feature Request

Apr 18 2017

amckinley closed T10828: File deletion should be queued up and run by daemons as Resolved by committing rPbe00264ae74b: Make daemons perform file deletion.

Apr 18 2017, 6:09 PM · Contributor Onboarding, Daemons, Files, Bug Report

Apr 17 2017

amckinley claimed T10828: File deletion should be queued up and run by daemons.

Apr 17 2017, 7:26 PM · Contributor Onboarding, Daemons, Files, Bug Report

Apr 12 2017

epriestley moved T12425: User-initiated search reindex tasks can end up stuck behind import tasks in the daemon queue from Backlog to Future on the Customer Impact board.

Apr 12 2017, 3:11 PM · Customer Impact, Daemons, Search, Diffusion

epriestley edited projects for T12425: User-initiated search reindex tasks can end up stuck behind import tasks in the daemon queue, added: Customer Impact; removed Phacility.

Apr 12 2017, 3:10 PM · Customer Impact, Daemons, Search, Diffusion

epriestley removed a project from T5969: Add timeouts to service calls to external mailers (was: PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely): Support Impact.

Apr 12 2017, 2:49 PM · Mail

epriestley moved T12543: `phd` can't handle PID properly when there are multiple phabricator daemons. from Backlog to Future on the Daemons board.

Apr 12 2017, 11:53 AM · Daemons, Bug Report

epriestley triaged T12543: `phd` can't handle PID properly when there are multiple phabricator daemons. as Wishlist priority.

Running multiple different versions of Phabricator on a single host is not currently supported. We should probably handle this situation better than we do, and there is no technical reason we can't support this, but this use case is very rare.

Apr 12 2017, 11:53 AM · Daemons, Bug Report

Apr 10 2017

ollehar added a comment to T6828: Repository import daemon repeatedly crashes and restarts if phabricator.base-uri is not configured.

I still had this problem, in a fresh install. Had to run

Apr 10 2017, 1:59 PM · Daemons, Diffusion

Apr 9 2017

epriestley added a comment to T10828: File deletion should be queued up and run by daemons.

When you click "Delete File", we currently delete the file in the web process. Since we've supported enormous files and pluggable storage backends for a while, this could take an arbitrarily long amount of time to complete.
- Instead, we want to flag the file as "deleted", hide it in the web UI, and queue up a task in the daemons to actually get rid of the data.

Apr 9 2017, 11:13 AM · Contributor Onboarding, Daemons, Files, Bug Report

epriestley moved T10828: File deletion should be queued up and run by daemons from Backlog to Intermediate on the Contributor Onboarding board.

Apr 9 2017, 10:56 AM · Contributor Onboarding, Daemons, Files, Bug Report

Advanced SearchUse ResultsEdit QueryHide Query

Feb 15 2019

Jan 14 2019

Jan 5 2019

Jan 4 2019

Jan 2 2019

Jul 19 2018

Apr 20 2018

Apr 12 2018

Apr 9 2018

Mar 6 2018

Mar 5 2018

Mar 1 2018

Feb 14 2018

Jan 29 2018

Jan 27 2018

Jan 24 2018

Jan 4 2018

Oct 12 2017

Jul 27 2017

Jul 9 2017

Jun 23 2017

Jun 21 2017

Jun 20 2017

Jun 19 2017

Jun 14 2017

Jun 12 2017

Jun 8 2017

Jun 7 2017

May 26 2017

May 23 2017

May 18 2017

May 17 2017

Apr 24 2017

Apr 23 2017

Apr 18 2017

Apr 17 2017

Apr 12 2017

Apr 10 2017

Apr 9 2017

Advanced Search
Use Results
Edit Query
Hide Query