Page MenuHomePhabricator

ClustersInfrastructure
ActivePublic

Watchers (1)

  • This project does not have any watchers.
  • View All

Details

Description

Running Phabricator services on multiple hosts.

Recent Activity

Sep 3 2019

epriestley added a comment to T13286: When nodes in a cluster repository fail, reads are still routed with the same weight and failed reads do not recover.

Also remaining is to extend this behavior to the HTTP pathway (and to Mercurial/SVN, eventually).

Sep 3 2019, 7:35 PM · Clusters, Diffusion
epriestley added a revision to T13286: When nodes in a cluster repository fail, reads are still routed with the same weight and failed reads do not recover: D20778: Generalize repository proxy retry logic to writes.
Sep 3 2019, 6:37 PM · Clusters, Diffusion
epriestley added a comment to T13286: When nodes in a cluster repository fail, reads are still routed with the same weight and failed reads do not recover.
  • if we have already retried 3 times, do not retry;
Sep 3 2019, 6:07 PM · Clusters, Diffusion
epriestley added a revision to T13286: When nodes in a cluster repository fail, reads are still routed with the same weight and failed reads do not recover: D20777: Instead of retrying safe reads 3 times, retry each eligible service once.
Sep 3 2019, 5:41 PM · Clusters, Diffusion
epriestley added a revision to T13286: When nodes in a cluster repository fail, reads are still routed with the same weight and failed reads do not recover: D20776: On Git cluster read failure, retry safe requests.
Sep 3 2019, 4:50 PM · Clusters, Diffusion
epriestley added a comment to T13286: When nodes in a cluster repository fail, reads are still routed with the same weight and failed reads do not recover.

we'll reduce silly client-visible behavior where you request /tourtle.git instead of /turtle.git and the server seems confused...

Sep 3 2019, 4:32 PM · Clusters, Diffusion
epriestley added a revision to T13286: When nodes in a cluster repository fail, reads are still routed with the same weight and failed reads do not recover: D20775: Allow repository service lookups to return an ordered list of service refs.
Sep 3 2019, 3:58 PM · Clusters, Diffusion

Aug 29 2019

epriestley added a comment to T10127: Migrating repository between storage hosts in a cluster.

Not necessarily applicable in the general case, but see also T13393.

Aug 29 2019, 3:09 PM · Clusters, Feature Request

Jul 12 2019

epriestley closed T10127: Migrating repository between storage hosts in a cluster as Resolved.

I assume this is being done already in the Phacility cluster on some level when repositories get really large, but I'm not particularly sure how to perform this migration.

Jul 12 2019, 5:11 PM · Clusters, Feature Request

May 10 2019

epriestley triaged T13287: Build general healthcheck infrastructure for monitoring services as Low priority.
May 10 2019, 5:45 PM · Clusters, Almanac
epriestley added a subtask for T13286: When nodes in a cluster repository fail, reads are still routed with the same weight and failed reads do not recover: T13287: Build general healthcheck infrastructure for monitoring services.
May 10 2019, 5:45 PM · Clusters, Diffusion
epriestley added parent tasks for T13287: Build general healthcheck infrastructure for monitoring services: T13286: When nodes in a cluster repository fail, reads are still routed with the same weight and failed reads do not recover, T13285: Service failures in JIRA can cascade into service failures in Phabricator.
May 10 2019, 5:45 PM · Clusters, Almanac
epriestley created T13287: Build general healthcheck infrastructure for monitoring services.
May 10 2019, 5:45 PM · Clusters, Almanac
epriestley triaged T13286: When nodes in a cluster repository fail, reads are still routed with the same weight and failed reads do not recover as Normal priority.
May 10 2019, 5:24 PM · Clusters, Diffusion

Apr 15 2019

epriestley moved T13211: Improve intracluster synchronization routing from Backlog to Clusters on the Diffusion board.
Apr 15 2019, 2:45 PM · Clusters, Diffusion

Feb 1 2019

epriestley closed T13192: Inactive repositories can cause "Repository Servers" to always report "Partial Sync?" as Resolved by committing rPf3e154eb02c7: Allow "inactive" repositories to be read over SSH for cluster sync.
Feb 1 2019, 6:12 AM · Clusters, Diffusion

Jan 31 2019

epriestley added a comment to T13192: Inactive repositories can cause "Repository Servers" to always report "Partial Sync?".

See PHI1015 for a slightly meatier explanation of this issue.

Jan 31 2019, 7:47 PM · Clusters, Diffusion
epriestley added a revision to T13192: Inactive repositories can cause "Repository Servers" to always report "Partial Sync?": D20077: Allow "inactive" repositories to be read over SSH for cluster sync.
Jan 31 2019, 7:47 PM · Clusters, Diffusion

Dec 13 2018

epriestley added a comment to T13219: When returning a writable connection as a "r" connection, label it so it can be reused as a "w" connection.

Yeah, that's T10769.

Dec 13 2018, 11:41 PM · Clusters, Infrastructure
joshuaspence added a comment to T13219: When returning a writable connection as a "r" connection, label it so it can be reused as a "w" connection.

I am seeing a similar issue on our install:

[2018-12-13 12:20:12] EXCEPTION: (PhabricatorClusterImproperWriteException) Unable to establish a write-mode connection (to application database "phabricator_repository") because Phabricator is in read-only mode. Whatever you are trying to do does not function correctly in read-only mode. at [<phabricator>/src/infrastructure/storage/lisk/PhabricatorLiskDAO.php:119] arcanist(head=stable, ref.master=d9a4293ae734, ref.stable=45a8d22c74a6), phabricator(head=stable, ref.master=2951694c2737, ref.stable=237a2a190984), phlab(head=master, ref.master=564c60d09ff4), phutil(head=stable, ref.master=dd136d1c3712, ref.stable=414a4c6abb1b)
  #0 PhabricatorLiskDAO::raiseImproperWrite(string) called at [<phabricator>/src/infrastructure/storage/lisk/PhabricatorLiskDAO.php:60]
  #1 PhabricatorLiskDAO::establishLiveConnection(string) called at [<phabricator>/src/infrastructre/storage/lisk/LiskDAO.php:1011]
  #2 LiskDAO::establishConnection(string) called at [<phabricator>/src/applications/repository/stora... (619 more bytes) ... at [<phutil>/src/future/exec/ExecFuture.php:380]
[13-Dec-2018 12:20:12 Etc/UTC] arcanist(head=stable, ref.master=d9a4293ae734, ref.stable=45a8d22c74a6), phabricator(head=stable, ref.master=2951694c2737, ref.stable=237a2a190984), phlab(head=master, ref.master=564c60d09ff4), phutil(head=stable, ref.master=dd136d1c3712, ref.stable=414a4c6abb1b)
[13-Dec-2018 12:20:12 Etc/UTC]   #0 <#3> ExecFuture::resolvex() called at [<phabricator>/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:446]
[13-Dec-2018 12:20:12 Etc/UTC]   #1 phlog(PhutilProxyException) called at [<phabricator>/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:453]
[13-Dec-2018 12:20:12 Etc/UTC]   #2 PhabricatorRepositoryPullLocalDaemon::resolveUpdateFuture(PhabricatorRepository, ExecFuture, integer) called at [<phabricator>/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:222]
[13-Dec-2018 12:20:12 Etc/UTC]   #3 PhabricatorRepositoryPullLocalDaemon::run() called at [<phutil>/src/daemon/PhutilDaemon.php:219]
[13-Dec-2018 12:20:12 Etc/UTC]   #4 PhutilDaemon::execute() called at [<phutil>/scripts/daemon/exec/exec_daemon.php:131]
Dec 13 2018, 11:40 PM · Clusters, Infrastructure
joshuaspence added a comment to T13219: When returning a writable connection as a "r" connection, label it so it can be reused as a "w" connection.

I am seeing a similar issue on our install:

Dec 13 2018, 11:38 PM · Clusters, Infrastructure

Nov 21 2018

epriestley updated the task description for T13219: When returning a writable connection as a "r" connection, label it so it can be reused as a "w" connection.
Nov 21 2018, 4:52 PM · Clusters, Infrastructure
epriestley added a parent task for T13219: When returning a writable connection as a "r" connection, label it so it can be reused as a "w" connection: T11908: Support an "overlay" database connection mode where multiple applications share a single connection.
Nov 21 2018, 4:18 PM · Clusters, Infrastructure
epriestley triaged T13219: When returning a writable connection as a "r" connection, label it so it can be reused as a "w" connection as Low priority.
Nov 21 2018, 4:18 PM · Clusters, Infrastructure

Oct 8 2018

epriestley triaged T13211: Improve intracluster synchronization routing as Normal priority.
Oct 8 2018, 7:29 PM · Clusters, Diffusion

Oct 5 2018

epriestley added a revision to T10884: Sort repository, database and notification services better (by network distance): D19735: Explicitly shuffle nodes before selecting one for cluster sync.
Oct 5 2018, 9:01 PM · Clusters

Sep 6 2018

epriestley added a comment to T10884: Sort repository, database and notification services better (by network distance).

See PHI860 and T13111. In the future, repository nodes may automatically gc/prune/repack. If they do, it may make sense to sort them to the bottom of the list so traffic is sent to them only if no other nodes are available, in order to minimize the impact that gc/prune/repack have on other activity.

Sep 6 2018, 6:43 PM · Clusters

Aug 27 2018

epriestley triaged T13192: Inactive repositories can cause "Repository Servers" to always report "Partial Sync?" as Low priority.
Aug 27 2018, 5:28 PM · Clusters, Diffusion

Jun 5 2018

joshuaspence added a member for Clusters: joshuaspence.
Jun 5 2018, 10:45 PM

Apr 12 2018

epriestley closed T10883: Allow repository cluster nodes to be read-only as Resolved by committing rP6556536d0615: Allow repository cluster bindings to be marked as not "writable", making them….
Apr 12 2018, 11:10 PM · Restricted Project, Diffusion, Clusters
epriestley added a revision to T10883: Allow repository cluster nodes to be read-only: D19357: Allow repository cluster bindings to be marked as not "writable", making them read-only.
Apr 12 2018, 9:09 PM · Restricted Project, Diffusion, Clusters
epriestley added a revision to T10883: Allow repository cluster nodes to be read-only: D19356: Give getAlmanacServiceURI() an "options" parameter to prepare for read-only devices.
Apr 12 2018, 9:00 PM · Restricted Project, Diffusion, Clusters
epriestley added a revision to T10883: Allow repository cluster nodes to be read-only: D19355: Turn the "closed" property on cluster repositories into a nice boolean.
Apr 12 2018, 8:38 PM · Restricted Project, Diffusion, Clusters

Feb 22 2018

epriestley renamed T13089: A full disk on a read replica database host can cause far-reaching request slowness? from A full disk on a read replica can cause far-reaching request slowness? to A full disk on a read replica database host can cause far-reaching request slowness?.
Feb 22 2018, 5:48 PM · Clusters, Infrastructure
epriestley renamed T13089: A full disk on a read replica database host can cause far-reaching request slowness? from A full disk on a read replica can kill everything to A full disk on a read replica can cause far-reaching request slowness?.
Feb 22 2018, 5:48 PM · Clusters, Infrastructure
epriestley triaged T13089: A full disk on a read replica database host can cause far-reaching request slowness? as Low priority.
Feb 22 2018, 4:30 PM · Clusters, Infrastructure

Aug 30 2017

20after4 added a comment to T12965: When no "master" database is configured, the ElasticSearch setup check can fatal.

Whoops this is probably my bad :-/

Aug 30 2017, 1:07 PM · Database, Clusters, Search

Aug 17 2017

epriestley added a revision to T12966: Phabricator should survive a restart and setup checks with an unreachable master: D18442: Fix a possible database ref fatal during MySQL setup checks if a host is unreachable.
Aug 17 2017, 6:35 PM · Clusters
epriestley updated the task description for T12966: Phabricator should survive a restart and setup checks with an unreachable master.
Aug 17 2017, 6:17 PM · Clusters
epriestley added a parent task for T12966: Phabricator should survive a restart and setup checks with an unreachable master: T10769: Read-Only Mode Errata.
Aug 17 2017, 6:15 PM · Clusters
epriestley added a subtask for T10769: Read-Only Mode Errata: T12966: Phabricator should survive a restart and setup checks with an unreachable master.
Aug 17 2017, 6:15 PM · Clusters
epriestley created T12966: Phabricator should survive a restart and setup checks with an unreachable master.
Aug 17 2017, 6:15 PM · Clusters
epriestley added a revision to T12965: When no "master" database is configured, the ElasticSearch setup check can fatal: D18440: Don't fatal in ElasticSearch setup check if no "master" database is configured.
Aug 17 2017, 4:53 PM · Database, Clusters, Search
epriestley created T12965: When no "master" database is configured, the ElasticSearch setup check can fatal.
Aug 17 2017, 4:41 PM · Database, Clusters, Search

Jul 25 2017

epriestley closed T12893: Instance reports consistent bad repository device versions for clustered, observed Git repositories as Resolved by committing rP8034b9d819eb: Don't require a device be registered in Almanac to do cluster init/resync steps.
Jul 25 2017, 12:12 PM · Restricted Project, Clusters, Diffusion, Customer Impact

Jul 24 2017

epriestley added a revision to T12893: Instance reports consistent bad repository device versions for clustered, observed Git repositories: D18273: Don't require a device be registered in Almanac to do cluster init/resync steps.
Jul 24 2017, 6:43 PM · Restricted Project, Clusters, Diffusion, Customer Impact
epriestley added a comment to T12893: Instance reports consistent bad repository device versions for clustered, observed Git repositories.

See PHI14 for priority.

Jul 24 2017, 4:26 PM · Restricted Project, Clusters, Diffusion, Customer Impact

Jul 18 2017

epriestley closed T12927: Private Clusters: VPN Notes as Resolved.

Cool, that all sounds like it's roughly what I expected. Thanks!

Jul 18 2017, 5:37 PM · Clusters, Ops
amckinley added a comment to T12927: Private Clusters: VPN Notes.

Skipping questions where you have the right answer:

Jul 18 2017, 5:09 PM · Clusters, Ops
epriestley added a revision to T10769: Read-Only Mode Errata: D18233: Skip Conduit call log writes in read-only mode, allowing "conduit.ping" to run.
Jul 18 2017, 3:51 PM · Clusters