In D17384#209987, @epriestley wrote:

Just to make sure I haven't missed anything:

We currently write health checks but never read them, right? So there's no effect (other than the UI "Status" changing) when a service fails health checks? That seems fine for now, I just want to make sure I didn't miss a health check read somewhere.

Mar 21 2017, 2:03 PM · Wikimedia, Clusters, Elasticsearch

Mar 20 2017

epriestley accepted D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Just to make sure I haven't missed anything:

Mar 20 2017, 1:20 PM · Wikimedia, Clusters, Elasticsearch

20after4 added inline comments to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Mar 20 2017, 12:26 PM · Wikimedia, Clusters, Elasticsearch

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

rebased on top of D17509: Updated PhabricatorElasticFulltextStorageEngine for elasticsearch 5

Mar 20 2017, 12:20 PM · Wikimedia, Clusters, Elasticsearch

Mar 17 2017

epriestley added a comment to T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails.

I'm going to put D17497 + D17498 into this release even though they don't directly tackle the issue here and I think they're slightly risky changes (mostly because git ls-remote may have odd behaviors in some cases, and we don't currently use it in other workflows). But they may help with T12296 and general cluster load issues, and the followup changes will generally be more complicated to reason about (more locking/concurrency stuff), so I think getting these in earlier spreads risk out somewhat even though they're something to watch out for in this release.

Mar 17 2017, 11:42 PM · Bug Report, Clusters, Diffusion

Mar 16 2017

20after4 updated the test plan for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Mar 16 2017, 3:27 PM · Wikimedia, Clusters, Elasticsearch

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Move the stats definitions into the engine so the status UI remains engine agnostic.
Fix a bug where role => false was being treated like role => true in the UI

Mar 16 2017, 3:26 PM · Wikimedia, Clusters, Elasticsearch

20after4 added a comment to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

I'm pleased to report that this has been live on wikimedia's phabricator for about a week without any incidents whatsoever. Additionally, we are in the process of migrating from elasticsearch 2.x to 5.x and the ability to write to multiple clusters is really working out nicely for transition.

Mar 16 2017, 3:25 PM · Wikimedia, Clusters, Elasticsearch

Mar 15 2017

epriestley added a comment to T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails.

Probably bump the version unconditionally now

Mar 15 2017, 1:10 AM · Bug Report, Clusters, Diffusion

Mar 14 2017

epriestley added a revision to T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails: D17498: Simplify "git fetch" behavior in the Pull daemon.

Mar 14 2017, 10:37 PM · Bug Report, Clusters, Diffusion

epriestley added a revision to T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails: D17497: Use "git ls-remote" to guess if "git fetch" is a no-op.

Mar 14 2017, 10:25 PM · Bug Report, Clusters, Diffusion

epriestley added a comment to T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails.

(For example, if the goal was to very aggressively optimize for minimizing network traffic, we could read the entire repository history to figure out which refs were ancestors of other refs first? Then we could drop those and only ask for descendant refs. But this seems crazy, since it's saying that ~20 bytes of network traffic is more costly than like one hundred million disk I/O operations?)

Mar 14 2017, 9:04 PM · Bug Report, Clusters, Diffusion

epriestley added a comment to T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails.

it has to compare "What do I have" vs. "What do you have"

Mar 14 2017, 8:57 PM · Bug Report, Clusters, Diffusion

jmeador added a comment to T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails.

Do git ls-remote (which, curiously, seems to be significantly faster than git fetch even when git fetch is a no-op).

Mar 14 2017, 8:41 PM · Bug Report, Clusters, Diffusion

epriestley added a comment to T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails.

I think T12393 and this probably have a similar set of root causes.

Mar 14 2017, 7:48 PM · Bug Report, Clusters, Diffusion

Mar 13 2017

epriestley created T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails.

Mar 13 2017, 6:54 PM · Bug Report, Clusters, Diffusion

Mar 10 2017

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Added index stas to status ui
Separate mysql status from elasticsearch status and show different set of columns appropriate to each cluster type.

Mar 10 2017, 3:48 PM · Wikimedia, Clusters, Elasticsearch

Mar 9 2017

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Remove unused healthrecord stuff from PhabricatorSearchCluster class
Add back getDisplayName to the PhabricatorSearchCluster class because it's needed.

Mar 9 2017, 6:47 PM · Wikimedia, Clusters, Elasticsearch

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Addressed latest round of feedback.

Mar 9 2017, 4:05 AM · Wikimedia, Clusters, Elasticsearch

epriestley added inline comments to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Mar 9 2017, 12:29 AM · Wikimedia, Clusters, Elasticsearch

20after4 added inline comments to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Mar 9 2017, 12:26 AM · Wikimedia, Clusters, Elasticsearch

epriestley added a comment to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Few more minor things.

Mar 9 2017, 12:18 AM · Wikimedia, Clusters, Elasticsearch

Mar 8 2017

20after4 added inline comments to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Mar 8 2017, 11:47 PM · Wikimedia, Clusters, Elasticsearch

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

$limit = 10000 - $offset

Mar 8 2017, 11:46 PM · Wikimedia, Clusters, Elasticsearch

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

updated PhabricatorExtraConfigSetupCheck
10,000 results
remove unused methods

Mar 8 2017, 11:43 PM · Wikimedia, Clusters, Elasticsearch

20after4 added a comment to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

This is now live on https://phab-01.wmflabs.org for testing. Everything seems to be working well, including the health monitoring.

Mar 8 2017, 11:39 PM · Wikimedia, Clusters, Elasticsearch

20after4 added a comment to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

In D17384#209367, @epriestley wrote:

This is shaping up nicely, couple of other minor inlines.

Mar 8 2017, 9:56 PM · Wikimedia, Clusters, Elasticsearch

epriestley added a comment to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

This is shaping up nicely, couple of other minor inlines.

Mar 8 2017, 6:57 PM · Wikimedia, Clusters, Elasticsearch

Mar 7 2017

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Fix unit test case.

Mar 7 2017, 10:38 AM · Wikimedia, Clusters, Elasticsearch

Harbormaster failed remote builds in B15931: Diff 42025 for D17384: Support multiple fulltext search clusters with 'cluster.search' config!

Mar 7 2017, 10:35 AM · Wikimedia, Clusters, Elasticsearch

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Getting closer...

Mar 7 2017, 10:34 AM · Wikimedia, Clusters, Elasticsearch

Mar 4 2017

20after4 added a comment to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

In D17384#208882, @epriestley wrote:

Using the same objects as both Host and Service feels confusing to me. I think this would probably be clearer as separate Service and Host classes? Like PhabricatorMySQLSearchClusterService extends PhabricatorSearchClusterService and PhabricatorMySQLSearchClusterHost extends PhabricatorSearchClusterHost or similar. Particularly because setHostRefs() seems like it's getting called with a raw dictionary in one case and a list of objects in another? And then there's weird magic around getHostRefs() for the MySQL case?

Mar 4 2017, 12:10 PM · Wikimedia, Clusters, Elasticsearch

20after4 added a comment to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

I'll split out the changes to the engine if I can figure out how to do that... Update coming soon.

Mar 4 2017, 12:08 PM · Wikimedia, Clusters, Elasticsearch

Mar 3 2017

epriestley requested changes to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

General stuff:

Mar 3 2017, 8:46 PM · Wikimedia, Clusters, Elasticsearch

20after4 added a comment to D17384: Support multiple fulltext search clusters with 'cluster.search' config.

@epriestley: Ok I believe this addresses all of your feedback and other than documentation it should be very close to finished.

Mar 3 2017, 9:11 AM · Wikimedia, Clusters, Elasticsearch

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Address epriestley's feedback about tooltip and string concatenation

Mar 3 2017, 9:09 AM · Wikimedia, Clusters, Elasticsearch

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Fixed up the cli workflows for search init and search index
Misc other cleanup

Mar 3 2017, 8:58 AM · Wikimedia, Clusters, Elasticsearch

20after4 updated the summary of D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Mar 3 2017, 3:08 AM · Wikimedia, Clusters, Elasticsearch

20after4 updated the diff for D17384: Support multiple fulltext search clusters with 'cluster.search' config.

Fix elasticsearch setup checks

Mar 3 2017, 3:08 AM · Wikimedia, Clusters, Elasticsearch

20after4 retitled D17384: Support multiple fulltext search clusters with 'cluster.search' config from WIP: Cluster "Search Servers" config and back-end abstraction to Support multiple fulltext search clusters with 'cluster.search' config.

Mar 3 2017, 2:59 AM · Wikimedia, Clusters, Elasticsearch

Feb 20 2017

• Losy added a watcher for Clusters: • Losy.

Feb 20 2017, 1:14 AM

Dec 13 2016

epriestley moved T11057: Allow installs to manage which cluster service new repositories allocate on from Backlog to Clusters on the Diffusion board.

Dec 13 2016, 4:40 PM · Diffusion, Clusters

epriestley moved T10883: Allow repository cluster nodes to be read-only from Backlog to Clusters on the Diffusion board.

Dec 13 2016, 4:35 PM · Restricted Project, Diffusion, Clusters

Dec 5 2016

epriestley closed T11044: Support partitioning application databases across multiple database hosts as Resolved.

This has been running cleanly in production for roughly two weeks, and appears stable. secure001 stopped writing Files data at F1943597 and we're now at F2078289 on secure003. We saw a couple of minor setup issues (mostly: exception messages not being tailored enough) but no fundamental issues.

Dec 5 2016, 10:54 PM · Clusters, Prioritized

Nov 23 2016

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16930: Clean up another insufficiently-general exception.

Nov 23 2016, 6:31 PM · Clusters, Prioritized

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16924: Fix two setup issues arising from partitioning support.

Nov 23 2016, 12:47 PM · Clusters, Prioritized

Nov 22 2016

epriestley moved T11044: Support partitioning application databases across multiple database hosts from The Queue to Paused on the Prioritized board.

This seems to be working now. I'm going to let it sit in production for a while and see if any issues crop up before considering it resolved, but it seems like everything is working smoothly.

Nov 22 2016, 7:55 PM · Clusters, Prioritized

epriestley added a comment to T11044: Support partitioning application databases across multiple database hosts.

I configured 003 to replicate to 004:

Nov 22 2016, 7:26 PM · Clusters, Prioritized

epriestley added a comment to T11044: Support partitioning application databases across multiple database hosts.

Okay, we're headed back into readonly mode shortly to set up replication. I'm going to verify D16916 along the way so there may be some "partitions disagree about life" errors.

Nov 22 2016, 7:00 PM · Clusters, Prioritized

epriestley closed T11672: Evaluate persistent connections from HTTP contexts as Resolved by committing rP8c89fc38fc42: Allow persistent connections to be configured per database host.

Nov 22 2016, 6:57 PM · Restricted Project, Performance, Clusters, Database

epriestley closed T11672: Evaluate persistent connections from HTTP contexts, a subtask of T11044: Support partitioning application databases across multiple database hosts, as Resolved.

Nov 22 2016, 6:57 PM · Clusters, Prioritized

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16916: Fix a couple of partition migration bugs.

Nov 22 2016, 6:52 PM · Clusters, Prioritized

epriestley added a comment to T11044: Support partitioning application databases across multiple database hosts.

Here's what I've done so far:

Nov 22 2016, 6:46 PM · Clusters, Prioritized

epriestley added a comment to T11044: Support partitioning application databases across multiple database hosts.

I'm partitioning secure.phabricator.com now. Things will drop into read-only mode for a bit.

Nov 22 2016, 6:03 PM · Clusters, Prioritized

epriestley added a comment to T11044: Support partitioning application databases across multiple database hosts.

T11908 is a followup for executing queries for multiple applications on a single connection. I believe the pathway for that is straightforward and fairly short, but that no install would really reap substantial benefits from it today, so I don't expect to pursue it for some time.
I believe everything else is now complete, so I'll put this in production as soon as everything here lands and we can see what catches on fire.

Nov 22 2016, 2:58 PM · Clusters, Prioritized

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16913: Allow persistent connections to be configured per database host.

Nov 22 2016, 2:55 PM · Clusters, Prioritized

epriestley added a revision to T11672: Evaluate persistent connections from HTTP contexts: D16913: Allow persistent connections to be configured per database host.

Nov 22 2016, 2:41 PM · Restricted Project, Performance, Clusters, Database

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16912: Apply storage patches patch-by-patch, not database-by-database.

Nov 22 2016, 2:07 PM · Clusters, Prioritized

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16911: Remove "mysql.configuration-provider" configuration option.

Nov 22 2016, 12:36 PM · Clusters, Prioritized

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16910: When storage is partitioned, refuse to serve requests unless web and databases agree on partitioning.

Nov 22 2016, 2:08 AM · Clusters, Prioritized

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16909: Remove "mysql.implementation" configuration.

Nov 22 2016, 1:05 AM · Clusters, Prioritized

Nov 21 2016

epriestley closed T10759: Run PhabricatorDatabase/MySQLSetupCheck against all configured replicas, a subtask of T11044: Support partitioning application databases across multiple database hosts, as Resolved.

Nov 21 2016, 11:56 PM · Clusters, Prioritized

epriestley added a parent task for T11672: Evaluate persistent connections from HTTP contexts: T11044: Support partitioning application databases across multiple database hosts.

Nov 21 2016, 4:01 PM · Restricted Project, Performance, Clusters, Database

epriestley added a subtask for T11044: Support partitioning application databases across multiple database hosts: T11672: Evaluate persistent connections from HTTP contexts.

Nov 21 2016, 4:01 PM · Clusters, Prioritized

epriestley added a comment to T11044: Support partitioning application databases across multiple database hosts.

We have at least one tricky issue remaining: when applying storage upgrades, we currently apply them like this:

Nov 21 2016, 3:48 PM · Clusters, Prioritized

Nov 19 2016

epriestley added a comment to T11044: Support partitioning application databases across multiple database hosts.

General state of the world here:

Nov 19 2016, 10:11 PM · Clusters, Prioritized

Nov 16 2016

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16876: Support application partitioning across multiple masters.

Nov 16 2016, 1:17 AM · Clusters, Prioritized

Nov 13 2016

epriestley added a comment to T11044: Support partitioning application databases across multiple database hosts.

D16115 + D16847 + D16848 make these changes:

Nov 13 2016, 12:48 AM · Clusters, Prioritized

Nov 12 2016

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16848: Update `bin/storage` workflows to accommodate multiple masters.

Nov 12 2016, 8:05 PM · Clusters, Prioritized

epriestley added a revision to T11044: Support partitioning application databases across multiple database hosts: D16847: Update SchemaQuery and the web UI to accommodate multiple master databases.

Nov 12 2016, 5:27 PM · Clusters, Prioritized

Nov 8 2016

epriestley closed T11834: PhabricatorGuideQuickStartModule duplicates quickstart entries in cluster mode as Resolved by committing rPd032eea216b4: Discourage new users from exploring too much.

Nov 8 2016, 9:27 PM · Clusters, NUX, Phacility, Bug Report

epriestley added a revision to T11834: PhabricatorGuideQuickStartModule duplicates quickstart entries in cluster mode: D16828: Discourage new users from exploring too much.

Nov 8 2016, 9:21 PM · Clusters, NUX, Phacility, Bug Report

epriestley claimed T11834: PhabricatorGuideQuickStartModule duplicates quickstart entries in cluster mode.

It's just very
important to
thoroughly explore
the applications

Nov 8 2016, 9:19 PM · Clusters, NUX, Phacility, Bug Report

thoughtpolice created T11834: PhabricatorGuideQuickStartModule duplicates quickstart entries in cluster mode.

Nov 8 2016, 9:03 PM · Clusters, NUX, Phacility, Bug Report

Sep 29 2016

eadler added a project to T11672: Evaluate persistent connections from HTTP contexts: Restricted Project.

Sep 29 2016, 6:29 PM · Restricted Project, Performance, Clusters, Database

Sep 27 2016

epriestley closed T11524: Import-time diff generation fails for particular commits in cluster mode as Resolved.

This went out a while ago and I confirmed the fix in production.

Sep 27 2016, 4:43 PM · Phacility, Clusters, Daemons, Diffusion

Sep 23 2016

epriestley added a revision to T11672: Evaluate persistent connections from HTTP contexts: D16591: For now, disable persistent connections and the "max_connections" setup warning.

Sep 23 2016, 7:40 PM · Restricted Project, Performance, Clusters, Database

epriestley reopened T11672: Evaluate persistent connections from HTTP contexts as "Open".

Sep 23 2016, 7:37 PM · Restricted Project, Performance, Clusters, Database

Sep 21 2016

epriestley closed T11672: Evaluate persistent connections from HTTP contexts as Resolved.

Sep 21 2016, 9:47 PM · Restricted Project, Performance, Clusters, Database

epriestley added a comment to T11672: Evaluate persistent connections from HTTP contexts.

I've banged on this a reasonable amount locally without issues and the originating instance reports that this seems to have calmed things down in production with these patches, so it seems like this pretty much just worked.

Sep 21 2016, 9:47 PM · Restricted Project, Performance, Clusters, Database