Page MenuHomePhabricator

When no "master" database is configured, the ElasticSearch setup check can fatal
Open, NormalPublic

Description

See PHI36. To reproduce:

  • Configure a master + replica database cluster.
  • Configure MySQL indexing (the default config works).
  • "disable" the master database.
  • Restart the webserver.

Expect:

  • "No master configured" error.

Actual:

  • Fatal:
>>> UNRECOVERABLE FATAL ERROR <<<

Call to a member function getHealthRecord() on a non-object

.../phabricator/src/infrastructure/cluster/search/PhabricatorMySQLSearchHost.php:31


┻━┻ ︵ ¯\_(ツ)_/¯ ︵ ┻━┻

This error arises out of PhabricatorElasticsearchSetupCheck, which calls getAnyHostForRole('read') on each search service. This eventually ends up here, in PhabricatorMySQLSearchHost:

public function getHealthRecord() {
  if (!$this->healthRecord) {
    $ref = PhabricatorDatabaseRef::getMasterDatabaseRefForApplication(
      'search');
    $this->healthRecord = $ref->getHealthRecord();
  }
  return $this->healthRecord;
}

This method is proxying the health check for the underlying database master, but probably should not be. There is no guarantee such a database is configured. The parent class implements a default behavior of giving search services their own health checks, which seems correct. These services should likely have their own health checks.

This is also probably wrong:

public function getConnectionStatus() {
  PhabricatorDatabaseRef::queryAll();
  $ref = PhabricatorDatabaseRef::getMasterDatabaseRefForApplication('search');
  $status = $ref->getConnectionStatus();
  return $status;
}

Initially, it's unclear why we're calling queryAll() without effect (perhaps it throws some exception we care about, but it would be good to get a note here to explain what's going on since it's totally unobvious to me). It's also possible that $ref returns as null. If so, we should probably either return false or raise a tailored exception like PhabricatorClusterNoHostForRoleException.

See T12450.


Since a customer is impacted I'm going to patch this in a very hacky way by simply skipping this check in readonly mode, and hope to revisit it in connection with T12819.