Changeset View
Standalone View
src/infrastructure/cluster/PhabricatorClusterSearchRef.php
- This file was added.
<?php | |||||
abstract class PhabricatorClusterSearchRef | |||||
extends Phobject { | |||||
const KEY_REFS = 'cluster.search.refs'; | |||||
const KEY_HEALTH = 'cluster.search.health'; | |||||
protected $healthRecord; | |||||
protected $roles = array(); | |||||
protected $disabled; | |||||
protected $host; | |||||
protected $port; | |||||
protected $hostRefs; | |||||
const STATUS_OKAY = 'okay'; | |||||
const STATUS_FAIL = 'fail'; | |||||
public function setDisabled($disabled) { | |||||
$this->disabled = $disabled; | |||||
return $this; | |||||
} | |||||
epriestley: Slightly simpler as:
```
$this->roles = array_fuse($roles);
```
...which is just shorthand… | |||||
Done Inline ActionsAh yes, I forgot about that one. libphutil is full of handy helper functions :) 20after4: Ah yes, I forgot about that one. libphutil is full of handy helper functions :) | |||||
public function getDisabled() { | |||||
return $this->disabled; | |||||
} | |||||
public static function getConnectionStatusMap() { | |||||
return array( | |||||
self::STATUS_OKAY => array( | |||||
'icon' => 'fa-exchange', | |||||
'color' => 'green', | |||||
'label' => pht('Okay'), | |||||
), | |||||
self::STATUS_FAIL => array( | |||||
'icon' => 'fa-times', | |||||
'color' => 'red', | |||||
'label' => pht('Failed'), | |||||
), | |||||
); | |||||
} | |||||
public function isWritable() { | |||||
return $this->hasRole('write'); | |||||
} | |||||
public function isReadable() { | |||||
return $this->hasRole('read'); | |||||
} | |||||
public function hasRole($role) { | |||||
return isset($this->roles[$role]) && $this->roles[$role] === true; | |||||
} | |||||
public function setRoles(array $roles) { | |||||
foreach ($roles as $role => $val) { | |||||
$this->roles[$role] = $val; | |||||
} | |||||
return $this; | |||||
} | |||||
public function getRoles() { | |||||
return $this->roles; | |||||
} | |||||
public function setPort($value) { | |||||
$this->port = $value; | |||||
return $this; | |||||
} | |||||
public function getPort() { | |||||
return $this->port; | |||||
Done Inline ActionsMaybe offer a default config, instead of a default $refs? Most installs probably won't configure this, which means that we won't run most of the code in this method most of the time. If the default is a config instead of an actual object result, we'll run the code more often and there's a better chance we can catch errors with it. It would also be harder to make a change later in this method, forget to change the default case properly, and end up with a bug. epriestley: Maybe offer a default config, instead of a default `$refs`? Most installs probably won't… | |||||
Done Inline ActionsYeah I agree. This was kind of a hacky solution that I planned to revisit. 20after4: Yeah I agree. This was kind of a hacky solution that I planned to revisit. | |||||
} | |||||
public function setHost($value) { | |||||
$this->host = $value; | |||||
return $this; | |||||
} | |||||
public function getHost() { | |||||
return $this->host; | |||||
} | |||||
public function setHostRefs(array $refs) { | |||||
$this->hostRefs = $refs; | |||||
return $this; | |||||
} | |||||
/** @return PhabricatorClusterSearchRef */ | |||||
public function getHostForRole($role) { | |||||
$hosts = $this->getAllHostsForRole($role); | |||||
$random = array_rand($hosts); | |||||
return $hosts[$random]; | |||||
} | |||||
public function getAllHostsForRole($role) { | |||||
$hosts = array(); | |||||
foreach ($this->hostRefs as $host) { | |||||
if ($host->hasRole($role)) { | |||||
$hosts[] = $host; | |||||
} | |||||
} | |||||
return $hosts; | |||||
} | |||||
public function getHealthRecordCacheKey() { | |||||
$host = $this->getHost(); | |||||
$port = $this->getPort(); | |||||
$key = self::KEY_HEALTH; | |||||
return "{$key}({$host}, {$port})"; | |||||
} | |||||
/** | |||||
* @return PhabricatorClusterServiceHealthRecord | |||||
*/ | |||||
public function getHealthRecord() { | |||||
if (!$this->healthRecord) { | |||||
$this->healthRecord = new PhabricatorClusterServiceHealthRecord( | |||||
$this->getHealthRecordCacheKey()); | |||||
} | |||||
return $this->healthRecord; | |||||
} | |||||
public function didHealthCheck($reachable) { | |||||
$record = $this->getHealthRecord(); | |||||
$should_check = $record->getShouldCheck(); | |||||
if ($should_check) { | |||||
$record->didHealthCheck($reachable); | |||||
} | |||||
} | |||||
Done Inline ActionsJust write all of these out, this is impossible to grep for and there are only like 5 of them. This theoretically exposes us to weird bugs where you somehow get a bad config to call some setImportantSecurityFlag(false) or something. There's like zero risk here, but if we adopted this pattern in general I think it would bite us eventually. epriestley: Just write all of these out, this is impossible to grep for and there are only like 5 of them. | |||||
Done Inline ActionsAgreed.. This also performs badly (call_user_func can't be optimized very well by the runtime...) 20after4: Agreed.. This also performs badly (call_user_func can't be optimized very well by the runtime... | |||||
Not Done Inline ActionsSo I moved this to setConfig() in the service instance, and I list each property explicitly instead of fancy dynamic shenanigans. 20after4: So I moved this to setConfig() in the service instance, and I list each property explicitly… | |||||
/** @return PhabricatorClusterSearchRef[] */ | |||||
public static function getLiveServers() { | |||||
$cache = PhabricatorCaches::getRequestCache(); | |||||
Done Inline ActionsI think it's probably clearer not to try to index these servers by role earlier -- just loop over them here and pick out the ones you want. The cost recompute the list is tiny and doing the setup stuff makes dealing with the list a bit more involved without much of a benefit, I think. (If there was a benefit to caching this list because identifying servers in a particular role was expensive, it would probably be better to build that cache here, so we only have to build it if we actually need it. Most pages don't need to identify writable servers, for instance.) epriestley: I think it's probably clearer not to try to index these servers by role earlier -- just loop… | |||||
Done Inline ActionsYeah, I still fall victim to premature optimization occasionally. 20after4: Yeah, I still fall victim to premature optimization occasionally. | |||||
$refs = $cache->getKey(self::KEY_REFS); | |||||
if (!$refs) { | |||||
$refs = self::newRefs(); | |||||
$cache->setKey(self::KEY_REFS, $refs); | |||||
} | |||||
return $refs; | |||||
} | |||||
/** | |||||
* @return PhabricatorClusterSearchRef[] | |||||
*/ | |||||
public static function newRefs() { | |||||
$services = PhabricatorEnv::getEnvConfig('cluster.search'); | |||||
$types = id(new PhutilClassMapQuery()) | |||||
->setAncestorClass(__CLASS__) | |||||
Not Done Inline ActionsYou might just not have gotten this far yet, but I think Service vs Host need to stay separate all the way through, so this code looks more like: $services = self::getServicesForRole('write'); foreach ($services as $service) { $hosts = $service->getHosts(); foreach ($hosts as $host) { // ... } } (Or the host stuff goes in the $service or something like that.) But if you don't do this, it will be hard to implement this rule, I think:
As written, I think this loop will triple-write to ElasticSearch if there are three writable hosts. epriestley: You might just not have gotten this far yet, but I think `Service` vs `Host` need to stay… | |||||
Not Done Inline ActionsYes, I just hadn't gotten there yet. I should have mentioned it in my planned changes. 20after4: Yes, I just hadn't gotten there yet. I should have mentioned it in my planned changes. | |||||
->setUniqueMethod('getEngineIdentifier') | |||||
->execute(); | |||||
$refs = array(); | |||||
foreach ($services as $config) { | |||||
if (!isset($types[$config['type']])) { | |||||
throw new Exception(pht('Configured search server type is invalid: %s', | |||||
$config['type'])); | |||||
} | |||||
$type = $types[$config['type']]; | |||||
$service = clone($type); | |||||
$service->setConfig($config); | |||||
if (is_array($config['hosts']) && count($config['hosts'])) { | |||||
$hosts = array(); | |||||
foreach ($config['hosts'] as $host) { | |||||
$hostref = clone($service); | |||||
$hostref->setPort(idx($host, 'port')) | |||||
->setHost(idx($host, 'host')) | |||||
->setRoles(idx($host, 'roles', array())); | |||||
$hosts[] = $hostref; | |||||
} | |||||
$service->setHostRefs($hosts); | |||||
} | |||||
$refs[] = $service; | |||||
} | |||||
return $refs; | |||||
} | |||||
/** | |||||
* @return PhabricatorFulltextStorageEngine | |||||
*/ | |||||
abstract public function getEngine(); | |||||
abstract public function loadServerStatus(); | |||||
public static function reindexAbstractDocument( | |||||
PhabricatorSearchAbstractDocument $doc) { | |||||
$services = self::getLiveServers(); | |||||
$indexed = 0; | |||||
foreach ($services as $service) { | |||||
20after4AuthorUnsubmitted Not Done Inline ActionsWith the correct config, this should make it possible to do what @epriestley outlined above:
20after4: With the correct config, this should make it possible to do what @epriestley outlined above:
>… | |||||
$host = $service->getHostForRole('write'); | |||||
if ($host) { | |||||
$host->getEngine()->reindexAbstractDocument($doc); | |||||
$indexed++; | |||||
} | |||||
} | |||||
if ($indexed == 0) { | |||||
throw new Exception( | |||||
pht('No hosts were available to index fulltext document: %s', | |||||
$doc->getDocumentTitle())); | |||||
} | |||||
} | |||||
public static function executeSearch(PhabricatorSavedQuery $query) { | |||||
$services = self::getLiveServers(); | |||||
foreach ($services as $service) { | |||||
$hosts = $service->getAllHostsForRole('read'); | |||||
// try all hosts until one succeeds | |||||
foreach ($hosts as $host) { | |||||
$last_exception = null; | |||||
try { | |||||
$res = $host->getEngine()->executeSearch($query); | |||||
20after4AuthorUnsubmitted Not Done Inline ActionsI need to make the timeout configurable so that we can set the elasticsearch timeout to be shorter than the front-end web request. Then it'll be possible to get results even if one host is overloaded and taking forever to respond. 20after4: I need to make the timeout configurable so that we can set the elasticsearch timeout to be… | |||||
// return immediately if we get results without an exception | |||||
return $res; | |||||
} catch (Exception $ex) { | |||||
// try each server in turn, only throw if none succeed | |||||
$last_exception = $ex; | |||||
} | |||||
} | |||||
} | |||||
if ($last_exception) { | |||||
throw $last_exception; | |||||
} | |||||
return $res; | |||||
} | |||||
} |
Slightly simpler as:
...which is just shorthand for this loop.