diff --git a/src/docs/contributor/database.diviner b/src/docs/contributor/database.diviner --- a/src/docs/contributor/database.diviner +++ b/src/docs/contributor/database.diviner @@ -28,11 +28,10 @@ ========= Each Phabricator application has its own database. The names are prefixed by -`phabricator_` (this is configurable). This design has two advantages: +`phabricator_` (this is configurable). - - Each database is easier to comprehend and to maintain. - - We don't do cross-database joins so each database can live on its own - machine. This gives us flexibility in sharding data later. +Phabricator uses a separate database for each application. To understand why, +see @{article:Why does Phabricator need so many databases?}. Connections =========== diff --git a/src/docs/flavor/so_many_databases.diviner b/src/docs/flavor/so_many_databases.diviner new file mode 100644 --- /dev/null +++ b/src/docs/flavor/so_many_databases.diviner @@ -0,0 +1,131 @@ +@title Why does Phabricator need so many databases? +@group lore + +Phabricator uses about 60 databases (and we may have added more by the time you +read this document). This sometimes comes as a surprise, since you might assume +it would only use one database. + +The approach we use is designed to work at scale for huge installs with many +thousands of users. We care a lot about working well for large installs, and +about scaling up gracefully to meet the needs of growing organizations. We want +small startups to be able to install Phabricator and have it grow with them as +they expand to many thousands of employees. + +A cost of this approach is that it makes Phabricator more difficult to install +on shared hosts which require a lot of work to create or authorize access to +each database. However, Phabricator does a lot of advanced or complex things +which are difficult to configure or manage on shared hosts, and we don't +recommend installing it on a shared host. The install documentation explicitly +discouarges installing on shared hosts. + +Broadly, in cases where we must choose between operating well at scale for +growing organizations and installing easily on shared hosts, we prioritize +operating at scale. + + +Listing Databases +================= + +You can get a full list of the databases Phabricator needs with `bin/storage +databases`. It will look something like this: + +``` +$ /core/lib/phabricator/bin/storage databases +secure_audit +secure_calendar +secure_chatlog +secure_conduit +secure_countdown +secure_daemon +secure_differential +secure_draft +secure_drydock +secure_feed +...... +``` + +Roughly, each application has its own database, and then there are some +databases which support internal systems or shared infrastructure. + + +Operating at Scale +================== + +This storage design is aimed at large installs that may need more than one +physical database server to handle the load the install generates. + +The primary reason we a database per application is to allow large installs to +scale up by spreading database load across more hardware. A large organization +with many thousands of active users may find themselves limited by the capacity +of a single database backend. + +If so, they can launch a second backend, move some applications over to it, and +continue piling on more users. + +This can't continue forever, but provides a substantial amount of headroom for +large installs to spread the workload across more hardware and continue scaling +up. + +To make this possible, we put each application in its own database and use +database boundaries to enforce the logical constraints that the application +must have in order for this to work. For example, we can not perform joins +between separable tables, because they may not be on the same hardware. + +Establishing boundaries with application databases is a simple, straightforward +way to partition storage and make administrative operations like spreading load +realistic. + + +Ease of Development +=================== + +This design is also easier for us to work with, and easier for users who +want to work with the raw database data to understand and interact with. + +We have a large number of tables (more than 400) and we can not reasonably +reduce the number of tables very much (each table generally represents some +meaningful type of object in some application0. It's easier to develop with +tables which are organized into separate application databases, just like it's +easier to work with a large project if you organize source files into +directories. + +If you aren't developing Phabricator and never look at the data in the +database, you probably don't benefit from this organization. However, if you +are a developer or want to extend Phabricator or look under the hood, it's +easier to find what you're looking for and work with the tables and data when +they're organized by application. + + +Databases Have No Cost +====================== + +In almost all cases, creating databases has zero cost, just like organizing +source code into directories has zero cost. + +Even if we didn't derive enormous benefits from this approach at scale, there +is little reason //not// to organize storage like this. + +There are a handful of administrative tasks which are very slightly more +complex to perform on multiple databases, but these are all either automated +with `bin/storage` or easy to build on top of the list of databases emitted by +`bin/storage databases`. + +For example, you can dump all the databases with `bin/storage dump`, and you +can destroy all the databases with `bin/storage destroy`. + +As mentioned above, an exception to this is that if you're installing on a +shared host and need to jump through hoops to individually authorize access to +each database, databases do cost something. + +However, this cost is an artificial cost imposed by the selected environment, +and this is only the first of many issues you'll run into trying to install and +run Phabricator on a shared host. These issues are why we strongly discourage +using shared hosts, and recommend against them in the install guide. + + +Next Steps +========== + +Continue by: + + - learning more about databases in @{article:Database Schema}.