Page Menu
Home
Phabricator
Search
Configure Global Search
Log In
Files
F14351882
D15249.diff
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
6 KB
Referenced Files
None
Subscribers
None
D15249.diff
View Options
diff --git a/src/docs/contributor/database.diviner b/src/docs/contributor/database.diviner
--- a/src/docs/contributor/database.diviner
+++ b/src/docs/contributor/database.diviner
@@ -28,11 +28,10 @@
=========
Each Phabricator application has its own database. The names are prefixed by
-`phabricator_` (this is configurable). This design has two advantages:
+`phabricator_` (this is configurable).
- - Each database is easier to comprehend and to maintain.
- - We don't do cross-database joins so each database can live on its own
- machine. This gives us flexibility in sharding data later.
+Phabricator uses a separate database for each application. To understand why,
+see @{article:Why does Phabricator need so many databases?}.
Connections
===========
diff --git a/src/docs/flavor/so_many_databases.diviner b/src/docs/flavor/so_many_databases.diviner
new file mode 100644
--- /dev/null
+++ b/src/docs/flavor/so_many_databases.diviner
@@ -0,0 +1,131 @@
+@title Why does Phabricator need so many databases?
+@group lore
+
+Phabricator uses about 60 databases (and we may have added more by the time you
+read this document). This sometimes comes as a surprise, since you might assume
+it would only use one database.
+
+The approach we use is designed to work at scale for huge installs with many
+thousands of users. We care a lot about working well for large installs, and
+about scaling up gracefully to meet the needs of growing organizations. We want
+small startups to be able to install Phabricator and have it grow with them as
+they expand to many thousands of employees.
+
+A cost of this approach is that it makes Phabricator more difficult to install
+on shared hosts which require a lot of work to create or authorize access to
+each database. However, Phabricator does a lot of advanced or complex things
+which are difficult to configure or manage on shared hosts, and we don't
+recommend installing it on a shared host. The install documentation explicitly
+discouarges installing on shared hosts.
+
+Broadly, in cases where we must choose between operating well at scale for
+growing organizations and installing easily on shared hosts, we prioritize
+operating at scale.
+
+
+Listing Databases
+=================
+
+You can get a full list of the databases Phabricator needs with `bin/storage
+databases`. It will look something like this:
+
+```
+$ /core/lib/phabricator/bin/storage databases
+secure_audit
+secure_calendar
+secure_chatlog
+secure_conduit
+secure_countdown
+secure_daemon
+secure_differential
+secure_draft
+secure_drydock
+secure_feed
+...<dozens more databases>...
+```
+
+Roughly, each application has its own database, and then there are some
+databases which support internal systems or shared infrastructure.
+
+
+Operating at Scale
+==================
+
+This storage design is aimed at large installs that may need more than one
+physical database server to handle the load the install generates.
+
+The primary reason we a database per application is to allow large installs to
+scale up by spreading database load across more hardware. A large organization
+with many thousands of active users may find themselves limited by the capacity
+of a single database backend.
+
+If so, they can launch a second backend, move some applications over to it, and
+continue piling on more users.
+
+This can't continue forever, but provides a substantial amount of headroom for
+large installs to spread the workload across more hardware and continue scaling
+up.
+
+To make this possible, we put each application in its own database and use
+database boundaries to enforce the logical constraints that the application
+must have in order for this to work. For example, we can not perform joins
+between separable tables, because they may not be on the same hardware.
+
+Establishing boundaries with application databases is a simple, straightforward
+way to partition storage and make administrative operations like spreading load
+realistic.
+
+
+Ease of Development
+===================
+
+This design is also easier for us to work with, and easier for users who
+want to work with the raw database data to understand and interact with.
+
+We have a large number of tables (more than 400) and we can not reasonably
+reduce the number of tables very much (each table generally represents some
+meaningful type of object in some application0. It's easier to develop with
+tables which are organized into separate application databases, just like it's
+easier to work with a large project if you organize source files into
+directories.
+
+If you aren't developing Phabricator and never look at the data in the
+database, you probably don't benefit from this organization. However, if you
+are a developer or want to extend Phabricator or look under the hood, it's
+easier to find what you're looking for and work with the tables and data when
+they're organized by application.
+
+
+Databases Have No Cost
+======================
+
+In almost all cases, creating databases has zero cost, just like organizing
+source code into directories has zero cost.
+
+Even if we didn't derive enormous benefits from this approach at scale, there
+is little reason //not// to organize storage like this.
+
+There are a handful of administrative tasks which are very slightly more
+complex to perform on multiple databases, but these are all either automated
+with `bin/storage` or easy to build on top of the list of databases emitted by
+`bin/storage databases`.
+
+For example, you can dump all the databases with `bin/storage dump`, and you
+can destroy all the databases with `bin/storage destroy`.
+
+As mentioned above, an exception to this is that if you're installing on a
+shared host and need to jump through hoops to individually authorize access to
+each database, databases do cost something.
+
+However, this cost is an artificial cost imposed by the selected environment,
+and this is only the first of many issues you'll run into trying to install and
+run Phabricator on a shared host. These issues are why we strongly discourage
+using shared hosts, and recommend against them in the install guide.
+
+
+Next Steps
+==========
+
+Continue by:
+
+ - learning more about databases in @{article:Database Schema}.
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Fri, Dec 20, 11:06 AM (1 h, 6 m)
Storage Engine
blob
Storage Format
Encrypted (AES-256-CBC)
Storage Handle
6910232
Default Alt Text
D15249.diff (6 KB)
Attached To
Mode
D15249: Write "Why does Phabricator need so many databases?"
Attached
Detach File
Event Timeline
Log In to Comment