Changeset View
Standalone View
src/docs/contributor/database.diviner
@title Database Schema | @title Database Schema | ||||
@group developer | @group developer | ||||
This document describes key components of the database schema and should answer | This document describes key components of the database schema and should answer | ||||
questions like how to store new types of data. | questions like how to store new types of data. | ||||
= Database System = | Database System | ||||
=============== | |||||
Phabricator uses MySQL with InnoDB engine. The only exception is the | Phabricator uses MySQL or another MySQL-compatible database (like MariaDB | ||||
or Amazon RDS). | |||||
Phabricator the InnoDB table engine. The only exception is the | |||||
polybuildr: Small typo? "Phabricator uses the"? | |||||
`search_documentfield` table which uses MyISAM because MySQL doesn't support | `search_documentfield` table which uses MyISAM because MySQL doesn't support | ||||
fulltext search in InnoDB. | fulltext search in InnoDB (recent versions do, but we haven't added support | ||||
yet). | |||||
Let us know if you need to use other database system: @{article:Give Feedback! | We are unlikely to ever support other incompatible databases like PostgreSQL or | ||||
Get Support!}. | SQLite. | ||||
Not Done Inline ActionsDoes this need to be caveated at all? (e.g. we *do* support MariaDB and Amazon RDS then?) btrahan: Does this need to be caveated at all? (e.g. we *do* support MariaDB and Amazon RDS then?) | |||||
Not Done Inline ActionsI'll make this more clear, I was mostly missing a term for "we don't support officially it but we kind-of-sort-of support it", like "tacit support" or "grey support". epriestley: I'll make this more clear, I was mostly missing a term for "we don't //support// officially it… | |||||
Not Done Inline Actions"Users have reported that Phabricator works with MariaDB and Amazon RDS." Basically states our relatively-weak technological mastery of these things and has a nice flavor to it...? btrahan: "Users have reported that Phabricator works with MariaDB and Amazon RDS." Basically states our… | |||||
= PHP Drivers = | PHP Drivers | ||||
=========== | |||||
Phabricator supports [[ http://www.php.net/book.mysql | MySQL ]] and | Phabricator supports [[ http://www.php.net/book.mysql | MySQL ]] and | ||||
[[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions. Most installations | [[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions. | ||||
use MySQL but MySQLi should work equally well. | |||||
= Databases = | Databases | ||||
========= | |||||
Each Phabricator application has its own database. The names are prefixed by | Each Phabricator application has its own database. The names are prefixed by | ||||
`phabricator_`. This design has two advantages: | `phabricator_` (this is configurable). This design has two advantages: | ||||
* Each database is easier to comprehend and to maintain. | - Each database is easier to comprehend and to maintain. | ||||
* We don't do cross-database joins so each database can live on its own machine | - We don't do cross-database joins so each database can live on its own | ||||
which is useful for load-balancing. | machine. This gives us flexibility in sharding data later. | ||||
= Connections = | Connections | ||||
=========== | |||||
Phabricator specifies if it will use any opened connection just for reading or | Phabricator specifies if it will use any opened connection just for reading or | ||||
also for writing. This allows opening write connections to master and read | also for writing. This allows opening write connections to a primary and read | ||||
connections to slave in master/slave replication. It is useful for | connections to a replica in primary/replica setups (which are not actually | ||||
load-balancing. | supported yet). | ||||
= Tables = | Tables | ||||
====== | |||||
Each table name is prefixed by its application. For example, Differential | Most table names are prefixed by their application names. For example, | ||||
revisions are stored in database `phabricator_differential` and table | Differential revisions are stored in database `phabricator_differential` and | ||||
`differential_revision`. This duplicity allows easy recognition of the table in | table `differential_revision`. This generally makes queries easier to recognize | ||||
DarkConsole (see @{article:Using DarkConsole}) and other places. | and understand. | ||||
The exception is tables which share the same schema over different databases | The exception is a few tables which share the same schema over different | ||||
such as `edge`. | databases such as `edge`. | ||||
We use lower-case table names with words separated by underscores. The reason is | We use lower-case table names with words separated by underscores. | ||||
that MySQL can be configured (with `lower_case_table_names`) to lower-case the | |||||
table names anyway. | |||||
= Column Names = | Column Names | ||||
============ | |||||
Phabricator uses camelCase names for columns. The main advantage is that they | Phabricator uses `camelCase` names for columns. The main advantage is that they | ||||
directly map to properties in PHP classes. | directly map to properties in PHP classes. | ||||
Don't use MySQL reserved words (such as `order`) for column names. | Don't use MySQL reserved words (such as `order`) for column names. | ||||
= Data Types = | Data Types | ||||
========== | |||||
Phabricator uses `int unsigned` columns for storing dates instead of `date` or | |||||
`datetime`. We don't need to care about time-zones in both MySQL and PHP because | |||||
of it. The other reason is that PHP internally uses numbers for storing dates. | |||||
Phabricator uses UTF-8 encoding for storing all text data. We use | Phabricator defines a set of abstract data types (like `uint32`, `epoch`, and | ||||
`utf8_general_ci` collation for free-text and `utf8_bin` for identifiers. | `phid`) which map to MySQL column types. The mapping depends on the MySQL | ||||
version. | |||||
Phabricator uses `utf8mb4` character sets where available (MySQL 5.5 or newer), | |||||
and `binary` character sets in most other cases. The primary motivation is to | |||||
allow 4-byte unicode characters to be stored (the `utf8` character set, which | |||||
is more widely available, does not support them). On newer MySQL, we use | |||||
`utf8mb4` to take advantage of improved collation rules. | |||||
Phabricator stores dates with an `epoch` abstract data type, which maps to | |||||
`int unsigned`. Although this makes dates less readable when browsing the | |||||
database, it makes date and time manipulation more consistent and | |||||
straightforward in the application. | |||||
We don't use the `enum` data type because each change to the list of possible | We don't use the `enum` data type because each change to the list of possible | ||||
values requires altering the table (which is slow with big tables). We use | values requires altering the table (which is slow with big tables). We use | ||||
numbers (or short strings in some cases) mapped to PHP constants instead. | numbers (or short strings in some cases) mapped to PHP constants instead. | ||||
= JSON = | JSON and Other Serialized Data | ||||
============================== | |||||
Some data don't require structured access - you don't need to filter or order by | Some data don't require structured access -- we don't need to filter or order by | ||||
them. We store these data as text fields in JSON format. This approach has | them. We store these data as text fields in JSON format. This approach has | ||||
several advantages: | several advantages: | ||||
* If we decide to add another unstructured field then we don't need to alter the | - If we decide to add another unstructured field then we don't need to alter | ||||
table (which is slow for big tables in MySQL). | the table (which is slow for big tables in MySQL). | ||||
* Table structure is not cluttered by fields which could be unused most of the | - Table structure is not cluttered by fields which could be unused most of the | ||||
time. | time. | ||||
An example of such usage can be found in column | An example of such usage can be found in column | ||||
`differential_diffproperty.data`. | `differential_diffproperty.data`. | ||||
= Primary Keys = | Primary Keys | ||||
============ | |||||
Most tables have auto-increment column named `id`. However creating such column | Most tables have auto-increment column named `id`. Adding an ID column is | ||||
is not required for tables which are not usually directly referenced (such as | appropriate for most tables (even tables that have another natural unique key), | ||||
tables expressing M:N relations). Example of such table is | as it improves consistency and makes it easier to perform generic operations | ||||
`differential_relationship`. | on objects. | ||||
For example, @{class:LiskMigrationIterator} allows you to very easily apply a | |||||
migration to a table using a constant amount of memory provided the table has | |||||
an `id` column. | |||||
= Indexes = | Indexes | ||||
====== | |||||
Create all indexes necessary for fast query execution in most cases. Don't | Create all indexes necessary for fast query execution in most cases. Don't | ||||
create indexes which are not used. You can analyze queries @{article:Using | create indexes which are not used. You can analyze queries @{article:Using | ||||
DarkConsole}. | DarkConsole}. | ||||
Older MySQL versions are not able to use indexes for tuple search: | Older MySQL versions are not able to use indexes for tuple search: | ||||
`(a, b) IN ((%s, %d), (%s, %d))`. Use `AND` and `OR` instead: | `(a, b) IN ((%s, %d), (%s, %d))`. Use `AND` and `OR` instead: | ||||
`((a = %s AND b = %d) OR (a = %s AND b = %d))`. | `((a = %s AND b = %d) OR (a = %s AND b = %d))`. | ||||
= Foreign Keys = | Foreign Keys | ||||
============ | |||||
We don't use InnoDB's foreign keys because our application is so great that | We don't use foreign keys because they're complicated and we haven't experienced | ||||
no inconsistencies can arise. It will just slow us down. | significant issues with data inconsistency that foreign keys could help prevent. | ||||
Empirically, we have witnessed first hand as `ON DELETE CASCADE` relationships | |||||
accidentally destroy huge amounts of data. We may pursue foreign keys | |||||
eventually, but there isn't a strong case for them at the present time. | |||||
= PHIDs = | PHIDs | ||||
===== | |||||
Each globally referencable object in Phabricator has its associated PHID | Each globally referencable object in Phabricator has its associated PHID | ||||
(Phabricator ID) which serves as a global identifier. We use PHIDs for | ("Phabricator ID") which serves as a global identifier, similar to a GUID. | ||||
referencing data in different databases. | We use PHIDs for referencing data in different databases. | ||||
We use both autoincrementing IDs and global PHIDs because each is useful in | We use both autoincrementing IDs and global PHIDs because each is useful in | ||||
different contexts. Autoincrementing IDs are chronologically ordered and allow | different contexts. Autoincrementing IDs are meaningfully ordered and allow | ||||
us to construct short, human-readable object names (like D2258) and URIs. Global | us to construct short, human-readable object names (like `D2258`) and URIs. | ||||
PHIDs allow us to represent relationships between different types of objects in | Global PHIDs allow us to represent relationships between different types of | ||||
a homogeneous way. | objects in a homogeneous way. | ||||
For example, the concept of "subscribers" is more powerfully done with PHIDs | For example, infrastructure like "subscribers" can be implemented easily with | ||||
because we could theoretically have users, projects, teams, and more all as | PHID relationships: different types of objects (users, projects, mailing lists) | ||||
"subscribers" of other objects. Using an ID column we would need to add a | are permitted to subscribe to different types of objects (revisions, tasks, | ||||
"type" column to avoid ID collision; using PHIDs does not require this | etc). Without PHIDs, we would need to add a "type" column to avoid ID collision; | ||||
additional column. | using PHIDs makes implementing features like this simpler. | ||||
= Transactions = | Transactions | ||||
============ | |||||
Transactional code should be written using transactions. Example of such code is | Transactional code should be written using transactions. Example of such code is | ||||
inserting multiple records where one doesn't make sense without the other or | inserting multiple records where one doesn't make sense without the other or | ||||
selecting data later used for update. See chapter in @{class:LiskDAO}. | selecting data later used for update. See chapter in @{class:LiskDAO}. | ||||
= Advanced Features = | Advanced Features | ||||
================= | |||||
We don't use MySQL advanced features such as triggers, stored procedures or | We don't use MySQL advanced features such as triggers, stored procedures or | ||||
events because we like expressing the application logic in PHP more than in SQL. | events because we like expressing the application logic in PHP more than in SQL. | ||||
Some of these features (especially triggers) can also cause big confusion. | Some of these features (especially triggers) can also cause a great deal of | ||||
confusion, and are generally more difficult to debug, profile, version control, | |||||
update, and understand than application code. | |||||
Schema Denormalization | |||||
====================== | |||||
Phabricator uses schema denormalization sparingly. Avoid denormalization unless | |||||
there is a compelling reason (usually, performance) to denormalize. | |||||
Schema Changes and Migrations | |||||
============================= | |||||
To create a new schema change or migration: | |||||
**Create a database patch**. Database patches go in | |||||
`resources/sql/autopatches/`. To change a schema, use a `.sql` file and write | |||||
in SQL. To perform a migration, use a `.php` file and write in PHP. Name your | |||||
file `YYYYMMDD.patchname.ext`. For example, `20141225.christmas.sql`. | |||||
**Keep patches small**. Most schema change statements are not transactional. If | |||||
a patch contains several SQL statements and fails partway through, it normally | |||||
can not be rolled back. When a user tries to apply the patch again later, the | |||||
first statement (which, for example, adds a column) may fail (because the column | |||||
already exists). This can be avoided by keeping patches small (generally, one | |||||
statement per patch). | |||||
**Use namespace and character set variables**. When defining a `.sql` patch, | |||||
you should use these variables instead of hard-coding namespaces or character | |||||
set names: | |||||
Avoiding these advanced features is also good for supporting other database | | Variable | Meaning | Notes | | ||||
systems (which we don't support anyway). | |---|---|---| | ||||
| {$NAMESPACE} | Storage Namespace | Defaults to `phabricator` | | |||||
| {$CHARSET} | Default Charset | Mostly used to specify table charset | | |||||
| {$COLLATE_TEXT} | Text Collation | For most text (case-sensitive) | | |||||
| {$COLLATE_SORT} | Sort Collation | For sortable text (case-insensitive) | | |||||
| {$CHARSET_FULLTEXT} | Fulltext Charset | Specify explicitly for fulltext | | |||||
| {$COLLATE_FULLTEXT} | Fulltext Collate | Specify explicitly for fulltext | | |||||
= Schema Denormalization = | |||||
Phabricator uses schema denormalization for performance reasons sparingly. Try | **Test your patch**. Run `bin/storage upgrade` to test your patch. | ||||
Not Done Inline Actionsany sort of step a la rP915d9a52f085e0ecb3c00208f1a5809761b76fac you think or is that overkill? btrahan: any sort of step a la rP915d9a52f085e0ecb3c00208f1a5809761b76fac you think or is that overkill? | |||||
Not Done Inline ActionsI'm going to make bin/storage adjust warn (but not fail) on that, and then make bin/storage upgrade apply bin/storage adjust after upgrading, so the net effect will be that the issue becomes a lot harder to get wrong and a lot less serious someone does. epriestley: I'm going to make `bin/storage adjust` warn (but not fail) on that, and then make `bin/storage… | |||||
to avoid it if possible. | |||||
= Changing the Schema = | |||||
There are three simple steps to update the schema: | |||||
# Create a `.sql` file in `resources/sql/patches/`. This file should: | |||||
- Contain the appropriate MySQL commands to update the schema. | |||||
- Be named as `YYYYMMDD.patchname.ext`. For example, `20130217.example.sql`. | |||||
- Use `${NAMESPACE}` rather than `phabricator` for database names. | |||||
- Use `COLLATE utf8_bin` for any columns that are to be used as identifiers, | |||||
such as PHID columns. Otherwise, use `COLLATE utf8_general_ci`. | |||||
- Name all indexes so it is possible to delete them later. | |||||
# Edit `src/infrastructure/storage/patch/PhabricatorBuiltinPatchList.php` and | |||||
add your patch to | |||||
@{method@phabricator:PhabricatorBuiltinPatchList::getPatches}. | |||||
# Run `bin/storage upgrade`. | |||||
It is also possible to create more complex patches in PHP for data migration | |||||
(due to schema changes or otherwise.) However, the schema changes themselves | |||||
should be done in separate `.sql` files. Order can be guaranteed by editing | |||||
`src/infrastructure/storage/patch/PhabricatorBuiltinPatchList.php` | |||||
appropriately. | |||||
See the | |||||
[[https://secure.phabricator.com/rPb39175342dc5bee0c2246b05fa277e76a7e96ed3 | |||||
| commit adding policy storage for Paste ]] for a reasonable example of the code | |||||
changes. | |||||
= See Also = | See Also | ||||
======== | |||||
* @{class:LiskDAO} | - @{class:LiskDAO} | ||||
* @{class:PhabricatorPHID} |
Small typo? "Phabricator uses the"?