User:ASarabadani (WMF)/Database for developers toolkit/Concepts/Glossary

Wikimedia-specific

 * Cluster: A group of database hosts that have one primary that replicates data to the rest. There are several types of clusters including "core sections" (s1, s2, ...), "parsercache" (pc1, pc2, ...), "external storage" (es1, es2, ...), "misc" (m1, m2, ...), extension (x1, x2).
 * Section: A type of cluster that sometimes incorrectly is called "shard", it contains MediaWiki main databases. "s1" has English Wikipedia, "s2" has several large wikis, "s3" has most of small wikis, "s4" has Wikimedia Commons, "s5" has German Wikipedia with several small wikis, "s6" has French, Russian and Japanese Wikipedia, "s7" has another set of large wikis (plus centralauth database) and "s8" has Wikidata
 * Abstract schema: RDBMS-agnostic database schemas in MediaWki. See the related RFC and the help page.

Database knowledge

 * Primary database or "master": The source of truth. It's the database that should mostly get writes and not reads. It replicates to replicas.
 * Replica or "slave": This is the database that's used for reading.
 * Replication lag: The latency between primary and replica. Usually it should be below 1 second.
 * Normalization: Mostly means avoiding repeating strings in storage. For example, avoiding repeating username of users in storing revision data and using user id instead.