System Design Interview: MySQL

https://www.quora.com/In-scaling-Memcache-at-Facebook-4-1-Regional-Invalidations-is-there-an-example-of-how-McSqueal-actually-derives-the-keys-to-be-deleted-based-on-an-SQL-statement
Facebook has a technology called wormhole that captures row changes from the MySQL binary logs and publishes them on a message bus. These messages can be used for invalidation.
https://code.fb.com/core-data/wormhole-pub-sub-system-moving-data-through-space-and-time/

At a high level, Wormhole propagates changes issued in one system to all systems that need to reflect those changes – within and across data centers.

One application of Wormhole is to connect our user databases (UDBs) with other services so they can operate based on the most current data. Here are three examples of systems that receive updates via Wormhole:

Caches – Refills and invalidation messages need to be sent to each cache so they stay in sync with their local database and consistent with each other.
Services – Services such as Graph Search that build specialized social indexes need to remain current and receive updates to the underlying data.
Data warehouse – The Data warehouses and bulk processing systems (Hadoop, Hbase, Hive, Oracle) receive streaming, real-time updates instead of relying on periodic database dumps.

The Wormhole system has three primary components:

Producer – Services and web tier use the Wormhole producer library to embed messages that get written to the binary log of the UDBs.
Publisher – The Wormhole publisher tails the binary log, extracts the embedded messages, and makes them available as real-time streams that can be subscribed to.
Consumer – Each interested consumer subscribes to relevant updates.

In order to satisfy a wide variety of use-cases, Wormhole has the following properties:

Data partitioning: On the databases, the user data is partitioned, or sharded, across a large number of machines. Updates are ordered within a shard but not necessarily across shards. This partitioning also isolates failures, allowing the rest of the system to keep working even when one or more storage machines have failed. Wormhole maintains a separate publish-subscribe stream per shard–think parallel wormholes in space.
Rewind in time: To deal with different failures (network, software, and hardware), services need to be able to go back to an earlier data checkpoint and start applying updates from that point onward. Wormhole supports check-pointing, state management, and a rewind feature.

Thanks to all of these properties, we are able to run Wormhole at a huge scale. For example, on the UDB deployment alone, Wormhole processes over 1 trillion messages every day (significantly more than 10 million messages every second). Like any system at Facebook’s scale, Wormhole is engineered to deal with failure of individual components, integrate with monitoring systems, perform automatic remediation, enable capacity planning, automate provisioning and adapt to sudden changes in usage pattern

System Design Interview

Sunday, March 24, 2019

MySQL - System Design

No comments:

Post a Comment