This article mainly helps to deepen the understanding of MySQL and optimize database performance in large data scenarios based on the foundation of learning MySQL basics.
MySQL Architecture

The design of MySQL reflects the idea of layered architecture, and can be divided from top to bottom as follows:
- Connection Layer: Includes various connectors and MySQL shell.
- Service Layer: SQL interface, parser, optimizer, cache, etc.
- Engine Layer: Includes storage engines such as InnoDB and MyISAM.
- Storage Layer: Handles interactions with disks and log files.
Engine Layer
Differences between InnoDB and MyISAM:
- InnoDB is designed for handling large datasets and high concurrency, while MyISAM focuses on performance.
- InnoDB supports row-level locking, transactions, and foreign keys, whereas MyISAM only supports table-level locking.
- InnoDB uses clustered indexes, while MyISAM uses non-clustered indexes as the following content shows.
- The MyISAM storage engine supports spatial data indexes (R-Tree), which can be used for geographic data storage.
MySQL Cache
The smallest unit of interaction between MySQL and disk is a page (default 1 page = 16KB).
The default size of the memory pool (Buffer Pool) is 128MB.
Thus, the default number of pages in the memory pool can be calculated as 128M / 16KB = 8172.
redo.log 和 bin.log
The redo log is the log at the InnoDB storage engine layer, while the binlog is the log recorded at the MySQL server layer. These two logs have some overlapping content (different formats). They can be used to recover data after a MySQL crash and restart.
Execution Process of an Update Statement
- Modify buffer (generate dirty pages).
- Generate corresponding redo.log and bin.log (log-ahead principle).
- Persist redo.log and bin.log.
- Return modification success.
## Transactions
Transactions refer to a set of operations that satisfy the ACID properties (Atomicity, Consistency, Isolation, Durability). A transaction can be committed using COMMIT, or rolled back using ROLLBACK.
MySQL Transaction Execution Process
- Start transaction.
- Apply for lock resources, place exclusive lock on row with id=2.
- Read the data pages to be modified into innodb_buffer_cache.
- Record the data with id=2 into the undo log.
- Record the modified data with id=2 into the redo log buffer.
- Change the name of id=2 in the buffer cache to “test”.
- Commit, trigger two-phase commit (2PC).
- End transaction.
Two-Phase Commit of Redo Logs
The writing of redo logs is split into two steps: prepare and commit. This is called “two-phase commit.”
It mainly ensures that the order of redo log transactions matches the order of binlog transactions (ensured by transaction IDs).
- Prepare phase: Persist redo to disk (redo group commit), and set the rollback segment to prepared status. Binlog does not perform any operations during this phase.
- Commit phase: InnoDB releases locks, releases rollback segments, sets the redo log commit status, persists binlog to disk, and then commits at the storage engine layer.
In simple terms, after executing an update statement, first set the redo log status to prepared, and then set it to committed after the update content is written to binlog.
Redo log persistence strategy configuration:
=0: Do not persist immediately upon transaction commit; instead, let background processes handle it.
=1: …Persist immediately.
=2: …Immediately hand over to operating system cache.
Concurrent Consistency Issues:
- Lost updates.
- Dirty reads.
- Non-repeatable reads.
- Phantom reads.
The following table shows the concurrent consistency issues that difference isolation levels can solve:
| Lost updates | Dirty reads | Non-repeatable reads | Phantom reads | |
|---|---|---|---|---|
| Read Uncommited | ❌ | ❌ | ❌ | ❌ |
| Read Commited | ❌ | ✅ | ❌ | ❌ |
| Repeatable Read | ❌ | ✅ | ✅ | ❌ |
| Serialization | ✅ | ✅ | ✅ | ✅ |
MVCC (Multi-Version Concurrency Control)
Under the Read Uncommitted mode: The transaction connection list can change.
Under the Repeatable Read mode: The pointer to the transaction connection list remains fixed.
Indexes
Explanation: A sorted data structure for fast lookup.
There are two ways to add indexes (index) to a data table:
CREATE INDEXALTER TABLE
Indexes also exist on disk and need to be loaded into memory for retrieval when required.
Advantages and Disadvantages of Indexes
Advantages: Speed up query and sorting.
Disadvantages: Creating an index duplicates pointers to table data, which slows down insertion and update speeds.
Reasons Why Indexes Are Slow
- Poorly written query statements.
- Indexes become ineffective.
- Too many join queries.
Clustered Indexes and Non-Clustered Indexes

Clustered indexes have data nodes as their leaf nodes, while non-clustered indexes still have index nodes, but they point to corresponding data blocks.
Generally, the primary key will automatically create a clustered index, and a table allows only one clustered index.
Index Matching
Best Left Prefix Principle
# Large Table Optimization
Database Logical Design Three Normal Forms:
1NF: Requires attributes to be atomic.
2NF: Requires records to have unique identifiers, i.e., no partial dependencies.
In 2NF, non-primary attributes must fully depend on primary attributes.
3NF: No field can be derived from other fields, no field redundancy, i.e., no transitive dependencies.
Each normal form necessarily satisfies the previous normal form. There are also BCNF and 4NF.
If there are cases where key fields determine key fields, it does not meet BCNF.
### Optimization Object: Data tables
Based on business types, data is generally divided into three categories:
- Flow-type data.
- Stateful data: Multiple business data depend on stateful data, and the correctness of this data must be ensured. For example, when making a payment or recharge, the original balance must be obtained to ensure successful payment.
- Configuration-type data.
Depending on the type of data, the optimization way also varies. Here is a summary:
| Data Growth Situation | Table Type | Business Characteristics | Optimization Core Idea |
|---|---|---|---|
| Millions of records relatively stable data volume | Status table | OLTP business direction | Don’t split if possible; horizontally scale read demand |
| Millions of records may reach billions or even higher | Flow table | OLTP business historical data | Business splitting, distributed storage design |
| Millions of records may reach billions or even higher | Flow table | OLTP business statistical data source | Design distributed expansion for statistical data needs |
| Millions of records shouldn’t have so much data | Configuration table | General business | Small and concise, avoid monolithic |
Vertical and Horizontal Partitioning of Databases and Tables
Vertical partitioning splits a table with many columns into multiple tables.
General partitioning principles:
- Place less commonly used fields in a separate table.
- Split out text, blob, and other large fields into sub-tables.
- Place frequently combined query columns in the same table.
Horizontal partitioning refers to the splitting of rows in a data table. When the number of rows exceeds 2 million, performance decreases. At this point, the data from one table can be split into multiple tables.
Some partitioning principles:
- Modulo partitioning: For example, a user table with 4 million records can be split into four tables to improve query efficiency.
- Part of the business logic can also be split by region, year, and other fields.
Problems Brought by Sharding Databases and Tables
- Transaction consistency issues, leading to distributed transactions.
- Cross-node join queries.
- Cross-node pagination, sorting functions.
- Primary key duplication: Consider using global primary keys.
- Common tables (parameter tables, data dictionary tables) need to be saved in each database, and all updates to common tables need to be executed multiple times.
References:
1、MySQL官网结构图: https://dev.mysql.com/doc/refman/8.0/en/pluggable-storage-overview.html
2、MySQL千万级大表优化,看这一篇就忘不掉了!: https://database.51cto.com/art/202002/610289.htm
3、表的垂直拆分和水平拆分:https://www.kancloud.cn/thinkphp/mysql-design-optimalize/39326
4、MYSQL索引:对聚簇索引和非聚簇索引的认识:https://blog.csdn.net/alexdamiao/article/details/51934917#:~:text=%E8%81%9A%E7%B0%87%E7%B4%A2%E5%BC%95%E6%98%AF%E5%AF%B9,%E5%AD%98%E5%9C%A8%E4%B8%80%E4%B8%AA%E8%81%9A%E7%B0%87%E7%B4%A2%E5%BC%95%E3%80%82
Comments
shortnamefor Disqus. Please set it in_config.yml.