A system must have a column in each table to store a unique primary key ID, and if the system is distributed with multiple distributed databases, it is necessary to ensure that the IDs in each database are not duplicated. This requires the following characteristics for the unique ID:
- The ID must be unique across the entire system.
- The ID must be of numeric type and trend incrementally.
- The ID should be short for fast querying.
There are multiple ways to generate IDs, and big companies definitely use more complex methods. However, for small systems, the following methods are more than sufficient. Let's introduce them one by one.
UUID#
This is the most common solution, generating a UUID using a utility method.
Advantages:
- Simple implementation.
- Generated locally, no performance issues.
- Easy data migration due to globally unique IDs.
Disadvantages:
- The generated IDs are unordered and cannot guarantee trend increment.
- UUID strings have slow query efficiency.
- Large storage space.
- IDs themselves have no business meaning and are not readable.
Use cases:
- Generating tokens for token scenarios.
- Not suitable for scenarios that require trend-incrementing IDs.
MySQL Auto-Increment Primary Key#
This method is also commonly used, with simple configuration. It utilizes MySQL's auto-increment feature, which increments the ID by 1 by default.
Advantages:
- Numeric IDs that increment.
- High query efficiency.
- Some business readability.
Disadvantages:
- Single point of failure - if MySQL goes down, IDs cannot be generated.
- High database load, unable to handle high concurrency.
MySQL Multiple Instance Auto-Increment Primary Key#
Each instance has an initial value of 1, 2, 3...N, with a step size of N (in this case, the step size is 4).
Advantages: Solves the single point of failure issue.
Disadvantages: Once the step size is set, it cannot be expanded; and individual databases have high load, unable to meet high concurrency with their own performance.
Use cases: Scenarios where data does not need to be expanded.
Snowflake Algorithm#
The Snowflake algorithm generates a 64-bit binary positive integer, which is then converted to a decimal number. The 64-bit binary number consists of the following parts:
- 1-bit identifier: always 0.
- 41-bit timestamp: the timestamp difference (current timestamp - start timestamp), where the start timestamp is generally specified by the ID generator in our program.
- 10-bit machine identifier: can be deployed on 1024 nodes. If machines are deployed in different data centers (IDCs), these 10 bits can be composed of 5 bits for the data center ID and 5 bits for the machine ID.
- 12-bit sequence: the count within a millisecond, supporting 4096 ID numbers per millisecond for each node.
Java implementation:
// Code omitted for brevity
Advantages:
- This solution can generate around 4.096 million IDs per second, with fast performance.
- The timestamp is in the high bits and the incrementing sequence is in the low bits, resulting in IDs that are trend-incrementing and ordered by time.
- High flexibility, allowing for bit allocation adjustments based on business requirements.
Disadvantages:
- Relies on the machine's clock. If the server's clock is rolled back, it may result in duplicate IDs.
In distributed scenarios, server clock rollback is often encountered, usually within a range of 10 milliseconds. Some may think that 10 milliseconds is short and can be ignored. However, this algorithm is based on millisecond-level generation, so once a rollback occurs, duplicate IDs are very likely to be generated.
Redis Generation Solution#
Using the atomic operation "incr" in Redis to increment the ID, the algorithm is generally: year + number of days from the start of the year to the current day + day + hour + Redis increment.
Advantages: Incrementing in an ordered manner, with strong readability.
Disadvantages: Occupies bandwidth, requires making requests to Redis each time.
References: