to navigate

to select

to close

On this page

Data Types and Schema Design

Schema Design Principles

Good schema design balances normalization (reduce duplication), query performance (strategic denormalization), and operational simplicity (migrations, backups). Start normalized; denormalize only when profiling proves a bottleneck.

Numeric Types

  CREATE TABLE metrics (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    product_id INT UNSIGNED NOT NULL,
    view_count INT UNSIGNED NOT NULL DEFAULT 0,
    price DECIMAL(10, 2) NOT NULL,
    tax_rate DECIMAL(5, 4) NOT NULL DEFAULT 0.0825,
    weight_kg DOUBLE,
    is_featured TINYINT(1) NOT NULL DEFAULT 0
);

Type	Storage	Use Case
TINYINT	1 byte	Flags, small enums (0–255)
INT / BIGINT	4 / 8 bytes	IDs, counts — use UNSIGNED for non-negative
DECIMAL(p,s)	Variable	Money, exact fractions — never FLOAT for currency
FLOAT / DOUBLE	4 / 8 bytes	Scientific approximations only

  -- Wrong: floating point money
price FLOAT   -- 0.1 + 0.2 != 0.3

-- Right: fixed precision
price DECIMAL(10, 2)

Strings and Text

  CREATE TABLE articles (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    slug VARCHAR(64) NOT NULL,
    title VARCHAR(255) NOT NULL,
    summary VARCHAR(500),
    body MEDIUMTEXT,
    metadata JSON,
    UNIQUE KEY uk_slug (slug),
    FULLTEXT KEY ft_title_body (title, body)
);

Type	Max Length	Use Case
CHAR(n)	Fixed n	Country codes (`US`), fixed tokens
VARCHAR(n)	Up to 65,535 bytes	Names, emails, slugs
TEXT / MEDIUMTEXT / LONGTEXT	64KB / 16MB / 4GB	Long content
JSON	Document storage	Flexible attributes with indexing

Index only the prefix of long VARCHAR columns when necessary:

  CREATE INDEX idx_body_prefix ON articles (body(100));

Date and Time

  CREATE TABLE events (
    id INT UNSIGNED PRIMARY KEY,
    starts_at DATETIME(3) NOT NULL,
    ends_at DATETIME(3) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    timezone VARCHAR(50) DEFAULT 'UTC'
);

Type	Range	Notes
DATE	1000-01-01 to 9999-12-31	Date only
TIME	Time of day	Duration or clock time
DATETIME	1000 to 9999	No timezone conversion — store UTC
TIMESTAMP	1970–2038 (32-bit)	Converts to session timezone

Best practice: store UTC in DATETIME; convert to local timezone in application layer.

ENUM and SET

  CREATE TABLE orders (
    id INT UNSIGNED PRIMARY KEY,
    status ENUM('pending', 'paid', 'shipped', 'delivered', 'cancelled') DEFAULT 'pending',
    flags SET('gift', 'fragile', 'express') DEFAULT ''
);

ENUM is compact but altering values requires DDL. For frequently changing options, use a lookup table:

  CREATE TABLE order_statuses (
    id TINYINT UNSIGNED PRIMARY KEY,
    code VARCHAR(20) NOT NULL UNIQUE,
    label VARCHAR(50) NOT NULL
);

JSON Columns (MySQL 8.0+)

  CREATE TABLE products (
    id INT PRIMARY KEY,
    attributes JSON,
    CHECK (JSON_VALID(attributes))
);

INSERT INTO products VALUES (1, '{"color": "red", "size": "L", "tags": ["sale"]}');

SELECT id FROM products
WHERE attributes->>'$.color' = 'red';

CREATE INDEX idx_color ON products ((CAST(attributes->>'$.color' AS CHAR(20))));

Use JSON for semi-structured data that does not warrant a separate table — not as a replacement for proper relational design.

Constraints

  CREATE TABLE order_items (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    order_id INT UNSIGNED NOT NULL,
    product_id INT UNSIGNED NOT NULL,
    quantity INT UNSIGNED NOT NULL,
    unit_price DECIMAL(10, 2) NOT NULL,
    CONSTRAINT chk_quantity CHECK (quantity > 0 AND quantity <= 999),
    CONSTRAINT fk_order FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE,
    UNIQUE KEY uk_order_product (order_id, product_id)
);

Constraint	Purpose
PRIMARY KEY	Unique row identifier
FOREIGN KEY	Referential integrity
UNIQUE	Alternate key (email, slug)
NOT NULL	Required field
CHECK	Domain validation (8.0.16+)
DEFAULT	Fallback value

Normalization vs Denormalization

Third Normal Form (3NF): every non-key column depends on the key, the whole key, and nothing but the key.

  -- Normalized: no redundant user name on every order
CREATE TABLE orders (
    id INT PRIMARY KEY,
    user_id INT NOT NULL,
    total DECIMAL(10,2),
    FOREIGN KEY (user_id) REFERENCES users(id)
);

Denormalize when reads dominate and joins are expensive:

  -- Denormalized: cache user name on order for reporting speed
ALTER TABLE orders ADD COLUMN user_name VARCHAR(100);
-- Maintain via trigger or application on user update

Index-Friendly Schema Choices

Use smallest integer type for IDs (INT UNSIGNED until you exceed 4 billion rows)
Put high-cardinality columns first in composite indexes
Avoid storing computed values unless they are expensive — use generated columns:

  ALTER TABLE order_items
ADD COLUMN line_total DECIMAL(10,2) AS (quantity * unit_price) STORED;

Best Practices

Use utf8mb4 for all text columns
Add created_at / updated_at to every table
Use soft deletes (deleted_at) when audit history matters
Name tables and columns consistently (snake_case, singular or plural — pick one)
Version-control every schema change

Common Mistakes

Mistake	Impact
VARCHAR(255) everywhere	Wastes index space — size to actual need
No foreign keys “for performance”	Orphan rows, application-enforced integrity fails
Storing dates as strings	Breaks sorting and range queries
Over-normalizing	12-table join for a simple dashboard

Troubleshooting

“Row size too large” — MySQL row limit is ~65KB. Split wide tables or use TEXT/JSON off-page storage.

“Data too long for column” — check actual byte length with multibyte UTF-8:

  SELECT LENGTH('emoji😀'), CHAR_LENGTH('emoji😀');
-- LENGTH = 9 bytes, CHAR_LENGTH = 6 characters

Production Scenario

A multi-tenant SaaS uses tenant_id on every table with composite indexes leading with tenant_id. Row-level security is enforced in application queries (WHERE tenant_id = ?), and foreign keys within tenant scope prevent cross-tenant references. Schema migrations run through Flyway in CI with rollback scripts tested on staging clones.

Thoughtful schema design prevents costly migrations and performance problems as data grows from thousands to billions of rows.

SQL Basics

CRUD Operations

Data Types and Schema Design

Schema Design Principles link

Numeric Types link

Strings and Text link

Date and Time link

ENUM and SET link

JSON Columns (MySQL 8.0+) link

Constraints link

Normalization vs Denormalization link

Index-Friendly Schema Choices link

Best Practices link

Common Mistakes link

Troubleshooting link

Production Scenario link