Data validation library for Python enforcing type safety and constraints at system boundaries.
| Question | Answer |
|---|---|
| When to use | Validating external data at system boundaries (APIs, configs, external services) |
| When to skip | Internal data already validated, performance-critical paths |
| Default choice | Pydantic v2 BaseModel for DTOs, BaseSettings for config |
| Key pattern | Validate once at boundary → use plain types (dataclass) internally |
| Boundary | API endpoint, config loading, external service response, file ingestion |
| Feature | Pydantic v1 | Pydantic v2 |
|---|---|---|
| Validators | @validator |
@field_validator |
| Model validators | @root_validator |
@model_validator |
| Settings | from pydantic import BaseSettings |
from pydantic_settings import BaseSettings |
| Serialization | .dict(), .json() |
.model_dump(), .model_dump_json() |
| ORM mode | Config.orm_mode = True |
model_config = ConfigDict(from_attributes=True) |
| Config class | class Config: |
model_config = ConfigDict(...) |
| Regex | Field(regex=...) |
Field(pattern=...) |
# v2 (recommended)
pip install pydantic pydantic-settings
# v1 (legacy)
pip install "pydantic<2"Owner: python-quality-assistant
Applies when: a Python service class or internal domain model inherits from pydantic.BaseModel instead of using dataclass / plain types, in code reached only from already-validated callers (not at an API boundary, queue consumer, file parser, or other untrusted-input ingestion point).
Enforcement: rules/python/boundary-validation-only.yml flags all class X(BaseModel): ... declarations as a first-pass filter. The agent rules out the "actually at a boundary" case by inspecting the file path (e.g. pkg/api/, pkg/handler/, pkg/ingestion/).
Why: Pydantic validation runs on every instantiation. At the system boundary, that cost is justified — incoming JSON could be malformed. Inside the service, the data has already been validated at ingestion; re-validating wastes CPU on every method call and obscures the trust boundary (a reader can't tell which User instances are "raw input" vs "already trusted"). Boundary-only use keeps the validation cost where the trust boundary actually is and makes internal code free of **user.dict() re-validation rituals.
class UserService:
def get_user(self, user_id: int) -> User: # User is a Pydantic BaseModel
user = self._repo.find_by_id(user_id) # validates again on every read
return user# Pydantic at the boundary only
@app.post("/users")
def create_user(request: CreateUserRequest): # validates incoming JSON
user_service.create(request.to_entity()) # internal uses plain User type
# Internal types use dataclass — no validation overhead on trusted data
from dataclasses import dataclass
@dataclass
class User:
id: int
name: str
email: strConstraint: Data MUST be validated exactly once at ingestion.
Rationale: Re-validation wastes CPU cycles and obscures the trust boundary.
Examples:
# [GOOD] - Validate once at ingestion
class EventIngestion:
def ingest(self, raw_events: list[dict]) -> list[Event]:
return [Event(**e) for e in raw_events] # Validate here
class EventProcessor:
def process_events(self, events: list[Event]) -> None:
for event in events:
self._handle(event) # Already validated
# [BAD] - Re-validate already validated data
class UserService:
def create_user(self, user: User) -> None:
validated_user = User(**user.dict()) # Redundant validation
self._repo.save(validated_user)Constraint: Internal domain models MUST use dataclass or plain types, not BaseModel.
Rationale: Avoids validation overhead on trusted data; separates concerns between DTOs and domain entities.
Examples:
# [GOOD] - dataclass for internal domain model
from dataclasses import dataclass
@dataclass
class UserEntity:
id: int
name: str
email: str
class UserDTO(BaseModel): # Pydantic at boundary only
id: int
name: str
# [BAD] - Pydantic for internal domain model
class UserEntity(BaseModel): # Unnecessary validation overhead
id: int
name: strOwner: python-quality-assistant
Applies when: a Pydantic BaseModel field is typed Optional[T] / T | None (intended to be omittable) but has no default value assigned.
Enforcement: rules/python/optional-needs-default.yml flags name: Optional[T] and name: T | None annotations inside BaseModel subclasses that lack a = None or = Field(...) default. The agent confirms intent when the absence is intentional.
Why: Optional[T] is a type-system declaration that values can be None — it does NOT make the field omittable. Pydantic still requires the field at instantiation; callers must explicitly pass name=None. The "field omitted ⇒ default applied" semantic only kicks in when a default is declared. The bug is silent at definition time and explodes as a ValidationError: field required at instantiation — usually in a code path the author thought was optional. Always pair Optional[T] with = None (or = Field(default=...)) when the intent is "may be omitted."
class User(BaseModel):
name: Optional[str] # type says "T or None"; field is still REQUIRED
User() # ValidationError: name field requiredclass User(BaseModel):
name: Optional[str] = None # truly omittable: default kicks in when omitted
User() # OK — name = NoneConstraint: Numeric and string constraints MUST use Field() parameters, not custom validators.
Rationale: Built-in constraints are optimized and generate accurate JSON Schema.
Examples:
# [GOOD] - Use Field() constraints
class Product(BaseModel):
name: str = Field(min_length=1, max_length=100)
price: float = Field(gt=0, le=10000)
sku: str = Field(pattern=r"^[A-Z]{3}-\d{4}$") # v2: pattern, v1: regex
# [BAD] - Custom validator for simple constraints
class Product(BaseModel):
price: float
@field_validator("price")
@classmethod
def validate_price(cls, v):
if v <= 0 or v > 10000:
raise ValueError("...")
return vConstraint: Mutable defaults MUST use Field(default_factory=...).
Rationale: Explicit factory prevents accidental shared state; clarity over implicit safety.
Examples:
# [GOOD] - Explicit default factory
class User(BaseModel):
tags: list[str] = Field(default_factory=list)
# [BAD] - Implicit mutable default (works but unclear)
class User(BaseModel):
tags: list[str] = []Constraint: In Pydantic v2, @field_validator MUST be decorated with @classmethod and include type hints.
Rationale: v2 requires explicit classmethod decorator; type hints enable proper IDE support.
Examples:
# [GOOD] - Pydantic v2 validator
@field_validator("name")
@classmethod
def validate_name(cls, v: str) -> str:
if not v.strip():
raise ValueError("Name cannot be blank")
return v.strip()
# [BAD] - Missing classmethod (v2)
@field_validator("name")
def validate_name(cls, v): # Will fail in v2
return v.strip()Constraint: Data transformation before validation MUST use mode="before" (v2) or pre=True (v1).
Rationale: Ensures transformation happens before type coercion and validation.
Examples:
# [GOOD] - v2 pre-validation
@field_validator("raw_value", mode="before")
@classmethod
def clean_value(cls, v: str) -> str:
return v.strip().lower()
# [GOOD] - v1 pre-validation
@validator("raw_value", pre=True)
def clean_value(cls, v):
return v.strip().lower()Constraint: Validation involving multiple fields MUST use @model_validator (v2) or @root_validator (v1).
Rationale: Field validators only see one field; model validators access all fields.
Examples:
# [GOOD] - v2 model validator
@model_validator(mode="after")
def check_dates(self) -> "DateRange":
if self.start_date > self.end_date:
raise ValueError("start_date must be before end_date")
return self
# [GOOD] - v1 root validator
@root_validator
def check_dates(cls, values):
if values.get("start_date") > values.get("end_date"):
raise ValueError("start_date must be before end_date")
return valuesConstraint: Business rules MUST NOT be implemented in Pydantic validators.
Rationale: Validators are for data format; business rules belong in service layer for testability and reuse.
Examples:
# [GOOD] - Data validation only, business logic in service
class UserInput(BaseModel):
age: int = Field(ge=0, le=150) # Data constraint
email: EmailStr # Format validation
class UserService:
def create_user(self, input: UserInput) -> None:
if input.age < 18: # Business rule
raise BusinessRuleError("User must be 18+")
if not input.email.endswith("@company.com"): # Business rule
raise BusinessRuleError("Must use company email")
# [BAD] - Business logic in validator
class User(BaseModel):
age: int = Field(ge=18) # Business rule masquerading as data validation
@field_validator("email")
@classmethod
def email_must_be_company_domain(cls, v):
if not v.endswith("@company.com"): # Business logic
raise ValueError("Must use company email")
return vConstraint: Read-only DTOs MUST use frozen=True configuration.
Rationale: Prevents accidental mutation after validation; ensures data integrity.
Examples:
# [GOOD] - v2 frozen model
class ReadOnlyUser(BaseModel):
model_config = ConfigDict(frozen=True)
id: int
name: str
# [GOOD] - v1 frozen model
class ReadOnlyUser(BaseModel):
class Config:
frozen = True
# [BAD] - Mutable model allows bypassing validation
user = User(id=1, name="Alice")
user.age = -5 # Invalid value - no error raised!Constraint: validate_assignment=True MUST NOT be used in performance-critical code.
Rationale: Validates on every attribute assignment, causing O(n) validation for n assignments.
Examples:
# [BAD] - Performance issue with validate_assignment
class User(BaseModel):
model_config = ConfigDict(validate_assignment=True)
name: str
for i in range(10000):
user.name = f"User {i}" # Validates 10,000 times!Constraint: v2 code MUST use .model_dump() and .model_dump_json(); v1 code MUST use .dict() and .json().
Rationale: API changed between versions; using wrong methods causes AttributeError.
Examples:
# [GOOD] - v2 serialization
user.model_dump()
user.model_dump_json()
user.model_dump(exclude_unset=True)
# [GOOD] - v1 serialization
user.dict()
user.json()
user.dict(exclude_unset=True)Constraint: API models receiving external input MUST use extra="forbid" to reject unknown fields.
Rationale: Prevents silent acceptance of typos or malicious extra fields.
Examples:
# [GOOD] - Reject unknown fields
class CreateUserRequest(BaseModel):
model_config = ConfigDict(extra="forbid")
name: str
email: str
# Request with typo "emial" will fail instead of being silently ignored
# [BAD] - Allow unknown fields (default)
class CreateUserRequest(BaseModel):
name: str
email: str
# {"name": "Alice", "emial": "typo@example.com"} silently ignores typoConstraint: ValidationError MUST be caught and converted to structured API responses; silent failures are forbidden.
Rationale: Silent failures hide bugs; structured errors enable client-side handling.
Examples:
# [GOOD] - Explicit error handling
try:
user = User(**data)
except ValidationError as e:
logger.error(f"Validation failed: {e.json()}")
raise HTTPException(status_code=400, detail=e.errors())
# [BAD] - Silent failure
try:
user = User(**data)
except ValidationError:
user = None # Bug hidden, None propagatesConstraint: Fields where coercion could cause bugs MUST use Strict* types.
Rationale: Default coercion can produce unexpected results (e.g., "1" → True for bool).
Examples:
# [GOOD] - Strict types prevent coercion surprises
from pydantic import StrictBool, StrictInt
class Config(BaseModel):
enabled: StrictBool # Only accepts True/False
count: StrictInt # Only accepts int, not "123"
# [BAD] - Unexpected coercion
class Config(BaseModel):
enabled: bool
Config(enabled="yes") # Becomes True
Config(enabled="1") # Becomes True
Config(enabled=1) # Becomes TrueConstraint: Datetime fields requiring timezone awareness MUST validate tzinfo is not None.
Rationale: Naive datetimes cause subtle bugs in distributed systems.
Examples:
# [GOOD] - Enforce timezone awareness
class Event(BaseModel):
timestamp: datetime
@field_validator("timestamp")
@classmethod
def ensure_timezone(cls, v: datetime) -> datetime:
if v.tzinfo is None:
raise ValueError("Datetime must be timezone-aware")
return v
# [BAD] - Accept naive datetime
class Event(BaseModel):
timestamp: datetime # No validation, accepts naive datetimeConstraint: In Pydantic v2, BaseSettings MUST be imported from pydantic_settings, not pydantic.
Rationale: Settings functionality was moved to separate package in v2.
Examples:
# [GOOD] - v2 settings
from pydantic_settings import BaseSettings, SettingsConfigDict
class AppConfig(BaseSettings):
model_config = SettingsConfigDict(env_file=".env")
database_url: str
# [GOOD] - v1 settings
from pydantic import BaseSettings
class AppConfig(BaseSettings):
class Config:
env_file = ".env"Constraint: Pydantic validation MUST NOT occur inside tight loops.
Rationale: Validation overhead multiplied by iteration count causes significant latency.
Examples:
# [GOOD] - Validate once, iterate validated data
validated_items = [Item(**item) for item in large_list] # Validate all
for item in validated_items:
process(item)
# [BAD] - Validate inside loop
for item in large_list:
validated = Item(**item) # 10,000 validations!
process(validated)Examples:
from fastapi import FastAPI
from pydantic import BaseModel, Field, field_validator
app = FastAPI()
class CreateUserRequest(BaseModel):
name: str = Field(min_length=1, max_length=100)
email: EmailStr
age: int = Field(ge=18, le=150)
@field_validator("name")
@classmethod
def name_must_not_be_blank(cls, v: str) -> str:
if not v.strip():
raise ValueError("Name cannot be blank")
return v.strip()
@app.post("/users")
def create_user(request: CreateUserRequest):
# FastAPI automatically validates request body
return {"id": 1, "name": request.name}Examples:
# [GOOD] - v2 ORM conversion
class OrderResponse(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: int
status: str
# [GOOD] - v1 ORM conversion
class OrderResponse(BaseModel):
class Config:
orm_mode = TrueExamples:
from enum import Enum
from pydantic import BaseModel, ConfigDict
class Status(str, Enum):
PENDING = "pending"
ACTIVE = "active"
class Order(BaseModel):
model_config = ConfigDict(use_enum_values=True)
status: Status
order.model_dump() # {'status': 'active'} - string, not enum- python-ioc-guide.md - Dependency injection patterns
- python-cli-arguments-guide.md - Configuration management with
BaseSettings - python-logging-guide.md - Logging validation errors