Database Utilities

The winidjango.src.db package provides high-performance database utilities for Django applications, including bulk operations, model utilities, field introspection, and raw SQL execution.

Bulk Operations

Module: winidjango.src.db.bulk

High-performance bulk operations with automatic chunking, multithreading, and transaction management.

Core Functions

`bulk_create_in_steps()`

Create thousands of model instances efficiently in configurable batches.

Signature:

def bulk_create_in_steps[TModel: Model](
    model: type[TModel],
    bulk: Iterable[TModel],
    step: int = 1000,
) -> list[TModel]:
    ...

Parameters:

model: Django model class to create instances of
bulk: Iterable of model instances to create
step: Batch size (default: 1000)

Returns:

List of created model instances with populated primary keys

Features:

Multithreaded processing of chunks
Atomic transaction wrapping
Memory-efficient chunking
Type-safe with generics

Example:

from winidjango.src.db.bulk import bulk_create_in_steps

# Create 10,000 authors in batches of 1,000
authors = [Author(name=f"Author {i}") for i in range(10000)]
created = bulk_create_in_steps(Author, authors, step=1000)

print(f"Created {len(created)} authors")
print(f"First author PK: {created[0].pk}")

Performance:

~5-10x faster than individual save() calls
Processes batches in parallel threads
Minimal memory footprint

`bulk_update_in_steps()`

Update large datasets efficiently in batches.

Signature:

def bulk_update_in_steps[TModel: Model](
    model: type[TModel],
    bulk: Iterable[TModel],
    update_fields: list[str],
    step: int = 1000,
) -> int:
    ...

Parameters:

model: Django model class
bulk: Iterable of model instances to update
update_fields: List of field names to update (required for safety)
step: Batch size (default: 1000)

Returns:

Total count of updated objects

Example:

from winidjango.src.db.bulk import bulk_update_in_steps

# Update 10,000 authors
authors = Author.objects.all()
for author in authors:
    author.name = author.name.upper()

updated_count = bulk_update_in_steps(
    Author, 
    authors, 
    update_fields=['name'],
    step=1000
)

print(f"Updated {updated_count} authors")

Important:

Must specify update_fields explicitly for safety
Only specified fields are updated in the database
Objects must have primary keys (must be saved first)

`bulk_delete_in_steps()`

Delete model instances in batches with cascade tracking.

Signature:

def bulk_delete_in_steps[TModel: Model](
    model: type[TModel],
    bulk: Iterable[TModel],
    step: int = 1000,
) -> tuple[int, dict[str, int]]:
    ...

Parameters:

model: Django model class
bulk: Iterable of model instances to delete
step: Batch size (default: 1000)

Returns:

Tuple of (total_count, count_by_model_dict)
total_count: Total number of objects deleted
count_by_model_dict: Dictionary mapping model names to deletion counts

Example:

from winidjango.src.db.bulk import bulk_delete_in_steps

authors = Author.objects.filter(name__startswith="Test")
total, by_model = bulk_delete_in_steps(Author, list(authors), step=500)

print(f"Deleted {total} objects total")
for model_name, count in by_model.items():
    print(f"  {model_name}: {count}")

Features:

Tracks cascade deletions across related models
Safe handling of foreign key constraints
Respects Django's on_delete behavior

`bulk_create_bulks_in_steps()`

Create multiple model types in correct dependency order automatically.

Signature:

def bulk_create_bulks_in_steps[TModel: Model](
    bulk_by_class: dict[type[TModel], Iterable[TModel]],
    step: int = 1000,
) -> dict[type[TModel], list[TModel]]:
    ...

Parameters:

bulk_by_class: Dictionary mapping model classes to iterables of instances
step: Batch size (default: 1000)

Returns:

Dictionary mapping model classes to lists of created instances

Features:

Automatic topological sorting based on foreign key dependencies
Creates models in correct order to avoid integrity errors
Handles complex dependency graphs
Returns all created instances with populated PKs

Example:

from winidjango.src.db.bulk import bulk_create_bulks_in_steps

# Create related models - order doesn't matter!
authors = [Author(name=f"Author {i}") for i in range(100)]
books = [Book(title=f"Book {i}", author=authors[
    i % len(authors)]) for i in range(500)
]
reviews = [Review(book=books[i % len(books)], rating=5) for i in range(1000)]

# Provide in any order - library sorts by dependencies
results = bulk_create_bulks_in_steps({
    Review: reviews,      # Depends on Book
    Book: books,          # Depends on Author
    Author: authors,      # No dependencies
})

# Created in order: Author → Book → Review
print(f"Created {len(results[Author])} authors")
print(f"Created {len(results[Book])} books")
print(f"Created {len(results[Review])} reviews")

How it works:

Analyzes foreign key relationships between provided models
Uses graphlib.TopologicalSorter to determine creation order
Creates each model type in dependency order
Ignores self-referential relationships
Raises CycleError if circular dependencies detected

Advanced Functions

`get_differences_between_bulks()`

Compare two bulks of model instances and return their differences.

Signature:

def get_differences_between_bulks(
    bulk1: list[Model],
    bulk2: list[Model],
    fields: list[Field | ForeignObjectRel | GenericForeignKey],
) -> tuple[list[Model], list[Model], list[Model], list[Model]]:
    ...

Parameters:

bulk1: First list of model instances
bulk2: Second list of model instances
fields: List of fields to compare (get with get_fields())

Returns:

4-tuple of:
Objects in bulk1 but not in bulk2
Objects in bulk2 but not in bulk1
Objects in both (from bulk1)
Objects in both (from bulk2)

Example:

from winidjango.src.db.bulk import get_differences_between_bulks
from winidjango.src.db.fields import get_fields

# Compare current database state with new data
current_users = list(User.objects.all())
new_users = [
    User(username=f"user_{i}", email=f"user_{i}@example.com") 
    for i in range(100)
    ]

fields = get_fields(User) to_delete, to_create, unchanged_old, unchanged_new =
get_differences_between_bulks(
    current_users, new_users, fields
)

print(f"To delete: {len(to_delete)}")
print(f"To create: {len(to_create)}")
print(f"Unchanged: {len(unchanged_old)}")

# Perform sync
if to_delete:
    bulk_delete_in_steps(User, to_delete)
if to_create:
    bulk_create_in_steps(User, to_create)

Use Cases:

Data synchronization
Change detection
Incremental updates
Audit trails

How it works:

Hashes instances based on field values
For saved instances: uses primary key
For unsaved instances: uses field values
Preserves original object references

`simulate_bulk_deletion()`

Preview what objects would be deleted without actually deleting them.

Signature:

def simulate_bulk_deletion(
    model_class: type[Model],
    entries: list[Model],
) -> dict[type[Model], set[Model]]:
    ...

Parameters:

model_class: Django model class of entries to delete
entries: List of model instances to simulate deletion for

Returns:

Dictionary mapping model classes to sets of objects that would be deleted

Example:

from winidjango.src.db.bulk import simulate_bulk_deletion, bulk_delete_in_steps

# Get authors to potentially delete
authors = Author.objects.filter(name__startswith="Test")

# Preview deletion impact
deletion_preview = simulate_bulk_deletion(Author, list(authors))

# Show what would be deleted
print("Deletion preview:")
for model, objects in deletion_preview.items():
    print(f"  {model.__name__}: {len(objects)} objects")

# Example output:
#   Author: 10 objects
#   Book: 45 objects (cascade)
#   Review: 230 objects (cascade)

# Confirm with user
if input("Proceed? (y/n): ").lower() == 'y':
    total, by_model = bulk_delete_in_steps(Author, list(authors))
    print(f"Deleted {total} objects")

Features:

Uses Django's Collector for accurate cascade simulation
No database modifications
Shows all cascade effects
Respects on_delete settings

Best Practice: Always simulate deletions before executing them, especially for models with many relationships.

`multi_simulate_bulk_deletion()`

Simulate deletion for multiple model types and aggregate results.

Signature:

def multi_simulate_bulk_deletion(
    entries: dict[type[Model], list[Model]],
) -> dict[type[Model], set[Model]]:
    ...

Parameters:

entries: Dictionary mapping model classes to lists of instances

Returns:

Dictionary mapping model classes to sets of all objects that would be deleted

Example:

from winidjango.src.db.bulk import multi_simulate_bulk_deletion

# Simulate deletion of multiple model types
deletion_preview = multi_simulate_bulk_deletion({
    Author: list(Author.objects.filter(name__startswith="Test")),
    Publisher: list(Publisher.objects.filter(active=False)),
})

# Aggregated results across all deletions
for model, objects in deletion_preview.items():
    print(f"{model.__name__}: {len(objects)} objects")

Configuration

Batch Size

The default batch size is 1000, defined in STANDARD_BULK_SIZE. Adjust based on:

Small batch sizes (100-500):

Complex models with many fields
Models with heavy validation logic
Limited memory environments
High database load

Large batch sizes (2000-5000):

Simple models with few fields
No validation overhead
Ample memory available
Low database load

Example:

# Small objects, large batches
bulk_create_in_steps(SimpleModel, objects, step=5000)

# Complex objects, small batches
bulk_create_in_steps(ComplexModel, objects, step=100)

Performance Characteristics

Multithreading

All bulk operations use multithreading by default:

# 10,000 objects, batch size 1,000 = 10 chunks
# Each chunk processed in parallel thread
bulk_create_in_steps(Model, objects, step=1000)

Performance gain: ~5-10x faster than sequential processing

Memory Usage

Memory usage is proportional to batch size, not total dataset size:

# Only 1,000 objects in memory at a time
# Even when processing millions
bulk_create_in_steps(Model, huge_iterator, step=1000)

Transaction Overhead

Each bulk operation is wrapped in a transaction:

@transaction.atomic  # DON'T add this
def my_function():
    bulk_create_in_steps(...)  # Already atomic

Warning: Nested transactions can cause issues. Remove outer @transaction.atomic decorators.

Model Utilities

Module: winidjango.src.db.models

Utilities for working with Django models, including dependency sorting and hashing.

Functions

`topological_sort_models()`

Sort Django models by foreign key dependencies.

Signature:

def topological_sort_models[TModel: Model](
    models: list[type[TModel]],
) -> list[type[TModel]]:
    ...

Parameters:

models: List of Django model classes to sort

Returns:

List of models sorted in dependency order (dependencies first)

Example:

from winidjango.src.db.models import topological_sort_models

# Unsorted models
models = [Review, Book, Author, Publisher]

# Sort by dependencies
sorted_models = topological_sort_models(models)
# Result: [Author, Publisher, Book, Review]

# Use for creation order
for model in sorted_models:
    # Create instances of model
    pass

Features:

Uses Python's graphlib.TopologicalSorter
Handles complex dependency graphs
Ignores self-referential relationships
Raises CycleError for circular dependencies

Use Cases:

Determining creation order for related models
Planning deletion order (reverse the list)
Database migration ordering
Fixture loading

`hash_model_instance()`

Generate a hash for a model instance based on its field values.

Signature:

def hash_model_instance(
    instance: Model,
    fields: list[Field | ForeignObjectRel | GenericForeignKey],
) -> int:
    ...

Parameters:

instance: Django model instance to hash
fields: List of fields to include in hash

Returns:

Integer hash value

Example:

from winidjango.src.db.models import hash_model_instance
from winidjango.src.db.fields import get_fields

user = User(username="john", email="john@example.com")
fields = get_fields(User)
hash_value = hash_model_instance(user, fields)

# Same field values = same hash
user2 = User(username="john", email="john@example.com")
hash_value2 = hash_model_instance(user2, fields)
assert hash_value == hash_value2

Behavior:

Saved instances (with PK): Hash based on primary key only
Unsaved instances (no PK): Hash based on field values

Use Cases:

Comparing model instances
Detecting duplicates
Change detection
Used internally by get_differences_between_bulks()

Warning: Not cryptographically secure. Use only for comparison purposes.

Classes

`BaseModel`

Abstract base model with common fields and utilities.

Fields:

created_at: DateTimeField, auto-populated on creation
updated_at: DateTimeField, auto-updated on save

Properties:

meta: Type-safe access to model's _meta attribute

Methods:

__str__(): Returns "ModelName(pk)"
__repr__(): Same as __str__()

Example:

from winidjango.src.db.models import BaseModel
from django.db import models

class Article(BaseModel):
    title = models.CharField(max_length=200)
    content = models.TextField()

    class Meta(BaseModel.Meta):
        db_table = "articles"

# Automatically includes created_at and updated_at
article = Article.objects.create(title="Hello", content="World")
print(article.created_at)  # 2024-01-01 12:00:00
print(article.updated_at)  # 2024-01-01 12:00:00
print(article)  # "Article(1)"

# Update automatically updates updated_at
article.title = "Updated"
article.save()
print(article.updated_at)  # 2024-01-01 12:05:00

Benefits:

Consistent timestamp tracking across all models
Type-safe meta access
Standardized string representation
Reduces boilerplate code

Note: Must inherit BaseModel.Meta in your model's Meta class to maintain abstract status.

Field Utilities

Module: winidjango.src.db.fields

Utilities for introspecting and working with Django model fields.

Functions

`get_fields()`

Get all fields from a Django model including relationships.

Signature:

def get_fields[TModel: Model](
    model: type[TModel],
) -> list[Field | ForeignObjectRel | GenericForeignKey]:
    ...

Parameters:

model: Django model class

Returns:

List of all field objects including:
Regular fields (CharField, IntegerField, etc.)
Foreign keys (ForeignKey, OneToOneField)
Reverse relationships (ForeignObjectRel)
Generic foreign keys (GenericForeignKey)

Example:

from winidjango.src.db.fields import get_fields

fields = get_fields(User)
for field in fields:
    if hasattr(field, 'name'):
        print(f"{field.name}: {type(field).__name__}")

# Output:
# id: AutoField
# username: CharField
# email: EmailField
# groups: ManyToManyField
# ...

`get_field_names()`

Extract field names from a list of field objects.

Signature:

def get_field_names(
    fields: list[Field | ForeignObjectRel | GenericForeignKey],
) -> list[str]:
    ...

Parameters:

fields: List of field objects (from get_fields())

Returns:

List of field names as strings

Example:

from winidjango.src.db.fields import get_fields, get_field_names

fields = get_fields(User)
field_names = get_field_names(fields)
print(field_names)
# ['id', 'username', 'email', 'first_name', 'last_name', ...]

`get_model_meta()`

Get the Django model metadata options object.

Signature:

def get_model_meta(model: type[Model]) -> Options[Model]:
    ...

Parameters:

model: Django model class

Returns:

Model's _meta attribute (Options object)

Example:

from winidjango.src.db.fields import get_model_meta

meta = get_model_meta(User)
print(meta.db_table)  # 'auth_user'
print(meta.app_label)  # 'auth'
print(meta.verbose_name)  # 'user'

Use Cases:

Accessing model metadata
Getting table name
Inspecting model configuration
Building dynamic queries

SQL Utilities

Module: winidjango.src.db.sql

Utilities for executing raw SQL queries safely.

Functions

`execute_sql()`

Execute raw SQL query with safe parameter binding.

Signature:

def execute_sql(
    sql: str,
    params: dict[str, Any] | None = None,
) -> tuple[list[str], list[Any]]:
    ...

Parameters:

sql: SQL query string (can contain parameter placeholders)
params: Dictionary of parameters for safe binding (optional)

Returns:

Tuple of (column_names, rows)
column_names: List of column names
rows: List of result rows (each row is a tuple)

Example:

from winidjango.src.db.sql import execute_sql

# Simple query
columns, rows = execute_sql("SELECT id, username FROM auth_user LIMIT 10")
print(columns)  # ['id', 'username']
for row in rows:
    print(f"ID: {row[0]}, Username: {row[1]}")

# Query with parameters (safe from SQL injection)
columns, rows = execute_sql(
    "SELECT id, username FROM auth_user WHERE is_active = %(active)s",
    params={"active": True}
)

# Complex query
sql = """
    SELECT
        u.id,
        u.username,
        COUNT(a.id) as article_count
    FROM auth_user u
    LEFT JOIN articles a ON a.author_id = u.id
    WHERE u.date_joined >= %(since)s
    GROUP BY u.id, u.username
    ORDER BY article_count DESC
"""
columns, rows = execute_sql(sql, params={"since": "2024-01-01"})

Features:

Automatic cursor management
Safe parameter binding (prevents SQL injection)
Returns column names for easy result processing
Works with Django's default database connection

Best Practices:

Always use parameter binding for user input
Use Django ORM when possible
Reserve for complex queries not expressible in ORM
Consider using raw() for model-based queries

Security:

# DON'T do this (SQL injection risk)
username = request.GET.get('username')
execute_sql(f"SELECT * FROM users WHERE username = '{username}'")

# DO this (safe parameter binding)
username = request.GET.get('username')
execute_sql(
    "SELECT * FROM users WHERE username = %(username)s",
    params={"username": username}
)

Complete Example

Here's a complete example combining multiple utilities:

from winidjango.src.db.bulk import (
    bulk_create_bulks_in_steps,
    simulate_bulk_deletion,
    bulk_delete_in_steps,
    get_differences_between_bulks,
)
from winidjango.src.db.fields import get_fields
from winidjango.src.db.models import BaseModel, topological_sort_models
from django.db import models

# Define models using BaseModel
class Author(BaseModel):
    name = models.CharField(max_length=100)

    class Meta(BaseModel.Meta):
        db_table = "authors"

class Book(BaseModel):
    title = models.CharField(max_length=200)
    author = models.ForeignKey(Author, on_delete=models.CASCADE)

    class Meta(BaseModel.Meta):
        db_table = "books"

# Create data with automatic dependency resolution
authors = [Author(name=f"Author {i}") for i in range(100)]
books = [
    Book(title=f"Book {i}", author=authors[i % len(authors)]) 
    for i in range(500)
]

results = bulk_create_bulks_in_steps({
    Book: books,      # Provided in any order
    Author: authors,  # Library sorts by dependencies
})

print(f"Created {len(results[Author])} authors")
print(f"Created {len(results[Book])} books")

# Compare with new data
new_authors = [Author(name=f"Updated Author {i}") for i in range(50)]
fields = get_fields(Author)

to_delete, to_create, unchanged_old, unchanged_new =
get_differences_between_bulks(
    list(Author.objects.all()), new_authors, fields
)

# Preview deletion impact
if to_delete:
    deletion_preview = simulate_bulk_deletion(Author, to_delete)
    print("\nDeletion preview:")
    for model, objects in deletion_preview.items():
        print(f"  {model.__name__}: {len(objects)} objects")

    # Confirm and delete
    if input("Proceed? (y/n): ").lower() == 'y':
        total, by_model = bulk_delete_in_steps(Author, to_delete)
        print(f"\nDeleted {total} objects")

# Create new authors
if to_create:
    created = bulk_create_in_steps(Author, to_create)
    print(f"Created {len(created)} new authors")

Database Utilities

Table of Contents

Bulk Operations

Core Functions

`bulk_create_in_steps()`

`bulk_update_in_steps()`

`bulk_delete_in_steps()`

`bulk_create_bulks_in_steps()`

Advanced Functions

`get_differences_between_bulks()`

`simulate_bulk_deletion()`

`multi_simulate_bulk_deletion()`

Configuration

Batch Size

Performance Characteristics

Multithreading

Memory Usage

Transaction Overhead

Model Utilities

Functions

`topological_sort_models()`

`hash_model_instance()`

Classes

`BaseModel`

Field Utilities

Functions

`get_fields()`

`get_field_names()`

`get_model_meta()`

SQL Utilities

Functions

`execute_sql()`

Complete Example

See Also