Microsoft Open-Sources 'pg_durable': Bringing Fault-Tolerant, In-Database Workflows Directly to PostgreSQLMicrosoft has officially introduced pg_durable, a groundbreaking open-source extension for PostgreSQL designed to orchestrate and execute long-running, multi-step SQL workflows entirely inside the database engine. The extension eliminates the historical infrastructure bottleneck where developers had to rely on external codebases, cron jobs, separate worker pools, or message queues to manage complex background data processes.
At the core of pg_durable is its built-in durable checkpointing capability. The extension continuously persists the exact execution state and progress of sequential operations directly to the database disk. If the database crashes, restarts, or encounters a failover midway through a highly complex operation, the runtime automatically recovers and resumes execution precisely from its last recorded checkpoint, rather than forcing the entire pipeline to restart or corrupting active transaction states.
The extension is highly optimized for resource-intensive, asynchronous backend workloads. Common use cases include:
AI & Vector Pipelines: Executing text-chunking and model embeddings sequentially right after new rows are inserted.
Heavy Data Transformation: Managing mass ETL (Extract, Transform, Load) pipelines, automated data deduplication, or batch-purging records.
System Integrations: Interacting with external web services or firing outbound webhooks to signal data modifications without blocking primary transaction threads.
While pg_durable serves as the foundational execution layer powering the automated AI pipelines in Microsoft's newly announced Azure HorizonDB managed service, Microsoft has released the extension under the standard permissive PostgreSQL License. This open-source distribution allows database administrators and backend engineers to deploy the extension natively across any self-hosted or cloud-based vanilla PostgreSQL cluster.
According to Microsoft's source code on GitHub, the pg_durable control unit is written in Rust and runs on Duroxide, Microsoft's own durable task engine (inspired by popular architectures like Temporal and Durable Task Framework). The system uses deterministic replay to rewind database memory, supporting special mathematical operators for structuring queries in a Graph DSL, such as the ~> symbol for queuing tasks and & for parallel execution of large tasks in a single SQL statement.
From a database administrator's perspective, leaving SQL statements or functions open for extended periods (long-running transactions) is a nightmare because it locks tables and prevents the system from clearing memory, leading to database bloat and system slowdowns. pg_durable solves this by decoupling tasks into smaller steps and handing them over to background workers to execute on-demand, with checkpoints recording the state. This prevents system freezes and allows for easy measurement and tracking of progress (observable) by running the status monitoring command: SELECT * FROM df.status().
Microsoft Debuts Autopilots Always-On AI Agents Run in the Background of Microsoft 365.
Source: GitHub
Microsoft Open-Sources 'pg_durable': Bringing Fault-Tolerant, In-Database Workflows Directly to PostgreSQLMicrosoft has officially introduced pg_durable, a groundbreaking open-source extension for PostgreSQL designed to orchestrate and execute long-running, multi-step SQL workflows entirely inside the database engine. The extension eliminates the historical infrastructure bottleneck where developers had to rely on external codebases, cron jobs, separate worker pools, or message queues to manage complex background data processes.
At the core of pg_durable is its built-in durable checkpointing capability. The extension continuously persists the exact execution state and progress of sequential operations directly to the database disk. If the database crashes, restarts, or encounters a failover midway through a highly complex operation, the runtime automatically recovers and resumes execution precisely from its last recorded checkpoint, rather than forcing the entire pipeline to restart or corrupting active transaction states.
The extension is highly optimized for resource-intensive, asynchronous backend workloads. Common use cases include:
AI & Vector Pipelines: Executing text-chunking and model embeddings sequentially right after new rows are inserted.
Heavy Data Transformation: Managing mass ETL (Extract, Transform, Load) pipelines, automated data deduplication, or batch-purging records.
System Integrations: Interacting with external web services or firing outbound webhooks to signal data modifications without blocking primary transaction threads.
While pg_durable serves as the foundational execution layer powering the automated AI pipelines in Microsoft's newly announced Azure HorizonDB managed service, Microsoft has released the extension under the standard permissive PostgreSQL License. This open-source distribution allows database administrators and backend engineers to deploy the extension natively across any self-hosted or cloud-based vanilla PostgreSQL cluster.
According to Microsoft's source code on GitHub, the pg_durable control unit is written in Rust and runs on Duroxide, Microsoft's own durable task engine (inspired by popular architectures like Temporal and Durable Task Framework). The system uses deterministic replay to rewind database memory, supporting special mathematical operators for structuring queries in a Graph DSL, such as the ~> symbol for queuing tasks and & for parallel execution of large tasks in a single SQL statement.
From a database administrator's perspective, leaving SQL statements or functions open for extended periods (long-running transactions) is a nightmare because it locks tables and prevents the system from clearing memory, leading to database bloat and system slowdowns. pg_durable solves this by decoupling tasks into smaller steps and handing them over to background workers to execute on-demand, with checkpoints recording the state. This prevents system freezes and allows for easy measurement and tracking of progress (observable) by running the status monitoring command: SELECT * FROM df.status().
Microsoft Debuts Autopilots Always-On AI Agents Run in the Background of Microsoft 365.
Source: GitHub
Comments
Post a Comment