DAG-Based Patch Format Specification

Overview

A patch format designed for machine consumption and LLM processing that represents code changes as a directed acyclic graph of transform operations rather than line-based diffs.

Core Architecture

Base Representation

Delta Encoding

Transform Operations

Primitive Operations

Operation Properties

Each operation node contains:

DAG Structure

Dependency Model

Benefits

Auto-Regression Algorithm

Factorization Process

Given a large commit (before/after state):

  1. Whitespace Isolation - Extract all formatting changes first
  2. Pattern Detection - Identify repeated transformations (renames, refactors)
  3. Operation Discovery - Find minimal set of transforms
  4. Dependency Analysis - Build DAG from causal relationships
  5. Optimization - Minimize total description length

Optimization Metrics

Use Cases

LLM Integration

Tooling Applications

Version Control

Implementation Considerations

Format Properties

Tool Responsibilities

Format is substrate only. Intelligence lives in tools:

No formal grammar required. Operations are data, not a language to parse.

Example Structure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
 Patch {
  base_tar: <content-hash>
  operations: [
    {
      id: <hash-1>
      type: "whitespace"
      intent: "normalize-indentation"
      semantic_tags: ["formatting", "non-functional"]
      description: "Normalize Python indentation to 4 spaces"
      author: "system"
      timestamp: "2024-01-01T00:00:00Z"
      confidence: 1.0
      reversible: true
      breaking_change: false
      affected_domains: ["style"]
      scope: ["**/*.py"]
      transform: <normalize-indentation>
      depends_on: []
    },
    {
      id: <hash-2>
      type: "move"
      source: "utils.py:Foo"
      intent: "refactor-module-organization"
      semantic_tags: ["refactoring", "structural"]
      description: "Move Foo class to helpers module for better organization"
      author: "developer"
      timestamp: "2024-01-01T00:00:00Z"
      confidence: 0.95
      reversible: true
      breaking_change: false
      affected_domains: ["architecture", "imports"]
      target: "helpers.py:Foo"
      depends_on: []
    },
    {
      id: <hash-3>
      type: "string_replace"
      pattern: "import.*Foo.*from utils"
      intent: "update-imports-after-move"
      semantic_tags: ["refactoring", "import-update"]
      description: "Update import statements to reflect Foo class relocation"
      author: "system"
      timestamp: "2024-01-01T00:00:00Z"
      confidence: 0.98
      reversible: true
      breaking_change: false
      affected_domains: ["imports"]
      replacement: "import Foo from helpers"
      scope: ["src/**/*.py"]
      depends_on: [<hash-2>]
    },
    {
      id: <hash-4>
      type: "binary_delta"
      target: "main.py"
      intent: "logic-update"
      semantic_tags: ["feature", "logic-change"]
      description: "Update main.py logic to use refactored Foo class"
      author: "developer"
      timestamp: "2024-01-01T00:00:00Z"
      confidence: 0.85
      reversible: true
      breaking_change: false
      affected_domains: ["logic", "behavior"]
      delta: <compressed-diff>
      depends_on: [<hash-3>]
    }
  ]
}

Future Extensions