JSON++: The Grammar-Extensible Data Substrate

Introduction: The Case for a Living Grammar

JSON has conquered the world by being simple, readable, and ubiquitous. Yet, as a “frozen format,” it struggles with the complexities of modern systems. We force binary data into base64 strings, duplicate massive amounts of structural redundancy in logs, and rely on external logic to interpret opaque fields. The industry’s response has been a bifurcation: humans use YAML or TOML for config, while machines use Protobuf or Parquet for efficiency. We have lost the middle ground—a format that is both human-navigable and machine-intelligent.

This essay proposes JSON++, not as a new language, but as a grammar-extensible substrate. Rather than a static container, JSON++ acts as a programmable foundation. By treating JSON as a strict subgrammar of JavaScript, we unlock symbolic power, structural compression, and declarative processing without inventing new syntax. We move from passive data storage to an active, programmable substrate—a medium that can adapt to the complexity of the data it carries while maintaining the safety of a side-effect-free environment.

The core constraint is strategic: The JS-Eval Constraint. By ensuring the format remains a valid subset of JavaScript, we leverage the massive global infrastructure of optimized parsers and runtimes. This isn’t just about compatibility; it’s about using the world’s most ubiquitous execution environment as the host for our data. It ensures that both humans and LLMs, already fluent in JavaScript, can interact with the data natively, turning the format into a living bridge between code and configuration.

I. The Symbolic Layer: Structural Compression

Standard JSON is verbose. In telemetry, logging, and complex configuration, keys like "user.settings.theme" are repeated thousands of times. Binary formats solve this with field tables, but they sacrifice readability.

JSON++ introduces Symbolic Bindings by leveraging JavaScript’s native assignment and Computed Keys syntax. This creates a duality between a machine-executable format and a human-optimized DSL. Instead of inventing a new @prefix operator, we simply use variables.

The Mechanism

In the Verbose Mode (strictly valid, JS-eval-able code), we use Symbolic Bindings (via let) and explicit * *Computed Keys** (bracket notation). This ensures the data can be parsed by any standard JavaScript engine without modification:

1
2
3
4
5
6
7
8
{
  let u = "user.";
  let s = "settings.";

  [u + "id"]: 42,
  [u + "name"]: "Andrew",
  [u + s + "theme"]: "dark"
}

In the Pretty Mode (a human-centric DSL), we elide the let keywords and the brackets for Computed Keys. The parser infers the intent, maintaining the symbolic logic while removing syntactic noise:

1
2
3
4
5
{
  u = "user.",
  u + "id": 42,
  u + "name": "Andrew"
}

This approach provides structural compression—deduplicating prefixes and schema patterns—while remaining semantically explicit. It transforms the document from a flat list of values into a graph of relationships, where symbols represent shared context.

II. The Active Layer: Declarative Processing

Data is rarely static; it requires reconstruction. An image is a compressed blob that needs decoding; a secure field is a ciphertext that needs decryption. Standard JSON treats these as dumb strings, pushing the logic into the application code. This creates a “semantic gap” where the data on disk is disconnected from the data in memory.

JSON++ embeds this logic directly into the substrate using Pure Functions. By allowing function calls within the object literal, the data describes its own reconstruction pipeline. This transforms the document from a static tree into an Active Processing Graph. The document doesn’t just store values; it stores the intent of how those values should be materialized.

Compositional Pipelines

The power of JSON++ lies in the nesting of these transforms. Because each function is pure, they can be composed into sophisticated pipelines that handle everything from security to signal processing.

Consider a telemetry packet containing frequency-domain data and a domain-specific log format:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
  // Signal processing: From raw time-series to frequency spectrum
  spectrum: fft(
    normalize(
      raw_samples("base64_blob...")
    )
  ),

  // Domain-specific decoding: Protobuf-to-JSON within the substrate
  event_log: decode_proto(
    "schema_v1",
    decompress("zstd", "compressed_proto_data...")
  ),

  // Security: Selective decryption of sensitive fields
  pii: decrypt("kms-key-id", "encrypted_blob...")
}

This approach allows the data to carry its own “hydration” logic. A consumer doesn’t need to know how to process the spectrum field; they simply evaluate the graph, and the substrate provides the final, usable representation.

Safety and Determinism

Crucially, this environment is side-effect free. The functions (decrypt, decompress, fft) are pure: they depend only on their inputs and produce deterministic outputs. This safety is fundamental to Declarative Processing; there is no network access, no disk I/O, and no arbitrary code execution.

This turns the file into a Declarative Processing Graph. Because the graph is acyclic and the functions are pure, the evaluation order is deterministic. The substrate can optimize this execution—parallelizing independent branches of the graph or caching the results of expensive transforms like fft. This bridges the gap between storage and memory representation, ensuring that the data is “ready to use” the moment it is parsed.

III. Specialized Extensions via Composition

Rather than polluting the grammar with new syntax for every use case (e.g., @array, @encrypted), JSON++ handles these requirements through the unified mechanism of function calls and standard JS features.

1. Dense Numerical Arrays

High-performance computing (HPC) and machine learning (ML) workloads often involve massive vectors and tensors. Representing these as standard JSON arrays (e.g., [0.12, 0.45, ...]) is catastrophically inefficient, incurring massive overhead in both string parsing and memory allocation.

JSON++ addresses this through Zero-Copy Decoding. By using a tensor or buffer function, the substrate can map a binary payload (often base64-encoded or referenced via a URI) directly into a typed memory buffer (like Float32Array in JS). This bypasses the intermediate step of creating thousands of individual JavaScript number objects, allowing the data to be passed directly to GPU kernels or SIMD-optimized routines.

1
2
3
4
{
  // Zero-copy decoding into a Float32Array for ML inference
  embedding: tensor("float32", [768], base64("..."))
}

This approach ensures that JSON++ remains a viable format for data-intensive applications where performance is non-negotiable, providing a bridge between high-level metadata and low-level binary efficiency.

2. Encrypted Fields and Selective Confidentiality

In traditional systems, encryption is often an “all-or-nothing” proposition—either the entire file is encrypted (making it opaque to infrastructure) or it is stored in the clear. JSON++ enables Selective Confidentiality, where sensitive fields are encrypted while the surrounding metadata remains visible for routing, indexing, or auditing.

Within the substrate, encryption is treated as just another transform. There is no need for complex wrapper formats like JWE (JSON Web Encryption). Instead, the decrypt function acts as a declarative instruction. This function accepts metadata—such as key identifiers, algorithm specifications, or nonces—as standard arguments, ensuring that the decryption context is bundled with the ciphertext.

1
2
3
4
5
6
7
8
9
10
11
12
{
  // Public metadata for routing
  region: "us-east-1",
  
  // Selective confidentiality via functional transform
  // The substrate handles key resolution via the provided ID
  pii: decrypt({
    alg: "aes-256-gcm",
    kid: "arn:aws:kms:...",
    iv: "base64_nonce..."
  }, "ciphertext_blob...")
}

This approach turns security into a first-class citizen of the data format. Because the transform is explicit, the substrate can verify that the necessary keys are available before evaluation, or even delegate decryption to a secure enclave (TEE) without the application ever seeing the raw keys.

3. Literate Data: Comments as Structured Metadata

Standard JSON’s lack of comments is one of its most cited frustrations, forcing developers to use “dummy keys” like "_comment": "..." which bloat the data and confuse parsers. Because JSON++ is a JavaScript subgrammar, it inherits C-style comments (// and /* */) for free. However, JSON++ treats these not merely as ignored text, but as Literate Data.

This approach allows for self-documenting configurations where the “why” is bundled with the “what.” Comments can serve as a side-channel for metadata—such as schema versions, deprecation warnings, or unit definitions—that provides context to humans and LLMs without altering the machine-executable semantics of the object.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
  /* 
   * @schema: telemetry-v2
   * @description: High-frequency sensor data for the propulsion module.
   */
  
  // Sampling rate in Hz
  rate: 1000,

  // Symbolic binding for structural compression
  let p = "propulsion.sensor.";

  [p + "temp"]: 45.2, // Celsius
  [p + "pressure"]: 101.3 /* kPa */
}

By formalizing comments as a first-class citizen of the substrate, we enable a “literate” approach to data. Tools can extract these comments to generate documentation or provide IDE tooltips, while the core runtime remains focused on the pure evaluation of the data graph. This ensures that the data remains human-navigable and machine-intelligent, fulfilling the promise of a format that serves both creators and consumers.

IV. The Ecosystem: TypeScript as the Spec

The most powerful aspect of this design is that we do not need to write a complex, multi-hundred-page specification. * *TypeScript is the specification.** By defining JSON++ as a subset of JavaScript object literals, we inherit a formal grammar that is already documented, battle-tested, and globally understood.

Formal Grammar and Type Safety

In traditional data formats, the “schema” (like JSON Schema) is a separate, often clunky layer that sits on top of the data. In JSON++, the schema and the specification are unified through TypeScript’s type system. We can define the valid bounds of a JSON++ document using standard type definitions. If a file validates against the ExtendedJSON type, it is a valid document. In traditional data formats, the “schema” (like JSON Schema) is a separate, often clunky layer that sits on top of the data. In JSON++, the schema and the specification are unified through TypeScript’s type system. We can define the valid bounds of a JSON++ document using standard type definitions. If a file validates against the ExtendedJSON type, it is a valid document, formalizing the Declarative Processing model within the type system.

1
2
3
4
5
6
7
type JSONValue = string | number | boolean | null | JSONObject | JSONArray;
type PureFunction = (arg: any) => JSONValue;

// The spec is enforced by the type system
interface ExtendedJSON {
  [key: string]: JSONValue | PureFunction | ExtendedJSON;
}

This provides compile-time safety for data. Errors in structural composition, invalid function calls, or type mismatches are caught before the data is ever evaluated. The specification isn’t a passive document; it’s an active validator.

Zero-Cost Tooling: The LSP Advantage

One of the biggest hurdles for any new data format is tooling: syntax highlighting, auto-completion, and linting. By piggybacking on TypeScript, JSON++ gains world-class editor support for free. Because JSON++ files are valid .ts or .js files, every modern editor (VS Code, IntelliJ, Vim via LSP) already knows how to handle them. Developers get:

The LLM-Native Advantage

Large Language Models are trained on the “logic of the web”—billions of lines of JavaScript. They understand variable scoping, string concatenation, and function composition not as abstract rules, but as fundamental patterns of thought. By aligning JSON++ with JS syntax, we create a format that is LLM-Native.

Unlike static JSON, which forces models into a “data entry” mode prone to repetition errors, JSON++ allows models to use their “reasoning” mode. Symbolic bindings (let variables) act as cognitive anchors, allowing the model to define a concept once and reference it reliably throughout the document. This mirrors the way LLMs perform chain-of-thought reasoning: by breaking down complex structures into named, manageable symbols. The result is a dramatic increase in generation reliability and a significant reduction in token consumption, as the model can “compress” repetitive structures into the same symbolic logic it uses to write code.

Furthermore, because the format is valid JS, LLMs can leverage their existing knowledge of algorithmic patterns to perform complex data transformations within the substrate itself. They don’t just output data; they output the logic of the data, making JSON++ the ideal medium for agentic workflows where models must synthesize, transform, and pass structured information between heterogeneous systems.

Conclusion: From Frozen Formats to Extensible Substrates

The era of the “frozen format” is reaching its limit. As our systems grow more complex and our data more dense, the rigid boundaries of traditional JSON force us into a false choice between human readability and machine efficiency. JSON++ breaks this dichotomy by reimagining data not as a static container, but as a grammar-extensible substrate.

By leveraging the ubiquitous syntax of JavaScript, JSON++ provides a medium that is both a high-level, symbolic DSL for humans and LLMs, and a high-performance, declarative processing graph for machines. It turns the act of data storage into an act of composition—where symbolic bindings eliminate redundancy, pure functions handle complex reconstructions, and the TypeScript ecosystem provides a ready-made specification.

JSON++ doesn’t just store values; it encodes intent. It bridges the gap between the code that processes data and the data itself, creating a living bridge that is as expressive as a programming language yet as safe as a configuration file. By enforcing a side-effect-free environment, it ensures that Declarative Processing remains secure and predictable. In doing so, it provides the universal substrate necessary for the next generation of distributed, intelligent, and truly interoperable systems.