Hashing & Merkle Trees

This page specifies, byte for byte, what CORE-M commits to the blockchain. Three independent implementations — the telemetry service, the verification service, and any third-party verifier — must compute the identical hash from the same inputs, so the algorithm is fully deterministic and described here exactly. If your implementation diverges by even one byte, verification will fail by design.

The canonical data hash

Every telemetry point is reduced to a single 32-byte hash using double SHA-256 — SHA-256 applied twice. This is the same construction used at every internal node of the Merkle tree (see below), so the whole tree is uniform:

leaf_hash = SHA256( SHA256( preimage ) )

The preimage is the byte concatenation, in this exact order:

#	Component	Encoding	Details
1	`version`	1 byte	Constant `0x01` — the hash-scheme version.
2	`tenant_id_len`	2 bytes, uint16 big-endian	Byte-length of the UTF-8 `tenant_id`.
3	`tenant_id`	UTF-8 bytes	The `tenant_id` string.
4	`device_id_len`	2 bytes, uint16 big-endian	Byte-length of the UTF-8 `device_id`.
5	`device_id`	UTF-8 bytes	The `device_id` string, no null terminator.
6	`timestamp`	8 bytes, uint64 big-endian	Unix epoch in seconds. Nanoseconds are truncated.
7	`payload`	UTF-8 bytes	The values object serialized with RFC 8785 JCS (below). No length prefix.

tenant_id is part of the hash: a verifier that omits it computes a different hash, which is why the verify/raw API requires tenant_id. The two uint16 length prefixes make the boundary between tenant_id and device_id unambiguous, so two different tenant/device pairs can never produce the same preimage.

RFC 8785 JCS rules

The payload is canonicalized with the RFC 8785 JSON Canonicalization Scheme before it is hashed, so that logically identical payloads always serialize to the same bytes:

Object keys are sorted lexicographically by Unicode code point.
No whitespace between any tokens.
Numbers are serialized per JCS rules — no trailing zeros, no superfluous decimal point, shortest round-tripping form.
The resulting JSON string is encoded as UTF-8.

Because keys are sorted deterministically, the order in which a device emits its fields does not matter: {"temperature":22.5,"humidity":65} and {"humidity":65,"temperature":22.5} both canonicalize to exactly the same bytes and therefore hash to the same value.

Worked hash example

Take a concrete point:

Field	Value
`tenant_id`	`T1`
`device_id`	`D1`
`timestamp`	`1711000000` (Unix seconds)
values	`{"temperature":22.5,"humidity":65}`
→ JCS payload	`{"humidity":65,"temperature":22.5}`
→ preimage (hex)	`0100025431000244310000000065fbc9c07b2268756d6964697479223a36352c2274656d7065726174757265223a32322e357d`
→ leaf_hash (hex)	`9774d8af1f5e5a0457275ec82b482e0e1b46f6fc421720a279d0e661f99755d2`

Reading the preimage left to right: 01 (version) · 0002 (tenant_id length) · 5431 ("T1") · 0002 (device_id length) · 4431 ("D1") · 0000000065fbc9c0 (timestamp) · then the UTF-8 bytes of the JCS payload. The 32-byte leaf_hash is SHA256(SHA256(preimage)) — the value that feeds the Merkle tree and, ultimately, the on-chain commitment.

Merkle tree construction

Anchoring one transaction per point would be slow and costly, so a batch of point hashes is committed together through a binary double-SHA-256 Merkle tree. Only the 32-byte root goes on-chain; each point keeps a short Merkle path that proves its membership.

Construction rules:

The batch’s leaf_hash values are the leaves.
Adjacent nodes are paired and double-hashed: parent = SHA256(SHA256(left || right)), where left and right are the 32-byte child hashes concatenated in tree order (internal byte order — no byte-reversal).
If a level has an odd number of nodes, the last node is duplicated (paired with itself) to make the count even, then pairing continues.
A batch may not contain two identical leaf hashes — duplicate leaves are rejected (a CVE-2012-2459 guard).
This repeats level by level until a single root remains.

The whole tree, leaves included, is uniform double SHA-256 — the same construction as the leaf hash above.

flowchart TB
  R["Root = SHA256(SHA256(H01 || H23))"]
  H01["H01 = SHA256(SHA256(H0 || H1))"]
  H23["H23 = SHA256(SHA256(H2 || H3))"]
  H0["H0 (leaf)"]
  H1["H1 (leaf)"]
  H2["H2 (leaf)"]
  H3["H3 (leaf)"]
  R --> H01
  R --> H23
  H01 --> H0
  H01 --> H1
  H23 --> H2
  H23 --> H3

Per-point Merkle path

Each point’s Merkle path is the ordered list of sibling hashes needed to climb from that leaf to the root, each tagged with a direction (whether the sibling sits to the left or right). The leaf is the point’s leaf_hash — the same 32-byte value the verification API and proof store call data_hash. A verifier replays the path like this:

current = leaf_hash
for each step in merkle_path:
    if step.is_right:        # sibling is on the right
        current = SHA256(SHA256(current || step.hash))
    else:                    # sibling is on the left
        current = SHA256(SHA256(step.hash || current))
# current must now equal the Merkle root

For leaf H0 in the tree above, the path is [ {hash: H1, right}, {hash: H23, right} ]: combine with H1 on the right to get H01, then with H23 on the right to get the root. Recomputing the root from a single point — without any of the other points in the batch — is exactly what makes the proof portable.

Paths are computed at batch time and stored alongside each hash (and later in the PostgreSQL anchor_proofs table as JSONB). The full walk is shown end-to-end in Verification.

OP_RETURN payload byte layout

The anchoring transaction has one output: an OP_FALSE OP_RETURN carrying a fixed-layout payload. After the OP_FALSE OP_RETURN opcodes, the data fields appear in this exact order:

Offset	Field	Size	Type / encoding
0	Protocol prefix	6 bytes	ASCII `"CORE-M"`
6	`merkle_root`	32 bytes	SHA-256 Merkle root of the batch
38	`batch_id`	16 bytes	UUID, raw binary
54	`timestamp`	8 bytes	uint64 big-endian, Unix seconds
62	`data_point_count`	4 bytes	uint32 big-endian

Total payload after the opcodes is 66 bytes.

OP_FALSE OP_RETURN
  "CORE-M"            6 bytes,  ASCII protocol prefix
  <merkle_root>       32 bytes, SHA-256
  <batch_id>          16 bytes, UUID binary
  <timestamp>         8 bytes,  uint64 big-endian, Unix seconds
  <data_point_count>  4 bytes,  uint32 big-endian

Where this is used

The identical hashing algorithm is shared across three boundaries — telemetry (computing hashes for anchoring), verification (recomputing from raw data), and any external verifier. That shared determinism is the whole basis of the guarantee.

Continue to Verification to walk a complete proof from raw data to on-chain commitment, or to Modes, SLA & Finality for how batches are scheduled and made final.