Skip to content

Hashing & Merkle Trees

This page specifies, byte for byte, what CORE-M commits to the blockchain. Three independent implementations — the telemetry service, the verification service, and any third-party verifier — must compute the identical hash from the same inputs, so the algorithm is fully deterministic and described here exactly. If your implementation diverges by even one byte, verification will fail by design.

Every telemetry point is reduced to a single 32-byte SHA-256 hash:

data_hash = SHA256( device_id_utf8 || timestamp_uint64_be || jcs_payload_utf8 )

The three parts are concatenated in this exact order with no separators and no length prefixes:

#ComponentEncodingDetails
1device_id_utf8UTF-8 bytesThe device_id string, no null terminator.
2timestamp_uint64_be8 bytes, uint64 big-endianUnix epoch in seconds. Nanoseconds are truncated.
3jcs_payload_utf8UTF-8 bytesThe payload serialized with RFC 8785 JCS (below).

The payload is canonicalized with the RFC 8785 JSON Canonicalization Scheme before it is hashed, so that logically identical payloads always serialize to the same bytes:

  • Object keys are sorted lexicographically by Unicode code point.
  • No whitespace between any tokens.
  • Numbers are serialized per JCS rules — no trailing zeros, no superfluous decimal point, shortest round-tripping form.
  • The resulting JSON string is encoded as UTF-8.

Because keys are sorted deterministically, the order in which a device emits its fields does not matter: {"temperature":22.5,"humidity":65} and {"humidity":65,"temperature":22.5} both canonicalize to exactly the same bytes and therefore hash to the same value.

Take a concrete point:

  • device_id = "D1"
  • timestamp = 1711000000 (Unix seconds)
  • payload = {"temperature": 22.5, "humidity": 65}

Step 1 — canonicalize the payload (JCS). Keys are sorted, whitespace removed:

{"humidity":65,"temperature":22.5}

Step 2 — encode each component.

device_id_utf8 "D1" -> 0x4431
timestamp_uint64_be 1711000000 -> 0x0000000065FBC9C0
jcs_payload_utf8 {"humidity":65,"temperature":22.5} -> UTF-8 bytes of that string

Step 3 — concatenate in order and SHA-256.

0x4431
|| 0x0000000065FBC9C0
|| <UTF-8 of {"humidity":65,"temperature":22.5}>

The SHA-256 of that concatenation is the 32-byte data_hash. This is the value that feeds the Merkle tree and, ultimately, the on-chain commitment.

Anchoring one transaction per point would be slow and costly, so a batch of point hashes is committed together through a binary SHA-256 Merkle tree. Only the 32-byte root goes on-chain; each point keeps a short Merkle path that proves its membership.

Construction rules:

  • The batch’s data_hash values are the leaves.
  • Adjacent nodes are paired and hashed: parent = SHA256(left || right).
  • If a level has an odd number of nodes, the last node is duplicated to make the count even, then pairing continues.
  • This repeats level by level until a single root remains.
flowchart TB
  R["Root = SHA256(H01 || H23)"]
  H01["H01 = SHA256(H0 || H1)"]
  H23["H23 = SHA256(H2 || H3)"]
  H0["H0 (leaf)"]
  H1["H1 (leaf)"]
  H2["H2 (leaf)"]
  H3["H3 (leaf)"]
  R --> H01
  R --> H23
  H01 --> H0
  H01 --> H1
  H23 --> H2
  H23 --> H3

Each point’s Merkle path is the ordered list of sibling hashes needed to climb from that leaf to the root, each tagged with a direction (whether the sibling sits to the left or right). A verifier replays it like this:

current = data_hash
for each step in merkle_path:
if step.is_right: # sibling is on the right
current = SHA256(current || step.hash)
else: # sibling is on the left
current = SHA256(step.hash || current)
# current must now equal the Merkle root

For leaf H0 in the tree above, the path is [ {hash: H1, right}, {hash: H23, right} ]: combine with H1 on the right to get H01, then with H23 on the right to get the root. Recomputing the root from a single point — without any of the other points in the batch — is exactly what makes the proof portable.

Paths are computed at batch time and stored alongside each hash (and later in the PostgreSQL anchor_proofs table as JSONB). The full walk is shown end-to-end in Verification.

The anchoring transaction has one output: an OP_FALSE OP_RETURN carrying a fixed-layout payload. After the OP_FALSE OP_RETURN opcodes, the data fields appear in this exact order:

OffsetFieldSizeType / encoding
0Protocol prefix6 bytesASCII "CORE-M"
6merkle_root32 bytesSHA-256 Merkle root of the batch
38batch_id16 bytesUUID, raw binary
54timestamp8 bytesuint64 big-endian, Unix seconds
62data_point_count4 bytesuint32 big-endian

Total payload after the opcodes is 66 bytes.

OP_FALSE OP_RETURN
"CORE-M" 6 bytes, ASCII protocol prefix
<merkle_root> 32 bytes, SHA-256
<batch_id> 16 bytes, UUID binary
<timestamp> 8 bytes, uint64 big-endian, Unix seconds
<data_point_count> 4 bytes, uint32 big-endian

The identical hashing algorithm is shared across three boundaries — telemetry (computing hashes for anchoring), verification (recomputing from raw data), and any external verifier. That shared determinism is the whole basis of the guarantee.

Next

Continue to Verification to walk a complete proof from raw data to on-chain commitment, or to Modes, SLA & Finality for how batches are scheduled and made final.