
Polymorphic Sensor Identity Resolution in Multi-Protocol Environments
A deterministic, versioned, and architecture-ready model for eliminating duplicate sensor records in distributed
data systems (PSIM)
“Cross-protocol, cross-vendor sensor aggregation platforms
where data collection is the core business, not the hardware.”
“How do you create a consistent, deterministic identity system for millions of physical sensors that
come from incompatible ecosystems, are inconsistently labeled, and are reused or reprogrammed
unpredictably, all without breaking billing, auditability, or data integrity?”
In today’s hybrid environments that blend legacy and modern sensor technologies, unique sensor identification
becomes increasingly complex due to protocol variability, component reuse, and inconsistent manufacturer
standards. This paper proposes a robust, protocol-agnostic, and polymorphic sensor identification system that
normalizes multiple identifier sources into a unified structure, preventing database-level duplication and
improving operational traceability.
1. Introduction
The integration of modern sensor networks (e.g., LoRaWAN, NB-IoT, MIOTY) with legacy systems (e.g.,
MBUS, WM-Bus) introduces fundamental challenges in managing unique sensor identities. Sensor components,
radio transceivers and physical sensor bodies, are frequently reused, swapped, or migrated across locations,
causing duplication conflicts in traditional systems as these are not considered digitally friendly operations. This
behavior was previously acceptable with physical visits to the sensors however in this new world of digital
technology, severely complicates proper identification.
This paper addresses:
- The limitations of identifier design in heterogeneous environments.
- The impact of component-level reuse and human error.
- The trade-off between normalization and denormalization in schema design.
- A proposed polymorphic identification model using canonical identifiers.
2. The Problem Space
2.1 Legacy and Hybrid Protocols
Legacy protocols such as MBUS may only transmit a single identifier (e.g., sensor serial), whereas newer
protocols often transmit both radio and body identifiers. This inconsistency leads to unreliable assumptions
during sensor matching.
2.2 Human and Operational Behavior
- Component Swapping: Radio boards are removed and placed onto new physical bodies.
- Location Migration: Sensors are reassigned across locations, changing context for billing or regulation.
- Reusing Hardware: Serial collisions occur when identifiers are not globally unique across manufacturers.
2.3 Duplicates in the Database
Most systems treat identifiers (e.g., radio_serial) as the sensor’s primary key. This fails when:
- The same radio is reused across different sensor bodies.
- The same sensor body is reused with a new radio.
- Multiple sensors share identifiers from different manufacturers.
3. Design Goals
The proposed system must:
- Prevent duplicate sensors, even in edge cases.
- Support multiple communication protocols and component permutations.
- Identify a sensor from either the radio, the body, or both.
- Maintain historical traceability (e.g., versioning, active periods).
- Resolve ambiguity in conflicting identifier situations.
4. The Polymorphic Identifier Strategy
4.1 Components
Each sensor is described by:
- Radio Module (if present)
- Serial, Manufacturer, Version, Type
- Sensor Body
- Serial, Manufacturer, Version, Type
- deviceIdentifier
- Version (to track reassignment)
4.2 Smart deviceIdentifier
A normalized identifier is generated using the following strategy:
radio: {
identifier: '',
manufacturer: '',
version: '',
type: ''
},
sensor_body: {
identifier: '',
manufacturer: '',
version: '',
type: ''
},
<radio_serial>:<radio_manufacturer>:<radio_version>:<radio_type>:
<sensor_serial>:<sensor_manufacturer>:<sensor_version>:<sensor_type>:
<active_version>
XXX:XXX:XXX:XXX:XXX:XXX:XXX:XXX:XXX 12345678:EFE:00:07:98765432:EFE:00:07:0 12345678:EFE:00:07:XXX:XXX:XXX:XXX:0 XXX:XXX:XXX:XXX:98765432:EFE:00:07:0 2345678:XXX:XXX:XXX:XXX:XXX:XXX:XXX:XXX XXX:XXX:XXX:XXX:98765432:XXX:XXX:XXX:XXX
Where missing components are replaced with placeholders.
This format provides:
- Full reproducibility from telegram parsing
- Collision resistance
- Backward compatibility via partial fallbacks
4.3 identity_key as Query Path Resolver
While deviceIdentifier is the canonical identity for sensors, incoming telegrams typically provide only a partial
identifier, often just a serial number, and do not always include information about whether it belongs to the radio
module or the sensor body.
Using a query like:
$or: [
{ "radio.identifier": identifier },
{ "sensor_body.identifier": identifier }
]
…can return ambiguous results or duplicate sensors, especially in cases of reused hardware.
To resolve this, the system introduces an identity_key field, not a unique identifier, but a query path selector that
indicates which internal property (radio.identifier or sensor_body.identifier) should be used to identify the
sensor.
Example:
- WMBUS / LoRa / NB-IOT telegrams → Serial comes from the radio → identity_key = “radio.identifier”
- MBUS telegrams → Serial comes from the sensor body → identity_key = “sensor_body.identifier”
This allows querying:
Sensor.findOne({ [sensor.identity_key]: identifier })
Benefits:
- Protocol-aware and deterministic querying
- Avoids ambiguous or duplicate query results
- Optimized index usage
- Supports protocol inference during parsing
Note: The identity_key is not used for enforcing uniqueness. It is strictly an operational field to help the system
decide where to look when attempting to match an incoming serial number to a sensor.
4.4 Sensor Schema
Sensor: {
sensor_id: 'eey3k18m1tfb7f7',
deviceIdentifier: 'XXX:XXX:XXX:XXX:XXX:XXX:XXX:XXX',
identity_key: 'radio.identifier',
version: 0,
radio: {
identifier: 'XXX',
manufacturer: 'XXX',
version: 'XXX',
type: 'XXX'
},
sensor_body: {
identifier: 'XXX',
manufacturer: 'XXX',
version: 'XXX',
type: 'XXX'
}
}
5. Handling Duplicates and Versioning
5.1 Sensor Versioning
Each sensor has a version field that increments when:
- It is physically moved
- Its logical state changes
This preserves data consistency for time-series measurements. In such cases, the sensor is cloned, assigned a
new sensor_id, and the version is incremented.
5.2 Conflict Resolution
- Generating all possible permutations
- Matching based on strongest identifier
- Preferring exact matches over partials
6. Sensor Identity Lifecycle and Human Error Tolerance
6.1 Initial Ambiguity and Human Input Errors
- Field Confusion
- Missing Attributes
- Unreliable Initial Values
6.2 Resolution via First Telegram
The first telegram is treated as the source of truth. The system validates, resolves, and corrects the sensor
record or creates a new one if required.
7. Industry Shortcomings and Ideal Manufacturer Practices
7.1 The Manufacturer Identifier Problem
- Incremental serial numbers
- No differentiation between components
- No global uniqueness guarantees
- Hardware reuse
7.2 Solving the Problem: Proper Identifier Generation
Recommended approaches:
- crypto.randomUUID()
- High-entropy time-based identifiers
- Manufacturer-prefixed identifiers
Example:
EFE-1e3a9f5b2c-1704260803
7.3 Impact on System Complexity
- Eliminates fuzzy matching
- Reduces onboarding errors
- Simplifies versioning
7.4 On Payload Size and Vendor Objections
Identifier size is not the problem — obsolete telegram structure is.
8. Conclusion
The Polymorphic Sensor Identity Model (PSIM) offers a deterministic, versioned, and operationally proven
solution for large-scale sensor ecosystems.
Raise your standards. Fix your identity model. Or drown in your own digital violation.






