Skip to content

Thingsboard


name: Tin Can Thingsboard data description: Thingsboard device telemetry history data is stored in the RAW_DB.THINGSBOARD.THINGBSBOARD_REPORTS table.
This table contains 1 row per device (name) per historical import snapshot. For current state reporting, only the most recent historical import snapshot should be used, to avoid overstating device data. The full historical import snapshot can be used to trend data over time, but the default should always be to use the most recent snapshot data.


What this table is

A historical record of device state pulled from ThingsBoard. Every hour, an export of all devices is written to S3 (one CSV per snapshot) and loaded by Airbyte into this table. Each row captures the full state of one device at the moment of that hourly export.

This means:

  • The same device (ID) appears in roughly 24 rows per day — one per hourly snapshot.
  • "Current state" of a device = the most recent row for that ID.
  • "History" of a device = all rows for that ID, ordered by EXTRACTED_AT.
  • The table is append-only; old snapshots are never updated or deleted by the pipeline.

Two timestamps, two different meanings

This is the single most important thing to understand before querying.

EXTRACTED_AT is when we pulled the snapshot from ThingsBoard. It's a proper TIMESTAMP and is the same for every device in a given hourly batch. Use this when you want to know "what did the fleet look like at time X."

<column>_LAST_ACTIVITY_TIMESTAMP is when ThingsBoard last received that specific telemetry value from the device. Stored as Unix epoch milliseconds (e.g., 1761706509464), not a timestamp type — you'll need to convert. Use these when you want to know "when did this device last report its WiFi RSSI / firmware version / etc."

A device that's offline will keep appearing in new hourly snapshots with stale *_LAST_ACTIVITY_TIMESTAMP values. The snapshot time keeps advancing; the activity timestamps don't. The gap between them is how you measure staleness per-metric.

Converting epoch milliseconds in Snowflake:

TO_TIMESTAMP_NTZ(WIFI_RSSI_LAST_ACTIVITY_TIMESTAMP / 1000)

Columns by category

Identity

Column Notes
ID ThingsBoard device UUID. Stable across snapshots — this is the join key for tracking a single device over time.
NAME Device name as set in ThingsBoard. Often a MAC address. Can change if renamed in ThingsBoard, so prefer ID for joins.
DEVICE_PROFILE The ThingsBoard device profile assigned to the device (e.g., Tin Can Devices (Certificate), Tin Can Devices (Customer Activated)). Useful for segmenting fleet by provisioning state.
BETA_COHORT Cohort flag for beta program participation. Mostly 0.

Firmware

Column Notes
FIRMWARE_VERSION Version string the device reports. Empty / 0 if never reported.
FW_STATE Firmware update state. Mixed values: 0 (idle/unknown), DOWNLOADING, and other states the device emits during OTA. Treat as a string.
FIRMWARE_VERSION_LAST_UPDATED_TIME Epoch ms — last time the device reported its firmware version.
FW_STATE_LAST_ACTIVITY_TIMESTAMP Epoch ms — last time FW_STATE was updated.

Network & connectivity

Column Notes
ONLINE_STATUS Boolean (true / false) as ThingsBoard saw it at snapshot time. Devices can flap; use LASTACTIVITYTIME for a more reliable "is this device alive" check.
WIFI_RSSI WiFi signal strength (dBm, negative number — closer to 0 is stronger). 0 typically means never reported.
LASTACTIVITYTIME Epoch ms — last time ThingsBoard received any message from the device. The most useful "is this device alive" field.
ONLINE_STATUS_LAST_ACTIVITY_TIMESTAMP Epoch ms — last time online status flipped.
WIFI_RSSI_LAST_ACTIVITY_TIMESTAMP Epoch ms — last RSSI report.
LASTACTIVITYTIME_LAST_ACTIVITY_TIMESTAMP Epoch ms — meta-timestamp on when LASTACTIVITYTIME itself was updated. Rarely needed; LASTACTIVITYTIME is usually what you want.

SIP (call functionality)

Column Notes
SIP_STATUS SIP registration status.
SIP_RETRY_COUNT Number of SIP registration retries. High values suggest a device struggling to register.
SIPSERVERCERT SIP server certificate state/value.
SIP_STATUS_LAST_ACTIVITY_TIMESTAMP Epoch ms.
SIP_RETRY_COUNT_LAST_ACTIVITY_TIMESTAMP Epoch ms.
SIPSERVERCERT_LAST_ACTIVITY_TIMESTAMP Epoch ms.

Device health (heap & uptime)

These come from the device's runtime. All numeric; 0 typically means never reported.

Column Notes
UPTIME_S Seconds since device boot. Resets to a low number after a reboot — useful for detecting reboots in history.
HEAP_FREE Bytes of free heap memory.
HEAP_MIN Minimum free heap observed (low-water mark). Lower = closer to OOM.
HEAP_LARGEST_BLOCK Largest contiguous free heap block. Low values relative to HEAP_FREE indicate fragmentation.
WORKER_WDT_REBOOT_COUNT Count of watchdog-triggered reboots. Non-zero values are worth investigating.
UPTIME_S_LAST_ACTIVITY_TIMESTAMP Epoch ms.
HEAP_FREE_LAST_ACTIVITY_TIMESTAMP Epoch ms.
HEAP_MIN_LAST_ACTIVITY_TIMESTAMP Epoch ms.
HEAP_LARGEST_BLOCK_LAST_ACTIVITY_TIMESTAMP Epoch ms.
WORKER_WDT_REBOOT_COUNT_LAST_ACTIVITY_TIMESTAMP Epoch ms.
BETA_COHORT_LAST_ACTIVITY_TIMESTAMP Epoch ms.

Ingestion metadata (generally ignore)

These are added by Airbyte and the source-file convention. Useful for debugging the pipeline, not for analysis.

Column Notes
EXTRACTED_AT Exception to "ignore" rule — this is the snapshot timestamp and the most useful time field in the table. Same value for every device in a given hourly batch.
_AIRBYTE_RAW_ID Airbyte's per-row unique ID.
_AIRBYTE_EXTRACTED_AT When Airbyte loaded the row into Snowflake. Slightly later than EXTRACTED_AT. Use EXTRACTED_AT for analysis.
_AIRBYTE_META JSON of Airbyte sync metadata.
_AIRBYTE_GENERATION_ID Airbyte generation counter.
_AB_SOURCE_FILE_URL S3 path of the source CSV for this row. Useful when reconciling against the raw export.
_AB_SOURCE_FILE_LAST_MODIFIED When the source CSV was last modified in S3.

Gotchas to keep in mind

Watch for duplicate-row inflation. Any aggregate (COUNT, SUM, AVG) across the whole table multiplies by the number of snapshots. For "how many devices do we have," you almost always want COUNT(DISTINCT ID) or to first reduce to one row per device via QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY EXTRACTED_AT DESC) = 1.

Empty strings are not NULLs. The CSV ingestion produces empty strings ('') for missing values in some columns rather than proper NULLs. WHERE col IS NOT NULL won't filter these out — use NULLIF(col, '') or WHERE col <> ''.

Numeric-looking columns are sometimes strings. FW_STATE is the obvious one (mixes 0 with strings like DOWNLOADING), but inspect column types in Snowflake before assuming you can do math. Cast explicitly with TRY_CAST(col AS NUMBER) to fail safely.

0 often means "never reported," not literal zero. Especially for HEAP_*, UPTIME_S, WIFI_RSSI, and WORKER_WDT_REBOOT_COUNT. When analyzing these, consider filtering to rows where the corresponding *_LAST_ACTIVITY_TIMESTAMP is non-zero / non-empty.

ONLINE_STATUS is point-in-time and can flap. For reliable liveness, compute it from LASTACTIVITYTIME against a threshold (e.g., "active in the last 15 minutes") rather than trusting the boolean directly.

NAME is not a stable identifier. It can be edited in ThingsBoard. Use ID for joining across snapshots or to other tables.

Snapshot cadence isn't always exactly hourly. Pipeline delays, retries, and ThingsBoard-side hiccups can produce gaps or doubled snapshots. Don't assume COUNT(*) / COUNT(DISTINCT ID) = 24 per day exactly.