Thingsboard
name: Tin Can Thingsboard data
description: Thingsboard device telemetry history data is stored in the RAW_DB.THINGSBOARD.THINGBSBOARD_REPORTS table.
This table contains 1 row per device (name) per historical import snapshot. For current state reporting, only the
most recent historical import snapshot should be used, to avoid overstating device data. The full historical import
snapshot can be used to trend data over time, but the default should always be to use the most recent snapshot data.
What this table is
A historical record of device state pulled from ThingsBoard. Every hour, an export of all devices is written to S3 (one CSV per snapshot) and loaded by Airbyte into this table. Each row captures the full state of one device at the moment of that hourly export.
This means:
- The same device (
ID) appears in roughly 24 rows per day — one per hourly snapshot. - "Current state" of a device = the most recent row for that
ID. - "History" of a device = all rows for that
ID, ordered byEXTRACTED_AT. - The table is append-only; old snapshots are never updated or deleted by the pipeline.
Two timestamps, two different meanings
This is the single most important thing to understand before querying.
EXTRACTED_AT is when we pulled the snapshot from ThingsBoard. It's a proper TIMESTAMP and is the same for every device in a given hourly batch. Use this when you want to know "what did the fleet look like at time X."
<column>_LAST_ACTIVITY_TIMESTAMP is when ThingsBoard last received that specific telemetry value from the device. Stored as Unix epoch milliseconds (e.g., 1761706509464), not a timestamp type — you'll need to convert. Use these when you want to know "when did this device last report its WiFi RSSI / firmware version / etc."
A device that's offline will keep appearing in new hourly snapshots with stale *_LAST_ACTIVITY_TIMESTAMP values. The snapshot time keeps advancing; the activity timestamps don't. The gap between them is how you measure staleness per-metric.
Converting epoch milliseconds in Snowflake:
Columns by category
Identity
| Column | Notes |
|---|---|
ID |
ThingsBoard device UUID. Stable across snapshots — this is the join key for tracking a single device over time. |
NAME |
Device name as set in ThingsBoard. Often a MAC address. Can change if renamed in ThingsBoard, so prefer ID for joins. |
DEVICE_PROFILE |
The ThingsBoard device profile assigned to the device (e.g., Tin Can Devices (Certificate), Tin Can Devices (Customer Activated)). Useful for segmenting fleet by provisioning state. |
BETA_COHORT |
Cohort flag for beta program participation. Mostly 0. |
Firmware
| Column | Notes |
|---|---|
FIRMWARE_VERSION |
Version string the device reports. Empty / 0 if never reported. |
FW_STATE |
Firmware update state. Mixed values: 0 (idle/unknown), DOWNLOADING, and other states the device emits during OTA. Treat as a string. |
FIRMWARE_VERSION_LAST_UPDATED_TIME |
Epoch ms — last time the device reported its firmware version. |
FW_STATE_LAST_ACTIVITY_TIMESTAMP |
Epoch ms — last time FW_STATE was updated. |
Network & connectivity
| Column | Notes |
|---|---|
ONLINE_STATUS |
Boolean (true / false) as ThingsBoard saw it at snapshot time. Devices can flap; use LASTACTIVITYTIME for a more reliable "is this device alive" check. |
WIFI_RSSI |
WiFi signal strength (dBm, negative number — closer to 0 is stronger). 0 typically means never reported. |
LASTACTIVITYTIME |
Epoch ms — last time ThingsBoard received any message from the device. The most useful "is this device alive" field. |
ONLINE_STATUS_LAST_ACTIVITY_TIMESTAMP |
Epoch ms — last time online status flipped. |
WIFI_RSSI_LAST_ACTIVITY_TIMESTAMP |
Epoch ms — last RSSI report. |
LASTACTIVITYTIME_LAST_ACTIVITY_TIMESTAMP |
Epoch ms — meta-timestamp on when LASTACTIVITYTIME itself was updated. Rarely needed; LASTACTIVITYTIME is usually what you want. |
SIP (call functionality)
| Column | Notes |
|---|---|
SIP_STATUS |
SIP registration status. |
SIP_RETRY_COUNT |
Number of SIP registration retries. High values suggest a device struggling to register. |
SIPSERVERCERT |
SIP server certificate state/value. |
SIP_STATUS_LAST_ACTIVITY_TIMESTAMP |
Epoch ms. |
SIP_RETRY_COUNT_LAST_ACTIVITY_TIMESTAMP |
Epoch ms. |
SIPSERVERCERT_LAST_ACTIVITY_TIMESTAMP |
Epoch ms. |
Device health (heap & uptime)
These come from the device's runtime. All numeric; 0 typically means never reported.
| Column | Notes |
|---|---|
UPTIME_S |
Seconds since device boot. Resets to a low number after a reboot — useful for detecting reboots in history. |
HEAP_FREE |
Bytes of free heap memory. |
HEAP_MIN |
Minimum free heap observed (low-water mark). Lower = closer to OOM. |
HEAP_LARGEST_BLOCK |
Largest contiguous free heap block. Low values relative to HEAP_FREE indicate fragmentation. |
WORKER_WDT_REBOOT_COUNT |
Count of watchdog-triggered reboots. Non-zero values are worth investigating. |
UPTIME_S_LAST_ACTIVITY_TIMESTAMP |
Epoch ms. |
HEAP_FREE_LAST_ACTIVITY_TIMESTAMP |
Epoch ms. |
HEAP_MIN_LAST_ACTIVITY_TIMESTAMP |
Epoch ms. |
HEAP_LARGEST_BLOCK_LAST_ACTIVITY_TIMESTAMP |
Epoch ms. |
WORKER_WDT_REBOOT_COUNT_LAST_ACTIVITY_TIMESTAMP |
Epoch ms. |
BETA_COHORT_LAST_ACTIVITY_TIMESTAMP |
Epoch ms. |
Ingestion metadata (generally ignore)
These are added by Airbyte and the source-file convention. Useful for debugging the pipeline, not for analysis.
| Column | Notes |
|---|---|
EXTRACTED_AT |
Exception to "ignore" rule — this is the snapshot timestamp and the most useful time field in the table. Same value for every device in a given hourly batch. |
_AIRBYTE_RAW_ID |
Airbyte's per-row unique ID. |
_AIRBYTE_EXTRACTED_AT |
When Airbyte loaded the row into Snowflake. Slightly later than EXTRACTED_AT. Use EXTRACTED_AT for analysis. |
_AIRBYTE_META |
JSON of Airbyte sync metadata. |
_AIRBYTE_GENERATION_ID |
Airbyte generation counter. |
_AB_SOURCE_FILE_URL |
S3 path of the source CSV for this row. Useful when reconciling against the raw export. |
_AB_SOURCE_FILE_LAST_MODIFIED |
When the source CSV was last modified in S3. |
Gotchas to keep in mind
Watch for duplicate-row inflation. Any aggregate (COUNT, SUM, AVG) across the whole table multiplies by the number of snapshots. For "how many devices do we have," you almost always want COUNT(DISTINCT ID) or to first reduce to one row per device via QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY EXTRACTED_AT DESC) = 1.
Empty strings are not NULLs. The CSV ingestion produces empty strings ('') for missing values in some columns rather than proper NULLs. WHERE col IS NOT NULL won't filter these out — use NULLIF(col, '') or WHERE col <> ''.
Numeric-looking columns are sometimes strings. FW_STATE is the obvious one (mixes 0 with strings like DOWNLOADING), but inspect column types in Snowflake before assuming you can do math. Cast explicitly with TRY_CAST(col AS NUMBER) to fail safely.
0 often means "never reported," not literal zero. Especially for HEAP_*, UPTIME_S, WIFI_RSSI, and WORKER_WDT_REBOOT_COUNT. When analyzing these, consider filtering to rows where the corresponding *_LAST_ACTIVITY_TIMESTAMP is non-zero / non-empty.
ONLINE_STATUS is point-in-time and can flap. For reliable liveness, compute it from LASTACTIVITYTIME against a threshold (e.g., "active in the last 15 minutes") rather than trusting the boolean directly.
NAME is not a stable identifier. It can be edited in ThingsBoard. Use ID for joining across snapshots or to other tables.
Snapshot cadence isn't always exactly hourly. Pipeline delays, retries, and ThingsBoard-side hiccups can produce gaps or doubled snapshots. Don't assume COUNT(*) / COUNT(DISTINCT ID) = 24 per day exactly.