The BMS Went Silent
The BMS went silent.
CAN bus was alive. The ESP32 was transmitting. The dashboard was refreshing on schedule. And yet the battery had vanished — not physically, just from the system’s understanding of reality.
That was the night I learned something my degree never quite taught me: a connected device and an observable device are not the same thing.
For most of my degree, electrical and computer engineering felt fragmented. Electronics in one box, digital systems in another, embedded systems somewhere in between. I could pass exams, but I rarely felt like I was building anything connected. That changed when I started working on a battery charging station built around BMS telemetry, CAN communication, embedded control, and cloud monitoring. It forced everything I had studied into a single interacting system, and moments like that silent BMS were where the subjects finally clicked into each other.
From Battery to Cloud
The architecture is a layered pipeline:
JK-BMS > SN65HVD230 > ESP32 (CAN gateway) > Cloud > DashboardThe ESP32 acts as a CAN interface node, extracting real-time telemetry from the battery management system: total pack voltage, individual cell voltages, charge and discharge current, MOSFET and battery temperatures, cycle count, capacity tracking, and cell imbalance indicators. What initially looked like simple telemetry quickly became something more interesting: a live system describing its own internal state in real time.
The First Break: Sleep Behaviour
Back to that silent BMS.
The earliest system-level issue didn’t appear in code or in communication logic. It appeared in device state. The BMS entered a deep sleep when disconnected from charging or load, and once asleep, it stopped responding over CAN entirely. From the ESP32’s perspective the network still existed, but the device generating truth was offline.
This broke a core assumption: continuous observability.
The fix wasn’t a software patch. It was a reframing. Wake-up had to be treated as a system design problem, not a communication problem. The ESP32 had to emulate a valid external condition to pull the BMS back into an active state. That single bug exposed the gap between protocol connectivity and actual device state — a distinction I’d never had to make in any coursework.
From Data to Control Logic
Once acquisition stabilised, a different kind of problem surfaced.
The system had information, but no structure for behaviour. Charging decisions were scattered across the codebase, fault handling was inconsistent, and state transitions were implicit — buried inside conditionals that worked until they didn’t.
It needed a formal structure.
Finite State Machines: Order to Behaviour
I redesigned the system around explicit states: Idle, Battery Validation, Charging, Monitoring, Fault Handling, and Completion. Each transition had defined conditions; each state had defined responsibilities.
This shifted the system from reactive logic to deterministic behaviour. Instead of asking “what should happen now?”, the system started operating on “what state am I in, and what transitions are valid?” Ambiguity dropped. System-level behaviour stabilised.
An AI Layer for Battery Health
Once telemetry was stable, the next limitation became clear: the system could observe battery behaviour, but it couldn’t fully understand degradation. Voltage was visible. Current was visible. Temperature was visible. Health — the thing that actually mattered — wasn’t.
To close that gap, I introduced a Random Forest model for State of Health (SOH) estimation. The choice wasn’t arbitrary. Lithium Iron Phosphate degradation is stubbornly non-linear: voltage decay doesn’t follow a clean curve, thermal behaviour shifts with context, and cycle wear accumulates irregularly depending on how the pack has been used. Linear regression assumes the world behaves more politely than LFP chemistry actually does. Tree-based models, by contrast, handle messy interactions between features without forcing them into a shape they don’t fit, which made Random Forest a practical match for the kind of data the system was producing.
The features came directly from CAN telemetry: cell voltage variance as a key imbalance indicator, cycle count history, temperature gradients across the MOSFET and battery sensors, and charge/discharge current profiles. None of these measure degradation directly. They’re symptoms — the observable traces that internal chemistry leaves behind on the parts of the system you can actually see. That distinction is where the engineering sits. The model isn’t reading the battery’s health; it’s inferring it from the shadows the battery casts on the telemetry.
Architecture Separation
The intelligence layer doesn’t live on the ESP32. It lives on a local PC.
The ESP32 handles CAN acquisition and transmission at the edge. The PC handles model training and inference. This separation exists because of computational constraints, but it also pays off architecturally: the embedded layer stays deterministic, and the intelligence layer stays flexible.
The Real Engineering Problem: State Consistency
Even after the FSM and AI integration, the hardest problem remained.
The system now had three representations of reality:
- Physical state — CAN telemetry
- Behavioural state — FSM logic
- Estimated state — AI SOH prediction
The real engineering challenge wasn’t building any one of them. It was keeping all three consistent and meaningful under live operating conditions, when the battery was actually charging, sleeping, drifting, or aging.
Key Insight
This project was not about components. It was about alignment.
That sentence is the closest thing to an answer I have for what my degree was actually teaching me. I just couldn’t see it until something I built forced the question.
Final Thought
This started as a charging station. It became an exercise in what happens when multiple engineering domains intersect inside a single live system, and what it takes to keep that system honest with itself.
Performance evaluation and field testing aren’t finalised yet. Those results — measured outcomes and system visuals — will land in a follow-up breakdown focused on implementation, benchmarks, and what the SOH model actually predicts in the wild.
The charging station works. The harder question, how well does it understand itself?, is the one I’m still answering.