Building an LLM-Powered SIEM for Your Homelab

Most small teams running Kubernetes don't have security monitoring on their logs. And the reason is straightforward: traditional SIEMs like Splunk or Elastic SIEM are built for organizations that can afford six-figure contracts and staff dedicated to writing detection rules. If you're a startup with your first production cluster, a platform team running dev environments, or a lean ops group that can't justify that spend, your logs are probably flowing into Loki and sitting there unanalyzed.

What if a language model could do the analysis for you?

Sentinel is a ~550-line Python application that classifies Kubernetes cluster logs using an LLM in real time. You point it at a Loki instance and an Ollama-compatible endpoint. That's it — no SIEM subscriptions, no per-GB ingestion fees, no rule libraries to maintain. It works anywhere you have those two things running.

How It Works

The architecture is intentionally minimal. Every 60 seconds, Sentinel runs through a four-step cycle:

Queries Loki for log entries across configurable cluster sources (ArgoCD, kube-system, cert-manager, Cloudflare tunnels, application namespaces)
Sends log batches to Ollama with source-specific context about what each component does and what threats look like
Classifies each batch as SAFE, OPERATIONAL, SUSPICIOUS, or CRITICAL
Fires alerts through Alertmanager with cooldown logic to prevent alert fatigue

Why does this matter? An LLM can distinguish between a misconfiguration and an attack in ways that regex-based rules fundamentally cannot. Consider a 403 response. A traditional SIEM either alerts on every 403 (noisy) or ignores them all (blind). Sentinel evaluates the same 403 in context — a web crawler getting blocked by a news site is routine, but an unknown IP hitting 403s against your ArgoCD dashboard is a completely different signal.

The Prompt Is the Rule Engine

Sentinel replaces Sigma rules and Splunk queries with a structured system prompt that defines the classification taxonomy:

system prompt
VERDICTS:
- SAFE: Normal operational noise, routine errors, expected behavior
- OPERATIONAL: System outages, internal errors -- not attacks, but degraded
- SUSPICIOUS: Potential security threats, auth failures from unusual sources
- CRITICAL: Definite security breach, unauthorized access, active attack

Every log source carries its own context block — a short description of what the component does and what to watch for. For example, Sentinel knows that ArgoCD deploying workloads is expected behavior. However, an unknown user creating an ArgoCD application is a red flag worth escalating.

sentinel.py
{
    "name": "argo-security",
    "query": '{namespace="argo-system"} |~ "(?i)(denied|unauthorized|forbidden|...)"',
    "context": "ArgoCD GitOps controller. Can deploy arbitrary workloads to the cluster. "
               "Watch for: unauthorized access, unexpected app creation/deletion, "
               "sync failures from unknown sources, RBAC violations.",
    "priority": "critical",
}

There's also a "known noise" section in the prompt — essentially a tuning whitelist for the LLM. CSI driver reconciliation errors, Bitwarden rate limits, transient DNS timeouts: all explicitly marked as SAFE so the model doesn't fire alerts on normal infrastructure chatter.

Persistence-Based Alerting

Classifying a single cycle of logs isn't enough to act on. A rolling deployment that throws errors for 30 seconds shouldn't page you at 3am, and Sentinel's persistence model ensures it won't. Verdict history is tracked in a sliding window, and alerts only fire when patterns hold across multiple cycles:

CRITICAL alerts fire immediately (but still respect cooldown)
SUSPICIOUS verdicts require consecutive confirmations before alerting — a single anomalous cycle gets logged and watched, not alerted
OPERATIONAL issues need sustained failures across multiple cycles before they're worth waking up for

In practice, this eliminated the most common false positive scenario during early testing: transient errors from pod restarts and deployments that would resolve on their own within a cycle or two.

sentinel.py
def is_persistent_suspicious(self, source: str, threshold: int) -> bool:
    """Check if we have consecutive SUSPICIOUS verdicts
    (avoid single-cycle false positives)."""
    recent = self.get_recent(source, threshold)
    if len(recent) < threshold:
        return False
    return all(d["verdict"] in ("SUSPICIOUS", "CRITICAL") for d in recent)

DNS Security Monitoring

Sentinel also monitors NextDNS analytics for two categories of DNS-layer threats, going beyond what cluster logs alone can tell you.

DNS bypass detection watches for abnormally high block counts in short windows. A sudden spike in blocked queries might indicate a compromised device trying to reach C2 infrastructure, or an endpoint brute-forcing its way past DNS-layer controls. Either way, you want to know about it quickly.

Content policy monitoring checks for blocked queries matching configurable categories. When a device on your network hits a blocked domain, Sentinel fires a CRITICAL alert with the device name, IP, and domain. It's smart enough to skip cluster IPs (pods and services making DNS queries as part of normal operation), so you don't get flooded with false positives from your own workloads.

sentinel.py
# Skip cluster IPs (pods, nodes, services making DNS queries)
if any(device_ip.startswith(prefix) for prefix in PARENTAL_IGNORE_IPS):
    log.debug("[parental] Skipping cluster IP %s (%s) for %s",
              device_ip, device_name, domain)
    continue

Running It

Sentinel ships as a single Docker container. It needs three dependencies, and if you're already running a Kubernetes observability stack, you likely have most of them:

Loki for log aggregation (you're probably already running this)
Ollama with a capable model (qwen3 family works well — the 32B quantized model balances accuracy with inference speed)
Alertmanager for alert routing (ships with kube-prometheus-stack)

The resource footprint is trivial: 64Mi memory requested, 128Mi limit. All the expensive LLM inference happens on your Ollama instance, so Sentinel itself is just an orchestrator making HTTP calls on a timer.

A Helm chart handles the Kubernetes deployment and wires up environment variables for service URLs, polling intervals, cooldown periods, and sensitivity thresholds. Secrets (like the NextDNS API key) are injected via Bitwarden's CSI driver — the same pattern I wrote about in the LiteLLM post.

Observability

An LLM-based system needs its own monitoring — you can't just trust that inference is working correctly and move on. Sentinel exposes Prometheus metrics for poll cycle duration, inference latency, verdict distribution, alert counts (sent vs. suppressed), Loki errors, and DNS block rates. A bundled Grafana dashboard puts the security posture of your cluster on a single screen.

Why does this matter in practice? If inference latency spikes from 5 seconds to 60 seconds, you have a detection gap. If one source keeps returning OPERATIONAL verdicts for hours, that's a reliability problem a human should look at. The metrics make both of those situations visible before they become blind spots.

Tradeoffs and Limitations

There are real tradeoffs here, and I think it's important to be direct about them:

Inference latency means Sentinel isn't real-time. With a 60-second poll interval plus inference time, you're looking at 1–2 minute detection windows. Adequate for most environments; not sufficient for a high-throughput SOC.
LLM consistency can vary. Temperature is set to 0.1 for determinism, but the same log batch can occasionally produce different verdicts across runs. The persistence model mitigates this.
No correlation across sources. Each log source is classified independently. Sentinel won't connect a cert-manager error with a simultaneous ArgoCD deployment the way a human analyst would.
Model quality matters. Smaller models produce more false positives. The jump from 7B to 32B parameters noticeably improved classification accuracy in testing.

Why This Matters

The economics of security tooling are shifting. The traditional SIEM model — pay per GB ingested, hire analysts to write rules, maintain detection logic as your environment evolves — doesn't work for small teams. In fact, it barely works for large ones.

An LLM-based approach inverts the cost structure entirely. Instead of encoding every possible threat pattern as a rule, you describe what your systems do and what "suspicious" means in your specific context. The model handles the pattern matching from there. When you add a new service, you add a paragraph of context to a config — not a library of detection rules that someone has to maintain indefinitely.

We're seeing this same pattern across infrastructure tooling: rigid, rule-based systems giving way to context-aware ones that adapt without manual intervention. Security monitoring is a particularly good fit for this shift because the signal-to-noise problem is so severe. Most log lines are noise. The hard part has always been figuring out which ones aren't.

If you're running Kubernetes with Loki and don't have dedicated security monitoring, give Sentinel a look. It's open source, it deploys in minutes with Helm, and it will catch patterns your static Grafana alerts were never designed to find.

Kevin Crawley

Infrastructure engineer, crawley.systems. 20+ years building and operating production systems.

Building an LLM-Powered SIEM That Fits in a Single File