Sensor Monitoring System

A comprehensive monitoring and alerting solution for containerized sensor applications with automated fault detection and response capabilities.

Overview

This system monitors a sensor binary that outputs three values per second:

  • Sine-like value: Continuous floating-point signal
  • Random integer: Random numerical data
  • Counter: Incrementing sequence number

The system detects fault states (output: 0 0 0) and implements automated responses to unexpected conditions.

Architecture

Core Components

  • Sensor Application: Containerized binary producing telemetry data
  • Data Collection: Python exporter parsing sensor output into Prometheus metrics
  • Time-Series Storage: Prometheus for metrics collection and querying
  • Visualization: Grafana dashboards for real-time monitoring
  • Alerting: AlertManager with custom alert rules for fault detection
  • Automation: Alert handler service for automated incident response

Key Features

  • Fault Detection: Real-time identification of 0 0 0 fault patterns
  • Automated Recovery: Container restart and service healing capabilities
  • Comprehensive Alerting: 4 alert rules covering realistic failure scenarios
  • Dashboard Visualization: 9 panels showing metrics, trends, and system health
  • Incident Logging: Automated logging of faults and recovery actions

Data flow

  1. sensor-app container runs the sensor binary and writes to its own stdout
  2. Docker captures sensor-app's stdout and stores it as container logs
  3. sensor-exporter runs docker logs --follow sensor-app as a subprocess
  4. sensor-exporter reads the subprocess's stdout (which contains sensor-app's logs)

Visual Flow

┌─────────────┐    stdout    ┌──────────────┐    container   ┌─────────────┐
│ sensor      │────────────▶│ sensor-app   │────logs───────▶│ Docker      │
│ binary      │              │ container    │                │ daemon      │
└─────────────┘              └──────────────┘                └─────────────┘
┌─────────────┐    stdout    ┌──────────────┐    subprocess   ┌──────▼──────┐
│ sensor-     │◀────────────│ docker logs   │◀───────────────│ docker      │
│ exporter    │              │ --follow     │                 │ logs API    │
│ (parser)    │              │ sensor-app   │                 │             │
└─────────────┘              └──────────────┘                 └─────────────┘

Quick Start

Prerequisites

  • Docker and Docker Compose installed
  • Ports 3000, 5001, 8080, 9090, 9093 available

Deployment

# Clone or extract the project
cd ea-test/

# Start the complete monitoring stack
docker compose up -d --build

# Verify all services are running
docker compose ps

Access Points

  • Grafana Dashboard: http://localhost:3000 (admin/admin)
  • Prometheus: http://localhost:9090
  • AlertManager: http://localhost:9093
  • Sensor Metrics: http://localhost:8080/metrics
  • Alert Handler: http://localhost:5001

Monitoring Features

Dashboard Panels

  1. Sensor Sine Value: Time-series plot of sine-wave signal
  2. Sensor Random Value: Random integer visualization
  3. Sensor Counter Value: Incremental counter tracking
  4. Fault Count: Total number of detected faults
  5. Activity Rates: Fault and reading rates per minute
  6. Sensor Status: Online/offline indicator
  7. Last Reading Time: Time since last data point
  8. Fault Percentage: Percentage of readings that are faults
  9. Total Readings: Cumulative reading count

Alert Rules

  1. SensorFaultDetected: Immediate alert on any 0 0 0 output
  2. SensorOffline: Alert when no data received for 30+ seconds
  3. SensorCounterStalled: Alert when counter stops incrementing
  4. SensorReadingStalled: Critical alert when no new readings (exporter down)

Automated Responses

  • Sensor Offline (sensor-app down): Automatically restart sensor-app container
  • Reading Stalled (exporter down): Automatically Restart sensor exporter service
  • Fault Detection (0 0 0 readings): Log individual faults to /tmp/fault_incidents.log

Management Commands

Service Management

# View service status
docker compose ps

# View logs
docker compose logs sensor-exporter
docker compose logs alert-handler

# Restart specific service
docker compose restart sensor-app

# Stop all services
docker compose down

# Update and restart
docker compose down && docker compose up -d --build

Monitoring Commands

# Check sensor-app metrics
curl http://localhost:8080/metrics | grep sensor_

# Test alert handler
curl http://localhost:5001/

# Query Prometheus
curl "http://localhost:9090/api/v1/query?query=sensor_sine_value"

# Check alert rules
curl "http://localhost:9090/api/v1/rules"

Debugging

# Sensor-app output sample
docker exec sensor-app timeout 5 /app/sensor

# View alert logs
docker exec alert-handler tail -f /tmp/alert_handler.log

# Check fault incidents
docker exec alert-handler cat /tmp/fault_incidents.log

Configuration

File Structure

├── bin/sensor                              # Sensor binary
├── docker-compose.yml                     # Service orchestration
├── Dockerfile                            # Sensor container
├── Dockerfile.exporter                   # Metrics exporter
├── Dockerfile.alert-handler             # Alert automation
├── sensor_exporter.py                   # Metrics collection service
├── alert_handler.py                     # Automated response system
├── config/
│   ├── prometheus.yml                   # Metrics collection config
│   ├── alert_rules.yml                  # Alerting rules
│   ├── alertmanager.yml                # Alert routing config
│   └── grafana/
│       ├── provisioning/               # Auto-provisioning
│       └── dashboards/                 # Dashboard definitions
└── README.md                           # This documentation

Customization

  • Alert Thresholds: Edit config/alert_rules.yml
  • Dashboard Layout: Modify config/grafana/dashboards/sensor-dashboard.json
  • Retention Period: Update Prometheus retention in docker-compose.yml
  • Notification Channels: Configure AlertManager in config/alertmanager.yml

Troubleshooting

Common Issues

  1. Port Conflicts: Ensure ports 3000, 5001, 8080, 9090, 9093 are free
  2. Container Startup: Check docker compose logs <service> for errors
  3. Metrics Not Appearing: Verify sensor-exporter is reading data correctly
  4. Alerts Not Firing: Check Prometheus rules at http://localhost:9090/rules

Health Checks

# Verify all endpoints
curl http://localhost:8080/health    # Sensor exporter
curl http://localhost:3000/api/health # Grafana
curl http://localhost:5001/          # Alert handler
curl http://localhost:9090/-/healthy  # Prometheus

Recovery Procedures

  • Complete System Reset: docker compose down -v && docker compose up -d --build
  • Data Reset: Remove volumes to clear all stored data
  • Service Recovery: Individual service restarts preserve other components

Technology Choices

  • Prometheus: Industry standard for time-series metrics
  • Grafana: Rich visualization and dashboarding capabilities
  • Python: Robust ecosystem for data processing and automation
  • Docker: Consistent deployment across environments
  • AlertManager: Flexible routing and notification management
Created: 25 April 2026 Last updated: 25 April 2026