This project focuses on creating an end-to-end observability pipeline built to capture, stream, transform, and visualize live network traffic — designed as a monitoring and anomaly-detection system for operational visibility into network behavior.
Pipeline Architecture
The pipeline is composed of four decoupled stages, each handling a distinct concern in the data flow:
Capture
A Python agent using the Scapy library performs low-level packet capture directly off the network interface, producing a continuous stream of raw traffic events.
Stream
Captured packets are published to a Kafka topic via a producer, decoupling ingestion from processing and providing a durable, replayable buffer. This stage ensures backpressure resilience and supports multiple downstream consumers.
Transform & Store
A Kafka consumer reads from the topic, extracts structured fields — source/destination hosts, protocols, GeoIP metadata, packet sizes — and writes the normalized records into PostgreSQL. The schema is optimized for time-series aggregation queries used by the reporting layer.
Visualize
Power BI connects directly to PostgreSQL to surface a live operational dashboard tracking key traffic metrics: packet rates, unique hosts, protocol distribution, transferred data volume, and geographic origin of traffic.
The Pipeline
This architecture reflects a pattern common to production observability stacks: a lightweight collector, a durable message bus, a transformation layer, a queryable store, and a visualization layer — each independently scalable and replaceable.
Dashboard
The Power BI report provides operators with real-time visibility into:
- Total packets captured and throughput trends
- Unique hosts communicating on the network
- GeoIP mapping of traffic origins
- Protocol distribution breakdown
- Aggregate data transfer volumes

Conclusion
This project shows how to build a robust, scalable observability pipeline for network traffic analysis using open-source tools.
By decoupling each stage, the architecture allows for flexibility in scaling and evolving components independently — a critical design principle for production monitoring systems.
The resulting dashboard provides actionable insights into network behavior, enabling proactive detection of anomalies and informed operational decisions.
Have a nice day ^_^
