Network Traffic Visualization Data Pipeline


This project focuses on creating an end-to-end observability pipeline built to capture, stream, transform, and visualize live network traffic — designed as a monitoring and anomaly-detection system for operational visibility into network behavior.

Pipeline Architecture

The pipeline is composed of four decoupled stages, each handling a distinct concern in the data flow:

Capture

A Python agent using the Scapy library performs low-level packet capture directly off the network interface, producing a continuous stream of raw traffic events.

Stream

Captured packets are published to a Kafka topic via a producer, decoupling ingestion from processing and providing a durable, replayable buffer. This stage ensures backpressure resilience and supports multiple downstream consumers.

Transform & Store

A Kafka consumer reads from the topic, extracts structured fields — source/destination hosts, protocols, GeoIP metadata, packet sizes — and writes the normalized records into PostgreSQL. The schema is optimized for time-series aggregation queries used by the reporting layer.

Visualize

Power BI connects directly to PostgreSQL to surface a live operational dashboard tracking key traffic metrics: packet rates, unique hosts, protocol distribution, transferred data volume, and geographic origin of traffic.

The Pipeline

The pipeline

This architecture reflects a pattern common to production observability stacks: a lightweight collector, a durable message bus, a transformation layer, a queryable store, and a visualization layer — each independently scalable and replaceable.

Dashboard

The Power BI report provides operators with real-time visibility into:

  • Total packets captured and throughput trends
  • Unique hosts communicating on the network
  • GeoIP mapping of traffic origins
  • Protocol distribution breakdown
  • Aggregate data transfer volumes
Power BI Report

Conclusion

This project shows how to build a robust, scalable observability pipeline for network traffic analysis using open-source tools.

By decoupling each stage, the architecture allows for flexibility in scaling and evolving components independently — a critical design principle for production monitoring systems.

The resulting dashboard provides actionable insights into network behavior, enabling proactive detection of anomalies and informed operational decisions.

Have a nice day ^_^