System Design - Distributed Log Ingestion
Posted: | Categories: tech | Tags: hld, system-design
🧠System Design Interview Summary: Log Ingestion & Query System Interviewer: Vega Topic: Design a system for log ingestion, storage, and querying across multi-tenant agents ✅ High-Level Architecture Agents send logs (JSON) via HTTP to a rate-limited ingress service Ingress service writes logs to Kafka (HA cluster) Two main Kafka consumers: Object Store Consumer: Stores raw logs in GCS/S3 Indexing Consumer: Pushes structured logs to Elasticsearch Elasticsearch Cluster (with snapshots) holds searchable logs Query Layer exposes APIs (or Kibana) to end users Metadata DB stores user info, tenant configs, RBAC rules Telemetry pipeline for usage and system health insights 💡 Key Design Decisions 🔹 Data Format JSON for ingestion (readable, schema-tolerant) Protobuf or compressed archives for long-term storage 🔹 Schema Evolution Agent schemas versioned per tenant Schema registry to ensure backward compatibility Only expected fields are accepted/processed 🔹 Indexing & Querying Indexed fields include timestamp, log level, service name, etc. Read more...