Everyone knows Grafana for dashboards. But Grafana Labs now ships an entire observability platform — metrics, logs, traces, and profiling — all under one roof. The problem? There are so many components that it’s hard to know which ones you actually need.
We deploy Grafana stacks for customers regularly. Here’s the map of what each component does, which ones matter, and how they fit together.
The Grafana Ecosystem at a Glance
| Component | What It Does | Replaces | Min RAM |
|---|---|---|---|
| Grafana | Dashboards and visualization | Kibana, Chronograf | 512 MB |
| Loki | Log aggregation | Elasticsearch (for logs) | 1 GB |
| Mimir | Long-term metrics storage | Thanos, Cortex | 2 GB |
| Tempo | Distributed tracing | Jaeger, Zipkin | 1 GB |
| Pyroscope | Continuous profiling | pprof, async-profiler | 1 GB |
| Alloy | Unified telemetry collector | Grafana Agent, Promtail | 128 MB |
| OnCall | Incident management | PagerDuty, Opsgenie | 512 MB |
Which Components Do You Actually Need?
Not all of these are required. Here’s what we recommend based on team size and complexity:
| Tier | Components | Use Case |
|---|---|---|
| Essential | Grafana + Prometheus + Loki | Dashboards, metrics, and logs — covers 90% of needs |
| Growth | + Alloy + Mimir | Multi-server collection, long-term metric retention |
| Advanced | + Tempo + Pyroscope | Distributed tracing and profiling for microservices |
| Enterprise | + OnCall + Alerting | Full incident management pipeline |
Most teams should start with the Essential tier. Add components only when you hit a specific pain point — don’t deploy Tempo if you don’t have distributed services to trace.
Grafana: The Dashboard Layer
Grafana is the visualization engine that ties everything together. It queries Prometheus for metrics, Loki for logs, Tempo for traces, and displays them in unified dashboards. Key capabilities:
- Multi-source queries — correlate Prometheus metrics with Loki logs in a single panel
- Alerting — built-in alert rules that fire to Slack, email, PagerDuty, or webhooks
- Dashboard-as-code — export/import dashboards as JSON for version control
- Community dashboards — thousands of pre-built dashboards on grafana.com
Alloy: The Unified Collector
Alloy replaces both Grafana Agent and Promtail. It’s a single binary that collects metrics (Prometheus scrape), logs (file tailing), and traces (OpenTelemetry) and ships them to your backends. Deploy one agent instead of three.
// alloy config example
prometheus.scrape "default" {
targets = [{"__address__" = "localhost:9100"}]
forward_to = [prometheus.remote_write.mimir.receiver]
}
loki.source.file "syslog" {
targets = [{"__path__" = "/var/log/syslog"}]
forward_to = [loki.write.default.receiver]
}
prometheus.remote_write "mimir" {
endpoint {
url = "http://mimir:9009/api/v1/push"
}
}
loki.write "default" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
}
}
Mimir: Scalable Metrics Backend
Prometheus stores metrics locally and has finite retention. Mimir is Grafana’s horizontally scalable metrics backend — it accepts Prometheus remote-write and stores metrics for months or years with better compression.
When you need Mimir:
- Multiple Prometheus instances that need a unified query layer
- Metrics retention beyond 90 days
- High-availability — Mimir replicates data across nodes
Alternative: VictoriaMetrics does similar things with less operational overhead for single-node deployments.
Tempo: Distributed Tracing
Tempo stores and queries distributed traces — the path a request takes across microservices. It accepts OpenTelemetry, Jaeger, and Zipkin formats. In Grafana, you can jump from a log line to the exact trace that produced it.
You need this if: you run microservices and need to understand why a request took 3 seconds when it should take 200ms. You don’t need this if: you run a monolithic app or a small number of services.
Pyroscope: Continuous Profiling
Pyroscope captures CPU and memory profiles continuously, so when a performance issue happens, you already have the data. No need to reproduce the problem and attach a profiler — the flame graph is already recorded.
Best for: teams debugging memory leaks, CPU hotspots, or garbage collection pauses in Java/Go/Python/Node.js applications.
OpenTelemetry: The Glue
OpenTelemetry (OTel) is the vendor-neutral standard for telemetry. Instead of using Grafana-specific SDKs, instrument your app with OTel and send data to any backend. Alloy speaks OTel natively, so you can switch backends without re-instrumenting your code.
Production Architecture
Here’s a typical production setup using Docker Compose:
version: "3.8"
services:
grafana:
image: grafana/grafana:latest
ports: ["3000:3000"]
volumes: [grafana_data:/var/lib/grafana]
prometheus:
image: prom/prometheus:latest
ports: ["9090:9090"]
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
loki:
image: grafana/loki:latest
ports: ["3100:3100"]
volumes: [loki_data:/loki]
alloy:
image: grafana/alloy:latest
volumes:
- ./alloy-config.alloy:/etc/alloy/config.alloy
- /var/log:/var/log:ro
command: run /etc/alloy/config.alloy
volumes:
grafana_data:
prometheus_data:
loki_data:
Hosting Requirements
| Tier | CPU | RAM | Storage |
|---|---|---|---|
| Essential (Grafana + Prometheus + Loki) | 2 cores | 4 GB | 50 GB SSD |
| Growth (+ Alloy + Mimir) | 4 cores | 8 GB | 200 GB SSD |
| Advanced (+ Tempo + Pyroscope) | 4 cores | 16 GB | 500 GB SSD |
The Essential tier fits on a Cloud VPS with 4 GB RAM. Growth and Advanced tiers benefit from a dedicated server for the I/O and memory headroom.
Want us to handle the deployment and maintenance? Our Managed Support team sets up Grafana stacks, configures alerting, and keeps everything updated so you can focus on building your product.
Next Steps
Start with the Essential tier. Once you have dashboards and logs working, you’ll know exactly which gaps to fill. Check our monitoring comparison for the full Prometheus setup guide and our logging comparison for Loki configuration details. For security hardening of your observability stack, follow our VPS hardening guide.
Be First to Comment