Everyone knows Grafana for dashboards. But Grafana Labs now ships an entire observability platform — metrics, logs, traces, and profiling — all under one roof. The problem? There are so many components that it’s hard to know which ones you actually need.

We deploy Grafana stacks for customers regularly. Here’s the map of what each component does, which ones matter, and how they fit together.

The Grafana Ecosystem at a Glance

Component What It Does Replaces Min RAM
Grafana Dashboards and visualization Kibana, Chronograf 512 MB
Loki Log aggregation Elasticsearch (for logs) 1 GB
Mimir Long-term metrics storage Thanos, Cortex 2 GB
Tempo Distributed tracing Jaeger, Zipkin 1 GB
Pyroscope Continuous profiling pprof, async-profiler 1 GB
Alloy Unified telemetry collector Grafana Agent, Promtail 128 MB
OnCall Incident management PagerDuty, Opsgenie 512 MB

Which Components Do You Actually Need?

Not all of these are required. Here’s what we recommend based on team size and complexity:

Tier Components Use Case
Essential Grafana + Prometheus + Loki Dashboards, metrics, and logs — covers 90% of needs
Growth + Alloy + Mimir Multi-server collection, long-term metric retention
Advanced + Tempo + Pyroscope Distributed tracing and profiling for microservices
Enterprise + OnCall + Alerting Full incident management pipeline

Most teams should start with the Essential tier. Add components only when you hit a specific pain point — don’t deploy Tempo if you don’t have distributed services to trace.

Grafana: The Dashboard Layer

Grafana is the visualization engine that ties everything together. It queries Prometheus for metrics, Loki for logs, Tempo for traces, and displays them in unified dashboards. Key capabilities:

  • Multi-source queries — correlate Prometheus metrics with Loki logs in a single panel
  • Alerting — built-in alert rules that fire to Slack, email, PagerDuty, or webhooks
  • Dashboard-as-code — export/import dashboards as JSON for version control
  • Community dashboards — thousands of pre-built dashboards on grafana.com

Alloy: The Unified Collector

Alloy replaces both Grafana Agent and Promtail. It’s a single binary that collects metrics (Prometheus scrape), logs (file tailing), and traces (OpenTelemetry) and ships them to your backends. Deploy one agent instead of three.

// alloy config example
prometheus.scrape "default" {
  targets = [{"__address__" = "localhost:9100"}]
  forward_to = [prometheus.remote_write.mimir.receiver]
}

loki.source.file "syslog" {
  targets = [{"__path__" = "/var/log/syslog"}]
  forward_to = [loki.write.default.receiver]
}

prometheus.remote_write "mimir" {
  endpoint {
    url = "http://mimir:9009/api/v1/push"
  }
}

loki.write "default" {
  endpoint {
    url = "http://loki:3100/loki/api/v1/push"
  }
}

Mimir: Scalable Metrics Backend

Prometheus stores metrics locally and has finite retention. Mimir is Grafana’s horizontally scalable metrics backend — it accepts Prometheus remote-write and stores metrics for months or years with better compression.

When you need Mimir:

  • Multiple Prometheus instances that need a unified query layer
  • Metrics retention beyond 90 days
  • High-availability — Mimir replicates data across nodes

Alternative: VictoriaMetrics does similar things with less operational overhead for single-node deployments.

Tempo: Distributed Tracing

Tempo stores and queries distributed traces — the path a request takes across microservices. It accepts OpenTelemetry, Jaeger, and Zipkin formats. In Grafana, you can jump from a log line to the exact trace that produced it.

You need this if: you run microservices and need to understand why a request took 3 seconds when it should take 200ms. You don’t need this if: you run a monolithic app or a small number of services.

Pyroscope: Continuous Profiling

Pyroscope captures CPU and memory profiles continuously, so when a performance issue happens, you already have the data. No need to reproduce the problem and attach a profiler — the flame graph is already recorded.

Best for: teams debugging memory leaks, CPU hotspots, or garbage collection pauses in Java/Go/Python/Node.js applications.

OpenTelemetry: The Glue

OpenTelemetry (OTel) is the vendor-neutral standard for telemetry. Instead of using Grafana-specific SDKs, instrument your app with OTel and send data to any backend. Alloy speaks OTel natively, so you can switch backends without re-instrumenting your code.

Production Architecture

Here’s a typical production setup using Docker Compose:

version: "3.8"
services:
  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    volumes: [grafana_data:/var/lib/grafana]

  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus

  loki:
    image: grafana/loki:latest
    ports: ["3100:3100"]
    volumes: [loki_data:/loki]

  alloy:
    image: grafana/alloy:latest
    volumes:
      - ./alloy-config.alloy:/etc/alloy/config.alloy
      - /var/log:/var/log:ro
    command: run /etc/alloy/config.alloy

volumes:
  grafana_data:
  prometheus_data:
  loki_data:

Hosting Requirements

Tier CPU RAM Storage
Essential (Grafana + Prometheus + Loki) 2 cores 4 GB 50 GB SSD
Growth (+ Alloy + Mimir) 4 cores 8 GB 200 GB SSD
Advanced (+ Tempo + Pyroscope) 4 cores 16 GB 500 GB SSD

The Essential tier fits on a Cloud VPS with 4 GB RAM. Growth and Advanced tiers benefit from a dedicated server for the I/O and memory headroom.

Want us to handle the deployment and maintenance? Our Managed Support team sets up Grafana stacks, configures alerting, and keeps everything updated so you can focus on building your product.

Next Steps

Start with the Essential tier. Once you have dashboards and logs working, you’ll know exactly which gaps to fill. Check our monitoring comparison for the full Prometheus setup guide and our logging comparison for Loki configuration details. For security hardening of your observability stack, follow our VPS hardening guide.