Monitoring 📖 25 min read

Grafana Loki Deployment

I was running ELK on a 4GB VPS. Elasticsearch alone wanted 2GB of heap. The JVM would regularly spike to 3GB+, leaving almost nothing for the OS or anything else on the box. I switched to Loki in late 2024, and it replaced the entire ELK stack while running in under 512MB. This is how I did the migration.

Why Loki Over ELK?

The short version: ELK indexes everything, Loki indexes almost nothing. Loki only stores labels (like job name, hostname, environment) as indexed fields. The actual log lines sit compressed on disk and get scanned at query time. That approach uses a fraction of the memory because there's no inverted index sitting in RAM.

It plugs straight into Grafana, so if you already run Grafana for Prometheus dashboards, you don't need a separate UI like Kibana. One less thing to update, one less thing to break.

The tradeoff is real though — searches across unindexed content are slower, and if you need full-text search across millions of log lines, Loki will feel sluggish compared to Elasticsearch. For filtering by service, time range, and grepping for error patterns, it's more than enough.

Architecture

  • Loki - The log storage and query engine
  • Promtail - Agent that ships logs to Loki
  • Grafana - Where you view and query logs

Docker Compose Setup

version: "3"

services:
 loki:
 image: grafana/loki:latest
 ports:
 - "3100:3100"
 volumes:
 - ./loki-config.yaml:/etc/loki/local-config.yaml
 - loki-data:/loki
 command: -config.file=/etc/loki/local-config.yaml

 promtail:
 image: grafana/promtail:latest
 volumes:
 - ./promtail-config.yaml:/etc/promtail/config.yml
 - /var/log:/var/log:ro
 command: -config.file=/etc/promtail/config.yml

 grafana:
 image: grafana/grafana:latest
 ports:
 - "3000:3000"
 volumes:
 - grafana-data:/var/lib/grafana

volumes:
 loki-data:
 grafana-data:

Loki Configuration

Create loki-config.yaml:

auth_enabled: false

server:
 http_listen_port: 3100

common:
 path_prefix: /loki
 storage:
 filesystem:
 chunks_directory: /loki/chunks
 rules_directory: /loki/rules
 replication_factor: 1
 ring:
 kvstore:
 store: inmemory

schema_config:
 configs:
 - from: 2020-10-24
 store: boltdb-shipper
 object_store: filesystem
 schema: v11
 index:
 prefix: index_
 period: 24h

storage_config:
 boltdb_shipper:
 active_index_directory: /loki/boltdb-shipper-active
 cache_location: /loki/boltdb-shipper-cache

limits_config:
 enforce_metric_name: false
 reject_old_samples: true
 reject_old_samples_max_age: 168h

Promtail Configuration

Promtail is honestly annoying to configure compared to Filebeat. Filebeat's YAML is straightforward — you point it at files and it ships them. Promtail's config has this label-centric model where __path__ is technically a label, not a path directive, and the glob behavior doesn't always match what you'd expect. I spent an hour wondering why no logs appeared until I realized my path had /var/log/nginx/*.log but the actual files were in /var/log/nginx/access.log and error.log separately. Filebeat would have just worked.

Create promtail-config.yaml:

server:
 http_listen_port: 9080
 grpc_listen_port: 0

positions:
 filename: /tmp/positions.yaml

clients:
 - url: http://loki:3100/loki/api/v1/push

scrape_configs:
 - job_name: system
 static_configs:
 - targets:
 - localhost
 labels:
 job: varlogs
 __path__: /var/log/*.log
 
 - job_name: containers
 static_configs:
 - targets:
 - localhost 
 labels:
 job: docker
 __path__: /var/lib/docker/containers/*/*log

Start the Stack

docker compose up -d

Configure Grafana

  1. Open Grafana at http://localhost:3000
  2. Login (admin/admin by default)
  3. Go to Configuration → Data Sources → Add data source
  4. Select Loki
  5. URL: http://loki:3100
  6. Save & test

Querying Logs (LogQL)

Loki uses LogQL, similar to PromQL:

# All logs from a job
{job="varlogs"}

# Filter by content
{job="varlogs"} |= "error"

# Exclude patterns
{job="varlogs"} != "debug"

# Regex matching
{job="varlogs"} |~ "fail(ed|ure)"

# Parse and filter
{job="nginx"} | json | status >= 400

Labels Are Key

Loki only indexes labels, so the labels you pick determine how fast your queries run:

  • job - What service/application
  • host - Which server
  • environment - prod/staging/dev

Don't use high-cardinality labels (like user IDs). That explodes storage.

Retention

In loki-config.YAML:

compactor:
 working_directory: /loki/compactor
 shared_store: filesystem
 retention_enabled: true
 retention_delete_delay: 2h

limits_config:
 retention_period: 744h # 31 days

Alerting

Loki can send alerts based on log patterns:

groups:
 - name: error-alerts
 rules:
 - alert: HighErrorRate
 expr: |
 sum(rate({job="myapp"} |= "error" [5m])) > 10
 for: 5m
 labels:
 severity: critical

Production Tips

  • I set retention to 30 days — anything longer and my 256GB disk filled up within a month
  • If you're storing more than 5GB/day, move chunk storage to S3 or MinIO. Local filesystem won't keep up
  • Point Prometheus at Loki's /metrics endpoint. I missed a disk-full event because I wasn't monitoring Loki itself
  • Read/write path separation only matters past ~50GB/day. Below that, single-binary is fine

ELK vs Loki:

RAM: ELK needed 4GB minimum on my VPS (Elasticsearch 2GB heap + Kibana + Logstash). Loki + Grafana + Promtail together use around 530MB.

Query speed: Elasticsearch is faster for full-text search across large datasets. Loki is fast for label-filtered queries but noticeably slower when scanning unindexed content across wide time ranges.

Setup time: ELK took me most of a weekend to get right, between Elasticsearch tuning, Logstash pipelines, and Kibana dashboards. Loki was running in about 45 minutes with Docker Compose.

Cost: ELK forced me onto a bigger VPS ($24/month). Loki fits on a $6/month VPS alongside other services.

Where Loki falls short:

Full-text search is genuinely worse. If you need to search for arbitrary substrings across all your logs without knowing which service produced them, Elasticsearch will return results in seconds where Loki might take 30+ seconds or time out entirely. Loki assumes you know roughly where to look (which job, which time range) before you query.

Complex aggregations are limited. ELK lets you do things like "show me the top 10 IP addresses by request count, broken down by hour, filtered by status code" in a single Kibana visualization. LogQL can handle basic rates and counts, but multi-dimensional aggregations either require awkward workarounds or just aren't possible. If your use case is log analytics rather than log monitoring, ELK is still the better tool.

Right now Loki is using 380MB on the same VPS that couldn't run Elasticsearch. Grafana adds another 150MB. Total: 530MB for logs + dashboards. ELK wanted 4GB minimum. Eight services push about 2GB of logs per day into it and the VPS still has headroom for other things. That was never true with Elasticsearch.

💬 Comments