Prometheus & Grafana

In Production we use few tools for Monitoring an alerting

  1. Sentry: for crash monitoring on Python and Django side

  2. Crashlytics: for crash monitoring on Mobile end of things

Historically we’ve used Nagios but we now prefer Prometheus and Grafana.

Prometheus is an open-source software project written in Go that is used to record real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting.

Key concepts to understand are:

  1. Prometheus as a core is time series database, stores bunch of metrics.

  2. Node Exporter is what is responsible for collecting metrics from nodes and pushing it off to prometheus

Installing Prometheus

  1. Create user for prometheus and node_exporter

sudo useradd --no-create-home --shell /bin/false prometheus
sudo useradd --no-create-home --shell /bin/false node_exporter
  1. Create prometheus directories

sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
  1. Change owner of directories created

sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
  1. Download latest prometheus binary for your operating system.

wget https://github.com/prometheus/prometheus/releases/download/v2.9.2/prometheus-2.9.2.linux-amd64.tar.gz
  1. Extract compressed file

tar -zxvf prometheus-2.9.2.linux-amd64.tar.gz
  1. Copy prometheus binary to /usr/local/bin

sudo cp prometheus-2.9.2.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.9.2.linux-amd64/promtool /usr/local/bin/
  1. Change binary owner to prometheus

sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
  1. Copy prometheus config to /etc/prometheus

sudo cp -r prometheus-2.9.2.linux-amd64/consoles /etc/prometheus
sudo cp -r prometheus-2.9.2.linux-amd64/console_libraries /etc/prometheus
  1. Change config owner to prometheus

sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
  1. Create prometheus config file and add following config

sudo nano /etc/prometheus/prometheus.yml

Update the file to contain

global:
scrape_interval: 15s

scrape_configs:
- job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
    - targets: ['localhost:9090']
  1. Change owner of config file

sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
  1. Add prometheus to systemd server. Create service file and add following lines

sudo nano /etc/systemd/system/prometheus.service

Update the file to contain

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
  1. Reload daemon and start prometheus service

sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl status prometheus

Setting up Node Exporter

To recap, Node Exporter is a Prometheus exporter for hardware and OS metrics with plug-able metric collectors. It allows to measure various machine resources such as memory, disk and CPU utilization.

  1. Download prometheus node exporter binary

wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz
  1. Extract compressed file

tar -zxvf node_exporter-0.17.0.linux-amd64.tar.gz
  1. Copy binary to /usr/local/bin and change owner

sudo cp node_exporter-0.17.0.linux-amd64/node_exporter /usr/local/bin
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
  1. Add node exporter to systemd service. Create service file and add following lines

sudo nano /etc/systemd/system/node_exporter.service

Update the file to contain

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
  1. Reload daemon and start prometheus service

sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl status node_exporter
  1. Add node exporter config to prometheus

sudo nano /etc/prometheus/prometheus.yml

Append following lines to prometheus.yml

- job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
    - targets: ['localhost:9100']
  1. Restart prometheus service

sudo systemctl restart prometheus
sudo systemctl status prometheus

Python Integration

Integrating prometheus with python app is straight forward and and simple.

First of all install prometheus python client library using following command.

pip install prometheus_client

consider following python app.

Create file awesome_python_app.py and add following content.

from prometheus_client import start_http_server, Counter, Gauge
import random
import time

# Initialize required metrics
MY_COUNTER = Counter("total_number_of_requests", "Total number of requests processed")
MY_GAUGE = Gauge("request_processing_seconds", 'Time spent processing request')

# function which process your request
def process_request(t):
    """A dummy function that takes some time."""
    # increment counter each time function gets called
    MY_COUNTER.inc()

    # set time taken to process the request
    MY_GAUGE.set(t)

    # sleep for some time
    time.sleep(t)

if __name__ == '__main__':
    # prometheus client library provides start_http_server which runs on different thread
    # Start up the server to expose the metrics.
    start_http_server(8000)

    # Generate some requests.
    while True:
        process_request(random.random())

Run above created file using following command. Open browser and goto url http://localhost:8000 you will see prometheus metrics generated for each request

Golang Integration

gRPC for Go has support for Interceptors, {i.e. middleware that is executed before the request is passed onto the user’s application logic}. Here we can integrate prometheus monitoring. Consider following example we’ll see step by step process of integrating prometheus monitoring.

Basic directory structure of gRPC app.

client/
    client.go
protobuf/
    service.pb.go
    service.proto
server/
    server.go

Following is initial content of server.go

package main

import (
    "context"
    "fmt"
    "log"

    "google.golang.org/grpc"

    pb "github.com/go-grpc-prometheus/examples/grpc-server-with-prometheus/protobuf"
)

// DemoServiceServer defines a Server.
type DemoServiceServer struct{}

func newDemoServer() *DemoServiceServer {
    return &DemoServiceServer{}
}

// SayHello implements a interface defined by protobuf.
func (s *DemoServiceServer) SayHello(ctx context.Context, request *pb.HelloRequest) (*pb.HelloResponse, error) {
    return &pb.HelloResponse{Message: fmt.Sprintf("Hello %s", request.Name)}, nil
}

// NOTE: Graceful shutdown is missing. Don't use this demo in your production setup.
func main() {
    // Listen an actual port.
    lis, err := net.Listen("tcp", fmt.Sprintf(":%d", 9093))
    if err != nil {
        log.Fatalf("failed to listen: %v", err)
    }
    defer lis.Close()

    // Create a HTTP server for prometheus.
    httpServer := &http.Server{Handler: promhttp.HandlerFor(reg, promhttp.HandlerOpts{}), Addr: fmt.Sprintf("0.0.0.0:%d", 9092)}

    // Create a gRPC Server with gRPC interceptor.
    grpcServer := grpc.NewServer()

    // Create a new api server.
    demoServer := newDemoServer()

    // Register your service.
    pb.RegisterDemoServiceServer(grpcServer, demoServer)

    // Start your gRPC server.
    log.Fatal(grpcServer.Serve(lis))
}

Following is content of protobuf file service.proto

syntax="proto3";

package proto;

service DemoService {
    rpc SayHello(HelloRequest) returns (HelloResponse) {}
}

message HelloRequest {
    string name = 1;
}

message HelloResponse {
    string message = 1;
}

service.pb.go is compiled protobuf file. Following is content of client.go

package main

import (
    "bufio"
    "context"
    "fmt"
    "log"
    "os"
    "strings"
    "time"

    "google.golang.org/grpc"

    pb "github.com/go-grpc-prometheus/examples/grpc-server-with-prometheus/protobuf"
)

func main() {
    // Create a insecure gRPC channel to communicate with the server.
    conn, err := grpc.Dial(
        fmt.Sprintf("localhost:%v", 9093), grpc.WithInsecure(),
    )
    if err != nil {
        log.Fatal(err)
    }

    defer conn.Close()

    // Create a HTTP server for prometheus.
    httpServer := &http.Server{Handler: promhttp.HandlerFor(reg, promhttp.HandlerOpts{}), Addr: fmt.Sprintf("0.0.0.0:%d", 9094)}

    // Create a gRPC server client.
    client := pb.NewDemoServiceClient(conn)
    fmt.Println("Start to call the method called SayHello every 3 seconds")
    go func() {
        for {
            // Call “SayHello” method and wait for response from gRPC Server.
            _, err := client.SayHello(context.Background(), &pb.HelloRequest{Name: "Test"})
            if err != nil {
                log.Printf("Calling the SayHello method unsuccessfully. ErrorInfo: %+v", err)
                log.Printf("You should to stop the process")
                return
            }
            time.Sleep(3 * time.Second)
        }
    }()
    scanner := bufio.NewScanner(os.Stdin)
    fmt.Println("You can press n or N to stop the process of client")
    for scanner.Scan() {
        if strings.ToLower(scanner.Text()) == "n" {
            os.Exit(0)
        }
    }
}

As shown in above files we’ve basic gRPC app with service called SayHello. Lets integrate prometheus on server side and client side. First of all add prometheus dependency

Type following command in your terminal it will install go-grpc-prometheus package

go get "github.com/grpc-ecosystem/go-grpc-prometheus"

On Server side import prometheus package and create grpc server with prometheus interceptor. Create new customized metric counter, register metric and use this metric to increment count in every service call. Following is server.go looks like after adding prometheus metric.

package main

import (
    "context"
    "fmt"
    "log"
    "net"
    "net/http"

    "google.golang.org/grpc"

    "github.com/grpc-ecosystem/go-grpc-prometheus"
    pb "github.com/grpc-ecosystem/go-grpc-prometheus/examples/grpc-server-with-prometheus/protobuf"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

// DemoServiceServer defines a Server.
type DemoServiceServer struct{}

func newDemoServer() *DemoServiceServer {
    return &DemoServiceServer{}
}

// SayHello implements a interface defined by protobuf.
func (s *DemoServiceServer) SayHello(ctx context.Context, request *pb.HelloRequest) (*pb.HelloResponse, error) {
    customizedCounterMetric.WithLabelValues(request.Name).Inc()
    return &pb.HelloResponse{Message: fmt.Sprintf("Hello %s", request.Name)}, nil
}

var (
    // Create a metrics registry.
    reg = prometheus.NewRegistry()

    // Create some standard server metrics.
    grpcMetrics = grpc_prometheus.NewServerMetrics()

    // Create a customized counter metric.
    customizedCounterMetric = prometheus.NewCounterVec(prometheus.CounterOpts{
        Name: "demo_server_say_hello_method_handle_count",
        Help: "Total number of RPCs handled on the server.",
    }, []string{"name"})
)

func init() {
    // Register standard server metrics and customized metrics to registry.
    reg.MustRegister(grpcMetrics, customizedCounterMetric)
    customizedCounterMetric.WithLabelValues("Test")
}

// NOTE: Graceful shutdown is missing. Don't use this demo in your production setup.
func main() {
    // Listen an actual port.
    lis, err := net.Listen("tcp", fmt.Sprintf(":%d", 9093))
    if err != nil {
        log.Fatalf("failed to listen: %v", err)
    }
    defer lis.Close()

    // Create a HTTP server for prometheus.
    httpServer := &http.Server{Handler: promhttp.HandlerFor(reg, promhttp.HandlerOpts{}), Addr: fmt.Sprintf("0.0.0.0:%d", 9092)}

    // Create a gRPC Server with gRPC interceptor.
    grpcServer := grpc.NewServer(
        grpc.StreamInterceptor(grpcMetrics.StreamServerInterceptor()),
        grpc.UnaryInterceptor(grpcMetrics.UnaryServerInterceptor()),
    )

    // Create a new api server.
    demoServer := newDemoServer()

    // Register your service.
    pb.RegisterDemoServiceServer(grpcServer, demoServer)

    // Initialize all metrics.
    grpcMetrics.InitializeMetrics(grpcServer)

    // Start your http server for prometheus.
    go func() {
        if err := httpServer.ListenAndServe(); err != nil {
            log.Fatal("Unable to start a http server.")
        }
    }()

    // Start your gRPC server.
    log.Fatal(grpcServer.Serve(lis))
}

Let’s integrate same prometheus metric on client side. We have to create customized metric, register that metric and use that metric to increment number of response from server.

Following is client.go looks like after adding prometheus metric.

package main

import (
    "bufio"
    "context"
    "fmt"
    "log"
    "net/http"
    "os"
    "strings"
    "time"

    "google.golang.org/grpc"

    "github.com/grpc-ecosystem/go-grpc-prometheus"
    pb "github.com/grpc-ecosystem/go-grpc-prometheus/examples/grpc-server-with-prometheus/protobuf"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
    // Create a metrics registry.
    reg := prometheus.NewRegistry()
    // Create some standard client metrics.
    grpcMetrics := grpc_prometheus.NewClientMetrics()
    // Register client metrics to registry.
    reg.MustRegister(grpcMetrics)
    // Create a insecure gRPC channel to communicate with the server.
    conn, err := grpc.Dial(
        fmt.Sprintf("localhost:%v", 9093),
        grpc.WithUnaryInterceptor(grpcMetrics.UnaryClientInterceptor()),
        grpc.WithStreamInterceptor(grpcMetrics.StreamClientInterceptor()),
        grpc.WithInsecure(),
    )
    if err != nil {
        log.Fatal(err)
    }

    defer conn.Close()

    // Create a HTTP server for prometheus.
    httpServer := &http.Server{Handler: promhttp.HandlerFor(reg, promhttp.HandlerOpts{}), Addr: fmt.Sprintf("0.0.0.0:%d", 9094)}

    // Start your http server for prometheus.
    go func() {
        if err := httpServer.ListenAndServe(); err != nil {
            log.Fatal("Unable to start a http server.")
        }
    }()

    // Create a gRPC server client.
    client := pb.NewDemoServiceClient(conn)
    fmt.Println("Start to call the method called SayHello every 3 seconds")
    go func() {
        for {
            // Call “SayHello” method and wait for response from gRPC Server.
            _, err := client.SayHello(context.Background(), &pb.HelloRequest{Name: "Test"})
            if err != nil {
                log.Printf("Calling the SayHello method unsuccessfully. ErrorInfo: %+v", err)
                log.Printf("You should to stop the process")
                return
            }
            time.Sleep(3 * time.Second)
        }
    }()
    scanner := bufio.NewScanner(os.Stdin)
    fmt.Println("You can press n or N to stop the process of client")
    for scanner.Scan() {
        if strings.ToLower(scanner.Text()) == "n" {
            os.Exit(0)
        }
    }
}

Run both files server.go and client.go, open browser and goto following urls you will see prometheus metrics for both server and client.

  1. Server Metrics URL : http://localhost:9093/metrics

  2. Client Metrics URL : http://localhost:9094/metrics