Telegraf + InfluxDB + Grafana

Problem

My server keeps shutting down like once every week, and I had no idea why for the past 2 months. It was overheating… so putting it out into a vented place solved that issue.

But I wanted to be alerted of this problem before it even happens.

Solution

Setup a (Telegraf + InfluxDB + Grafana) stack.

  • InfluxDB - is a time-series database
  • Telegraf - is a binary running on the server/computer that will send temp data to InfluxDB
  • Grafana - will take data from InfluxDB and display it nicely (and also alert me when temp is too high)

ToC

InfluxDB Setup

Since I have a Proxmox Cluster, I will be using Proxmox Helper Scripts to set up an InfluxDB server.

InfluxDB can easily be setup with the following command:

bash -c "$(curl -fsSL https://raw.githubusercontent.com/community-scripts/ProxmoxVE/main/ct/influxdb.sh)"

Once installed, open the InfluxDB webpage via the URL given from the command output.

Next, we will need to create an organization, bucket and API token.

Create an organization & bucket:

alt

alt

Create an API token. Once created copy token value for future use:

alt

Telegraf Setup

On your server/computer that you want to monitor:

Install lsb-release

sudo apt-get update
sudo apt-get install lsb-release

Install Telegraf

curl -fsSL https://repos.influxdata.com/influxdata-archive_compat.key | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/influxdata.gpg
echo "deb https://repos.influxdata.com/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/influxdata.list
sudo apt update
sudo apt install telegraf

Modify the telegraf.conf file:

vim /etc/telegraf/telegraf.conf

Find and uncomment the [[inputs.sensors]] section for temperature monitoring:

# Monitor sensors, requires lm-sensors package
[[inputs.sensors]]
## Remove numbers from field names.
## If true, a field name like 'temp1_input' will be changed to 'temp_input'.
# remove_numbers = true

## Timeout is the maximum amount of time that the sensors command can run.
# timeout = "5s"

We will also uncomment the [[inputs.net]] section for network monitoring.

Next, find and modify the InfluxDB configuration like below:

[[outputs.influxdb_v2]]
  ## The URLs of the InfluxDB cluster nodes.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  ##   ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
  urls = ["http://influxdb.lan"]

  ## Token for authentication.
  token = "INFLUX_TOKEN_VALUE_HERE"

  ## Organization is the name of the organization you wish to write to; must exist.
  organization = "marcus-company"

  ## Destination bucket to write into.
  bucket = "telegraf-pve1"

Save and exit.

In order, for [[inputs.sensors]] to work we will need to install lm-sensors.

sudo apt install lm-sensors

Start Telegraf server and verify:

sudo systemctl start telegraf
sudo systemctl enable telegraf
sudo systemctl status telegraf

Go back to InfluxDB webpage and verify the bucket is now filled with data

alt

Grafana Setup

Since I have a Proxmox Cluster, I will be using Proxmox Helper Scripts to set up a Grafana server.

Grafana can easily be setup with the following command:

bash -c "$(curl -fsSL https://raw.githubusercontent.com/community-scripts/ProxmoxVE/main/ct/grafana.sh)"

Once installed, open the Grafana webpage via the URL given from the command output.

Grafana Setup InfluxDB DataSource

Create a connection to InfluxDB

alt

alt

alt

Click save and test.

Grafana Setup Dashboard (Optional)

alt

alt

alt

Paste in the following grafana-dashboard.json into the above input. Then load.

Modify the datasource and bucket variables.

alt

Grafana Setup Alerts

Let’s create a Discord contact point.

alt

alt

Paste in the Discord webhook URL above. To create one, see here.

Click Test and verify a message is displayed in your Discord server.

Then click save.

Next, let’s create an alert rule.

alt

Enter the following values:

alt

You can click Preview alert rule condition.

alt

alt

Then click save.

This was probably overkill for being notified of overheating issues. But I’ll probably use more of InfluxDB and Grafana in the future.