Compare commits

...

55 Commits

Author SHA1 Message Date
vfebvre 43e34f8de8 modification configuration d'alertmanager 2020-12-17 10:21:49 +01:00
vincent 16d7bfa7f2 ajout alerte température CPU 2019-06-07 15:22:30 +02:00
wpetit d9b253b63d Use proxy to scrape Prometheus datasource 2019-04-15 14:18:40 +02:00
wpetit a5fb3de2c0 Fix Grafana install instructions 2019-04-15 14:17:42 +02:00
wpetit 7023209d9b Déploiement d'un dashboard templatisé par défaut 2019-03-08 16:38:22 +01:00
wpetit 31bf813036 Déploiement d'un dashboard Grafana par défaut pour la machine locale 2019-03-08 16:00:59 +01:00
wpetit fc2cfd9102 Correction chemin répertoire de provisioning 2019-03-08 15:40:09 +01:00
wpetit 1a2f7172d9 Ajout d'un fichier de provisioning des datasources par défaut 2019-03-08 15:20:56 +01:00
wpetit 628fdafdea Ajout d'un répertoire de définition des dashboards 2019-03-08 15:02:16 +01:00
wpetit b4fde311a5 Mise à jour instructions d'installation 2019-03-08 13:38:10 +01:00
wpetit e431a4207e Mise à jour version EOLE 2019-03-08 13:37:44 +01:00
wpetit 71787a9cf6 Renommage alertmanager en prometheus-alertmanager 2019-03-08 13:35:20 +01:00
vincent a681d455fe possibilité d'ajouter des clients avec un nom de domaine 2018-07-10 09:59:29 +02:00
vincent 0a3658fa57 ajout chemin personnalisé pour les job 2018-06-26 15:05:24 +02:00
vincent c25b004d3b réglage niveau alert du filesystème 2018-06-26 14:06:37 +02:00
Philippe Caseiro f9cb3d35b1 Adding support for mail sending from smtp gateway 2018-06-26 13:20:08 +02:00
vincent b4ff9a35fd modification gestion smtpTLS et correctif chemin invalide 2018-06-26 12:06:05 +02:00
wpetit dc43a0f26c Add alerting prediction rules 2018-06-25 11:07:56 +02:00
wpetit a306f5ce19 Use system_mail_from when using SMTP system configuration 2018-06-25 11:02:58 +02:00
Philippe Caseiro a420d20d39 Removing problematic option 2018-06-19 17:41:21 +02:00
Philippe Caseiro ad5f6fbc75 Fix typo 2018-06-19 17:34:42 +02:00
Philippe Caseiro bb59a5238c Fix typo 2018-06-13 10:37:14 +02:00
Philippe Caseiro 61155f10f4 Ajout de la possibilité de configurer la rétention des données 2018-06-13 10:33:25 +02:00
Philippe Caseiro a5b6333599 Fixing multiple jobs generation 2018-06-11 15:42:36 +02:00
Philippe Caseiro 37342e8700 Fix diagnose 2018-06-11 09:35:28 +02:00
Philippe Caseiro 33643232d4 Supporting gobal smtp gateway usage 2018-06-11 09:24:40 +02:00
Philippe Caseiro 1013775b1a Adding alert rules file template 2018-06-11 09:06:53 +02:00
Philippe Caseiro e95d6f9e1d Improving dico to be more Prometheus like
We need to be closer to the prometheus way to do things
2018-06-06 16:46:02 +02:00
Philippe Caseiro ab479fd33a Improve configuration flexibility to match prometheus way of doing
things
2018-06-06 16:00:36 +02:00
Philippe Caseiro e3a6295709 Using 127.0.0.1 ... 2018-06-06 15:14:07 +02:00
Philippe Caseiro ad56be504b Fix typo 2018-06-06 14:39:33 +02:00
Philippe Caseiro 0fbffcc6d2 Adding some values for match sources 2018-06-06 09:26:21 +02:00
Philippe Caseiro 4195adfa6e Improving alerting configuration 2018-06-06 09:05:55 +02:00
Philippe Caseiro 5f263995d0 Fixing Variable order 2018-06-05 17:17:11 +02:00
Philippe Caseiro 73689be06c Fix bad variable name 2018-06-05 17:12:17 +02:00
Philippe Caseiro 9faff7988a Fix templateé 2018-06-05 17:07:34 +02:00
Philippe Caseiro 5ab3f20789 Improving alert support 2018-06-05 17:05:51 +02:00
Philippe Caseiro b95d0894d9 This as to be a multi to 2018-06-05 16:56:48 +02:00
Philippe Caseiro 44e3a5c0f7 Group master must be a multi 2018-06-05 16:55:45 +02:00
Philippe Caseiro 0d87cec74a true is not True 2018-06-05 16:55:03 +02:00
Philippe Caseiro 5c16310e5d Adding disable support for alert service 2018-06-05 16:53:11 +02:00
Philippe Caseiro c10edef336 Adding alertmanager support 2018-06-05 16:46:23 +02:00
Philippe Caseiro 598c1d1807 Fix template 2018-06-05 14:27:32 +02:00
Philippe Caseiro 2de724da57 Moving to bash 2 2018-06-05 11:45:05 +02:00
Philippe Caseiro a3def897fa Moving to bash 2018-06-04 17:01:42 +02:00
Philippe Caseiro 3df296b366 Adding datasource 2018-06-04 13:59:45 +02:00
Philippe Caseiro 4a24601bea Try 008 2018-06-04 11:16:18 +02:00
Philippe Caseiro f61a15d609 Try 007 2018-06-04 10:51:53 +02:00
Philippe Caseiro 4bd29eaa04 Try 006bis 2018-06-04 10:48:47 +02:00
Philippe Caseiro 3eaa7feee1 Try 006 2018-06-04 10:47:01 +02:00
Philippe Caseiro d815f156e3 Try 005 2018-06-04 10:42:19 +02:00
Philippe Caseiro 38883c4fca Try 004 2018-06-04 10:36:34 +02:00
Philippe Caseiro 325acbf537 Try 003 2018-06-04 10:33:36 +02:00
Philippe Caseiro f9be660423 Try 002 2018-06-04 10:30:13 +02:00
Philippe Caseiro 638295626b Try 001 2018-06-04 10:28:55 +02:00
15 changed files with 20050 additions and 99 deletions

View File

@ -4,8 +4,8 @@
SOURCE=eole-prometheus
VERSION=0.1
EOLE_VERSION=2.6
EOLE_RELEASE=2.6.2
EOLE_VERSION=2.7
EOLE_RELEASE=2.7.0
PKGAPPS=non
################################

View File

@ -1,41 +1,33 @@
## eole-prometheus
# eole-prometheus
Eolisation de la solution de surveillance Prometheus.
Grafana est pris en charge dans l'eolisation et peut ou non être activé.
L'exporter système (node-exporter) est dans la configuration par défaut (Prométheus se surveille lui même).
### eole-prometheus :
L'exporter système (node-exporter) est dans la configuration par défaut (Prometheus se surveille lui même).
### Installation
1. gen_config
#### Installer `eole-prometheus`
1. Ajouter le dépôt officiel de [Grafana](http://docs.grafana.org/installation/debian/#apt-repository). Dans l'interface `GenConfig`
```
Mode expert > Dépot tiers > Ajouter un dépot
Dépôt officiel Grafana
Libellé du dépot = Dépôt officiel Grafana
Déclaration du dépôt = deb https://packages.grafana.com/oss/deb stable main
Méthode de récupération de la clé = URL de la clé
URL de la clé = https://packages.grafana.com/gpg.key
```
2. Ajouter le dépôt Cadoles. Dans l'interface `GenConfig`
```
Mode expert > Dépot tiers > Ajouter un dépot
Cadoles pour environnement de Dev
Libellé du dépot = Cadoles
Déclaration du dépôt = deb https://vulcain.cadoles.com 2.7.0-dev main
Méthode de récupération de la clé = URL de la clé
URL de la clé = https://vulcain.cadoles.com/cadoles.gpg
```
Mode expert > Dépot tiers > Ajouter un dépot
Cadoles pour environnement de Qualification
Libellé du dépot = Cadoles
Déclaration du dépôt = deb https://vulcain.cadoles.com 2.6.2-staging main
Méthode de récupération de la clé = URL de la clé
URL de la clé = https://vulcain.cadoles.com/cadoles.gpg
```
* Pour ajouter une sonde sur eole :
1. gen_config
```
Mode expert > Dépot tiers > Ajouter un dépot
Cadoles pour environnement de Qualification
Libellé du dépot = Cadoles
Déclaration du dépôt = deb https://vulcain.cadoles.com xenial-staging main
Méthode de récupération de la clé = URL de la clé
URL de la clé = https://vulcain.cadoles.com/cadoles.gpg
```
* Pour ajouter une sonde sur ubuntu xenial:
```
echo "deb https://vulcain.cadoles.com xenial-staging main" > /etc/apt/sources.list.d/cadoles.list
wget -O - https://vulcain.cadoles.com/cadoles.gpg|apt-key add -
```
Il faut ouvrir les ports en fonction des exporters. Tous les exporters n'utilisent pas le même port.
@ -44,4 +36,3 @@ Le paquet eole-prometheus ouvre les ports sur le serveur où Prometheus sera ins
* 9090 pour le serveur prometheus
* 9100 pour la sonde node-exporter
* 3000 pour le serveur Grafana

View File

@ -1,6 +1,7 @@
#!/bin/bash
if [ $(CreoleGet activer_grafana) = "oui" ];then
if [[ $(CreoleGet activer_grafana) == "oui" ]]
then
. /usr/lib/eole/diagnose.sh
EchoGras "*** Accès au serveur grafana"

View File

@ -1,6 +1,7 @@
#!/bin/bash
if [ $(CreoleGet activer_prometheus) = "oui" ];then
if [[ $(CreoleGet activer_prometheus) == "oui" ]]
then
. /usr/lib/eole/diagnose.sh
EchoGras "*** Accès au serveur Prometheus"

View File

@ -1,14 +1,22 @@
<?xml version="1.0" encoding="utf-8"?>
<creole>
<files>
<file filelist='prometheus' name='/etc/prometheus/prometheus.yml' source='prometheus.yml' mkdir='True' rm='True'/>
<file filelist='grafana' name='/etc/grafana/grafana.ini' source='grafana.ini' mkdir='True' rm='True'/>
<file filelist='prometheus' name='/etc/default/prometheus' source='prometheus.defaults' mkdir='True' rm='True'/>
<file filelist='prometheus' name='/etc/prometheus/prometheus.yml' mkdir='True' rm='True'/>
<file filelist='prometheus-alertmanager' name='/etc/prometheus/alertmanager.yml' mkdir='True' rm='True'/>
<file filelist='prometheus-alertmanager' name='/etc/prometheus/rules.d/alert-rules.yml' mkdir='True' rm='True'/>
<file filelist='prometheus-alertmanager' name='/etc/prometheus/rules.d/predict-rules.yml' mkdir='True' rm='True'/>
<file filelist='grafana' name='/etc/grafana/grafana.ini' mkdir='True' rm='True'/>
<file filelist='grafana' name='/etc/grafana/provisioning/dashboards/eole.yml' source='grafana-dashboards.yml' mkdir='True' rm='True'/>
<file filelist='grafana' name='/etc/grafana/provisioning/datasources/eole.yml' source='grafana-datasources.yml' mkdir='True' rm='True'/>
<file filelist='grafana' name='/var/lib/grafana/dashboards/eole.json' source='grafana-node-dashboard.json' mkdir='True' rm='True'/>
<service>prometheus</service>
<service>prometheus-alertmanager</service>
<service>grafana-server</service>
<service_access service='prometheus'>
<port service_accesslist="saLemon">80</port>
<port service_accesslist="saLemon">443</port>
<port service_accesslist="prometheus">9090</port>
<port service_accesslist="prometheus-alertmanager">9093</port>
</service_access>
<service_access service='grafana-server'>
<port service_accesslist="grafana">3000</port>
@ -16,17 +24,23 @@
</files>
<variables>
<family name='services'>
<variable name='activer_prometheus' type='oui/non' description="Activer le service prometheus sur le serveur">
<value>oui</value>
<variable name='activer_prometheus' type='oui/non' description="Activer le service prometheus sur le serveur">
<value>oui</value>
</variable>
</family>
</family>
<family name='prometheus'>
<variable name='activer_grafana' type='oui/non' description="Activer le service grafana sur le serveur">
<value>oui</value>
</variable>
<variable name='activer_grafana' type='oui/non' description="Activer le service grafana sur le serveur">
<value>oui</value>
</variable>
<variable name='activerAlertmanager' type='oui/non' description="Activer le service d'alertes">
<value>oui</value>
</variable>
<variable name='promStorageRetention' type='number' description='Durée de rétention des métriques (en heures)'>
<value>24</value>
</variable>
<variable name='prometheusJobName' type='string' description="Nom du job ajouté au label">
<value>prometheus</value>
</variable>
</variable>
<variable name='prometheusScrapeInterval' type='string' description="Intervalle de récupération des données sur les différents noeuds">
<value>15s</value>
</variable>
@ -36,21 +50,55 @@
<variable name='prometheusScrapeTimeout' type='string' description="Temps d'attente avant que la récupération de données échoue">
<value>10s</value>
</variable>
<variable name='ajout_client_prometheus' type='oui/non' description="Ajouter un nouveau client à Prometheus">
<value>non</value>
<variable name='job_name_node' type='string' description="Nom du job pour les noeuds" mode='expert'>
<value>node</value>
</variable>
<variable name='nouveau_node_exporter' type='string' description="url/IP du nouveau node exporter" multi="True" mandatory='True'/>
<variable name='job_name_node' type='string' description="Nom du job pour les noeuds" mode='expert'>
<value>node</value>
</variable>
<variable name='job_file_config' type='string' description="Emplacement des fichiers de configuration des noeuds" auto_freeze='True' mode='expert' mandatory='True'>
<variable name='job_file_config' type='string' description="Emplacement des fichiers de configuration des noeuds" auto_freeze='True' mode='expert' mandatory='True'>
<value>/etc/prometheus/nodes</value>
</variable>
<variable name='addTargetPrometheus' type='oui/non' description="Ajouter des cibles statiques à Prometheus">
<value>non</value>
</variable>
<!-- Job standard -->
<variable name='prTarg' type='string' description='Nom de la cible prometheus' multi='True'/>
<variable name='prTargIP' type='string' description="Adresse IP ou nom de domaine de la cible prometheus"/>
<variable name='prTargSonde' type='string' description="Sonde a utiliser pour ce client">
<value>Node Exporter</value>
</variable>
</family>
<family name="grafana">
<variable name='grafana_domain' type='string' description="Nom de Domaine ou IP pour accèder à l'interface Grafana" mandatory='True'>
<value>localhost</value>
</variable>
<family name='Jobs prometheus'>
<variable name='promJobs' type='string' description="Nom du job prometheus" multi='True'/>
<variable name='honorLabels' type='oui/non' description='Garder les labels en cas de conflit' mode='expert'>
<value>oui</value>
</variable>
<variable name='scrpInterval' type='number' description="Interval d'intérogation de la sonde (en secondes)">
<value>15</value>
</variable>
<variable name='scrpTimeout' type='number' description="Délais d'attente maximum lors de l'interrogation d'une sonde">
<value>10</value>
</variable>
<variable name='scrpScheme' type='string' description="Protocole à utiliser pour l'interrogation de la sonde">
<value>http</value>
</variable>
<variable name='scrpMetricPath' type='string' description="Chemin d'accès de la ressource">
<value>/metrics</value>
</variable>
<variable name='addPrOpenTarg' type='oui/non' description="Ajouter des cibles statiques pour les jobs personnalisé">
<value>non</value>
</variable>
<!-- Job libre -->
<variable name='prOpenTarg' type='string' description='Nom de la cible personnalisé prometheus' multi='True'/>
<variable name='prOpenTargJob' type='string' description='Nom du job de rattachement de la cible'/>
<variable name='prOpenTargIP' type='string' description="Adresse IP ou nom de domaine de la cible"/>
<variable name='prOpenTargPort' type='number' description="Port d'écoute de la sonde"/>
</family>
<family name="grafana">
<variable name='grafana_domain' type='string' description="Nom de Domaine ou IP pour accèder à l'interface Grafana" mandatory='True'>
<value>localhost</value>
</variable>
<variable name='grafana_session_max_lifetime' type='string' description="Durée avant déconnexion de l'interface Grafana (en seconde)">
<value>86400</value>
</variable>
@ -63,23 +111,170 @@
<variable name='grafana_auth_anonymous' type='string' description="Activer l'accès aux utilisateurs non enregistrés">
<value>false</value>
</variable>
<variable name='grafanaRootURL' type='string' description='Url publique de grafana (avec http:// ou https://)' mode='expert'/>
</family>
<family name="alertes prometheus">
<variable name='alSMTPUseSys' type='oui/non' description="Utiliser la passerelle SMTP du système ?">
<value>non</value>
</variable>
<variable name='alSMTPHost' type='string' description="Adresse du serveur SMTP pour l'envois des alertes"/>
<variable name='alSMTPPort' type='string' description="Port d'écoute du serveur SMTP pour l'envois des alertes"/>
<variable name='alFrom' type='string' description="Adresse d'origine des emails d'alerte"/>
<variable name='alSMTPTLS' type='oui/non' description="Utiliser STARTTLS">
<value>non</value>
</variable>
<variable name='alSMTPAuth' type='oui/non' description="Authentification requise sur le serveur SMTP ?">
<value>non</value>
</variable>
<variable name='alSMTPUser' type='string' description="Utilisateur SMTP"/>
<variable name='alSMTPPass' type='string' description="Mot de passe"/>
<variable name='alReceiver' type='string' description="Nom du destinataire" multi='True'/>
<variable name='alReceiverEmail' type='string' description="Adresse email du destinataire"/>
<variable name='alDefaultReceiver' type='string' description='Nom du destinataire par défaut'/>
<variable name='alRoute' type='string' description="Nom de la rêgle de distribution des alertes" multi="True"/>
<variable name='alRouteMatchSource' type='string' description='Source de correspondance'/>
<variable name='alRouteMatchValue' type='string' description='Valeur attendue'/>
<variable name='alRouteMatchReceiver' type='string' description="Nom du destinataire de l'alerte"/>
<variable name='alRouteRegxp' type='string' description="Rêgle de distribution des alertes" multi="True"/>
<variable name='alRouteMatchRegExpSource' type='string' description='Source de correspondance'/>
<variable name='alRouteMatchRegExp' type='string' description='Expression régulière'/>
<variable name='alRouteMatchRegxpRecv' type='string' description="Nom du destinataire de l'alerte (regxp)"/>
<variable name='alSubRoute' type='string' description="Nom de la rêgle maitresse" multi='True'/>
<variable name='alSubRouteMatchSource' type='string' description='Source de correspondance'/>
<variable name='alSubRouteMatchValue' type='string' description='Valeur attendue'/>
<variable name='alSubRouteMatchReceiver' type='string' description="Nom du destinataire de l'alerte"/>
</family>
<separators>
<separator name='activer_grafana'>Services complèmentairse</separator>
<separator name='prometheusJobName'>Configuration du serveur Prometheus</separator>
<separator name='job_name_node'>Configuration des jobs standards</separator>
<separator name='alSMTPHost'>Configuration SMTP pour l'envois des alertes</separator>
<separator name='alReceiver'>Destinatires</separator>
<separator name='alRoute'>Rêgles de distribution simples</separator>
<separator name='alRouteRegxp'>Rêgles de distribution regexp</separator>
<separator name='alSubRoute'>Sous-rêgles de distribution</separator>
</separators>
</variables>
<constraints>
<group master='alReceiver'>
<slave>alReceiverEmail</slave>
</group>
<group master='promJobs'>
<slave>scrpInterval</slave>
<slave>scrpTimeout</slave>
<slave>honorLabels</slave>
<slave>scrpScheme</slave>
<slave>scrpMetricPath</slave>
</group>
<group master='alRoute'>
<slave>alRouteMatchSource</slave>
<slave>alRouteMatchValue</slave>
<slave>alRouteMatchReceiver</slave>
</group>
<group master='alRouteRegxp'>
<slave>alRouteMatchRegExpSource</slave>
<slave>alRouteMatchRegExp</slave>
<slave>alRouteMatchRegxpRecv</slave>
</group>
<group master='alSubRoute'>
<slave>alSubRouteMatchSource</slave>
<slave>alSubRouteMatchValue</slave>
<slave>alSubRouteMatchReceiver</slave>
</group>
<group master='prTarg'>
<slave>prTargIP</slave>
<slave>prTargSonde</slave>
</group>
<group master='prOpenTarg'>
<slave>prOpenTargIP</slave>
<slave>prOpenTargPort</slave>
<slave>prOpenTargJob</slave>
</group>
<check name='valid_enum' target='prTargSonde'>
<param>['Node Exporter']</param>
</check>
<check name='valid_enum' target='scrpScheme'>
<param>['http','https']</param>
</check>
<check name='valid_enum' target='alRouteMatchSource'>
<param>['','service','severity']</param>
</check>
<check name='valid_enum' target='alRouteMatchRegExpSource'>
<param>['','service','severity']</param>
</check>
<check name='valid_enum' target='alSubRouteMatchSource'>
<param>['','service','severity']</param>
</check>
<condition name='disabled_if_in' source='alSMTPUseSys'>
<param>oui</param>
<target type='variable'>alSMTPUser</target>
<target type='variable'>alSMTPPass</target>
<target type='variable'>alSMTPPort</target>
<target type='variable'>alSMTPTLS</target>
<target type='variable'>alSMTPHost</target>
<target type='variable'>alSMTPAuth</target>
</condition>
<condition name='disabled_if_in' source='alSMTPAuth'>
<param>non</param>
<target type='variable'>alSMTPUser</target>
<target type='variable'>alSMTPPass</target>
</condition>
<condition name='disabled_if_in' source='activer_prometheus'>
<param>non</param>
<target type='family'>prometheus</target>
<target type='family'>alertes prometheus</target>
<target type='filelist'>prometheus</target>
<target type='variable'>activer_grafana</target>
</condition>
<condition name='disabled_if_in' source='activer_grafana'>
<param>non</param>
<target type='family'>grafana</target>
<target type='filelist'>grafana</target>
</condition>
<condition name='disabled_if_in' source='ajout_client_prometheus'>
<condition name='disabled_if_in' source='activerAlertmanager'>
<param>non</param>
<target type='variable'>nouveau_node_exporter</target>
<target type='family'>alertes prometheus</target>
<target type='filelist'>prometheus-alertmanager</target>
<target type='service_accesslist'>prometheus-alertmanager</target>
</condition>
<condition name='disabled_if_in' source='addTargetPrometheus'>
<param>non</param>
<target type='variable'>prTarg</target>
<target type='variable'>prTargIP</target>
<target type='variable'>prTargSonde</target>
</condition>
<condition name='disabled_if_in' source='addPrOpenTarg'>
<param>non</param>
<target type='variable'>prOpenTarg</target>
<target type='variable'>prOpenTargIP</target>
<target type='variable'>prOpenTargPort</target>
</condition>
</constraints>
<help>

View File

@ -1,12 +0,0 @@
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
User=root
Restart=on-failure
ExecStart=/usr/bin/prometheus --config.file=/etc/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target

50
tmpl/alert-rules.yml Normal file
View File

@ -0,0 +1,50 @@
#
# Alert Rules
#
groups:
- name: EoleRules
rules:
# Instance is Down
- alert: JobInstanceDown
expr: up == 0
for: 1m
annotations:
DESCRIPTION: Job {{ $labels.job }} instance {{ $labels.instance }} is down.
SUMMARY: Job instance is down
# Heavy CPU usage
- alert: cpu_threshold_exceeded
expr: (100 * (1 - avg by(instance) (irate(node_cpu{job="%%{job_name_node}",mode="idle"}[5m]))))
> 80
annotations:
description: This device's cpu usage has exceeded the threshold with a value
of {{ $value }}.
summary: Instance {{ $labels.instance }} CPU usage is dangerously high
# Heavy Memory usage
- alert: mem_threshold_exceeded
expr: (node_memory_MemFree{job="%%{job_name_node}"} + node_memory_Cached{job="%%{job_name_node}"} + node_memory_Buffers{job="%%{job_name_node}"})
/ 1e+06 < 80
annotations:
description: This device's memory usage has exceeded the threshold with a value
of {{ $value }}.
summary: Instance {{ $labels.instance }} memory usage is dangerously high
# Heavy "/" use
- alert: filesystem_threshold_exceeded
expr: node_filesystem_avail{job="%%{job_name_node}",mountpoint="/"} / node_filesystem_size{job="%%{job_name_node}"}
* 100 < 20
annotations:
description: This device's filesystem usage has exceeded the threshold with
a value of {{ $value }}.
summary: Instance {{ $labels.instance }} filesystem usage is dangerously high
# Heavy CPU temperature
- alert: cpu_temp_threshold_exceeded
expr: avg(node_hwmon_temp_celsius{job="node"}) BY (instance)
> 50
annotations:
description: This device's cpu temperature has exceeded the threshold with a value
of {{ $value }}.
summary: Instance {{ $labels.instance }} CPU temperature is dangerously high

142
tmpl/alertmanager.yml Normal file
View File

@ -0,0 +1,142 @@
global:
# The smarthost and SMTP sender used for mail notifications.
%if %%alSMTPUseSys == 'oui'
%if %%tls_smtp == "non"
smtp_smarthost: '%%exim_relay_smtp:25'
%elif %%tls_smtp == "port 25"
smtp_smarthost: '%%exim_relay_smtp:25'
smtp_require_tls: true
%else
smtp_smarthost: '%%exim_relay_smtp:465'
smtp_require_tls: true
%end if
smtp_from: '%%system_mail_from'
%else
smtp_smarthost: '%%alSMTPHost:%%alSMTPPort'
smtp_from: '%%alFrom'
%if %%getVar('alSMTPAuth','non') == 'oui'
smtp_auth_username: '%%alSMTPUser'
smtp_auth_password: 'alSMTPPass'
%end if
%if %%getVar('alSMTPTLS','non') == 'oui'
smtp_require_tls: true
%else
smtp_require_tls: false
%end if
%end if
# The auth token for Hipchat.
#hipchat_auth_token: '1234556789'
# Alternative host for Hipchat.
#hipchat_api_url: 'https://hipchat.foobar.org/'
# The directory from which notification templates are read.
templates:
- '/etc/alertmanager/template/*.tmpl'
# The root route on which each incoming alert enters.
route:
# The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
group_by: ['alertname', 'cluster', 'service']
# When a new group of alerts is created by an incoming alert, wait at
# least 'group_wait' to send the initial notification.
# This way ensures that you get multiple alerts for the same group that start
# firing shortly after another are batched together on the first
# notification.
group_wait: 30s
# When the first notification was sent, wait 'group_interval' to send a batch
# of new alerts that started firing for that group.
group_interval: 5m
# If an alert has successfully been sent, wait 'repeat_interval' to
# resend them.
repeat_interval: 3h
# A default receiver
receiver: %%alDefaultReceiver
# All the above attributes are inherited by all child routes and can
# overwritten on each.
# The child route trees.
routes:
# This routes performs a regular expression match on alert labels to
# catch alerts that are related to a list of services.
%for route in %%getVar('alRouteRegxp',[])
- match_re:
%%{route.alRouteMatchRegExpSource}: %%{route.alRouteMatchRegExp}
receiver: %%route.alRouteMatchRegxpRecv
%if not %%is_empty('alSubRoute')
routes:
%for sroute in %%getVar('alSubRoute',[])
# The service has a sub-route for critical alerts, any alerts
# that do not match, i.e. severity != critical, fall-back to the
# parent node and are sent to 'team-X-mails'
%if %%sroute == %%route
- match:
%%{sroute.alSubRouteMatchSource}: %%alSubRouteMatchValue
receiver: %%alSubRouteMatchReceiver
continue: true
%end if
%end for
%end if
%end for
%for rt in %%getVar('alRoute',[])
- match:
%%{rt.alRouteMatchSource}: %%{rt.alRouteMatchValue}
receiver: %%rt.alRouteMatchReceiver
continue: true
%if not %%is_empty('alSubRoute')
routes:
%for sroute in %%getVar('alSubRoute',[])
%if %%sroute == %%rt
- match:
%%{sroute.alSubRouteMatchSource}: %%{sroute.alSubRouteMatchValue}
receiver: %%sroute.alSubRouteMatchReceiver
continue: true
%end if
%end for
%end if
%end for
# # This route handles all alerts coming from a database service. If there's
# # no team to handle it, it defaults to the DB team.
# - match:
# service: database
# receiver: team-DB-pager
# # Also group alerts by affected database.
# group_by: [alertname, cluster, database]
# routes:
# - match:
# owner: team-X
# receiver: team-X-pager
# - match:
# owner: team-Y
# receiver: team-Y-pager
# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
# Apply inhibition if the alertname is the same.
equal: ['alertname', 'cluster', 'service']
receivers:
%for rcv in %%getVar('alReceiver',[])
- name: '%%rcv'
email_configs:
- to: '%%rcv.alReceiverEmail'
%end for

View File

@ -0,0 +1,10 @@
# # config file version
apiVersion: 1
providers:
- name: 'eole'
orgId: 1
folder: ''
type: file
options:
path: /var/lib/grafana/dashboards

View File

@ -0,0 +1,11 @@
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://%%adresse_ip_eth0:9090
isDefault: true
version: 1
editable: false

File diff suppressed because it is too large Load Diff

View File

@ -21,7 +21,7 @@
;plugins = /var/lib/grafana/plugins
# folder that contains provisioning config files that grafana will apply on startup and while running.
; provisioning = conf/provisioning
provisioning = /etc/grafana/provisioning
#################################### Server ####################################
[server]
@ -40,11 +40,13 @@ domain = %%grafana_domain
# Redirect to correct domain if host header does not match domain
# Prevents DNS rebinding attacks
;enforce_domain = false
enforce_domain = true
# The full public facing url you use in browser, used for redirects and emails
# If you use reverse proxy and sub path specify full url (with sub path)
;root_url = http://localhost:3000
%if %%is_empty('grafanaRootURL')
root_url = %%grafanaRootURL
%end if
# Log web requests
;router_logging = false
@ -299,18 +301,20 @@ enabled = %%grafana_auth_anonymous
#################################### SMTP / Emailing ##########################
[smtp]
;enabled = false
;host = localhost:25
%if %%getVar('activer_exim_relay_smtp','non') == 'oui'
enabled = true
host = %%exim_relay_smtp:25
;user =
# If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;"""
;password =
;cert_file =
;key_file =
;skip_verify = false
;from_address = admin@grafana.localhost
;from_name = Grafana
skip_verify = true
from_address = %%system_mail_from
from_name = Grafana
# EHLO identity in SMTP dialog (defaults to instance_name)
;ehlo_identity = dashboard.example.com
%end if
[emails]
;welcome_email_on_sign_up = false

6
tmpl/predict-rules.yml Normal file
View File

@ -0,0 +1,6 @@
groups:
- name: PredictRules
rules:
- alert: disk_full_within_6_hours
expr: predict_linear(node_filesystem_free{job="%%{job_name_node}",mountpoint="/"}[1h], 6 * 3600) < 0
for: 5m

1
tmpl/prometheus.defaults Normal file
View File

@ -0,0 +1 @@
PROMETHEUS_OPTS='--config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus/data'

View File

@ -5,35 +5,67 @@ global:
scrape_timeout: %%prometheusScrapeTimeout
rule_files:
- "/etc/prometheus/rules.d/*.yml"
- "/etc/prometheus/rules.d/*.yml"
scrape_configs:
- job_name: %%prometheusJobName
- job_name: %%prometheusJobName
honor_labels: true
static_configs:
- targets: ['%%nom_domaine_machine:9090']
- targets: ['%%nom_domaine_machine:9090'%slurp
%if %%getVar('activerSndNodeExpoter','non') == 'oui'
- targets: ['%%nom_domaine_machine:9100']
, '%%nom_domaine_machine:9100'%slurp
%end if
]
- job_name: '%%job_name_node'
file_sd_configs:
- files: [ "%%job_file_config/*.yml" ]
# - files: [ "%%job_file_config/*.yml" ]
%if %%getVar('addTargetPrometheus','non') == 'oui'
static_configs:
%if %%getVar('ajout_client_prometheus','non') == 'oui'
%if not %%is_empty(%%nouveau_node_exporter)
%for %%client_prometheus in %%nouveau_node_exporter
- targets: ['%%client_prometheus:9100']
%end for
%end if
- targets: [ "%%adresse_ip_eth0:9100"%slurp
%for %%cliPr in %%getVar('prTarg',[])
%if %%cliPr.prTargSonde == 'Node Exporter'
, '%%cliPr.prTargIP:9100'%slurp
%end if
%end for
]
%end if
#alerting:
# alertmanagers:
# - scheme: https
# static_configs:
# - targets:
# - "1.2.3.4:9093"
# - "1.2.3.5:9093"
# - "1.2.3.6:9093"
%for %%job in %%getVar('promJobs', [])
- job_name: '%%job'
%if %%job.honorLabels == 'oui'
honor_labels: true
%else
honor_labels: false
%end if
scrape_interval: %%{job.scrpInterval}s
scrape_timeout: %%{job.scrpTimeout}s
scheme: %%job.scrpScheme
metrics_path: %%job.scrpMetricPath
%set first = True
static_configs:
- targets: [ %slurp
%for %%target in %%getVar('prOpenTarg',[])
%if %%target.prOpenTargJob == %%job
%if %%first
"%%target.prOpenTargIP:%%target.prOpenTargPort"%slurp
%set first = False
%else
, "%%target.prOpenTargIP:%%target.prOpenTargPort"%slurp
%end if
%end if
%end for
]
%end for
%if %%getVar('activerAlertmanager','non') == 'oui'
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "%%nom_domaine_machine:9093"
%end if