Compare commits

...

56 Commits

Author SHA1 Message Date
Laurent Gourvenec 45c97c765b augmentation du seuil d'alerte température CPU 2024-05-30 11:44:13 +02:00
vfebvre 43e34f8de8 modification configuration d'alertmanager 2020-12-17 10:21:49 +01:00
vincent 16d7bfa7f2 ajout alerte température CPU 2019-06-07 15:22:30 +02:00
wpetit d9b253b63d Use proxy to scrape Prometheus datasource 2019-04-15 14:18:40 +02:00
wpetit a5fb3de2c0 Fix Grafana install instructions 2019-04-15 14:17:42 +02:00
wpetit 7023209d9b Déploiement d'un dashboard templatisé par défaut 2019-03-08 16:38:22 +01:00
wpetit 31bf813036 Déploiement d'un dashboard Grafana par défaut pour la machine locale 2019-03-08 16:00:59 +01:00
wpetit fc2cfd9102 Correction chemin répertoire de provisioning 2019-03-08 15:40:09 +01:00
wpetit 1a2f7172d9 Ajout d'un fichier de provisioning des datasources par défaut 2019-03-08 15:20:56 +01:00
wpetit 628fdafdea Ajout d'un répertoire de définition des dashboards 2019-03-08 15:02:16 +01:00
wpetit b4fde311a5 Mise à jour instructions d'installation 2019-03-08 13:38:10 +01:00
wpetit e431a4207e Mise à jour version EOLE 2019-03-08 13:37:44 +01:00
wpetit 71787a9cf6 Renommage alertmanager en prometheus-alertmanager 2019-03-08 13:35:20 +01:00
vincent a681d455fe possibilité d'ajouter des clients avec un nom de domaine 2018-07-10 09:59:29 +02:00
vincent 0a3658fa57 ajout chemin personnalisé pour les job 2018-06-26 15:05:24 +02:00
vincent c25b004d3b réglage niveau alert du filesystème 2018-06-26 14:06:37 +02:00
Philippe Caseiro f9cb3d35b1 Adding support for mail sending from smtp gateway 2018-06-26 13:20:08 +02:00
vincent b4ff9a35fd modification gestion smtpTLS et correctif chemin invalide 2018-06-26 12:06:05 +02:00
wpetit dc43a0f26c Add alerting prediction rules 2018-06-25 11:07:56 +02:00
wpetit a306f5ce19 Use system_mail_from when using SMTP system configuration 2018-06-25 11:02:58 +02:00
Philippe Caseiro a420d20d39 Removing problematic option 2018-06-19 17:41:21 +02:00
Philippe Caseiro ad5f6fbc75 Fix typo 2018-06-19 17:34:42 +02:00
Philippe Caseiro bb59a5238c Fix typo 2018-06-13 10:37:14 +02:00
Philippe Caseiro 61155f10f4 Ajout de la possibilité de configurer la rétention des données 2018-06-13 10:33:25 +02:00
Philippe Caseiro a5b6333599 Fixing multiple jobs generation 2018-06-11 15:42:36 +02:00
Philippe Caseiro 37342e8700 Fix diagnose 2018-06-11 09:35:28 +02:00
Philippe Caseiro 33643232d4 Supporting gobal smtp gateway usage 2018-06-11 09:24:40 +02:00
Philippe Caseiro 1013775b1a Adding alert rules file template 2018-06-11 09:06:53 +02:00
Philippe Caseiro e95d6f9e1d Improving dico to be more Prometheus like
We need to be closer to the prometheus way to do things
2018-06-06 16:46:02 +02:00
Philippe Caseiro ab479fd33a Improve configuration flexibility to match prometheus way of doing
things
2018-06-06 16:00:36 +02:00
Philippe Caseiro e3a6295709 Using 127.0.0.1 ... 2018-06-06 15:14:07 +02:00
Philippe Caseiro ad56be504b Fix typo 2018-06-06 14:39:33 +02:00
Philippe Caseiro 0fbffcc6d2 Adding some values for match sources 2018-06-06 09:26:21 +02:00
Philippe Caseiro 4195adfa6e Improving alerting configuration 2018-06-06 09:05:55 +02:00
Philippe Caseiro 5f263995d0 Fixing Variable order 2018-06-05 17:17:11 +02:00
Philippe Caseiro 73689be06c Fix bad variable name 2018-06-05 17:12:17 +02:00
Philippe Caseiro 9faff7988a Fix templateé 2018-06-05 17:07:34 +02:00
Philippe Caseiro 5ab3f20789 Improving alert support 2018-06-05 17:05:51 +02:00
Philippe Caseiro b95d0894d9 This as to be a multi to 2018-06-05 16:56:48 +02:00
Philippe Caseiro 44e3a5c0f7 Group master must be a multi 2018-06-05 16:55:45 +02:00
Philippe Caseiro 0d87cec74a true is not True 2018-06-05 16:55:03 +02:00
Philippe Caseiro 5c16310e5d Adding disable support for alert service 2018-06-05 16:53:11 +02:00
Philippe Caseiro c10edef336 Adding alertmanager support 2018-06-05 16:46:23 +02:00
Philippe Caseiro 598c1d1807 Fix template 2018-06-05 14:27:32 +02:00
Philippe Caseiro 2de724da57 Moving to bash 2 2018-06-05 11:45:05 +02:00
Philippe Caseiro a3def897fa Moving to bash 2018-06-04 17:01:42 +02:00
Philippe Caseiro 3df296b366 Adding datasource 2018-06-04 13:59:45 +02:00
Philippe Caseiro 4a24601bea Try 008 2018-06-04 11:16:18 +02:00
Philippe Caseiro f61a15d609 Try 007 2018-06-04 10:51:53 +02:00
Philippe Caseiro 4bd29eaa04 Try 006bis 2018-06-04 10:48:47 +02:00
Philippe Caseiro 3eaa7feee1 Try 006 2018-06-04 10:47:01 +02:00
Philippe Caseiro d815f156e3 Try 005 2018-06-04 10:42:19 +02:00
Philippe Caseiro 38883c4fca Try 004 2018-06-04 10:36:34 +02:00
Philippe Caseiro 325acbf537 Try 003 2018-06-04 10:33:36 +02:00
Philippe Caseiro f9be660423 Try 002 2018-06-04 10:30:13 +02:00
Philippe Caseiro 638295626b Try 001 2018-06-04 10:28:55 +02:00
15 changed files with 20050 additions and 99 deletions

View File

@ -4,8 +4,8 @@
SOURCE=eole-prometheus SOURCE=eole-prometheus
VERSION=0.1 VERSION=0.1
EOLE_VERSION=2.6 EOLE_VERSION=2.7
EOLE_RELEASE=2.6.2 EOLE_RELEASE=2.7.0
PKGAPPS=non PKGAPPS=non
################################ ################################

View File

@ -1,41 +1,33 @@
## eole-prometheus # eole-prometheus
Eolisation de la solution de surveillance Prometheus. Eolisation de la solution de surveillance Prometheus.
Grafana est pris en charge dans l'eolisation et peut ou non être activé. Grafana est pris en charge dans l'eolisation et peut ou non être activé.
L'exporter système (node-exporter) est dans la configuration par défaut (Prométheus se surveille lui même).
### eole-prometheus : L'exporter système (node-exporter) est dans la configuration par défaut (Prometheus se surveille lui même).
### Installation
1. gen_config #### Installer `eole-prometheus`
1. Ajouter le dépôt officiel de [Grafana](http://docs.grafana.org/installation/debian/#apt-repository). Dans l'interface `GenConfig`
```
Mode expert > Dépot tiers > Ajouter un dépot
Dépôt officiel Grafana
Libellé du dépot = Dépôt officiel Grafana
Déclaration du dépôt = deb https://packages.grafana.com/oss/deb stable main
Méthode de récupération de la clé = URL de la clé
URL de la clé = https://packages.grafana.com/gpg.key
```
2. Ajouter le dépôt Cadoles. Dans l'interface `GenConfig`
```
Mode expert > Dépot tiers > Ajouter un dépot
Cadoles pour environnement de Dev
Libellé du dépot = Cadoles
Déclaration du dépôt = deb https://vulcain.cadoles.com 2.7.0-dev main
Méthode de récupération de la clé = URL de la clé
URL de la clé = https://vulcain.cadoles.com/cadoles.gpg
``` ```
Mode expert > Dépot tiers > Ajouter un dépot
Cadoles pour environnement de Qualification
Libellé du dépot = Cadoles
Déclaration du dépôt = deb https://vulcain.cadoles.com 2.6.2-staging main
Méthode de récupération de la clé = URL de la clé
URL de la clé = https://vulcain.cadoles.com/cadoles.gpg
```
* Pour ajouter une sonde sur eole :
1. gen_config
```
Mode expert > Dépot tiers > Ajouter un dépot
Cadoles pour environnement de Qualification
Libellé du dépot = Cadoles
Déclaration du dépôt = deb https://vulcain.cadoles.com xenial-staging main
Méthode de récupération de la clé = URL de la clé
URL de la clé = https://vulcain.cadoles.com/cadoles.gpg
```
* Pour ajouter une sonde sur ubuntu xenial:
```
echo "deb https://vulcain.cadoles.com xenial-staging main" > /etc/apt/sources.list.d/cadoles.list
wget -O - https://vulcain.cadoles.com/cadoles.gpg|apt-key add -
```
Il faut ouvrir les ports en fonction des exporters. Tous les exporters n'utilisent pas le même port. Il faut ouvrir les ports en fonction des exporters. Tous les exporters n'utilisent pas le même port.
@ -44,4 +36,3 @@ Le paquet eole-prometheus ouvre les ports sur le serveur où Prometheus sera ins
* 9090 pour le serveur prometheus * 9090 pour le serveur prometheus
* 9100 pour la sonde node-exporter * 9100 pour la sonde node-exporter
* 3000 pour le serveur Grafana * 3000 pour le serveur Grafana

View File

@ -1,6 +1,7 @@
#!/bin/bash #!/bin/bash
if [ $(CreoleGet activer_grafana) = "oui" ];then if [[ $(CreoleGet activer_grafana) == "oui" ]]
then
. /usr/lib/eole/diagnose.sh . /usr/lib/eole/diagnose.sh
EchoGras "*** Accès au serveur grafana" EchoGras "*** Accès au serveur grafana"

View File

@ -1,6 +1,7 @@
#!/bin/bash #!/bin/bash
if [ $(CreoleGet activer_prometheus) = "oui" ];then if [[ $(CreoleGet activer_prometheus) == "oui" ]]
then
. /usr/lib/eole/diagnose.sh . /usr/lib/eole/diagnose.sh
EchoGras "*** Accès au serveur Prometheus" EchoGras "*** Accès au serveur Prometheus"

View File

@ -1,14 +1,22 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<creole> <creole>
<files> <files>
<file filelist='prometheus' name='/etc/prometheus/prometheus.yml' source='prometheus.yml' mkdir='True' rm='True'/> <file filelist='prometheus' name='/etc/default/prometheus' source='prometheus.defaults' mkdir='True' rm='True'/>
<file filelist='grafana' name='/etc/grafana/grafana.ini' source='grafana.ini' mkdir='True' rm='True'/> <file filelist='prometheus' name='/etc/prometheus/prometheus.yml' mkdir='True' rm='True'/>
<file filelist='prometheus-alertmanager' name='/etc/prometheus/alertmanager.yml' mkdir='True' rm='True'/>
<file filelist='prometheus-alertmanager' name='/etc/prometheus/rules.d/alert-rules.yml' mkdir='True' rm='True'/>
<file filelist='prometheus-alertmanager' name='/etc/prometheus/rules.d/predict-rules.yml' mkdir='True' rm='True'/>
<file filelist='grafana' name='/etc/grafana/grafana.ini' mkdir='True' rm='True'/>
<file filelist='grafana' name='/etc/grafana/provisioning/dashboards/eole.yml' source='grafana-dashboards.yml' mkdir='True' rm='True'/>
<file filelist='grafana' name='/etc/grafana/provisioning/datasources/eole.yml' source='grafana-datasources.yml' mkdir='True' rm='True'/>
<file filelist='grafana' name='/var/lib/grafana/dashboards/eole.json' source='grafana-node-dashboard.json' mkdir='True' rm='True'/>
<service>prometheus</service> <service>prometheus</service>
<service>prometheus-alertmanager</service>
<service>grafana-server</service> <service>grafana-server</service>
<service_access service='prometheus'> <service_access service='prometheus'>
<port service_accesslist="saLemon">80</port>
<port service_accesslist="saLemon">443</port>
<port service_accesslist="prometheus">9090</port> <port service_accesslist="prometheus">9090</port>
<port service_accesslist="prometheus-alertmanager">9093</port>
</service_access> </service_access>
<service_access service='grafana-server'> <service_access service='grafana-server'>
<port service_accesslist="grafana">3000</port> <port service_accesslist="grafana">3000</port>
@ -16,17 +24,23 @@
</files> </files>
<variables> <variables>
<family name='services'> <family name='services'>
<variable name='activer_prometheus' type='oui/non' description="Activer le service prometheus sur le serveur"> <variable name='activer_prometheus' type='oui/non' description="Activer le service prometheus sur le serveur">
<value>oui</value> <value>oui</value>
</variable> </variable>
</family> </family>
<family name='prometheus'> <family name='prometheus'>
<variable name='activer_grafana' type='oui/non' description="Activer le service grafana sur le serveur"> <variable name='activer_grafana' type='oui/non' description="Activer le service grafana sur le serveur">
<value>oui</value> <value>oui</value>
</variable> </variable>
<variable name='activerAlertmanager' type='oui/non' description="Activer le service d'alertes">
<value>oui</value>
</variable>
<variable name='promStorageRetention' type='number' description='Durée de rétention des métriques (en heures)'>
<value>24</value>
</variable>
<variable name='prometheusJobName' type='string' description="Nom du job ajouté au label"> <variable name='prometheusJobName' type='string' description="Nom du job ajouté au label">
<value>prometheus</value> <value>prometheus</value>
</variable> </variable>
<variable name='prometheusScrapeInterval' type='string' description="Intervalle de récupération des données sur les différents noeuds"> <variable name='prometheusScrapeInterval' type='string' description="Intervalle de récupération des données sur les différents noeuds">
<value>15s</value> <value>15s</value>
</variable> </variable>
@ -36,21 +50,55 @@
<variable name='prometheusScrapeTimeout' type='string' description="Temps d'attente avant que la récupération de données échoue"> <variable name='prometheusScrapeTimeout' type='string' description="Temps d'attente avant que la récupération de données échoue">
<value>10s</value> <value>10s</value>
</variable> </variable>
<variable name='ajout_client_prometheus' type='oui/non' description="Ajouter un nouveau client à Prometheus"> <variable name='job_name_node' type='string' description="Nom du job pour les noeuds" mode='expert'>
<value>non</value> <value>node</value>
</variable> </variable>
<variable name='nouveau_node_exporter' type='string' description="url/IP du nouveau node exporter" multi="True" mandatory='True'/> <variable name='job_file_config' type='string' description="Emplacement des fichiers de configuration des noeuds" auto_freeze='True' mode='expert' mandatory='True'>
<variable name='job_name_node' type='string' description="Nom du job pour les noeuds" mode='expert'>
<value>node</value>
</variable>
<variable name='job_file_config' type='string' description="Emplacement des fichiers de configuration des noeuds" auto_freeze='True' mode='expert' mandatory='True'>
<value>/etc/prometheus/nodes</value> <value>/etc/prometheus/nodes</value>
</variable> </variable>
<variable name='addTargetPrometheus' type='oui/non' description="Ajouter des cibles statiques à Prometheus">
<value>non</value>
</variable>
<!-- Job standard -->
<variable name='prTarg' type='string' description='Nom de la cible prometheus' multi='True'/>
<variable name='prTargIP' type='string' description="Adresse IP ou nom de domaine de la cible prometheus"/>
<variable name='prTargSonde' type='string' description="Sonde a utiliser pour ce client">
<value>Node Exporter</value>
</variable>
</family> </family>
<family name="grafana"> <family name='Jobs prometheus'>
<variable name='grafana_domain' type='string' description="Nom de Domaine ou IP pour accèder à l'interface Grafana" mandatory='True'> <variable name='promJobs' type='string' description="Nom du job prometheus" multi='True'/>
<value>localhost</value> <variable name='honorLabels' type='oui/non' description='Garder les labels en cas de conflit' mode='expert'>
</variable> <value>oui</value>
</variable>
<variable name='scrpInterval' type='number' description="Interval d'intérogation de la sonde (en secondes)">
<value>15</value>
</variable>
<variable name='scrpTimeout' type='number' description="Délais d'attente maximum lors de l'interrogation d'une sonde">
<value>10</value>
</variable>
<variable name='scrpScheme' type='string' description="Protocole à utiliser pour l'interrogation de la sonde">
<value>http</value>
</variable>
<variable name='scrpMetricPath' type='string' description="Chemin d'accès de la ressource">
<value>/metrics</value>
</variable>
<variable name='addPrOpenTarg' type='oui/non' description="Ajouter des cibles statiques pour les jobs personnalisé">
<value>non</value>
</variable>
<!-- Job libre -->
<variable name='prOpenTarg' type='string' description='Nom de la cible personnalisé prometheus' multi='True'/>
<variable name='prOpenTargJob' type='string' description='Nom du job de rattachement de la cible'/>
<variable name='prOpenTargIP' type='string' description="Adresse IP ou nom de domaine de la cible"/>
<variable name='prOpenTargPort' type='number' description="Port d'écoute de la sonde"/>
</family>
<family name="grafana">
<variable name='grafana_domain' type='string' description="Nom de Domaine ou IP pour accèder à l'interface Grafana" mandatory='True'>
<value>localhost</value>
</variable>
<variable name='grafana_session_max_lifetime' type='string' description="Durée avant déconnexion de l'interface Grafana (en seconde)"> <variable name='grafana_session_max_lifetime' type='string' description="Durée avant déconnexion de l'interface Grafana (en seconde)">
<value>86400</value> <value>86400</value>
</variable> </variable>
@ -63,23 +111,170 @@
<variable name='grafana_auth_anonymous' type='string' description="Activer l'accès aux utilisateurs non enregistrés"> <variable name='grafana_auth_anonymous' type='string' description="Activer l'accès aux utilisateurs non enregistrés">
<value>false</value> <value>false</value>
</variable> </variable>
<variable name='grafanaRootURL' type='string' description='Url publique de grafana (avec http:// ou https://)' mode='expert'/>
</family>
<family name="alertes prometheus">
<variable name='alSMTPUseSys' type='oui/non' description="Utiliser la passerelle SMTP du système ?">
<value>non</value>
</variable>
<variable name='alSMTPHost' type='string' description="Adresse du serveur SMTP pour l'envois des alertes"/>
<variable name='alSMTPPort' type='string' description="Port d'écoute du serveur SMTP pour l'envois des alertes"/>
<variable name='alFrom' type='string' description="Adresse d'origine des emails d'alerte"/>
<variable name='alSMTPTLS' type='oui/non' description="Utiliser STARTTLS">
<value>non</value>
</variable>
<variable name='alSMTPAuth' type='oui/non' description="Authentification requise sur le serveur SMTP ?">
<value>non</value>
</variable>
<variable name='alSMTPUser' type='string' description="Utilisateur SMTP"/>
<variable name='alSMTPPass' type='string' description="Mot de passe"/>
<variable name='alReceiver' type='string' description="Nom du destinataire" multi='True'/>
<variable name='alReceiverEmail' type='string' description="Adresse email du destinataire"/>
<variable name='alDefaultReceiver' type='string' description='Nom du destinataire par défaut'/>
<variable name='alRoute' type='string' description="Nom de la rêgle de distribution des alertes" multi="True"/>
<variable name='alRouteMatchSource' type='string' description='Source de correspondance'/>
<variable name='alRouteMatchValue' type='string' description='Valeur attendue'/>
<variable name='alRouteMatchReceiver' type='string' description="Nom du destinataire de l'alerte"/>
<variable name='alRouteRegxp' type='string' description="Rêgle de distribution des alertes" multi="True"/>
<variable name='alRouteMatchRegExpSource' type='string' description='Source de correspondance'/>
<variable name='alRouteMatchRegExp' type='string' description='Expression régulière'/>
<variable name='alRouteMatchRegxpRecv' type='string' description="Nom du destinataire de l'alerte (regxp)"/>
<variable name='alSubRoute' type='string' description="Nom de la rêgle maitresse" multi='True'/>
<variable name='alSubRouteMatchSource' type='string' description='Source de correspondance'/>
<variable name='alSubRouteMatchValue' type='string' description='Valeur attendue'/>
<variable name='alSubRouteMatchReceiver' type='string' description="Nom du destinataire de l'alerte"/>
</family> </family>
<separators>
<separator name='activer_grafana'>Services complèmentairse</separator>
<separator name='prometheusJobName'>Configuration du serveur Prometheus</separator>
<separator name='job_name_node'>Configuration des jobs standards</separator>
<separator name='alSMTPHost'>Configuration SMTP pour l'envois des alertes</separator>
<separator name='alReceiver'>Destinatires</separator>
<separator name='alRoute'>Rêgles de distribution simples</separator>
<separator name='alRouteRegxp'>Rêgles de distribution regexp</separator>
<separator name='alSubRoute'>Sous-rêgles de distribution</separator>
</separators>
</variables> </variables>
<constraints> <constraints>
<group master='alReceiver'>
<slave>alReceiverEmail</slave>
</group>
<group master='promJobs'>
<slave>scrpInterval</slave>
<slave>scrpTimeout</slave>
<slave>honorLabels</slave>
<slave>scrpScheme</slave>
<slave>scrpMetricPath</slave>
</group>
<group master='alRoute'>
<slave>alRouteMatchSource</slave>
<slave>alRouteMatchValue</slave>
<slave>alRouteMatchReceiver</slave>
</group>
<group master='alRouteRegxp'>
<slave>alRouteMatchRegExpSource</slave>
<slave>alRouteMatchRegExp</slave>
<slave>alRouteMatchRegxpRecv</slave>
</group>
<group master='alSubRoute'>
<slave>alSubRouteMatchSource</slave>
<slave>alSubRouteMatchValue</slave>
<slave>alSubRouteMatchReceiver</slave>
</group>
<group master='prTarg'>
<slave>prTargIP</slave>
<slave>prTargSonde</slave>
</group>
<group master='prOpenTarg'>
<slave>prOpenTargIP</slave>
<slave>prOpenTargPort</slave>
<slave>prOpenTargJob</slave>
</group>
<check name='valid_enum' target='prTargSonde'>
<param>['Node Exporter']</param>
</check>
<check name='valid_enum' target='scrpScheme'>
<param>['http','https']</param>
</check>
<check name='valid_enum' target='alRouteMatchSource'>
<param>['','service','severity']</param>
</check>
<check name='valid_enum' target='alRouteMatchRegExpSource'>
<param>['','service','severity']</param>
</check>
<check name='valid_enum' target='alSubRouteMatchSource'>
<param>['','service','severity']</param>
</check>
<condition name='disabled_if_in' source='alSMTPUseSys'>
<param>oui</param>
<target type='variable'>alSMTPUser</target>
<target type='variable'>alSMTPPass</target>
<target type='variable'>alSMTPPort</target>
<target type='variable'>alSMTPTLS</target>
<target type='variable'>alSMTPHost</target>
<target type='variable'>alSMTPAuth</target>
</condition>
<condition name='disabled_if_in' source='alSMTPAuth'>
<param>non</param>
<target type='variable'>alSMTPUser</target>
<target type='variable'>alSMTPPass</target>
</condition>
<condition name='disabled_if_in' source='activer_prometheus'> <condition name='disabled_if_in' source='activer_prometheus'>
<param>non</param> <param>non</param>
<target type='family'>prometheus</target> <target type='family'>prometheus</target>
<target type='family'>alertes prometheus</target>
<target type='filelist'>prometheus</target> <target type='filelist'>prometheus</target>
<target type='variable'>activer_grafana</target> <target type='variable'>activer_grafana</target>
</condition> </condition>
<condition name='disabled_if_in' source='activer_grafana'> <condition name='disabled_if_in' source='activer_grafana'>
<param>non</param> <param>non</param>
<target type='family'>grafana</target> <target type='family'>grafana</target>
<target type='filelist'>grafana</target> <target type='filelist'>grafana</target>
</condition> </condition>
<condition name='disabled_if_in' source='ajout_client_prometheus'>
<condition name='disabled_if_in' source='activerAlertmanager'>
<param>non</param> <param>non</param>
<target type='variable'>nouveau_node_exporter</target> <target type='family'>alertes prometheus</target>
<target type='filelist'>prometheus-alertmanager</target>
<target type='service_accesslist'>prometheus-alertmanager</target>
</condition>
<condition name='disabled_if_in' source='addTargetPrometheus'>
<param>non</param>
<target type='variable'>prTarg</target>
<target type='variable'>prTargIP</target>
<target type='variable'>prTargSonde</target>
</condition>
<condition name='disabled_if_in' source='addPrOpenTarg'>
<param>non</param>
<target type='variable'>prOpenTarg</target>
<target type='variable'>prOpenTargIP</target>
<target type='variable'>prOpenTargPort</target>
</condition> </condition>
</constraints> </constraints>
<help> <help>

View File

@ -1,12 +0,0 @@
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
User=root
Restart=on-failure
ExecStart=/usr/bin/prometheus --config.file=/etc/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target

50
tmpl/alert-rules.yml Normal file
View File

@ -0,0 +1,50 @@
#
# Alert Rules
#
groups:
- name: EoleRules
rules:
# Instance is Down
- alert: JobInstanceDown
expr: up == 0
for: 1m
annotations:
DESCRIPTION: Job {{ $labels.job }} instance {{ $labels.instance }} is down.
SUMMARY: Job instance is down
# Heavy CPU usage
- alert: cpu_threshold_exceeded
expr: (100 * (1 - avg by(instance) (irate(node_cpu{job="%%{job_name_node}",mode="idle"}[5m]))))
> 80
annotations:
description: This device's cpu usage has exceeded the threshold with a value
of {{ $value }}.
summary: Instance {{ $labels.instance }} CPU usage is dangerously high
# Heavy Memory usage
- alert: mem_threshold_exceeded
expr: (node_memory_MemFree{job="%%{job_name_node}"} + node_memory_Cached{job="%%{job_name_node}"} + node_memory_Buffers{job="%%{job_name_node}"})
/ 1e+06 < 80
annotations:
description: This device's memory usage has exceeded the threshold with a value
of {{ $value }}.
summary: Instance {{ $labels.instance }} memory usage is dangerously high
# Heavy "/" use
- alert: filesystem_threshold_exceeded
expr: node_filesystem_avail{job="%%{job_name_node}",mountpoint="/"} / node_filesystem_size{job="%%{job_name_node}"}
* 100 < 20
annotations:
description: This device's filesystem usage has exceeded the threshold with
a value of {{ $value }}.
summary: Instance {{ $labels.instance }} filesystem usage is dangerously high
# Heavy CPU temperature
- alert: cpu_temp_threshold_exceeded
expr: avg(node_hwmon_temp_celsius{job="node"}) BY (instance)
> 70
annotations:
description: This device's cpu temperature has exceeded the threshold with a value
of {{ $value }}.
summary: Instance {{ $labels.instance }} CPU temperature is dangerously high

142
tmpl/alertmanager.yml Normal file
View File

@ -0,0 +1,142 @@
global:
# The smarthost and SMTP sender used for mail notifications.
%if %%alSMTPUseSys == 'oui'
%if %%tls_smtp == "non"
smtp_smarthost: '%%exim_relay_smtp:25'
%elif %%tls_smtp == "port 25"
smtp_smarthost: '%%exim_relay_smtp:25'
smtp_require_tls: true
%else
smtp_smarthost: '%%exim_relay_smtp:465'
smtp_require_tls: true
%end if
smtp_from: '%%system_mail_from'
%else
smtp_smarthost: '%%alSMTPHost:%%alSMTPPort'
smtp_from: '%%alFrom'
%if %%getVar('alSMTPAuth','non') == 'oui'
smtp_auth_username: '%%alSMTPUser'
smtp_auth_password: 'alSMTPPass'
%end if
%if %%getVar('alSMTPTLS','non') == 'oui'
smtp_require_tls: true
%else
smtp_require_tls: false
%end if
%end if
# The auth token for Hipchat.
#hipchat_auth_token: '1234556789'
# Alternative host for Hipchat.
#hipchat_api_url: 'https://hipchat.foobar.org/'
# The directory from which notification templates are read.
templates:
- '/etc/alertmanager/template/*.tmpl'
# The root route on which each incoming alert enters.
route:
# The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
group_by: ['alertname', 'cluster', 'service']
# When a new group of alerts is created by an incoming alert, wait at
# least 'group_wait' to send the initial notification.
# This way ensures that you get multiple alerts for the same group that start
# firing shortly after another are batched together on the first
# notification.
group_wait: 30s
# When the first notification was sent, wait 'group_interval' to send a batch
# of new alerts that started firing for that group.
group_interval: 5m
# If an alert has successfully been sent, wait 'repeat_interval' to
# resend them.
repeat_interval: 3h
# A default receiver
receiver: %%alDefaultReceiver
# All the above attributes are inherited by all child routes and can
# overwritten on each.
# The child route trees.
routes:
# This routes performs a regular expression match on alert labels to
# catch alerts that are related to a list of services.
%for route in %%getVar('alRouteRegxp',[])
- match_re:
%%{route.alRouteMatchRegExpSource}: %%{route.alRouteMatchRegExp}
receiver: %%route.alRouteMatchRegxpRecv
%if not %%is_empty('alSubRoute')
routes:
%for sroute in %%getVar('alSubRoute',[])
# The service has a sub-route for critical alerts, any alerts
# that do not match, i.e. severity != critical, fall-back to the
# parent node and are sent to 'team-X-mails'
%if %%sroute == %%route
- match:
%%{sroute.alSubRouteMatchSource}: %%alSubRouteMatchValue
receiver: %%alSubRouteMatchReceiver
continue: true
%end if
%end for
%end if
%end for
%for rt in %%getVar('alRoute',[])
- match:
%%{rt.alRouteMatchSource}: %%{rt.alRouteMatchValue}
receiver: %%rt.alRouteMatchReceiver
continue: true
%if not %%is_empty('alSubRoute')
routes:
%for sroute in %%getVar('alSubRoute',[])
%if %%sroute == %%rt
- match:
%%{sroute.alSubRouteMatchSource}: %%{sroute.alSubRouteMatchValue}
receiver: %%sroute.alSubRouteMatchReceiver
continue: true
%end if
%end for
%end if
%end for
# # This route handles all alerts coming from a database service. If there's
# # no team to handle it, it defaults to the DB team.
# - match:
# service: database
# receiver: team-DB-pager
# # Also group alerts by affected database.
# group_by: [alertname, cluster, database]
# routes:
# - match:
# owner: team-X
# receiver: team-X-pager
# - match:
# owner: team-Y
# receiver: team-Y-pager
# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
# Apply inhibition if the alertname is the same.
equal: ['alertname', 'cluster', 'service']
receivers:
%for rcv in %%getVar('alReceiver',[])
- name: '%%rcv'
email_configs:
- to: '%%rcv.alReceiverEmail'
%end for

View File

@ -0,0 +1,10 @@
# # config file version
apiVersion: 1
providers:
- name: 'eole'
orgId: 1
folder: ''
type: file
options:
path: /var/lib/grafana/dashboards

View File

@ -0,0 +1,11 @@
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://%%adresse_ip_eth0:9090
isDefault: true
version: 1
editable: false

File diff suppressed because it is too large Load Diff

View File

@ -21,7 +21,7 @@
;plugins = /var/lib/grafana/plugins ;plugins = /var/lib/grafana/plugins
# folder that contains provisioning config files that grafana will apply on startup and while running. # folder that contains provisioning config files that grafana will apply on startup and while running.
; provisioning = conf/provisioning provisioning = /etc/grafana/provisioning
#################################### Server #################################### #################################### Server ####################################
[server] [server]
@ -40,11 +40,13 @@ domain = %%grafana_domain
# Redirect to correct domain if host header does not match domain # Redirect to correct domain if host header does not match domain
# Prevents DNS rebinding attacks # Prevents DNS rebinding attacks
;enforce_domain = false enforce_domain = true
# The full public facing url you use in browser, used for redirects and emails # The full public facing url you use in browser, used for redirects and emails
# If you use reverse proxy and sub path specify full url (with sub path) # If you use reverse proxy and sub path specify full url (with sub path)
;root_url = http://localhost:3000 %if %%is_empty('grafanaRootURL')
root_url = %%grafanaRootURL
%end if
# Log web requests # Log web requests
;router_logging = false ;router_logging = false
@ -299,18 +301,20 @@ enabled = %%grafana_auth_anonymous
#################################### SMTP / Emailing ########################## #################################### SMTP / Emailing ##########################
[smtp] [smtp]
;enabled = false %if %%getVar('activer_exim_relay_smtp','non') == 'oui'
;host = localhost:25 enabled = true
host = %%exim_relay_smtp:25
;user = ;user =
# If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;""" # If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;"""
;password = ;password =
;cert_file = ;cert_file =
;key_file = ;key_file =
;skip_verify = false skip_verify = true
;from_address = admin@grafana.localhost from_address = %%system_mail_from
;from_name = Grafana from_name = Grafana
# EHLO identity in SMTP dialog (defaults to instance_name) # EHLO identity in SMTP dialog (defaults to instance_name)
;ehlo_identity = dashboard.example.com ;ehlo_identity = dashboard.example.com
%end if
[emails] [emails]
;welcome_email_on_sign_up = false ;welcome_email_on_sign_up = false

6
tmpl/predict-rules.yml Normal file
View File

@ -0,0 +1,6 @@
groups:
- name: PredictRules
rules:
- alert: disk_full_within_6_hours
expr: predict_linear(node_filesystem_free{job="%%{job_name_node}",mountpoint="/"}[1h], 6 * 3600) < 0
for: 5m

1
tmpl/prometheus.defaults Normal file
View File

@ -0,0 +1 @@
PROMETHEUS_OPTS='--config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus/data'

View File

@ -5,35 +5,67 @@ global:
scrape_timeout: %%prometheusScrapeTimeout scrape_timeout: %%prometheusScrapeTimeout
rule_files: rule_files:
- "/etc/prometheus/rules.d/*.yml" - "/etc/prometheus/rules.d/*.yml"
scrape_configs: scrape_configs:
- job_name: %%prometheusJobName - job_name: %%prometheusJobName
honor_labels: true honor_labels: true
static_configs: static_configs:
- targets: ['%%nom_domaine_machine:9090'] - targets: ['%%nom_domaine_machine:9090'%slurp
%if %%getVar('activerSndNodeExpoter','non') == 'oui' %if %%getVar('activerSndNodeExpoter','non') == 'oui'
- targets: ['%%nom_domaine_machine:9100'] , '%%nom_domaine_machine:9100'%slurp
%end if %end if
]
- job_name: '%%job_name_node' - job_name: '%%job_name_node'
file_sd_configs: file_sd_configs:
- files: [ "%%job_file_config/*.yml" ] # - files: [ "%%job_file_config/*.yml" ]
%if %%getVar('addTargetPrometheus','non') == 'oui'
static_configs: static_configs:
%if %%getVar('ajout_client_prometheus','non') == 'oui' - targets: [ "%%adresse_ip_eth0:9100"%slurp
%if not %%is_empty(%%nouveau_node_exporter) %for %%cliPr in %%getVar('prTarg',[])
%for %%client_prometheus in %%nouveau_node_exporter %if %%cliPr.prTargSonde == 'Node Exporter'
- targets: ['%%client_prometheus:9100'] , '%%cliPr.prTargIP:9100'%slurp
%end for %end if
%end if %end for
]
%end if %end if
#alerting: %for %%job in %%getVar('promJobs', [])
# alertmanagers: - job_name: '%%job'
# - scheme: https %if %%job.honorLabels == 'oui'
# static_configs: honor_labels: true
# - targets: %else
# - "1.2.3.4:9093" honor_labels: false
# - "1.2.3.5:9093" %end if
# - "1.2.3.6:9093" scrape_interval: %%{job.scrpInterval}s
scrape_timeout: %%{job.scrpTimeout}s
scheme: %%job.scrpScheme
metrics_path: %%job.scrpMetricPath
%set first = True
static_configs:
- targets: [ %slurp
%for %%target in %%getVar('prOpenTarg',[])
%if %%target.prOpenTargJob == %%job
%if %%first
"%%target.prOpenTargIP:%%target.prOpenTargPort"%slurp
%set first = False
%else
, "%%target.prOpenTargIP:%%target.prOpenTargPort"%slurp
%end if
%end if
%end for
]
%end for
%if %%getVar('activerAlertmanager','non') == 'oui'
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "%%nom_domaine_machine:9093"
%end if