Prometheus+Node_exporter+Grafana+Alertmanager 监控部署(上)

admin 2025-04-28 27人围观 ,发现210个评论

一、Prometheus安装及配置

1、下载及解压安装包

cd/usr/local/src/exportVER="2.13.1"wget${VER}/prometheus-${VER}./data0/prometheusgroupaddprometheususeradd-gprometheusprometheus-d/data0/prometheustar-xvfprometheus-${VER}./usr/local/src/mvprometheus-${VER}.linux-amd64/data0/prometheus/prometheus_servercd/data0/prometheus/prometheus_server/mkdir-p{data,config,logs,bin}mvprometheuspromtoolbin///data0/prometheus

2、设置环境变量

vim/etc/profilePATH=/data0/prometheus/prometheus_server/bin:$PATH:$HOME/binsource/etc/profile

3、检查配置文件

promtoolcheckconfig/data0/prometheus/prometheus_server/config//data0/prometheus/prometheus_server/config/:0rulefilesfound


4、创建的systemdunit文件

4.1、常规服务

sudotee/etc/systemd/system/'EOF'[Unit]Description=PrometheusDocumentation=[Service]Type=simpleUser=prometheusExecStart=/data0/prometheus/prometheus_server/bin/=/data0/prometheus/prometheus_server/config/=/data0/prometheus/prometheus_server/=60dRestart=on-failure[Install]WantedBy=

4.2、使用supervisor管理prometheus_server

yuminstall-yepel-releasesupervisorsudotee/etc//"EOF"[program:prometheus]在supervisord启动的时候也自动启动;autostart=true启动5秒后没有异常退出,就当作已经正常启动了;startsecs=5启动程序的用户;user=prometheus标准日志输出;stdout_logfile=/data0/prometheus/prometheus_server/logs/标准日志文件大小,默认50MB;stdout_logfile_maxbytes=20MB创建Alertmanager告警规则文件mkdir-p/data0/prometheus/prometheus_server/rules/touch/data0/prometheus/prometheus_server/rules/node_/data0/prometheus/prometheus_server/rules/memory_/data0/prometheus/prometheus_server/rules/disk_/data0/prometheus/prometheus_server/rules/cpu_:scrape_interval:15s设置rules评估时间间隔,默认是1m告警管理配置,默认配置alerting:alertmanagers:-static_configs:-targets:-192.168.56.11:9093加载rules,并根据设置的时间间隔定期评估rule_files:-"second_"-"/data0/prometheus/prometheus_server/rules/node_"内存报警规则文件-"/data0/prometheus/prometheus_server/rules/disk_"cpu报警规则文件默认只有主机本身的监控配置scrape_configs:metrics_pathdefaultsto'/metrics'可覆盖全局配置设置的抓取间隔,由15秒重写成5秒。scrape_interval:10sstatic_configs:-targets:['localhost:9090','localhost:9100']-job_name:'DMC_HOST'file_sd_configs:-files:['./']被监控的主机,可以json或yaml格式书写,我这里以json格式书写,target里面写监控机器的ip,labels非必须,可以由你自己定EOF服务器存活报警cat/data0/prometheus/prometheus_server/rules/node_\EOFgroups:-name:实例存活告警规则rules:-alert:实例存活告警expr:up==0for:1mlabels:user:prometheusseverity:warningannotations:description:"{{$}}ofjob{{$}}hasbeownformorethan1minutes."EOFdisk报警cat/data0/prometheus/prometheus_server/rules/disk_\EOFgroups:-name:磁盘报警规则rules:-alert:磁盘使用率告警expr:(node_filesystem_size_bytes-node_filesystem_avail_bytes)/node_filesystem_size_bytes*10080for:1mlabels:user:prometheusseverity:warningannotations:description:"服务器:磁盘设备:使用超过80%!(挂载点:{{$}}当前值:{{$value}}%)"EOF全局配置项global:resolve_timeout:5m定义路由树信息route:group_by:[alertname]设置默认接收人group_wait:30s在发送新警报前的等待时间repeat_interval:1h基础告警通知group_wait:10smatch_re:alertname:实例存活告警|磁盘使用率告警消息告警通知group_wait:10smatch_re:alertname:内存使用率告警|CPU使用率告警警报被解决之后是否通知一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签。inhibit_rules:-source_match:severity:'critical'target_match:severity:'warning'equal:['alertname','dev','instance']EOF

3、启动Alertmanager

cat/lib/systemd/system/\EOF[Unit]Description=Prometheus:thealertingsystemDocumentation=[Service]ExecStart=/data0/prometheus/alertmanager/=/data0/prometheus/alertmanager/=alwaysStartLimitInterval=0RestartSec=10[Install]WantedBy=命令行测试机器人发送消息,验证是否可以发送成功,有的时候prometheus-webhook-dingtalk会报422的错误,就是因为钉钉的安全限制(这里的安全策略是发送消息,必须包含prometheus才可以正常发送)curl-H"Content-Type:application/json"-d'{"msgtype":"text","text":{"content":"prometheusalerttest"}}'"Content-Type:application/json"-d'{"msgtype":"text","text":{"content":"prometheusalerttest"}}'

4.1、二进制包方式部署插件

cd/usr/local/src/exportVER="0.3.0"wget${VER}/prometheus-webhook-dingtalk-${VER}.${VER}.${VER}.linux-amd64/data0/prometheus/alertmanager/prometheus-webhook-dingtalkdockerrun-d--restartalways-p8060:8060timonwong/prometheus-webhook-dingtalk:="web-hook-name=dingtalk-webhook"dockerrun-d--restartalways-p8060:8060timonwong/prometheus-webhook-dingtalk:="ops_dingding="--="info_dingding="这里解释一下两个变量:web-hook-name:prometheus-webhook-dingtalk支持多个钉钉webhook,不同webhook就是靠名字对应到URL来做映射的。要支持多个钉钉webhook,可以用多个--参数的方式支持,例如:sudodockerrun-d--restartalways-p8060:8060timonwong/prometheus-webhook-dingtalk:="webhook1="--="webhook2="。而名字和URL的对应规则如下,="webhook1=",对应的APIURL为:http://localhost:8060/dingtalk/webhook1/sdingtalk-webhook:这个就是之前获取的钉钉webhook

4.3、源码方式部署插件

添加环境变量GOPATHmkdir-p/opt/pathexportGOPATH=/opt/path下载插件cd/usr/local/src/gitclone(make成功后,会产生一个prometheus-webhook-dingtalk二进制文件)启动服务nohup/data0/prometheus/alertmanager/prometheus-webhook-dingtalk/="ops_dingding="--="info_dingding="211/tmp/#检查端口netstat-anpt|grep8060
四、Grafana安装及配置

1、下载及安装

cd/usr/local/src/exportVER="6.4.3"wget${VER}-1.x86_64.rpmyumlocalinstall-ygrafana-${VER}-1.x86_64.rpm


2、启动服务


3、访问WEB界面

默认账号/密码:admin/admin

4、Grafana添加数据

五、替换grafana的dashboards

参考文档:

、Alertmanager、Grafana监控Linux主机

、canal服务器

github地址:

猜你喜欢
    不容错过