Files
K3S/addons/prometheus-stack/README.md
Sergey Antropoff 38aaadbfb1 docs: sync addon docs with explicit external/internal service modes
Обновлена документация под новые аддоны (gitlab, redis, mongodb, kafka, kafka-ui, rabbitmq) и новую модель явного выбора зависимостей. Добавлены и унифицированы описания переключателей *_database_mode и *_redis_mode, обновлена таблица зависимостей аддонов, примеры конфигурации и список vault-секретов.
2026-04-29 23:21:04 +03:00

150 lines
4.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# kube-prometheus-stack
Полный monitoring-стек: Prometheus, Grafana, Alertmanager, node-exporter, kube-state-metrics. Все аддоны проекта автоматически создают ServiceMonitor при включённом prometheus-stack.
## Быстрый старт
```yaml
# group_vars/all/addons.yml
addon_prometheus_stack: true
```
Секреты в `vault.yml`:
```yaml
vault_grafana_user: "admin"
vault_grafana_password: "secure-password"
```
```bash
make addon-prometheus-stack
```
## Параметры
| Переменная | Умолч. | Описание |
|---|---|---|
| `prometheus_retention_days` | `7` | Срок хранения метрик |
| `prometheus_storage_size` | `10Gi` | PVC Prometheus |
| `grafana_storage_size` | `5Gi` | PVC Grafana |
| `prometheus_alertmanager_enabled` | `true` | Alertmanager |
| `prometheus_grafana_ingress_enabled` | `false` | Grafana через Ingress |
## Доступ к Grafana
По умолчанию — NodePort 32000:
```
http://192.168.1.10:32000
```
Через Ingress:
```yaml
prometheus_grafana_ingress_enabled: true
prometheus_grafana_ingress_host: "grafana.example.com"
```
## Предустановленные дашборды
- **Kubernetes / Cluster Overview** — ресурсы кластера
- **Node Exporter Full** — метрики нод
- **Pod Monitoring** — метрики подов
- **Ingress Nginx** — HTTP метрики (при addon_ingress_nginx: true)
## Добавить Grafana дашборд через ConfigMap
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: my-dashboard
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
my-dashboard.json: |
{
"title": "My App",
"panels": [...]
}
```
## ServiceMonitor — добавить своё приложение
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
namespace: my-app
labels:
release: prom # совпадает с prometheus_stack_release_name
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics
interval: 30s
path: /metrics
```
## PrometheusRule — создать алерт
```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: my-app-alerts
namespace: my-app
labels:
release: prom
spec:
groups:
- name: my-app
rules:
- alert: MyAppDown
expr: up{job="my-app"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "My App is down"
description: "{{ $labels.instance }} has been down for more than 5 minutes."
```
## Alertmanager — настроить уведомления
```yaml
# В values Prometheus stack (через JCasC/ConfigMap):
alertmanager:
config:
route:
receiver: slack
receivers:
- name: slack
slack_configs:
- api_url: "https://hooks.slack.com/services/..."
channel: "#alerts"
text: "{{ range .Alerts }}{{ .Annotations.description }}{{ end }}"
```
## PromQL примеры
```promql
# CPU usage по нодам
100 - (avg by(node) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# HTTP error rate ingress
sum(rate(nginx_ingress_controller_requests{status=~"5.."}[5m])) by (ingress)
# Pod restarts
increase(kube_pod_container_status_restarts_total[1h]) > 0
```
## Официальные ресурсы
- Официальный сайт: [https://prometheus.io/](https://prometheus.io/)
- Официальная документация: [https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack)
- Версии Helm chart / ПО: [https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack)