Обновлена документация под новые аддоны (gitlab, redis, mongodb, kafka, kafka-ui, rabbitmq) и новую модель явного выбора зависимостей. Добавлены и унифицированы описания переключателей *_database_mode и *_redis_mode, обновлена таблица зависимостей аддонов, примеры конфигурации и список vault-секретов.
150 lines
4.0 KiB
Markdown
150 lines
4.0 KiB
Markdown
# kube-prometheus-stack
|
||
|
||
Полный monitoring-стек: Prometheus, Grafana, Alertmanager, node-exporter, kube-state-metrics. Все аддоны проекта автоматически создают ServiceMonitor при включённом prometheus-stack.
|
||
|
||
## Быстрый старт
|
||
|
||
```yaml
|
||
# group_vars/all/addons.yml
|
||
addon_prometheus_stack: true
|
||
```
|
||
|
||
Секреты в `vault.yml`:
|
||
```yaml
|
||
vault_grafana_user: "admin"
|
||
vault_grafana_password: "secure-password"
|
||
```
|
||
|
||
```bash
|
||
make addon-prometheus-stack
|
||
```
|
||
|
||
## Параметры
|
||
|
||
| Переменная | Умолч. | Описание |
|
||
|---|---|---|
|
||
| `prometheus_retention_days` | `7` | Срок хранения метрик |
|
||
| `prometheus_storage_size` | `10Gi` | PVC Prometheus |
|
||
| `grafana_storage_size` | `5Gi` | PVC Grafana |
|
||
| `prometheus_alertmanager_enabled` | `true` | Alertmanager |
|
||
| `prometheus_grafana_ingress_enabled` | `false` | Grafana через Ingress |
|
||
|
||
## Доступ к Grafana
|
||
|
||
По умолчанию — NodePort 32000:
|
||
```
|
||
http://192.168.1.10:32000
|
||
```
|
||
|
||
Через Ingress:
|
||
```yaml
|
||
prometheus_grafana_ingress_enabled: true
|
||
prometheus_grafana_ingress_host: "grafana.example.com"
|
||
```
|
||
|
||
## Предустановленные дашборды
|
||
|
||
- **Kubernetes / Cluster Overview** — ресурсы кластера
|
||
- **Node Exporter Full** — метрики нод
|
||
- **Pod Monitoring** — метрики подов
|
||
- **Ingress Nginx** — HTTP метрики (при addon_ingress_nginx: true)
|
||
|
||
## Добавить Grafana дашборд через ConfigMap
|
||
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: ConfigMap
|
||
metadata:
|
||
name: my-dashboard
|
||
namespace: monitoring
|
||
labels:
|
||
grafana_dashboard: "1"
|
||
data:
|
||
my-dashboard.json: |
|
||
{
|
||
"title": "My App",
|
||
"panels": [...]
|
||
}
|
||
```
|
||
|
||
## ServiceMonitor — добавить своё приложение
|
||
|
||
```yaml
|
||
apiVersion: monitoring.coreos.com/v1
|
||
kind: ServiceMonitor
|
||
metadata:
|
||
name: my-app
|
||
namespace: my-app
|
||
labels:
|
||
release: prom # совпадает с prometheus_stack_release_name
|
||
spec:
|
||
selector:
|
||
matchLabels:
|
||
app: my-app
|
||
endpoints:
|
||
- port: metrics
|
||
interval: 30s
|
||
path: /metrics
|
||
```
|
||
|
||
## PrometheusRule — создать алерт
|
||
|
||
```yaml
|
||
apiVersion: monitoring.coreos.com/v1
|
||
kind: PrometheusRule
|
||
metadata:
|
||
name: my-app-alerts
|
||
namespace: my-app
|
||
labels:
|
||
release: prom
|
||
spec:
|
||
groups:
|
||
- name: my-app
|
||
rules:
|
||
- alert: MyAppDown
|
||
expr: up{job="my-app"} == 0
|
||
for: 5m
|
||
labels:
|
||
severity: critical
|
||
annotations:
|
||
summary: "My App is down"
|
||
description: "{{ $labels.instance }} has been down for more than 5 minutes."
|
||
```
|
||
|
||
## Alertmanager — настроить уведомления
|
||
|
||
```yaml
|
||
# В values Prometheus stack (через JCasC/ConfigMap):
|
||
alertmanager:
|
||
config:
|
||
route:
|
||
receiver: slack
|
||
receivers:
|
||
- name: slack
|
||
slack_configs:
|
||
- api_url: "https://hooks.slack.com/services/..."
|
||
channel: "#alerts"
|
||
text: "{{ range .Alerts }}{{ .Annotations.description }}{{ end }}"
|
||
```
|
||
|
||
## PromQL примеры
|
||
|
||
```promql
|
||
# CPU usage по нодам
|
||
100 - (avg by(node) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
|
||
|
||
# Memory usage
|
||
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
|
||
|
||
# HTTP error rate ingress
|
||
sum(rate(nginx_ingress_controller_requests{status=~"5.."}[5m])) by (ingress)
|
||
|
||
# Pod restarts
|
||
increase(kube_pod_container_status_restarts_total[1h]) > 0
|
||
```
|
||
## Официальные ресурсы
|
||
|
||
- Официальный сайт: [https://prometheus.io/](https://prometheus.io/)
|
||
- Официальная документация: [https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack)
|
||
- Версии Helm chart / ПО: [https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack)
|