# kube-prometheus-stack Полный monitoring-стек: Prometheus, Grafana, Alertmanager, node-exporter, kube-state-metrics. Все аддоны проекта автоматически создают ServiceMonitor при включённом prometheus-stack. ## Быстрый старт ```yaml # group_vars/all/addons.yml addon_prometheus_stack: true ``` Секреты в `vault.yml`: ```yaml vault_grafana_user: "admin" vault_grafana_password: "secure-password" ``` ```bash make addon-prometheus-stack ``` ## Параметры | Переменная | Умолч. | Описание | |---|---|---| | `prometheus_retention_days` | `7` | Срок хранения метрик | | `prometheus_storage_size` | `10Gi` | PVC Prometheus | | `grafana_storage_size` | `5Gi` | PVC Grafana | | `prometheus_alertmanager_enabled` | `true` | Alertmanager | | `prometheus_grafana_ingress_enabled` | `false` | Grafana через Ingress | ## Доступ к Grafana По умолчанию — NodePort 32000: ``` http://192.168.1.10:32000 ``` Через Ingress: ```yaml prometheus_grafana_ingress_enabled: true prometheus_grafana_ingress_host: "grafana.example.com" ``` ## Предустановленные дашборды - **Kubernetes / Cluster Overview** — ресурсы кластера - **Node Exporter Full** — метрики нод - **Pod Monitoring** — метрики подов - **Ingress Nginx** — HTTP метрики (при addon_ingress_nginx: true) ## Добавить Grafana дашборд через ConfigMap ```yaml apiVersion: v1 kind: ConfigMap metadata: name: my-dashboard namespace: monitoring labels: grafana_dashboard: "1" data: my-dashboard.json: | { "title": "My App", "panels": [...] } ``` ## ServiceMonitor — добавить своё приложение ```yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: my-app namespace: my-app labels: release: prom # совпадает с prometheus_stack_release_name spec: selector: matchLabels: app: my-app endpoints: - port: metrics interval: 30s path: /metrics ``` ## PrometheusRule — создать алерт ```yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: my-app-alerts namespace: my-app labels: release: prom spec: groups: - name: my-app rules: - alert: MyAppDown expr: up{job="my-app"} == 0 for: 5m labels: severity: critical annotations: summary: "My App is down" description: "{{ $labels.instance }} has been down for more than 5 minutes." ``` ## Alertmanager — настроить уведомления ```yaml # В values Prometheus stack (через JCasC/ConfigMap): alertmanager: config: route: receiver: slack receivers: - name: slack slack_configs: - api_url: "https://hooks.slack.com/services/..." channel: "#alerts" text: "{{ range .Alerts }}{{ .Annotations.description }}{{ end }}" ``` ## PromQL примеры ```promql # CPU usage по нодам 100 - (avg by(node) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) # Memory usage (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 # HTTP error rate ingress sum(rate(nginx_ingress_controller_requests{status=~"5.."}[5m])) by (ingress) # Pod restarts increase(kube_pod_container_status_restarts_total[1h]) > 0 ``` ## Официальные ресурсы - Официальный сайт: [https://prometheus.io/](https://prometheus.io/) - Официальная документация: [https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) - Версии Helm chart / ПО: [https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack)