docs: полная документация проекта — docs/ и README.md для каждого аддона

- README.md: перепиcан как компактный обзор (98 строк) с навигацией по docs/ - docs/: 13 файлов — getting-started, architecture, configuration, addons, storage, security, cicd, observability, networking, operations, make-reference, molecule-testing, troubleshooting - addons/*/README.md: 31 новый файл — описание, параметры, примеры кода для каждого из 34 аддонов (vault и external-secrets уже существовали)
2026-04-26 00:22:06 +03:00
parent 1080e6a792
commit eccc1c2a01
45 changed files with 5838 additions and 1670 deletions
--- a/addons/prometheus-stack/README.md
+++ b/addons/prometheus-stack/README.md
@@ -0,0 +1,144 @@
+# kube-prometheus-stack
+
+Полный monitoring-стек: Prometheus, Grafana, Alertmanager, node-exporter, kube-state-metrics. Все аддоны проекта автоматически создают ServiceMonitor при включённом prometheus-stack.
+
+## Быстрый старт
+
+```yaml
+# group_vars/all/addons.yml
+addon_prometheus_stack: true
+```
+
+Секреты в `vault.yml`:
+```yaml
+vault_grafana_user: "admin"
+vault_grafana_password: "secure-password"
+```
+
+```bash
+make addon-prometheus-stack
+```
+
+## Параметры
+
+| Переменная | Умолч. | Описание |
+|---|---|---|
+| `prometheus_retention_days` | `7` | Срок хранения метрик |
+| `prometheus_storage_size` | `10Gi` | PVC Prometheus |
+| `grafana_storage_size` | `5Gi` | PVC Grafana |
+| `prometheus_alertmanager_enabled` | `true` | Alertmanager |
+| `prometheus_grafana_ingress_enabled` | `false` | Grafana через Ingress |
+
+## Доступ к Grafana
+
+По умолчанию — NodePort 32000:
+```
+http://192.168.1.10:32000
+```
+
+Через Ingress:
+```yaml
+prometheus_grafana_ingress_enabled: true
+prometheus_grafana_ingress_host: "grafana.example.com"
+```
+
+## Предустановленные дашборды
+
+- **Kubernetes / Cluster Overview** — ресурсы кластера
+- **Node Exporter Full** — метрики нод
+- **Pod Monitoring** — метрики подов
+- **Ingress Nginx** — HTTP метрики (при addon_ingress_nginx: true)
+
+## Добавить Grafana дашборд через ConfigMap
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: my-dashboard
+  namespace: monitoring
+  labels:
+    grafana_dashboard: "1"
+data:
+  my-dashboard.json: |
+    {
+      "title": "My App",
+      "panels": [...]
+    }
+```
+
+## ServiceMonitor — добавить своё приложение
+
+```yaml
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: my-app
+  namespace: my-app
+  labels:
+    release: prom    # совпадает с prometheus_stack_release_name
+spec:
+  selector:
+    matchLabels:
+      app: my-app
+  endpoints:
+    - port: metrics
+      interval: 30s
+      path: /metrics
+```
+
+## PrometheusRule — создать алерт
+
+```yaml
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: my-app-alerts
+  namespace: my-app
+  labels:
+    release: prom
+spec:
+  groups:
+    - name: my-app
+      rules:
+        - alert: MyAppDown
+          expr: up{job="my-app"} == 0
+          for: 5m
+          labels:
+            severity: critical
+          annotations:
+            summary: "My App is down"
+            description: "{{ $labels.instance }} has been down for more than 5 minutes."
+```
+
+## Alertmanager — настроить уведомления
+
+```yaml
+# В values Prometheus stack (через JCasC/ConfigMap):
+alertmanager:
+  config:
+    route:
+      receiver: slack
+    receivers:
+      - name: slack
+        slack_configs:
+          - api_url: "https://hooks.slack.com/services/..."
+            channel: "#alerts"
+            text: "{{ range .Alerts }}{{ .Annotations.description }}{{ end }}"
+```
+
+## PromQL примеры
+
+```promql
+# CPU usage по нодам
+100 - (avg by(node) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
+
+# Memory usage
+(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
+
+# HTTP error rate ingress
+sum(rate(nginx_ingress_controller_requests{status=~"5.."}[5m])) by (ingress)
+
+# Pod restarts
+increase(kube_pod_container_status_restarts_total[1h]) > 0
+```