Operations / SRE Engineer

What you do in ITIP
Core workflows
Using frameworks
1. ISO 25010 — Operational quality characteristics
2. ISO 25012 — Data quality in production
Example: Detect and remediate compliance drift
Example: Set up observability-driven governance for a new service

What you do in ITIP

You monitor compliance drift against governed quality constraints. You inspect near real-time architecture state sourced from observability platforms. You trigger remediation when Norm violations are detected.

ITIP connects your observability stack to the governance plane. You act on governed constraints, not ad-hoc alerts.

Core workflows

Monitor compliance drift

Observability platforms (Datadog, Grafana, Prometheus) collect SLI/SLO metrics. ITIP ingests this telemetry and continuously evaluates Norm assertions against live data.

When a Norm evaluation fails, ITIP flags a compliance drift:

VIOLATION: payment-gateway
  Norm: availability-30d
  Assertion: ServiceTelemetry.availability30d >= 0.999
  Current value: 0.9973
  Tolerance: SUSTAINED (30d window)
  Governing Directive: "MUST ENSURE availability ON payment-gateway"
  Regulatory preset: NIS2

The violation includes full governance context — which Directive it traces to, which regulatory preset it belongs to, and what the tolerance mode is.

Inspect near real-time architecture state

Navigate to Operations > Architecture State. ITIP shows the current state of all governed Structures:

Column	What it shows
Structure	The governed component
Compliance status	Green / Amber / Red based on Norm evaluation
Drift indicators	Code-sourced definitions vs. authored definitions
Active violations	Norms currently in violation with duration
Last telemetry	Timestamp of most recent observability data

This is the live architecture state — not a diagram drawn months ago. It reflects what is actually running.

Trigger remediation workflows

When a compliance drift is detected:

ServiceNow incident — ITIP creates an incident linked to the violated Norm, including governance context and current metric values.
Jira remediation task — created in the owning team’s backlog with the violated Norm as acceptance criteria.
Governance loop tracking — ITIP tracks resolution: Norm re-evaluation → compliance restored → incident closed.

The remediation workflow is governance-driven. The Norm defines what must be fixed. The tolerance mode defines how quickly a violation must be resolved.

Tolerance modes in practice

The three tolerance modes determine how ITIP evaluates compliance:

Mode	Ops meaning	Example
INSTANTANEOUS	Every single measurement must pass	`ResponseTime.p99Ms <= 500` — any single violation is a breach
AGGREGATED	Statistical aggregate must pass	`ErrorRate.percent <= 0.1` over 1h window — individual spikes allowed
SUSTAINED	Continuous compliance over a time window	`Availability.percent >= 99.9` over 30d — sustained performance required

AGGREGATED and SUSTAINED modes are where SRE practices (error budgets, burn rates) map directly into ITIP Norms.

Using frameworks

ISO 25010 — Operational quality characteristics

ISO 25010 quality characteristics map to your monitoring targets:

Characteristic	Ops concern	Typical metrics
ProductReliability	Availability, fault tolerance, recoverability	Uptime %, MTTR, error rate
PerformanceEfficiency	Latency, throughput, resource utilization	p50/p95/p99 latency, RPS, CPU/memory
ProductSecurity	Runtime security posture	TLS compliance, auth failure rate, vulnerability scan

Norms using these characteristics are your governance-driven SLOs.

ISO 25012 — Data quality in production

For data pipelines and data-processing services:

Characteristic	Ops concern
DataCurrentness	Is data fresh enough? Staleness detection, refresh lag
DataAvailability	Is the data accessible when needed? Storage uptime, query success rate
DataRecoverability	Can data be restored after failure? Backup validation, RPO/RTO compliance

Example: Detect and remediate compliance drift

Context: The Payment Gateway is in production. The SRE team monitors via Datadog. ITIP has ingested SLI metrics.

Drift detected — ITIP flags: availability-30d Norm in violation. Current value 99.73%, threshold 99.9%.
Inspect context — the Norm traces to a NIS2 regulatory preset and the architecture board’s availability Directive.
Trigger remediation — ServiceNow incident created. Jira task added to the payments team’s sprint.
Investigate — the Reconciliation view shows a recent deployment changed a timeout configuration. Code-sourced definitions diverged from governed values.
Resolution — team fixes the configuration. ITIP re-evaluates: compliance restored. Incident auto-closed.

Result: Compliance monitoring is continuous and automated. The governance context tells you why the metric matters, not just that a number is wrong.

Example: Set up observability-driven governance for a new service

Context: A new service is being deployed. You want observability-driven Norm evaluation from day one.

Check governed Norms — open the Structure’s Norms tab. See the quality Norms defined by the architect.
Map to metrics — each Norm assertion references a quality dimension. Map those to your telemetry sources (Datadog monitors, Grafana dashboards, Prometheus rules).
Configure telemetry ingestion — ITIP’s observability integration ingests metrics and evaluates Norms continuously.
Verify initial state — the Operations dashboard shows all Norms green (or flags gaps if telemetry is missing).

Result: Governance-driven monitoring from the first deployment. SLOs are defined by architecture Norms, not invented ad-hoc by the ops team.