Core Reliability Metrics

MTTR — Mean Time To Repair

Measure and reduce average repair times to improve maintenance efficiency.

  1. 1Identify all failure events and document exact start/end of repairs
  2. 2Sum all repair durations and divide by repair count
  3. 3Benchmark against industry standards (< 2h is world-class for most equipment)
  4. 4Target high-frequency failures first for maximum MTTR reduction

MTBF — Mean Time Between Failures

Quantify equipment reliability and set predictive maintenance intervals.

  1. 1Log all operating time and each failure event precisely
  2. 2Calculate MTBF = Total Operating Time ÷ Number of Failures
  3. 3Use MTBF to schedule preventive maintenance before expected failure
  4. 4Track trends — declining MTBF signals degradation

MTTF — Mean Time To Failure

Evaluate the expected lifespan of non-repairable components and parts.

  1. 1Apply to non-repairable parts such as bearings, fuses, and seals
  2. 2Calculate from time-to-failure data on a batch of identical items
  3. 3Use to set replacement schedules before MTTF is reached
  4. 4Consider using Weibull analysis for more accuracy with small samples

Availability Analysis

Calculate and optimise system uptime for critical operations.

  1. 1Choose method: uptime/total time, or MTBF/(MTBF+MTTR)
  2. 2Separate planned vs unplanned downtime for clearer insights
  3. 3Target 99.9%+ for critical assets (≤ 8.77 hours downtime/year)
  4. 4Improve availability by increasing MTBF or decreasing MTTR

OEE — Overall Equipment Effectiveness

Measure manufacturing efficiency across availability, performance, and quality.

  1. 1Collect planned production time, actual run time, ideal/actual cycle times, good/total units
  2. 2OEE = Availability × Performance × Quality (all as decimals)
  3. 3World class OEE ≥ 85% — most manufacturers start at 40–60%
  4. 4Focus on the weakest of the three components first

Downtime Cost Calculator

Quantify the true financial impact of unplanned outages.

  1. 1Include direct costs: lost revenue, labour, material waste
  2. 2Include indirect costs: customer penalties, rush orders, reputation
  3. 3Use the result to justify reliability improvement investments
  4. 4Compare cost across assets to prioritise maintenance spend

Spare Parts Optimisation

Determine optimal stock levels to balance cost against service level.

  1. 1Estimate mean annual demand (λ) from historical consumption
  2. 2Choose target service level (95% for standard, 99% for critical)
  3. 3Use Poisson CDF to find minimum stocking quantity
  4. 4Review periodically as equipment ages and demand changes

Advanced Reliability Analysis

Reliability R(t) Calculator

Calculate the probability of surviving a mission time without failure.

  1. 1Input MTBF or failure rate λ, and the mission duration t
  2. 2R(t) = e^(−λt) — assumes constant (exponential) failure rate
  3. 3Use to set reliability targets for new designs
  4. 4For degradation/wear-out, switch to Weibull analysis

System Reliability

Model complex systems with series, parallel, and k-of-n configurations.

  1. 1Draw a reliability block diagram of your system
  2. 2Series: Rs = R₁ × R₂ × … (all must work)
  3. 3Parallel: Rs = 1 − (1−R₁)(1−R₂)… (redundancy improves Rs)
  4. 4Use k-of-n for voting systems (e.g., 2-of-3 engines must work)

Failure Rate Analysis

Determine equipment failure rate and its relationship to reliability.

  1. 1λ = Failures ÷ Total Operating Hours
  2. 2Convert to FIT (Failures In Time = λ × 10⁹) for electronics
  3. 3λ = 1 / MTBF — use either value in your calculators
  4. 4Plot failure rate over time to identify bathtub curve phase

Weibull Analysis

Fit failure data to the Weibull distribution for life prediction.

  1. 1Collect time-to-failure data (at least 10–20 data points)
  2. 2Estimate β (shape) and η (scale) by fitting to failure data
  3. 3β < 1 = infant mortality, β = 1 = random, β > 1 = wear-out
  4. 4Use η and β to predict warranty returns, spare needs, and replacement schedules

FMEA — Failure Mode & Effects Analysis

Proactively identify and prioritise failure risks before they occur.

  1. 1List all failure modes for each component or function
  2. 2Score Severity (1–10), Occurrence (1–10), Detection (1–10)
  3. 3RPN = S × O × D — prioritise items with RPN > 100
  4. 4Develop corrective actions targeting the highest-scoring factor

Gage R&R

Validate that your measurement system is reliable and repeatable.

  1. 1Select 2–3 operators, 10 parts, measure each part twice per operator
  2. 2Analyse repeatability (within-operator) and reproducibility (between-operator)
  3. 3%GRR < 10% = acceptable, 10–30% = marginal, > 30% = unacceptable
  4. 4Address training, fixtures, or tool calibration based on dominant source of error

Quality & Statistical Tools

Process Capability (Cp/Cpk)

Assess whether your process meets customer specification limits.

  1. 1Collect ≥ 30 data points from a stable, in-control process
  2. 2Cp = (USL − LSL) / 6σ — potential capability
  3. 3Cpk = min of upper/lower capability — actual capability
  4. 4Target Cpk ≥ 1.33 for standard, ≥ 1.67 for critical dimensions

DPMO & Sigma Level

Measure process quality in Sigma level for Six Sigma benchmarking.

  1. 1Define what counts as a defect and what counts as an opportunity
  2. 2DPMO = (Defects / (Units × Opportunities)) × 1,000,000
  3. 3Look up DPMO on a sigma table to get your sigma level
  4. 46σ = 3.4 DPMO. Most processes start at 3–4σ (66,807–6,210 DPMO)

Sample Size Calculator

Determine statistically valid sample sizes for studies and audits.

  1. 1Define confidence level (95% or 99%) and acceptable margin of error
  2. 2Estimate population standard deviation if known
  3. 3For proportions: use p=0.5 if unknown — gives largest (most conservative) n
  4. 4Increase n if process is critical or very expensive to fail

Control Chart (SPC)

Distinguish common cause from special cause variation in real time.

  1. 1Collect 20–25 subgroups (n=4 or 5 per subgroup) from stable process
  2. 2Calculate X̄̄, R̄, and control limits using standard constants
  3. 3Plot ongoing data — investigate any point outside control limits
  4. 4Do NOT adjust the process based on common cause variation alone

Pareto Chart

Apply the 80/20 rule to prioritise quality improvement efforts.

  1. 1List all problem categories and count frequency or impact
  2. 2Sort descending and calculate cumulative percentage
  3. 3Draw bar chart with cumulative line — identify the "vital few"
  4. 4Focus improvement actions on items in the leftmost 20% of causes

Fishbone Diagram

Systematically identify root causes using the 6M framework.

  1. 1Define the problem statement (the "head" of the fish)
  2. 2Brainstorm causes in 6 categories: Man, Machine, Method, Material, Measurement, Environment
  3. 3Use "5 Whys" to drill deeper into each branch
  4. 4Vote on most likely root causes and validate with data

Histogram

Visualise data distribution to detect patterns, skew, and outliers.

  1. 1Collect continuous measurement data (≥ 50 readings preferred)
  2. 2Choose bin width: Sturges rule → bins ≈ 1 + 3.3 × log₁₀(n)
  3. 3Look for normal bell curve — skew or bimodal shape signals issues
  4. 4Overlay specification limits to assess capability visually

Scatter Diagram

Test and visualise correlation between two process variables.

  1. 1Collect paired (x, y) data across the full operating range
  2. 2Plot x vs y — visual pattern reveals correlation type
  3. 3Calculate Pearson r: |r| > 0.7 = strong, < 0.3 = weak
  4. 4Correlation ≠ causation — investigate with DOE to confirm

Check Sheet

Collect structured defect or event frequency data efficiently.

  1. 1Define categories to track before data collection starts
  2. 2Record tally marks in real time during production or inspection
  3. 3Calculate frequency and relative frequency per category
  4. 4Feed results into a Pareto chart for prioritisation

Ready to apply these methods?

Use our free interactive calculators to put these guides into practice. All tools work in your browser — no signup required.