The state of a service may go up and down over time.

For each observation, downtime is the instantaneous time it went down, which is after (i.e. greater than) the moment it went up, uptime. The difference (downtime minus uptime) is the amount of time it was operating between these two events.

To quantify dependability, need to know distribution (e.g. mean) of time to failure or restore: MTTF/MTTR

  • Mean Time To Failure (MTTF) is a basic measure of reliability for non-repairable systems. It is the mean time expected until the first failure of a piece of equipment. MTTF is a statistical value and is meant to be the mean over a long period of time and a large number of units.
  • Mean Time To Restore (MTTR) is a basic measure of the maintainability of repairable items. It represents the average time required to repair a failed component or device. Expressed mathematically, it is the total corrective maintenance time divided by the total number of corrective maintenance actions during a given period of time. However, it is usually more interested in service restoration than component repair (see “Responding to faults”).
  • Mean time between failures (MTBF) is a reliability term used to provide the amount of failures per million hours for a product. The MTBF is typically part of a model that assumes the failed system is immediately repaired, as a part of a renewable process. This is in contrast to the mean time to failure, which measures average time to failures with the modeling assumption that the failed system is not repaired (infinite repair rate). MTBF = MTTF + MTTR

