What is Priority Inversion? Why You Care and What to Do About It

By far the most pervasive mechanism to control the sequencing of tasks in time-constrained systems is priority scheduling. Priority scheduling means that whenever a task or thread is ready to execute, its priority relative to other threads determines the actual order of execution. Ideally, priorities in a time-constrained system should always reflect the time constraints for the tasks. A useful priority assignment heuristic is to assign higher priorities to tasks that are part of tighter time constraints. Thus, a task that needs to respond in a shorter time will get higher priority than a task that needs to respond in a longer time. This intuitive heuristic, known as monotonic scheduling, works well when the system is not overloaded. The more commonly known rate-monotonic scheduling is a special case of this heuristic for periodic tasks. (For detailed explanations of these scheduling mechanisms, see the excellent textbook Real-Time Systems by Jane W. S. Liu (Prentice Hall; ISBN 0130996513; 1st edition).

What Is Priority Inversion?

The most desirable state of a priority scheduler is that the system should always be executing the task with the highest priority. However, it is impossible for an operating system to invariably achieve this state. When circumstances within the system force a higher-priority task to wait for a lower-priority task, priority inversion occurs. This can lead to unbounded priority inversion (discussed in the next section), where a high-priority thread misses its deadline.

Priority inversion is sometimes necessary, and is often outside the control of the operating system developer. At other times, priority inversion might be a deliberate consequence of design decisions made by system developers, as an attempt to balance priority enforcement with other desirable characteristics, such as maintaining integrity.

An example of priority inversion that is typically outside of an operating system developer’s control is the result of the interrupt architecture in most modern machines. When an interrupt occurs, most operating systems will handle the interrupt immediately, regardless of the time constraint of the tasks depending on it. At least some of the processing is done at high priority, to find out which task is waiting for the interrupt.

Another example is when a low-priority task has locked a resource (such as a device or data structure), and a higher-priority task attempts to lock it, the higher-priority task will remain blocked until the low-priority task releases the lock.

Because every operating system experiences incidents of priority inversion, the issue is not the presence of priority inversion, but rather the duration of each cause of priority inversion. The duration of priority inversion is very important in assessing whether an operating system will be suitable for a particular application or system.

What Is Unbounded Priority Inversion?

In most operating systems, an application might actually suffer unbounded priority inversion. An operating system that has large unbounded priority inversions makes a real-time application highly susceptible to timing failures.

Unbounded priority inversion is defined as any situation in which the duration of priority inversion is dependent not only on the time required to handle a shared resource, but on the (unpredictable) actions of other unrelated tasks as well. For example, in 1997, NASA landed a rover robot (Pathfinder) on Mars. The landing was very successful, using a highly innovative “bouncing ball” structure that opened up after landing to release the robot. A few days later, however, the lander software determined erroneously that its internal integrity had been lost and initiated a system reset sequence, aborting that day’s operations. The system reset repeated the next day. After a heroic effort by the Jet Propulsion Laboratory (JPL) team that built the Pathfinder, the problem was traced to priority inversion. Detailed examination of the priority inversion on MARS Pathfinder and the resolution using priority inheritance can be found at http://research.microsoft.com/~mbj/Mars_Pathfinder/Mars_Pathfinder.html.

The Pathfinder lander software contained a number of tasks. This document considers three of those tasks:

periodically checks the health of the spacecraft systems and software.
processes image data.
performs an occasional test on equipment status.

In each period, resets a “Go/Nogo” hardware timer. It is assumed that the integrity of the lander software has been compromised if this timer ever times out. The processor is halted, all devices are reset, the software is completely reloaded, the spacecraft systems are tested, and the system starts over. This process does not complete until the next day.

Attempts to Lock

–Normal Execution	– Execution in Critical Section	– Unbounded Priority Inversion

The problem is that and share a common data structure, locking a mutex to control access to it. Thus, if locks the mutex, and tries to lock it, must wait. Every now and then, when has the mutex locked, and is waiting for it, runs because it has a higher priority than . Thus, must wait for both and to complete and fails to reset the timer before it expires.

Even though nothing was actually wrong, the JPL team (which built the Pathfinder) traced the system failure to an unbounded priority inversion after extensive analysis. The team solved the problem by enabling priority inheritance for the mutex.

Unbounded priority inversion happens in many circumstances. It can happen in a preemptible operating system kernel unless it is explicitly avoided (most operating systems are not protected). Device drivers frequently lock critical structures or disable interrupts for long time periods, resulting in unbounded priority inversion. IP packets are placed in FIFO (first in, first out) queues without regard to priority, resulting in unbounded priority inversion. Applications lock data structures using semaphores or mutexes, resulting in unbounded priority inversion.

Avoiding Unbounded Priority Inversion

While there are many known academic solutions to the problem of avoiding unbounded priority inversion, in practice two techniques are commonly used: priority ceiling emulation and disabling interrupts.

The priority ceiling emulation technique raises the priority of any locking thread to its priority ceiling, defined as the highest priority of any thread that ever uses that lock. This requires the developer to supply a priority ceiling for each lock. In contrast, priority inheritance raises the priority of a thread only when holding a lock causes it to block a higher-priority thread. When a higher-priority thread is blocked, the low-priority thread inherits the priority of the higher-priority thread it is blocking.

Kernel threads can also disable interrupts or preemption. Disabling interrupts or preemption is an effective locking technique that avoids unbounded priority inversion, but is of limited general applicability since it stops all other processing. It’s like turning all traffic lights red in a city whenever any car wishes to cross any intersection. However, because of its low overhead, it can work well for locks that are never held for more than a few instructions.

Priority Inheritance vs. Priority Ceiling Emulation

Both the priority inheritance protocol and the priority ceiling emulation protocol^* have strengths and weaknesses. When used correctly, each are useful components of a real-time designer’s toolbox.

Priority inheritance can be difficult in a complex environment; implementers not familiar with its nuances can easily make mistakes. However, there is ample evidence that it can also be used to great advantage. For example, in one jitter test (where jitter refers to the maximum deviation in the measured period of a periodic task), using an RTOS running on a heavily loaded 1.4GHz Pentium processor, the worst-case jitter dropped from 76,492 microseconds to 52 microseconds simply by enabling priority inheritance in the kernel. 1.4GHz is somewhat faster than is usually available in embedded systems, but even at this speed the worst-case jitter is still just over 70 milliseconds. Priority inheritance illustrates that faster hardware doesn’t make an application behave more predictably, but avoiding priority inversion does.

Priority ceiling emulation has certain desirable properties — it has good worst-case priority inversion control, it handles nested locks well, and can avoid deadlock in some cases. Priority inheritance can occasionally lead to poorer worst-case behavior when there are nested locks, and does not avoid deadlocks. However, most performance-critical applications and real-time operating systems minimize their use of nested locks, and there are other mechanisms to avoid deadlock when nesting cannot be avoided.

On the other hand, priority inheritance can be implemented so that there is no penalty when the locks are not contended, which covers the vast majority of time-constrained systems. This, in addition to the fact that many extra context switches are avoided, and medium-priority tasks are not preempted unnecessarily, leads to excellent average performance. In practical systems, including both hard and soft real-time systems, average performance is as important as worst-case response. In contrast, priority ceiling emulation will pay the cost of changing a thread’s priority twice regardless of whether there is contention for the lock or not, resulting in higher overhead and many unnecessary context switches and blocking in unrelated tasks.

Priority ceiling emulation is an attractive choice when the set of threads contending for the lock is known, when there may be nested locks, and when worst-case behavior is the only concern. On the other hand, priority inheritance is very effective when a lock is seldom part of a nested set, and when average performance is relevant in addition to worst-case performance.

Another important aspect to understand is that optimal use of priority ceiling emulation requires static analysis of the system to find the priority ceiling of each lock. While static analysis is highly desirable (and even necessary) for many applications with critical response requirements, it may be neither desirable nor cost-effective in many other applications in which only a few parts of the system are critical. Also, formal real-time analysis is primarily applicable to systems that are constructed according to a set of precise rules with limited dynamic operations. For systems that cannot be readily analyzed, priority inheritance is likely to be a more effective mechanism since it does not require static analysis.

Design and Analysis of Real-Time Systems

Priority inheritance and priority ceiling emulation are both effective and powerful techniques to prevent uncontrolled priority inversion when locks are used to protect critical sections in a real-time system. Real-time software designers must make intelligent decisions to use the appropriate technique, depending on their system requirements.

The solution to priority inversion problems starts with a sound architecture and design that must consider the decomposition of the system into tasks and shared resources, and how they impact the system’s ability to meet its timing constraints. Many performance problems can be solved by developing an architecture that avoids unnecessary coupling between tasks through shared resources.

Operating systems for performance-critical applications should provide a complete set of tools to manage priority inversion.

Protecting Resources

The foregoing discussion has concentrated on synchronization without creating unbounded priority inversion; it started with the supposition that locking is required to protect resources. Actually, it is important to remember that in a time-constrained application the best locking is no locking at all. There are four simple rules that can be applied for protecting resources:

Use non-locking atomic operations (such as a flip-buffer) to avoid locking. This approach is clearly optimal with respect to priority inversion.
Hold a lock (when it must be acquired) for as short a time as possible.
Ensure that you understand the limitations of each of the various methods for managing priority inversion when locks are being used.
Ensure that an operating system provides all the tools for managing priority inversion so you have the flexibility to meet a wide range of performance requirements.

Within the operating system, there are several things that can be done to handle priority inversion:

Make the OS kernel fully preemptible. When it needs to lock resources, the kernel’s locking should use mutexes that use priority inheritance to bound blocking within the kernel.
Prevent device drivers from disabling interrupts.
Run interrupt handlers at appropriate priorities, instead of arbitrarily high priorities (as most operating systems do).
Prioritize network packet processing according to the socket owner’s priority.

These features minimize or entirely prevent priority inversion, removing a major source of development delays for time-constrained software.

*Often incorrectly referred to as priority ceiling protocol. The original priority ceiling protocol, described in the paper “Priority Inheritance protocols: An approach to real-time synchronization,” by Sha, Lohoczky, and Rajkumar, is expensive to implement and not available in any commercial operating system. PCP emulation (also known as the highest locker protocol) is a simplified version of the original PCP protocol and is part of various standards such as POSIX (POSIX_PRIO_PROTECT), the Real-Time Specification for Java (RTSJ), OSEK, and the Ada 95 real-time specification. Priority inheritance is also part of standards such as POSIX (POSIX_PRIO_INHERIT) and the RTSJ.