τ₀
While separation kernel systems are typically configured statically, there are use cases which require runtime reconfiguration of the system. For Muen, this shall be made possible by the subject Tau0 (τ₀). In contrast to the kernel, Tau0 can configure the system in a non-realtime and non-parallel manner. This reduces complexity both in the kernel and the system configuration logic.
Among others, Tau0 allows to
-
add and remove subjects,
-
modify the page table structures of subjects, kernel instances and VT-d device domains,
-
assign device resources to a subject (interrupts, IO ports, device memory).
For this purpose, Tau0 accepts a command stream, i.e. a stream of instructions that describe how to modify the system’s state. Each command is checked for validity and against the system invariants. It is executed only if the checks are successful, so that each performed command leads again to a sound system state. The following listing shows an example command stream in XML format.
<tau0>
<commands>
<addProcessor id="0" apicId="42"/>
<addIoapic sid="16#f0f8#"/>
<addMemoryBlock address="16#0000#" size="262144"/>
<!-- Set up device.
We can create PCI devices (with B/D/F) and legacy devices. -->
<createPCIDevice device="0" bus="16#0#" dev="16#1#" func="1" usesMSI="false"/>
<activateDevice device="0"/>
<createPCIDevice device="1" bus="16#0#" dev="16#0#" func="0" usesMSI="false"/>
<!-- Add device IRQs, IO ports, memory. Device memory is marked
as reserved. -->
<addIRQDevice device="1" irq="33"/>
<addIOPortRangeDevice device="1" from="16#0000#" to="16#000a#"/>
<addMemoryDevice device="1" address="16#000a_0000#" size="32" caching="WC"/>
<activateDevice device="1"/>
<clearPage page="16#2300_0000#"/>
<clearPage page="16#2300_1000#"/>
<clearPage page="16#2300_2000#"/>
<!-- Allocate the VT-d table structures. -->
<createVTdRootTable page="16#2300_0000#"/>
<createVTdContextTable page="16#2300_1000#" bus="16#0#"/>
Tau0 has been formally proven to be free of run-time errors; moreover there is work in progress to prove that Tau0 does not violate system invariants (see below).
Modes
Tau0 can be executed in two basic modes:
-
In static mode, Tau0 runs at system integration time (i.e. most likely as a Linux process). In this mode, Tau0 generates a system image from the command stream, which is written to disk and can be booted later on.
-
In dynamic mode, Tau0 runs as a special Muen subject, and may, based on the command stream, modify the runtime state of the system.
This makes the following three scenarios possible:
-
Tau0 integrates the system in static mode, and the system cannot be modified at runtime.
-
Tau0 integrates a minimal Muen system in static mode. At startup, this system comprises only the Muen kernel and a dynamic instance of Tau0. The latter may start other subjects at runtime as described by an included static command stream.
-
As above, but Tau0 is allowed to start and stop other subjects at runtime. In this case, the command stream may originate from some controller subject.
Currently, only static mode is supported. |
Command Stream Phases
The command stream processing consists of two distinct phases: setup phase and running phase.
In the setup phase, the basic system configuration is communicated to Tau0. This comprises, i.a., the system hardware configuration. The data supplied in the setup phase is taken to be axiomatically true. This phase is exclusively executed in static mode.
In the running phase, the basic configuration received in the setup phase may no longer be modified. Commands from the command stream are processed in order to modify the state of the system. This phase is part of the dynamic, as well as the static mode.
It is worth reiterating, that in Tau0 static mode, the system state is fixed at integration time and cannot be changed later on. |
Data
Tau0 comprises the following runtime data.
Typed Memory
System Memory View
For the purpose of managing the system state, Tau0 has a view on the system memory. The latter is linearly mapped into Tau0’s virtual address space at some offset. In order to prevent and/or detect interference of Tau0 with active parts of the system, subsets of the system memory view can be unmapped from and remapped to Tau0’s virtual address space. We make sure that active pages are in general not accessible (i.e. mapped) by Tau0.
Memory Typization
For each 4KB page of the system memory, Tau0 stores its (basic) typization; i.e. the information whether this page is
-
Undefined — it is free to use and has no defined contents,
-
Zeroed — it is free to use and contains only zero bits,
-
Reserved — it may not be used, read, or written, or
-
Used — it is currently in use, e.g. as a page table, part of a memory region, or otherwise.
The basic typization of a page may be refined to a full typization — for example, the full typization of some page might reveal that the page in question is not just "Used", but a level 2 EPT page table of some subject. Full typizations are not directly computed at runtime (they are SPARK Ghost values), but they figure extensively in contracts and proofs.
Private Pages
There is also the concept of Tau0 private pages: these are pages mapped at an offset that is greater than the maximal physical address supported by the x86 CPU. They are used by static Tau0 to store internal data structures which are not written to the system image. See below for more details.
Devices
Tau0 stores information on the built-in devices of the system. This information comprises the PCI bus/device/function number of the device, and the device-owned interrupts, IO ports, and memory.
The device information is specified in the command stream’s setup phase.
Before the setup phase of Tau0’s command stream can be completed, all created devices must have been activated.
Root Objects
Tau0 also keeps a table of root objects. A root object (or briefly root) is an entity in the system that possesses an own virtual address space. Currently, this comprises
-
Instances of the Muen kernel (one for each CPU)
-
Subjects
-
Memory regions, and
-
VT-d device domains.
Every root object may be active or inactive. An active root object is currently running in the system and may not be modified by Tau0. However, Tau0 can modify the properties, page tables, etc., of inactive root objects.
Just as a root object may be active or inactive, the same holds for the pages and page tables associated to the root object. A page (table) may only be modified by Tau0 if it is inactive. Moreover, a page table may only be active if all pages or page tables it points to are also active.
The fact whether a page (table) P is active is stored in an unused bit of the
PTE that points to P.[1] If P is a top-level page
table, then the information is instead stored within the corresponding root
object (flag PTP_Active
).
The lifecycle of a root object is described in the following figure.
unused<--------
| ^ |
Create (properties) | | Destroy |
v | |
setup teardown (modify root and its page tables)
| ^
Lock | | Unlock
v |
locked ((de)activate page tables)
| ^
Activate | | Deactivate
v |
active (may not modify root)
Root Object States
Creation
A root object is created and initially configured with a command of the form
Create_*
. Depending on the type of the root object, the command expects
different parameters.
Destruction
A root object whose virtual address space is empty (i.e. its page table pointer is not present) and which is assigned no system resources (e.g. hardware interrupts etc.), can be destroyed.
Page Table Modification
In states "Setup" and "Teardown", Tau0 may modify
-
the root object’s page tables, and
-
the resources (hardware interrupts etc.) assigned to the root object.
Page tables are created in a top-down manner: first, the top-level page table for the root object is created (its address is stored in the root’s page table pointer), then the page tables pointed to by the root’s top-level page table, and so on.
On the other hand, page tables are destroyed in a bottom-up manner: a page table may only be destroyed if none of its entries are present.
Activation/Deactivation
In state "Locked", the page tables of the root object may be activated or deactivated. If a root’s whole accessible address space has been activated, the root may be activated itself. If the root’s address space is empty, the root can be unlocked.
Activation and deactivation of a root object may involve particular operations depending on the kind of the root. For example, when a device domain is (de)activated, Tau0 modifies the system VT-d tables accordingly.
In dynamic Tau0 mode, when a page is activated, it is unmapped from Tau0’s virtual address space. Thus, we ensure that active pages cannot be modified by Tau0. Moreover, we avoid inconsistencies regarding the caching mode (PAT) of the pages. The page is mapped back to Tau0’s virtual address space as soon as it is deactivated.
As an exception to the above, page tables of memory regions (which conceptually belong to Tau0 and are not accessible by other subjects) are never unmapped from Tau0’s virtual address space. This allows us to perform page translation on a memory region even if it is active.[2] |
Root Object Kinds
We describe the different kinds of Tau0 root objects.
Memory Regions
The purpose of a memory region is to collect an arbitrary number of page
frames. The virtual address space of a memory region is always of the form {0,
…, n * PAGE_SIZE
- 1} for some non-negative integer n. There are
commands to append an unused page to a memory region, and to remove (truncate)
the uppermost page of a memory region. Note that although the virtual address
space of a memory region is a contiguous interval, the mapped physical pages
need not be.[3]
Memory regions can be attached to other root objects. When a memory region has been attached to a root R, its page frames may be mapped into the address space of R. It is in this way that we may load programs into memory, create shared memory channels between subjects, and so on. Pages of a memory region may only be mapped if the memory region is active, and the memory region must stay active as long as some of its pages are mapped.
If Tau0 acts merely as a system integrator, the page tables of memory regions are stored in Tau0 private memory, as they will not be required at system runtime.
Subjects
As described above, we may map pages from a memory region to a subject. Moreover, we can assign device resources to a subject (if allowed by policy). Furthermore specific access rights (e.g. access to model-specific-registers) may be granted to subjects by adapting the respective VM management structures accordingly.
Finally execution of subjects may be controlled by adapting one or more scheduling plans.
Kernels
Instances of the Muen kernel (one for each system CPU) are set up similarly to subjects during system integration time. In contrast to subjects, kernels cannot be modified at runtime (as they cannot be deactivated).
VT-d Device Domains
The Intel VT-d system allows to present devices with a virtual address space. For this, a device is assigned to a device domain by setting an entry in a VM table called VT-d context table. This entry contains a page table pointer, as well as an ID for the device domain the device is associated to. Each entry that is associated to the same VT-d device domain shall have the same page table pointer.
Tau0 abstracts from this by viewing the VT-d device domains as root objects. Devices can be added to / removed from a device domain root object in state Setup or Teardown. On activation of a device domain root object, the corresponding entries in the VT-d context tables are written.
Full Typization
As described above, the full typization of a page is a refinement to its basic typization. Currently, only the basic typization "Used" has non-trivial refinements. These are the following. For each full typization, we give a sufficient condition for when it holds.
-
EPT4
,EPT3
,EPT2
,EPT1
,where
EPTi
means that the page is an EPT page table of level i (with indices decreasing towards the leaves of the tree).Condition: There is a subject with EPT paging such that the page is reachable from the subject in (5 - i) steps.
-
IA32e_PT4
,IA32e_PT3
,IA32e_PT2
,IA32e_PT1
,as above, but for IA32e page tables.
Condition: There is a subject with IA32e paging such that the page is reachable from the subject in (5 - i) steps.
-
MPT4
,MPT3
,MPT2
,MPT1
,as above, but for the page tables of memory regions.
Condition: There is a memory region with root level k ∈ {1, …, 5} such that the page is reachable from the memory region in (k - i) steps.
-
VTd_PT4
,VTd_PT3
,VTd_PT2
,VTd_PT1
,as above, but for the page tables of VT-d device domains.
Condition: There is a device domain with root level k ∈ {4,5} such that the page is reachable from the domain in (k - i) steps.
-
Tau0_PT4
,Tau0_PT3
,Tau0_PT2
,Tau0_PT1
,the page tables of Tau0 itself (note that Tau0 is able to access them in dynamic mode only).
Condition: The page is reachable from Tau0’s PTP in (5 - i) steps.
-
MR_Page
,the (level-0) page frames of a memory region. Note that these may also be reachable from other root objects if they have been mapped to them.
Condition: There is a memory region with root level k ∈ {1, …, 5} from which the page is reachable in k steps.
-
Device_Page
,a page which is owned by some device.
Condition: There is a device which owns a memory interval that contains the page.
-
IO_Bitmap_Low
,IO_Bitmap_High
,i.e. the page is one of the two IO bitmaps associated to some subject.
Condition: There is a subject whose
IO_Bitmap_Low
orIO_Bitmap_High
address is equal to the page. -
MSR_Bitmap
,i.e. the page is the MSR bitmap associated to some subject.
Condition: There is a subject whose
MSR_Bitmap
address is equal to the page. -
VTd_Root_Table
,the page is the VT-d root table (see the section on device domains).
Condition: The VT-d root table is present and its address is equal to the page.
-
VTd_Context_Table
,the page is a VT-d context table — i.e. it is referenced in a present entry of the VT-d root table.
Condition: The VT-d root table is present and it has a present entry that refers to the page.
-
VTd_IR_Table
,the page is the VT-d interrupt remapping table.
Condition: The VT-d IR table is present and its address is equal to the page.
Lemmas are provided (currently in Data.Memory.Full_Typization
) which can be
used to prove that a page is of a certain full typization. Thus, the
precondition of a procedure which operates, e.g., only on page tables, can be
shown.
System Properties
Invariants
We list the following invariants. Some are rather technical, while others are important for Tau0’s security assurances.
-
Whenever a root object M has been assigned to another root object R, M must be of kind memory region and active.
-
No two distinct kernels may be assigned to the same CPU
-
Each device assigned to a subject is active
-
Each device added to a device domain is active, a PCI device, and its VT-d context table has been created.
-
If a page P has been mapped into the virtual address space of a root R, then
-
either P is a memory region page (full typization
MR_Page
) and R is not a memory region, and the memory region of P has been assigned to R, or -
P is a device page (full typization
Device_Page
) and R is a subject (only they may refer device pages), and the device that owns P has been assigned to R.
-
-
The full typization of every 4K page must be unique. That is, at most one of the conditions listed above is satisfied, and at most by one assignment (i.e. no two distinct subjects may own the same page table etc.)
Note that this condition precludes "sharing" of page tables and pages between root objects, except for mapped memory regions and device memory.
-
Descendants of active roots/page tables are active.
-
Dynamic mode: Let P be a page with full typization
EPTi
,PTi
,VTd_PTi
,MR_Page
, orDevice_Page
. Then P is accessible iff P is not active.Since a subject can only be activated if all its pages are active, this ensures that Tau0 cannot modify the memory of running subjects.
Verification
Isabelle Model
Up to now, verification of Tau0 is entirely in SPARK. Properties of page translation, accessibility, and full typization are axiomatized by Ghost procedures.
Later, these axioms will be proven with an Isabelle model of Tau0.
Invariant Preservation
Currently, there remain several procedures whose invariants cannot be
proven. This is due to the fact that the invariants depend on abstract state
Data.Memory.Memory_State
. So whenever there is a write to memory (regardless
whether the memory location is relevant to the invariant), the invariant breaks.
This will be addressed in an upcoming release, by providing lemma procedures which specify the memory locations the respective invariants depend on. For example, one of these lemmas might specify that the invariant for VT-d device domains is preserved whenever a memory write does not affect the VT-d root and context tables.
Compilation
Execute make
in the working directory of Tau0 to compile the binary and
perform proofs.
Using a current Pro version of GNATProve (20.0w, 20190216), 99% of all generated verification conditions are discharged. Results may vary depending on the toolchain. In particular, the version of GNATProve bundled with GNAT Community 2019 proves only 97% of them.
Source Code Structure
-
Tau0
-
Data
-
Roots
Represent root objects (subjects, device domains, kernel instances, memory regions). Package contains accessor functions.
-
Writers
This package contains operations which modify root objects. We use a child package to avoid circular package dependencies.
-
-
Memory
Represent system memory and its typization
-
VTd_IOMMU
Represent VTd IOMMU tables
-
Writers
Operations on VTd IOMMU tables
-
-
Interrupt_Remapping
Represent VTd interrupt remapping tables
-
-
Devices
Represent devices present in the system
-
Writers
Operations to add devices, as well as device memory, interrupts, IO ports. The operations may only be called in the command stream’s setup phase.
-
-
-
Commands
Tau0 commands — generally call procedures in package Data. Here, we check the preconditions of the called procedures.
-
Command_Stream
Provide a stream of commands. In static mode, the commands are extracted from an XML file. In dynamic mode, they will be read from a channel.
-
Command_Processor
Read from
Command_Stream
and dispatch toCommands
.
-