Changes

Jump to navigation Jump to search
3,588 bytes added ,  20:54, 12 August 2018
no edit summary
Line 1: Line 1:  
== Summary ==
 
== Summary ==
   −
XSAVE stands for similar x86 instruction [https://hjlebbink.github.io/x86doc/html/XSAVE.html xsave] which places extended processor state into the memory area. The saving can be initiated by any userspace application, where the size of the frame being written to memory upon the instruction depends on processor capabilities and may vary between different models. This aspect may cause a problem in a case of images migration.
+
XSAVE stands for similar x86 instruction [https://hjlebbink.github.io/x86doc/html/XSAVE.html <code>xsave</code>] which places extended processor state into a memory area. The saving can be initiated by any userspace application at any moment and size of the memory frame depends on processor features and may vary between different models. Thus if checkpoint and restore are done on different processors the next call to <code>xsave</code> may corrupt memory if sizes mismatch.
 +
 
 +
=== Helpers ===
 +
 
 +
There are several helpers we will refer on in this page
 +
 
 +
static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
 +
unsigned int *ecx, unsigned int *edx)
 +
{
 +
/* ecx is often an input as well as an output. */
 +
asm volatile("cpuid"
 +
    : "=a" (*eax),
 +
      "=b" (*ebx),
 +
      "=c" (*ecx),
 +
      "=d" (*edx)
 +
    : "0" (*eax), "2" (*ecx)
 +
    : "memory");
 +
}
 +
 
 +
static inline void cpuid(unsigned int op,
 +
unsigned int *eax, unsigned int *ebx,
 +
unsigned int *ecx, unsigned int *edx)
 +
{
 +
*eax = op;
 +
*ecx = 0;
 +
native_cpuid(eax, ebx, ecx, edx);
 +
}
 +
 
 +
static inline void cpuid_count(unsigned int op, int count,
 +
      unsigned int *eax, unsigned int *ebx,
 +
      unsigned int *ecx, unsigned int *edx)
 +
{
 +
*eax = op;
 +
*ecx = count;
 +
native_cpuid(eax, ebx, ecx, edx);
 +
}
 +
 
 +
=== Frame size ===
 +
 
 +
Run <code> cpuid(0x1, &eax, &ebx, &ecx, &edx)</code> and bits 26 and 27 are both set in <code>ecx</code> if <code>xsave</code> is supported (strictly speaking bit 27 is reserved for operating system which can clear it to indicate that instruction is disabled).
 +
 
 +
After that we can fetch maximal frame size which applications may use via <code>cpuid_count(0xd, 0, &eax, &ebx, &ecx, &edx)</code>, in result <code>ebx</code> will contain the size to keep currently enabled components of the frame and <code>ecx</code> will keep the value of maximal frame size. The maximal here means the size needed when all components are enabled (OS may disable some of components).
 +
 
 +
=== Enumerating frame components ===
 +
 
 +
To enumerate which components of the frame are enabled execute <code>cpuid_count(0xd, 0, &eax, &ebx, &ecx, &edx)</code>. Each component will have bit set to 1 in 64 bit mask <code>eax + ((uint64_t)edx << 32)</code> if enabled.
 +
 
 +
Current list of known components is the following (numbers are the bit position):
 +
 
 +
* <code>0</code>: x87 floating point registers
 +
* <code>1</code>: SSE registers
 +
* <code>2</code>: AVX registers
 +
* <code>3</code>: MPX bounds registers
 +
* <code>4</code>: MPX CSR
 +
* <code>5</code>: AVX-512 opmask
 +
* <code>6</code>: AVX-512 Hi256
 +
* <code>7</code>: AVX-512 ZMM_Hi256
 +
* <code>8</code>: Processor Trace
 +
* <code>9</code>: Protection Keys User registers
 +
* <code>10</code>: Hardware Duty Cycling
 +
 
 +
Once the bit mask is obtained we have to walk over each bit set and call <code>cpuid_count(0xd, component, &eax, &ebx, &ecx, &edx)</code>, where <code>component</code> is the bit position we are interested in. In other words it should be from 0 to 10. The result of this call is sitting in <code>ebx</code> which represent offset of the component from the frame base address and <code>eax</code> which shows component size. Note that some of components are supervisor components and if <code>(ecx & 1) == 0</code> from the <code>cpuid_count</code> call above then its offset should not be considered while size is still valid.
 +
 
 +
=== Potential memory corruption ===
 +
 
 +
When processes are dumped and restored on different cpu, the application may have remembered frame size on its own somewhere inside own code and in worst scenario it may allocate memory with size less than needed on different cpu, so the next call to <code>xsave</code> silently overwrite memory leading to sigsegv in best case.
 +
 
 +
Current criu implementation check for <code>cpuinfo</code> images to be compatible and size and features required to match. In turn some OS may mask some of the features with cpuid faulting engine but still all cpus in the pool should report same maximal size of the frame.

Navigation menu