Changes

2 bytes added ,  22:10, 13 February 2017
slight reformatting
Line 1: Line 1: −
[[Category: Development]]
+
== Compatible applications ==
[[Category: Under the hood‏‎]]
  −
[https://github.com/xemul/criu/issues/43 Github issue]
  −
 
  −
=== Compatible applications ===
      
On x86_64 there are two types of compatible applications:
 
On x86_64 there are two types of compatible applications:
Line 16: Line 12:  
The following text uses ''compatible'' and ''32-bit'' in the meaning of ia32 applications unless otherwise specified.
 
The following text uses ''compatible'' and ''32-bit'' in the meaning of ia32 applications unless otherwise specified.
   −
=== Difference between native and compat applications ===
+
== Difference between native and compat applications ==
    
From the CPU's point of view, 32-bit compatibility mode applications differ to 64-bit application by current CS (code segment selector): if corresponding value of L-bit from flags of entry in descriptors table is set the CPU will be in 64-bit mode when this segment descriptor is being used. There are some other differences between 32 and 64-bit selectors, one can read about them [https://www.malwaretech.com/2014/02/the-0x33-segment-selector-heavens-gate.html in the article "The 0x33 Segment Selector (Heavens Gate)"]. Code selectors for both bits are defined in kernel headers as <code>__USER32_CS</code> and <code>__USER_CS</code> and corresponds to descriptors in GDT (Global Descriptors Table). One can change 64-bit mode to compatibility mode by swapping CS value (e.g., with longjump).
 
From the CPU's point of view, 32-bit compatibility mode applications differ to 64-bit application by current CS (code segment selector): if corresponding value of L-bit from flags of entry in descriptors table is set the CPU will be in 64-bit mode when this segment descriptor is being used. There are some other differences between 32 and 64-bit selectors, one can read about them [https://www.malwaretech.com/2014/02/the-0x33-segment-selector-heavens-gate.html in the article "The 0x33 Segment Selector (Heavens Gate)"]. Code selectors for both bits are defined in kernel headers as <code>__USER32_CS</code> and <code>__USER_CS</code> and corresponds to descriptors in GDT (Global Descriptors Table). One can change 64-bit mode to compatibility mode by swapping CS value (e.g., with longjump).
Line 23: Line 19:  
Both native and compat applications can do 32 or 64-bit syscalls.
 
Both native and compat applications can do 32 or 64-bit syscalls.
   −
=== Approaches to C/R compatible applications ===
+
== Approaches to C/R compatible applications ==
    
C/R of compatible applications can be done differently, this section describes cons/pros of each, to address decision why C/R of 32-bit tasks done ''that'' way and not some other.
 
C/R of compatible applications can be done differently, this section describes cons/pros of each, to address decision why C/R of 32-bit tasks done ''that'' way and not some other.
   −
==== Restore with exec() of 32-bit dummy binary vs from 64-bit CRIU ====
+
=== Restore with exec() of 32-bit dummy binary vs from 64-bit CRIU ===
    
Restore of 32-bit application can be done with some daemon that runs in 32-bit mode and communicates with CRIU binary (or 32-bit CRIU subprocess).
 
Restore of 32-bit application can be done with some daemon that runs in 32-bit mode and communicates with CRIU binary (or 32-bit CRIU subprocess).
Line 42: Line 38:  
* will need also another daemon for x32
 
* will need also another daemon for x32
   −
==== Restore with a flag to sigreturn() or arch_prctl() ====
+
=== Restore with a flag to sigreturn() or arch_prctl() ===
    
The initial attempt to do 32-bit C/R, was rejected by lkml community by many reasons. It should have swapped thread info flags (such as <code>TIF_ADDR32</code>/<code>TIF_IA32</code>/<code>TIF_X32</code>), unmap native 64-bit vDSO blob from process's address space and map compatible 32-bit vDSO - all according to some bit in sigframe in <code>rt_sigreturn()</code> call or some dedicated for it <code>arch_prctl()</code> call.
 
The initial attempt to do 32-bit C/R, was rejected by lkml community by many reasons. It should have swapped thread info flags (such as <code>TIF_ADDR32</code>/<code>TIF_IA32</code>/<code>TIF_X32</code>), unmap native 64-bit vDSO blob from process's address space and map compatible 32-bit vDSO - all according to some bit in sigframe in <code>rt_sigreturn()</code> call or some dedicated for it <code>arch_prctl()</code> call.
Line 56: Line 52:  
After discussion in lkml, conclusion was: separate changing personality (like thread info flags) from API to map vDSO blobs, remove TIF_IA32 flag that differs 32 from 64-bit tasks and look on syscall's nature: compat syscall, x32 syscall or native syscall.
 
After discussion in lkml, conclusion was: separate changing personality (like thread info flags) from API to map vDSO blobs, remove TIF_IA32 flag that differs 32 from 64-bit tasks and look on syscall's nature: compat syscall, x32 syscall or native syscall.
   −
==== Seizing with two 32-bit and 64-bit parasites ====
+
=== Seizing with two 32-bit and 64-bit parasites ===
    
'''Pros''':
 
'''Pros''':
Line 67: Line 63:  
* serialization of parasite's answers: arguments to parasite differ in size - serialize them, which added not nice-looking and less readable C macros
 
* serialization of parasite's answers: arguments to parasite differ in size - serialize them, which added not nice-looking and less readable C macros
   −
==== Current approach ====
+
=== Current approach ===
    +
FIXME
   −
=== Needs to be done (TODO) ===
+
== Needs to be done (TODO) ==
   −
==== Error dump on x32-bit app dumping ====
+
=== Error dump on x32-bit app dumping ===
    
At this moment we'll support only compat ia32 applications, attempt to dump x32 compat binary should result in error.
 
At this moment we'll support only compat ia32 applications, attempt to dump x32 compat binary should result in error.
   −
==== Bug with mmaping over 4Gb ====
+
=== Bug with mmaping over 4Gb ===
    
As 32-bit application is restored from 64-bit CRIU, some task's properties that were filled on <code>exec()</code> are left, which is quite unusual for 32-bit task. One of the things, left from 64-bit binary is precalculated <code>mmap_base</code> which is used to find task's top/bottom address limit during <code>mmap()</code> syscall. That means that compat <code>sys_mmap()</code> may map page over 4Gb address and return 4-byte pointer to low bytes of address. Looks like no one has used compatible mmap in 64-bit binary. Results in broken mmap in restored 32-bit application, which can map vma over 4Gb.
 
As 32-bit application is restored from 64-bit CRIU, some task's properties that were filled on <code>exec()</code> are left, which is quite unusual for 32-bit task. One of the things, left from 64-bit binary is precalculated <code>mmap_base</code> which is used to find task's top/bottom address limit during <code>mmap()</code> syscall. That means that compat <code>sys_mmap()</code> may map page over 4Gb address and return 4-byte pointer to low bytes of address. Looks like no one has used compatible mmap in 64-bit binary. Results in broken mmap in restored 32-bit application, which can map vma over 4Gb.
Line 82: Line 79:  
Patches to fix this bug at this moment were posted on lkml, but not yet accepted. See [[Upstream kernel commits]]. If they will not go to v4.9-stable, the kerndat test for 32-bit C/R will be reworked to check if the bug present in kernel (which is not nice thing, but ok).
 
Patches to fix this bug at this moment were posted on lkml, but not yet accepted. See [[Upstream kernel commits]]. If they will not go to v4.9-stable, the kerndat test for 32-bit C/R will be reworked to check if the bug present in kernel (which is not nice thing, but ok).
   −
==== List of failed tests ====
+
=== List of failed tests ===
    
The table is being kept up-to-date by [[User:Dsafonov]] with latest kernel/CRIU patches in his environment, some of which may be yet not in tree or even yet not sent.
 
The table is being kept up-to-date by [[User:Dsafonov]] with latest kernel/CRIU patches in his environment, some of which may be yet not in tree or even yet not sent.
Line 115: Line 112:  
|}
 
|}
   −
==== Fixes for older kernels ====
+
=== Fixes for older kernels ===
    
For kernels with backported mainline patches for 32-bit C/R (like vzkernel) there are a couple of things to do like different sizes of vdso/vvar (or vvar may not be even present).
 
For kernels with backported mainline patches for 32-bit C/R (like vzkernel) there are a couple of things to do like different sizes of vdso/vvar (or vvar may not be even present).
   −
==== Fault-inject test for vDSO trampolines ====
+
=== Fault-inject test for vDSO trampolines ===
    
Should ensure that they work. Need to be done for both native/compat C/R.
 
Should ensure that they work. Need to be done for both native/compat C/R.
   −
==== Kernel patch for vsyscall page ====
+
=== Kernel patch for vsyscall page ===
    
That's emulated page, not a vma - affects only in /proc/<pid>/maps for restored process. Depends on !TIF_IA32 && !TIF_X32 - Andy got patches for disabling the emulation on per-pid basics, for now I ran tests with <code>vsyscall=none</code> boot parameter because zdtm.py checks maps before/after C/R.
 
That's emulated page, not a vma - affects only in /proc/<pid>/maps for restored process. Depends on !TIF_IA32 && !TIF_X32 - Andy got patches for disabling the emulation on per-pid basics, for now I ran tests with <code>vsyscall=none</code> boot parameter because zdtm.py checks maps before/after C/R.
 +
 +
== External links ==
 +
* [https://github.com/xemul/criu/issues/43 github issue]
 +
 +
[[Category: Development]]
 +
[[Category: Under the hood‏‎]]