Changes

Jump to navigation Jump to search
395 bytes removed ,  13:43, 5 December 2017
→‎Needs to be done (TODO): autofs is fixed by Stas, AFAIK
Line 1: Line 1: −
[[Category: Development]]
+
== Compatible applications ==
[[Category: Under the hood‏‎]]
  −
 
  −
=== Compatible applications ===
      
On x86_64 there are two types of compatible applications:
 
On x86_64 there are two types of compatible applications:
Line 15: Line 12:  
The following text uses ''compatible'' and ''32-bit'' in the meaning of ia32 applications unless otherwise specified.
 
The following text uses ''compatible'' and ''32-bit'' in the meaning of ia32 applications unless otherwise specified.
   −
=== Difference between native and compat applications ===
+
== Difference between native and compat applications ==
    
From the CPU's point of view, 32-bit compatibility mode applications differ to 64-bit application by current CS (code segment selector): if corresponding value of L-bit from flags of entry in descriptors table is set the CPU will be in 64-bit mode when this segment descriptor is being used. There are some other differences between 32 and 64-bit selectors, one can read about them [https://www.malwaretech.com/2014/02/the-0x33-segment-selector-heavens-gate.html in the article "The 0x33 Segment Selector (Heavens Gate)"]. Code selectors for both bits are defined in kernel headers as <code>__USER32_CS</code> and <code>__USER_CS</code> and corresponds to descriptors in GDT (Global Descriptors Table). One can change 64-bit mode to compatibility mode by swapping CS value (e.g., with longjump).
 
From the CPU's point of view, 32-bit compatibility mode applications differ to 64-bit application by current CS (code segment selector): if corresponding value of L-bit from flags of entry in descriptors table is set the CPU will be in 64-bit mode when this segment descriptor is being used. There are some other differences between 32 and 64-bit selectors, one can read about them [https://www.malwaretech.com/2014/02/the-0x33-segment-selector-heavens-gate.html in the article "The 0x33 Segment Selector (Heavens Gate)"]. Code selectors for both bits are defined in kernel headers as <code>__USER32_CS</code> and <code>__USER_CS</code> and corresponds to descriptors in GDT (Global Descriptors Table). One can change 64-bit mode to compatibility mode by swapping CS value (e.g., with longjump).
Line 22: Line 19:  
Both native and compat applications can do 32 or 64-bit syscalls.
 
Both native and compat applications can do 32 or 64-bit syscalls.
   −
=== Approaches to C/R compatible applications ===
+
== Mixed-bitness applications ==
 +
 
 +
That's entirely possible with current kernel ABI to create mixed-bitness applications, which may be ''very'' entangled.
 +
For example, one could set ''both'' 32-bit and 64-bit robust futex list pointers.
 +
Or one can create multi-threaded application where some threads are executing 32-bit code, some 64-bit code.
 +
 
 +
If we ever meet application of such mixed-bitness kind, the support may be added to CRIU quite easily, but it should be done under some compile-time config as it'll add more syscalls to usual C/R where they aren't needed.
 +
 
 +
At this moment there is no plans to add such support and it's quite unlikely that we'll find such application in real world (non-syntetic test).
 +
 
 +
== Approaches to C/R compatible applications ==
    
C/R of compatible applications can be done differently, this section describes cons/pros of each, to address decision why C/R of 32-bit tasks done ''that'' way and not some other.
 
C/R of compatible applications can be done differently, this section describes cons/pros of each, to address decision why C/R of 32-bit tasks done ''that'' way and not some other.
   −
==== Restore with exec() of 32-bit dummy binary vs from 64-bit CRIU ====
+
=== Restore with exec() of 32-bit dummy binary vs from 64-bit CRIU ===
    
Restore of 32-bit application can be done with some daemon that runs in 32-bit mode and communicates with CRIU binary (or 32-bit CRIU subprocess).
 
Restore of 32-bit application can be done with some daemon that runs in 32-bit mode and communicates with CRIU binary (or 32-bit CRIU subprocess).
Line 41: Line 48:  
* will need also another daemon for x32
 
* will need also another daemon for x32
   −
==== Restore with a flag to sigreturn() or arch_prctl() ====
+
=== Restore with a flag to sigreturn() or arch_prctl() ===
    
The initial attempt to do 32-bit C/R, was rejected by lkml community by many reasons. It should have swapped thread info flags (such as <code>TIF_ADDR32</code>/<code>TIF_IA32</code>/<code>TIF_X32</code>), unmap native 64-bit vDSO blob from process's address space and map compatible 32-bit vDSO - all according to some bit in sigframe in <code>rt_sigreturn()</code> call or some dedicated for it <code>arch_prctl()</code> call.
 
The initial attempt to do 32-bit C/R, was rejected by lkml community by many reasons. It should have swapped thread info flags (such as <code>TIF_ADDR32</code>/<code>TIF_IA32</code>/<code>TIF_X32</code>), unmap native 64-bit vDSO blob from process's address space and map compatible 32-bit vDSO - all according to some bit in sigframe in <code>rt_sigreturn()</code> call or some dedicated for it <code>arch_prctl()</code> call.
Line 55: Line 62:  
After discussion in lkml, conclusion was: separate changing personality (like thread info flags) from API to map vDSO blobs, remove TIF_IA32 flag that differs 32 from 64-bit tasks and look on syscall's nature: compat syscall, x32 syscall or native syscall.
 
After discussion in lkml, conclusion was: separate changing personality (like thread info flags) from API to map vDSO blobs, remove TIF_IA32 flag that differs 32 from 64-bit tasks and look on syscall's nature: compat syscall, x32 syscall or native syscall.
   −
==== Seizing with two 32-bit and 64-bit parasites ====
+
=== Seizing with two 32-bit and 64-bit parasites ===
    
'''Pros''':
 
'''Pros''':
Line 66: Line 73:  
* serialization of parasite's answers: arguments to parasite differ in size - serialize them, which added not nice-looking and less readable C macros
 
* serialization of parasite's answers: arguments to parasite differ in size - serialize them, which added not nice-looking and less readable C macros
   −
==== Current approach ====
+
=== Current approach ===
    +
FIXME
   −
=== Needs to be done (TODO) ===
+
== Needs to be done (TODO) ==
   −
==== Bug with mmaping over 4Gb ====
+
=== Kernel patch for vsyscall page ===
   −
As 32-bit application is restored from 64-bit CRIU, some task's properties that were filled on <code>exec()</code> are left, which is quite unusual for 32-bit task. One of the things, left from 64-bit binary is precalculated <code>mmap_base</code> which is used to find task's top/bottom address limit during <code>mmap()</code> syscall. That means that compat <code>sys_mmap()</code> may map page over 4Gb address and return 4-byte pointer to low bytes of address. Looks like no one has used compatible mmap in 64-bit binary. Results in broken mmap in restored 32-bit application, which can map vma over 4Gb.
+
That's emulated page, not a vma - affects only in /proc/<pid>/maps for restored process. Depends on !TIF_IA32 && !TIF_X32 - Andy got patches for disabling the emulation on per-pid basics, for now I ran tests with <code>vsyscall=none</code> boot parameter because zdtm.py checks maps before/after C/R.
   −
Patches to fix this bug at this moment were posted on lkml, but not yet accepted. See [[Upstream kernel commits]]. If they will not go to v4.9-stable, the kerndat test for 32-bit C/R will be reworked to check if the bug present in kernel (which is not nice thing, but ok).
+
=== Error dump on x32-bit app dumping ===
   −
==== List of failed tests ====
+
At this moment we'll support only compat ia32 applications, attempt to dump x32 compat binary should result in error.
   −
The table is being kept up-to-date by [[User:Dsafonov]] with latest kernel/CRIU patches in his environment, some of which may be yet not in tree or even yet not sent.
+
=== Continue removing TIF_IA32 from uprobes & Oprofile ===
   −
{| class="wikitable"
+
This flag should be gone as it's suggested by Andy & Oleg.
! Name
+
There is quite lot of work to make kernel work without it, but small gain:
! Fail reason
+
the restored ia32 process will be traced by uprobes/oprofile and stuff like that.
|-
  −
| fpu01 || no ia32 version
  −
|-
  −
| sse00 || no ia32 version
  −
|-
  −
| futex-rl || sys_get_robust_list() should be compat syscall for 32-bit tasks: kernel keeps two different lists: <code>robust_list</code> and <code>compat_robust_list</code> in <code>task_struct</code>
  −
|-
  −
| rtc || no 32-bit version of rtc test library
  −
|-
  −
| vdso01 || no ia32 version
  −
|-
  −
| file_locks08 || ?
  −
|-
  −
| file_locks07 || ?
  −
|-
  −
| socket-tcp6-last-ack || ?
  −
|-
  −
| sse20 || ?
  −
|-
  −
| file_locks06 || ?
  −
|-
  −
| autofs || ?
  −
|-
  −
| sigpending || ?
  −
|-
  −
| fpu00 || ?
  −
|-
  −
| socket-tcp-last-ack || ?
  −
|-
  −
| mmx00 || ?
  −
|}
     −
==== Fixes for older kernels ====
+
== External links ==
 +
* [https://github.com/checkpoint-restore/criu/issues/43 github issue]
   −
For kernels with backported mainline patches for 32-bit C/R (like vzkernel) there are a couple of things to do like different sizes of vdso/vvar (or vvar may not be even present).
+
[[Category: Under the hood‏‎]]
105

edits

Navigation menu