| Line 1: |
Line 1: |
| − | [[Category: Development]]
| + | == Compatible applications == |
| − | [[Category: Under the hood]]
| |
| − | [https://github.com/xemul/criu/issues/43 Github issue]
| |
| − | | |
| − | === Compatible applications ===
| |
| | | | |
| | On x86_64 there are two types of compatible applications: | | On x86_64 there are two types of compatible applications: |
| Line 16: |
Line 12: |
| | The following text uses ''compatible'' and ''32-bit'' in the meaning of ia32 applications unless otherwise specified. | | The following text uses ''compatible'' and ''32-bit'' in the meaning of ia32 applications unless otherwise specified. |
| | | | |
| − | === Difference between native and compat applications ===
| + | == Difference between native and compat applications == |
| | | | |
| | From the CPU's point of view, 32-bit compatibility mode applications differ to 64-bit application by current CS (code segment selector): if corresponding value of L-bit from flags of entry in descriptors table is set the CPU will be in 64-bit mode when this segment descriptor is being used. There are some other differences between 32 and 64-bit selectors, one can read about them [https://www.malwaretech.com/2014/02/the-0x33-segment-selector-heavens-gate.html in the article "The 0x33 Segment Selector (Heavens Gate)"]. Code selectors for both bits are defined in kernel headers as <code>__USER32_CS</code> and <code>__USER_CS</code> and corresponds to descriptors in GDT (Global Descriptors Table). One can change 64-bit mode to compatibility mode by swapping CS value (e.g., with longjump). | | From the CPU's point of view, 32-bit compatibility mode applications differ to 64-bit application by current CS (code segment selector): if corresponding value of L-bit from flags of entry in descriptors table is set the CPU will be in 64-bit mode when this segment descriptor is being used. There are some other differences between 32 and 64-bit selectors, one can read about them [https://www.malwaretech.com/2014/02/the-0x33-segment-selector-heavens-gate.html in the article "The 0x33 Segment Selector (Heavens Gate)"]. Code selectors for both bits are defined in kernel headers as <code>__USER32_CS</code> and <code>__USER_CS</code> and corresponds to descriptors in GDT (Global Descriptors Table). One can change 64-bit mode to compatibility mode by swapping CS value (e.g., with longjump). |
| Line 23: |
Line 19: |
| | Both native and compat applications can do 32 or 64-bit syscalls. | | Both native and compat applications can do 32 or 64-bit syscalls. |
| | | | |
| − | === Approaches to C/R compatible applications ===
| + | == Approaches to C/R compatible applications == |
| | | | |
| | C/R of compatible applications can be done differently, this section describes cons/pros of each, to address decision why C/R of 32-bit tasks done ''that'' way and not some other. | | C/R of compatible applications can be done differently, this section describes cons/pros of each, to address decision why C/R of 32-bit tasks done ''that'' way and not some other. |
| | | | |
| − | ==== Restore with exec() of 32-bit dummy binary vs from 64-bit CRIU ====
| + | === Restore with exec() of 32-bit dummy binary vs from 64-bit CRIU === |
| | | | |
| | Restore of 32-bit application can be done with some daemon that runs in 32-bit mode and communicates with CRIU binary (or 32-bit CRIU subprocess). | | Restore of 32-bit application can be done with some daemon that runs in 32-bit mode and communicates with CRIU binary (or 32-bit CRIU subprocess). |
| Line 42: |
Line 38: |
| | * will need also another daemon for x32 | | * will need also another daemon for x32 |
| | | | |
| − | ==== Restore with a flag to sigreturn() or arch_prctl() ====
| + | === Restore with a flag to sigreturn() or arch_prctl() === |
| | | | |
| | The initial attempt to do 32-bit C/R, was rejected by lkml community by many reasons. It should have swapped thread info flags (such as <code>TIF_ADDR32</code>/<code>TIF_IA32</code>/<code>TIF_X32</code>), unmap native 64-bit vDSO blob from process's address space and map compatible 32-bit vDSO - all according to some bit in sigframe in <code>rt_sigreturn()</code> call or some dedicated for it <code>arch_prctl()</code> call. | | The initial attempt to do 32-bit C/R, was rejected by lkml community by many reasons. It should have swapped thread info flags (such as <code>TIF_ADDR32</code>/<code>TIF_IA32</code>/<code>TIF_X32</code>), unmap native 64-bit vDSO blob from process's address space and map compatible 32-bit vDSO - all according to some bit in sigframe in <code>rt_sigreturn()</code> call or some dedicated for it <code>arch_prctl()</code> call. |
| Line 56: |
Line 52: |
| | After discussion in lkml, conclusion was: separate changing personality (like thread info flags) from API to map vDSO blobs, remove TIF_IA32 flag that differs 32 from 64-bit tasks and look on syscall's nature: compat syscall, x32 syscall or native syscall. | | After discussion in lkml, conclusion was: separate changing personality (like thread info flags) from API to map vDSO blobs, remove TIF_IA32 flag that differs 32 from 64-bit tasks and look on syscall's nature: compat syscall, x32 syscall or native syscall. |
| | | | |
| − | ==== Seizing with two 32-bit and 64-bit parasites ====
| + | === Seizing with two 32-bit and 64-bit parasites === |
| | | | |
| | '''Pros''': | | '''Pros''': |
| Line 67: |
Line 63: |
| | * serialization of parasite's answers: arguments to parasite differ in size - serialize them, which added not nice-looking and less readable C macros | | * serialization of parasite's answers: arguments to parasite differ in size - serialize them, which added not nice-looking and less readable C macros |
| | | | |
| − | ==== Current approach ====
| + | === Current approach === |
| | | | |
| | + | FIXME |
| | | | |
| − | === Needs to be done (TODO) ===
| + | == Needs to be done (TODO) == |
| | | | |
| − | ==== Error dump on x32-bit app dumping ====
| + | === Error dump on x32-bit app dumping === |
| | | | |
| | At this moment we'll support only compat ia32 applications, attempt to dump x32 compat binary should result in error. | | At this moment we'll support only compat ia32 applications, attempt to dump x32 compat binary should result in error. |
| | | | |
| − | ==== Bug with mmaping over 4Gb ====
| + | === Bug with mmaping over 4Gb === |
| | | | |
| | As 32-bit application is restored from 64-bit CRIU, some task's properties that were filled on <code>exec()</code> are left, which is quite unusual for 32-bit task. One of the things, left from 64-bit binary is precalculated <code>mmap_base</code> which is used to find task's top/bottom address limit during <code>mmap()</code> syscall. That means that compat <code>sys_mmap()</code> may map page over 4Gb address and return 4-byte pointer to low bytes of address. Looks like no one has used compatible mmap in 64-bit binary. Results in broken mmap in restored 32-bit application, which can map vma over 4Gb. | | As 32-bit application is restored from 64-bit CRIU, some task's properties that were filled on <code>exec()</code> are left, which is quite unusual for 32-bit task. One of the things, left from 64-bit binary is precalculated <code>mmap_base</code> which is used to find task's top/bottom address limit during <code>mmap()</code> syscall. That means that compat <code>sys_mmap()</code> may map page over 4Gb address and return 4-byte pointer to low bytes of address. Looks like no one has used compatible mmap in 64-bit binary. Results in broken mmap in restored 32-bit application, which can map vma over 4Gb. |
| Line 82: |
Line 79: |
| | Patches to fix this bug at this moment were posted on lkml, but not yet accepted. See [[Upstream kernel commits]]. If they will not go to v4.9-stable, the kerndat test for 32-bit C/R will be reworked to check if the bug present in kernel (which is not nice thing, but ok). | | Patches to fix this bug at this moment were posted on lkml, but not yet accepted. See [[Upstream kernel commits]]. If they will not go to v4.9-stable, the kerndat test for 32-bit C/R will be reworked to check if the bug present in kernel (which is not nice thing, but ok). |
| | | | |
| − | ==== List of failed tests ====
| + | === List of failed tests === |
| | | | |
| | The table is being kept up-to-date by [[User:Dsafonov]] with latest kernel/CRIU patches in his environment, some of which may be yet not in tree or even yet not sent. | | The table is being kept up-to-date by [[User:Dsafonov]] with latest kernel/CRIU patches in his environment, some of which may be yet not in tree or even yet not sent. |
| Line 115: |
Line 112: |
| | |} | | |} |
| | | | |
| − | ==== Fixes for older kernels ====
| + | === Fixes for older kernels === |
| | | | |
| | For kernels with backported mainline patches for 32-bit C/R (like vzkernel) there are a couple of things to do like different sizes of vdso/vvar (or vvar may not be even present). | | For kernels with backported mainline patches for 32-bit C/R (like vzkernel) there are a couple of things to do like different sizes of vdso/vvar (or vvar may not be even present). |
| | | | |
| − | ==== Fault-inject test for vDSO trampolines ====
| + | === Fault-inject test for vDSO trampolines === |
| | | | |
| | Should ensure that they work. Need to be done for both native/compat C/R. | | Should ensure that they work. Need to be done for both native/compat C/R. |
| | | | |
| − | ==== Kernel patch for vsyscall page ====
| + | === Kernel patch for vsyscall page === |
| | | | |
| | That's emulated page, not a vma - affects only in /proc/<pid>/maps for restored process. Depends on !TIF_IA32 && !TIF_X32 - Andy got patches for disabling the emulation on per-pid basics, for now I ran tests with <code>vsyscall=none</code> boot parameter because zdtm.py checks maps before/after C/R. | | That's emulated page, not a vma - affects only in /proc/<pid>/maps for restored process. Depends on !TIF_IA32 && !TIF_X32 - Andy got patches for disabling the emulation on per-pid basics, for now I ran tests with <code>vsyscall=none</code> boot parameter because zdtm.py checks maps before/after C/R. |
| | + | |
| | + | == External links == |
| | + | * [https://github.com/xemul/criu/issues/43 github issue] |
| | + | |
| | + | [[Category: Development]] |
| | + | [[Category: Under the hood]] |