Difference between revisions of "Better logging"

From CRIU
Jump to navigation Jump to search
(This is the documentation of the work done during the GSoC 2020 period, for the project of optimizing logging engine. It contains the work done, usage of the work done and the tasks left pending.)
m (Add link to the wip/flog branch)
 
Line 14: Line 14:
  
 
==== The Current Implementation ====
 
==== The Current Implementation ====
 +
 +
The current implementation for binary logging is available in the [https://github.com/checkpoint-restore/criu/tree/wip/flog <code>wip/flog</code> branch].
  
 
The binary logger works such that it dumps the raw input string with all its arguments into a file. This implementation saves the time required to scan the input string and also the time consumed in coping these arguments into the string. It initially stores all the information in a buffer, and after the buffer fills up, its contents are dumped into a file. This file can later be passed into the decoder, which works independently of the criu execution, and can be read whenever needed.
 
The binary logger works such that it dumps the raw input string with all its arguments into a file. This implementation saves the time required to scan the input string and also the time consumed in coping these arguments into the string. It initially stores all the information in a buffer, and after the buffer fills up, its contents are dumped into a file. This file can later be passed into the decoder, which works independently of the criu execution, and can be read whenever needed.

Latest revision as of 00:56, 29 October 2024

Logging in CRIU should obey the following rules

  1. Default logging should provide enough data for typical investigation of "can't dump"/"can't restore"
  2. It should be possible to add developer-only debugging to see some more details
  3. It should be possible to shut the logger up completely (maybe by specifying the log file as /dev/null)

Binary logging[edit]

The Binary Logging Utility[edit]

The Previous Implementation[edit]

The logging facility available in criu uses printf family functions to store the logs into the file. The issue with the printf family functions is that they take much time to execute. These functions first scan the input string for format specifiers. The arguments passed are then typecast to string and then copied into the input string. This log file is also very often left unused unless some failure occurs.

The Current Implementation[edit]

The current implementation for binary logging is available in the wip/flog branch.

The binary logger works such that it dumps the raw input string with all its arguments into a file. This implementation saves the time required to scan the input string and also the time consumed in coping these arguments into the string. It initially stores all the information in a buffer, and after the buffer fills up, its contents are dumped into a file. This file can later be passed into the decoder, which works independently of the criu execution, and can be read whenever needed.

The binary logging thus saves the CPU execution time during the running of the process, and since these logs are rarely used unless some failure occurs, this conserves a significant amount of time. Although the overall time spent by the binary logger, when running with the decoder, is similar to the regular printer's time.

Working of Binary Logger[edit]

Criu currently supports both, logging using the regular printer and logging in binary. Adding a -b or --binary-log option at the end of a criu command allows it to use the binary printer and log in binary.

The simple loop page shows how to dump a trivial program. To store the logs generated during this process in binary, these are the changes are needed:

Dump it:

# criu dump -t 2221 -vvv -o dump.log -b && echo OK 
OK

Or

# criu dump -t 2221 -vvv -o dump.log --binary-log && echo OK
OK

Restore:

# criu restore -d -vvv -o restore.log --b && echo OK
OK

Or

# criu restore -d -vvv -o restore.log --binary-log && echo OK
OK

The Decoder[edit]

The decoder is placed in the crit folder. The decoder takes the input as the file that needs to be processed and the file where the output needs to be presented. The decoder reads the file and passes the format string and the arguments through a printf family function. The logs are then processed in this function, and the output is presented in the output file provided. Whenever the decoder encounters logs that are not printed by the flog printer into the file, it merely copies it into the output file as it is.

To Decode:

# crit/decoder dump.log decoded_dump.log && echo OK
OK

Improvements and Pending tasks[edit]

This project was added to criu as a part of GSoC 2020. These are a few issues faced and left unresolved during the duration.

Aarch64 and arm issue[edit]

Currently, the binary logging is not supported in the aarch64 architecture and the arm architecture. The decoder collects all the arguments in an array. This array is then typecast to void* and passed in the place of va_list to vsnprintf (The printf family function used to format this raw data). This va_list is accepted as type void* in all other architectures but causes issues with the aarch64 and arm architecture, causing the decoder to fail.

_Generic function issue[edit]

Currently, the binary logger is supported for GCC versions greater than C11. This limitation is caused due to the usage of _Generic function in the binary logging utility. This function is needed to return the type of the argument passed as input to the printf. This function was added to GCC and is supported from version C11 and forth.

Add timestamps and PID[edit]

Currently, the logs are being stored in the file without any other formatting added to it. The logs printed via the regular printer, in addition to the expected print statements, add timestamp and PID into the log file. This feature is yet to be added to the binary printer.

Optimize using .rodata[edit]

To further optimize the binary logger's performance, the string being stored in the file during logging could be referenced using .rodata.

Log messages structuring[edit]

CRIU generates tons of messages. The most critical to troubleshooting is the pr_err-s. Need to introduce types of errors to make troubleshooting easier. E.g. we have the guide what to do when C/R fails. Need to introduce error types into it and print them before the error text. Suggested types of failures are:

  • Restore errors
    • Resource ID conflict (mostly valid for non-containers case)
      • PID mismatch
      • File with the name we need exists
      • CGroups exist (?)
    • Missing item on restore
  • Dump errors
  • Common errors
    • Out of memory
    • Access denied / permission denied
    • Other unexpected/unhandled syscall error
    • Error reading/writing images
    • Proc file format error (do we really expect this thing?)