Difference between revisions of "Better logging"
m |
(This is the documentation of the work done during the GSoC 2020 period, for the project of optimizing logging engine. It contains the work done, usage of the work done and the tasks left pending.) |
||
Line 7: | Line 7: | ||
== Binary logging == | == Binary logging == | ||
− | + | === The Binary Logging Utility === | |
− | The | + | ==== The Previous Implementation ==== |
+ | |||
+ | The logging facility available in criu uses printf family functions to store the logs into the file. The issue with the printf family functions is that they take much time to execute. These functions first scan the input string for format specifiers. The arguments passed are then typecast to string and then copied into the input string. This log file is also very often left unused unless some failure occurs. | ||
+ | |||
+ | ==== The Current Implementation ==== | ||
+ | |||
+ | The binary logger works such that it dumps the raw input string with all its arguments into a file. This implementation saves the time required to scan the input string and also the time consumed in coping these arguments into the string. It initially stores all the information in a buffer, and after the buffer fills up, its contents are dumped into a file. This file can later be passed into the decoder, which works independently of the criu execution, and can be read whenever needed. | ||
+ | |||
+ | The binary logging thus saves the CPU execution time during the running of the process, and since these logs are rarely used unless some failure occurs, this conserves a significant amount of time. Although the overall time spent by the binary logger, when running with the decoder, is similar to the regular printer's time. | ||
+ | |||
+ | === Working of Binary Logger === | ||
+ | |||
+ | Criu currently supports both, logging using the regular printer and logging in binary. Adding a <code>-b or --binary-log</code> option at the end of a criu command allows it to use the binary printer and log in binary. | ||
+ | |||
+ | The [https://criu.org/Simple_loop simple loop] page shows how to dump a trivial program. To store the logs generated during this process in binary, these are the changes are needed: | ||
+ | |||
+ | '''Dump it:''' | ||
+ | |||
+ | # criu dump -t 2221 -vvv -o dump.log -b && echo OK | ||
+ | OK | ||
+ | Or | ||
+ | # criu dump -t 2221 -vvv -o dump.log --binary-log && echo OK | ||
+ | OK | ||
+ | |||
+ | '''Restore:''' | ||
+ | |||
+ | # criu restore -d -vvv -o restore.log --b && echo OK | ||
+ | OK | ||
+ | Or | ||
+ | # criu restore -d -vvv -o restore.log --binary-log && echo OK | ||
+ | OK | ||
+ | |||
+ | === The Decoder === | ||
+ | |||
+ | The decoder is placed in the crit folder. The decoder takes the input as the file that needs to be processed and the file where the output needs to be presented. The decoder reads the file and passes the format string and the arguments through a printf family function. The logs are then processed in this function, and the output is presented in the output file provided. Whenever the decoder encounters logs that are not printed by the flog printer into the file, it merely copies it into the output file as it is. | ||
+ | |||
+ | '''To Decode:''' | ||
+ | # crit/decoder dump.log decoded_dump.log && echo OK | ||
+ | OK | ||
+ | |||
+ | === Improvements and Pending tasks === | ||
+ | |||
+ | This project was added to criu as a part of [https://summerofcode.withgoogle.com/projects/#5262021429297152 GSoC 2020]. These are a few issues faced and left unresolved during the duration. | ||
+ | |||
+ | ==== Aarch64 and arm issue ==== | ||
+ | |||
+ | Currently, the binary logging is not supported in the aarch64 architecture and the arm architecture. | ||
+ | The decoder collects all the arguments in an array. This array is then typecast to void* and passed in the place of va_list to vsnprintf (The printf family function used to format this raw data). | ||
+ | This va_list is accepted as type void* in all other architectures but causes issues with the aarch64 and arm architecture, causing the decoder to fail. | ||
+ | |||
+ | ==== _Generic function issue ==== | ||
+ | |||
+ | Currently, the binary logger is supported for GCC versions greater than C11. This limitation is caused due to the usage of _Generic function in the binary logging utility. This function is needed to return the type of the argument passed as input to the printf. This function was added to GCC and is supported from version C11 and forth. | ||
+ | |||
+ | ==== Add timestamps and PID ==== | ||
+ | |||
+ | Currently, the logs are being stored in the file without any other formatting added to it. The logs printed via the regular printer, in addition to the expected print statements, add timestamp and PID into the log file. This feature is yet to be added to the binary printer. | ||
+ | |||
+ | ==== Optimize using .rodata ==== | ||
+ | |||
+ | To further optimize the binary logger's performance, the string being stored in the file during logging could be referenced using .rodata. | ||
== Log messages structuring == | == Log messages structuring == |
Revision as of 11:33, 31 August 2020
Logging in CRIU should obey the following rules
- Default logging should provide enough data for typical investigation of "can't dump"/"can't restore"
- It should be possible to add developer-only debugging to see some more details
- It should be possible to shut the logger up completely (maybe by specifying the log file as /dev/null)
Binary logging
The Binary Logging Utility
The Previous Implementation
The logging facility available in criu uses printf family functions to store the logs into the file. The issue with the printf family functions is that they take much time to execute. These functions first scan the input string for format specifiers. The arguments passed are then typecast to string and then copied into the input string. This log file is also very often left unused unless some failure occurs.
The Current Implementation
The binary logger works such that it dumps the raw input string with all its arguments into a file. This implementation saves the time required to scan the input string and also the time consumed in coping these arguments into the string. It initially stores all the information in a buffer, and after the buffer fills up, its contents are dumped into a file. This file can later be passed into the decoder, which works independently of the criu execution, and can be read whenever needed.
The binary logging thus saves the CPU execution time during the running of the process, and since these logs are rarely used unless some failure occurs, this conserves a significant amount of time. Although the overall time spent by the binary logger, when running with the decoder, is similar to the regular printer's time.
Working of Binary Logger
Criu currently supports both, logging using the regular printer and logging in binary. Adding a -b or --binary-log
option at the end of a criu command allows it to use the binary printer and log in binary.
The simple loop page shows how to dump a trivial program. To store the logs generated during this process in binary, these are the changes are needed:
Dump it:
# criu dump -t 2221 -vvv -o dump.log -b && echo OK OK
Or
# criu dump -t 2221 -vvv -o dump.log --binary-log && echo OK OK
Restore:
# criu restore -d -vvv -o restore.log --b && echo OK OK
Or
# criu restore -d -vvv -o restore.log --binary-log && echo OK OK
The Decoder
The decoder is placed in the crit folder. The decoder takes the input as the file that needs to be processed and the file where the output needs to be presented. The decoder reads the file and passes the format string and the arguments through a printf family function. The logs are then processed in this function, and the output is presented in the output file provided. Whenever the decoder encounters logs that are not printed by the flog printer into the file, it merely copies it into the output file as it is.
To Decode:
# crit/decoder dump.log decoded_dump.log && echo OK OK
Improvements and Pending tasks
This project was added to criu as a part of GSoC 2020. These are a few issues faced and left unresolved during the duration.
Aarch64 and arm issue
Currently, the binary logging is not supported in the aarch64 architecture and the arm architecture. The decoder collects all the arguments in an array. This array is then typecast to void* and passed in the place of va_list to vsnprintf (The printf family function used to format this raw data). This va_list is accepted as type void* in all other architectures but causes issues with the aarch64 and arm architecture, causing the decoder to fail.
_Generic function issue
Currently, the binary logger is supported for GCC versions greater than C11. This limitation is caused due to the usage of _Generic function in the binary logging utility. This function is needed to return the type of the argument passed as input to the printf. This function was added to GCC and is supported from version C11 and forth.
Add timestamps and PID
Currently, the logs are being stored in the file without any other formatting added to it. The logs printed via the regular printer, in addition to the expected print statements, add timestamp and PID into the log file. This feature is yet to be added to the binary printer.
Optimize using .rodata
To further optimize the binary logger's performance, the string being stored in the file during logging could be referenced using .rodata.
Log messages structuring
CRIU generates tons of messages. The most critical to troubleshooting is the pr_err-s. Need to introduce types of errors to make troubleshooting easier. E.g. we have the guide what to do when C/R fails. Need to introduce error types into it and print them before the error text. Suggested types of failures are:
- Restore errors
- Resource ID conflict (mostly valid for non-containers case)
- PID mismatch
- File with the name we need exists
- CGroups exist (?)
- Missing item on restore
- No TCP connection locks (for non netns case)
- Missing session leader (CLI/opt/--shell-job might help)
- Required file doesn't exist
- Resource ID conflict (mostly valid for non-containers case)
- Dump errors
- Unsupported object
- AIO with events
- Corked UDP socket
- Tasks under strace
- Knots in mount points
- Too many of smth (open files) met
- Unsupported object
- Common errors
- Out of memory
- Access denied / permission denied
- Other unexpected/unhandled syscall error
- Error reading/writing images
- Proc file format error (do we really expect this thing?)