Logging in CRIU should obey the following rules
- Default logging should provide enough data for typical investigation of "can't dump"/"can't restore"
- It should be possible to add developer-only debugging to see some more details
- It should be possible to shut the logger up completely (maybe by specifying the log file as /dev/null)
Binary logging
Let's try to implement the idea of putting log arguments in binary form into a buffer and flushing one at the end if necessary.
The majority of time spent in sprintf is in converting arguments to strings (e.g. %d). Next goes scanning the format string. At the end the string copying itself goes. If we manage to eliminat at least the first portion, that would be great.
Log messages structuring
CRIU generates tons of messages. The most critical to troubleshooting is the pr_err-s. Need to introduce types of errors to make troubleshooting easier. E.g. we have the guide what to do when C/R fails. Need to introduce error types into it and print them before the error text. Suggested types of failures are:
- Restore errors
- Resource ID conflict (mostly valid for non-containers case)
- PID mismatch
- File with the name we need exists
- CGroup exists (?)
- Missing item on restore
- No TCP lock (for non netns case)
- Missing session leader (--shell-job might help)
- Required file doesn't exist
- Resource ID conflict (mostly valid for non-containers case)
- Dump errors
- Unsupported object
- AIO with events
- Corked UDP socket
- Tasks under strace
- Knots in mount points
- Too many of smth (open files) met
- Unsupported object
- Common errors
- Out of memory
- Access denied / permission denied
- Other unexpected/unhandled syscall error
- Error reading/writing image files
- Proc file format error (do we really expect this thing?)