Difference between revisions of "Better logging"

From CRIU
Jump to navigation Jump to search
(Created page with "Logging in CRIU should obey the following rules # Default logging should provide enough data for typical investigation of "can't dump"/"can't restore" # It should be possible...")
 
m
(9 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
# Default logging should provide enough data for typical investigation of "can't dump"/"can't restore"
 
# Default logging should provide enough data for typical investigation of "can't dump"/"can't restore"
 
# It should be possible to add developer-only debugging to see some more details
 
# It should be possible to add developer-only debugging to see some more details
 +
# It should be possible to shut the logger up completely (maybe by specifying the log file as /dev/null)
 +
 +
== Binary logging ==
 +
 +
Let's try to implement the idea of putting log arguments in binary form into a buffer and flushing one at the end if necessary.
 +
 +
The majority of time spent in sprintf is in converting arguments to strings (e.g. %d). Next goes scanning the format string. At the end the string copying itself goes. If we manage to eliminat at least the first portion, that would be great.
 +
 +
== Log messages structuring ==
 +
 +
CRIU generates tons of messages. The most critical to troubleshooting is the pr_err-s. Need to introduce types of errors to make troubleshooting easier. E.g. we have the guide what to do [[when C/R fails]]. Need to introduce error types into it and print them before the error text. Suggested types of failures are:
 +
 +
* Restore errors
 +
** Resource ID conflict (mostly valid for non-containers case)
 +
*** PID mismatch
 +
*** File with the name we need exists
 +
*** [[CGroups]] exist (?)
 +
** Missing item on restore
 +
*** No [[TCP connection]] locks (for non netns case)
 +
*** Missing session leader ([[CLI/opt/--shell-job]] might help)
 +
*** Required file doesn't exist
 +
* Dump errors
 +
** [[What cannot be checkpointed|Unsupported object]]
 +
*** [[AIO]] with events
 +
*** Corked UDP socket
 +
*** Tasks under strace
 +
*** Knots in [[mount points]]
 +
*** Too many of smth (open files) met
 +
* Common errors
 +
** Out of memory
 +
** Access denied / permission denied
 +
** Other unexpected/unhandled syscall error
 +
** Error reading/writing [[images]]
 +
** Proc file format error (do we really expect this thing?)
 +
 +
[[Category:Plans]]
 +
[[Category:Thinkers]]

Revision as of 13:47, 18 August 2017

Logging in CRIU should obey the following rules

  1. Default logging should provide enough data for typical investigation of "can't dump"/"can't restore"
  2. It should be possible to add developer-only debugging to see some more details
  3. It should be possible to shut the logger up completely (maybe by specifying the log file as /dev/null)

Binary logging

Let's try to implement the idea of putting log arguments in binary form into a buffer and flushing one at the end if necessary.

The majority of time spent in sprintf is in converting arguments to strings (e.g. %d). Next goes scanning the format string. At the end the string copying itself goes. If we manage to eliminat at least the first portion, that would be great.

Log messages structuring

CRIU generates tons of messages. The most critical to troubleshooting is the pr_err-s. Need to introduce types of errors to make troubleshooting easier. E.g. we have the guide what to do when C/R fails. Need to introduce error types into it and print them before the error text. Suggested types of failures are:

  • Restore errors
    • Resource ID conflict (mostly valid for non-containers case)
      • PID mismatch
      • File with the name we need exists
      • CGroups exist (?)
    • Missing item on restore
  • Dump errors
  • Common errors
    • Out of memory
    • Access denied / permission denied
    • Other unexpected/unhandled syscall error
    • Error reading/writing images
    • Proc file format error (do we really expect this thing?)