Difference between revisions of "Better logging"

From CRIU
Jump to navigation Jump to search
m (Xemul moved page Logging to Better logging without leaving a redirect: This describes what we want, not what we have not)
Line 19: Line 19:
 
*** PID mismatch
 
*** PID mismatch
 
*** File with the name we need exists
 
*** File with the name we need exists
*** CGroup exists (?)
+
*** [[CGroups]] exist (?)
 
** Missing item on restore
 
** Missing item on restore
*** No TCP lock (for non netns case)
+
*** No [[TCP connection]] locks (for non netns case)
*** Missing session leader (--shell-job might help)
+
*** Missing session leader ([[CLI/opt/--shell-job]] might help)
 
*** Required file doesn't exist
 
*** Required file doesn't exist
 
* Dump errors
 
* Dump errors
** Unsupported object
+
** [[What cannot be checkpointed|Unsupported object]]
*** AIO with events
+
*** [[AIO]] with events
 
*** Corked UDP socket
 
*** Corked UDP socket
 
*** Tasks under strace
 
*** Tasks under strace
*** Knots in mount points
+
*** Knots in [[mount points]]
 
*** Too many of smth (open files) met
 
*** Too many of smth (open files) met
 
* Common errors
 
* Common errors
Line 35: Line 35:
 
** Access denied / permission denied
 
** Access denied / permission denied
 
** Other unexpected/unhandled syscall error
 
** Other unexpected/unhandled syscall error
** Error reading/writing image files
+
** Error reading/writing [[images]]
 
** Proc file format error (do we really expect this thing?)
 
** Proc file format error (do we really expect this thing?)
  

Revision as of 08:54, 17 November 2016

Logging in CRIU should obey the following rules

  1. Default logging should provide enough data for typical investigation of "can't dump"/"can't restore"
  2. It should be possible to add developer-only debugging to see some more details
  3. It should be possible to shut the logger up completely (maybe by specifying the log file as /dev/null)

Binary logging

Let's try to implement the idea of putting log arguments in binary form into a buffer and flushing one at the end if necessary.

The majority of time spent in sprintf is in converting arguments to strings (e.g. %d). Next goes scanning the format string. At the end the string copying itself goes. If we manage to eliminat at least the first portion, that would be great.

Log messages structuring

CRIU generates tons of messages. The most critical to troubleshooting is the pr_err-s. Need to introduce types of errors to make troubleshooting easier. E.g. we have the guide what to do when C/R fails. Need to introduce error types into it and print them before the error text. Suggested types of failures are:

  • Restore errors
    • Resource ID conflict (mostly valid for non-containers case)
      • PID mismatch
      • File with the name we need exists
      • CGroups exist (?)
    • Missing item on restore
  • Dump errors
  • Common errors
    • Out of memory
    • Access denied / permission denied
    • Other unexpected/unhandled syscall error
    • Error reading/writing images
    • Proc file format error (do we really expect this thing?)