Changes

Task-diag (edit)

Revision as of 18:58, 16 February 2016

4,235 bytes added , 18:58, 16 February 2016

created

Line 1: Line 1: +

This articles describes a new proposed interface to get information about running processes (roughly same info that is now available from <code>/proc/''PID''/*</code> files).

+

== Limitations of /proc/PID interface ==

+

Current interface is a bunch of files in /proc/PID. While this appears to be simple and There are a number of problems with it.

+

=== Lots of syscalls ===

+

At least three syscalls per each PID are required —

+

<code>open()</code>, <code>read()</code>, and <code>close()</code>.

+

For example, a mere <code>ps ax</code> command performs these 3 syscalls

+

for each of 3 files (<code>stat</code>, <code>status</code>, <code>cmdline</code>)

+

for each process in the system. This results in thousands of syscalls

+

and therefore thousands of user/kernel context switches.

+

=== Variety of formats ===

+

There are many different formats used by files in <code>/proc/''PID''/</code> hierarchy. Therefore, there is a need to write parser for each such format.

+

=== Not enough information ===

+

Example: <code>/proc/''PID''/fd/</code> doesn't contain file open flags or current position,

+

so we had to introduce <code>/proc/''PID''/fdinfo/</code>.

+

=== Non-extendable formats ===

+

Some formats in /proc/PID are non-extendable. For example,

+

<code>/proc/''PID''/maps</code> last column (file name) is optional,

+

therefore there is no way to add more columns without breaking the format.

+

=== Slow read due to extra info ===

+

Sometimes getting information is slow due to extra attributes

+

that are not always needed. For example, <code>/proc/''PID''/smaps</code>

+

contains <code>VmFlags</code> field (which can't be added

+

to <code>/proc/''PID''/maps</code>, see previous item),

+

but it also contains page stats that take long time to generate.

+

<pre>

+

$ time cat /proc/*/maps > /dev/null

+

real 0m0.061s

+

user 0m0.002s

+

sys 0m0.059s

+

$ time cat /proc/*/smaps > /dev/null

+

real 0m0.253s

+

user 0m0.004s

+

sys 0m0.247s

+

</pre>

+

== Proposed solution ==

+

Proposed is the <code>/proc/task_diag</code> file, which operates based on the following principles:

+

* Transactional: write request, read response

+

* Netlink message format (same as used by sock_diag; binary and extendable)

+

* Ability to specify a set of processes to get info about

+

** TASK_DIAG_DUMP_ALL: dump all processes

+

** TASK_DIAG_DUMP_ALL_THREAD: dump all threads

+

** TASK_DIAG_DUMP_CHILDREN: dump children of a specified task

+

** TASK_DIAG_DUMP_THREAD: dump threads of a specified task

+

** TASK_DIAG_DUMP_ONE: Dump one task

+

* Optimal grouping of attributes

+

** Any attribute in a group can't affect a response time

+

The following groups are proposed:

+

* TASK_DIAG_BASE

+

: PID, PGID, SID, TID, comm

+

* TASK_DIAG_CRED

+

: UID, GID, groups, capabilities

+

* TASK_DIAG_STAT

+

: per-task and per-process statistics (same as taskstats, not avail in /proc)

+

* TASK_DIAG_VMA

+

: mapped memory regions and their access permissions (same as maps)

+

* TASK_DIAG_VMA_STAT

+

: memory consumption for each mapping (same as smaps)

+

=== Performance measurements ===

+

==== Get pid, tid, pgid and comm for 50000 processes ====

+

Existing interface:

+

<pre>

+

$ time ./task_proc_all a

+

real 0m0.279s

+

user 0m0.013s

+

sys 0m0.255s

+

</pre>

+

New interface:

+

<pre>

+

$ time ./task_diag_all a

+

real 0m0.051s

+

user 0m0.001s

+

sys 0m0.049s

+

</pre>

+

==== Using perf tool ====

+

The following is a quote from David Ahern email:

+

<pre>

+

> Using the fork test command:

+

> 10,000 processes; 10k proc with 5 threads = 50,000 tasks

+

> reading /proc: 11.3 sec

+

> task_diag: 2.2 sec

+

>

+

> @7,440 tasks, reading /proc is at 0.77 sec and task_diag at 0.096

+

>

+

> 128 instances of sepcjbb, 80,000+ tasks:

+

> reading /proc: 32.1 sec

+

> task_diag: 3.9 sec

+

>

+

> So overall much snappier startup times.

+

</pre>

+

== Alternative (bad) solutions ==

+

The following information is only interesting in a historical context.

+

=== task_diag netlink socket ===

+

This was the original proposal -- create something very similar to sock_diag (aka tcp_diag aka inet_diag).

+

It appeared to be a bad one because:

+

* It's not obvious where to get pid and user namespaces

+

* It's impossible to restrict netlink sockets:

+

** Credentials are saved when a socket is created

+

** Process can drop privileges, but netlink doesn't care

+

** The same socket can be used to get process attributes and to set ip addresses

+

== See also ==

−

~~Pending work on the "~~[[~~upstream~~ kernel commits]]~~" pages.~~

+

* [[Upstream kernel commits]]

[[Category:Development]]

−

~~[[Category:Empty articles]]~~

Kir

Bureaucrats, Administrators

1,072

edits

Changes

Task-diag (edit)

Revision as of 18:58, 16 February 2016

Navigation menu

Search