[Bug translator/26296] New: delay script-global locking until required

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug translator/26296] New: delay script-global locking until required

Sourceware - systemtap mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26296

            Bug ID: 26296
           Summary: delay script-global locking until required
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: translator
          Assignee: systemtap at sourceware dot org
          Reporter: fche at redhat dot com
  Target Milestone: ---

The scripting language promises atomic execution of handlers that read/write
global variables.  This is implemented by taking read/write locks as
appropriate, early in the probe handler prologue.  It has been repeatedly
observed that this causes perhaps unnecessary overheads (e.g. bug #7033).

We can imagine a change that could maintain the atomic semantics, but handle
the common pattern:

  global bar
  probe foo {
    if(condition) next;
    bar = $var
  }

where a pure filtering predicate that does not read global variables is
expected to frequently skip execution of the critical sections entirely.

Instead of emitting:

prologue:
   lock_all()
body:
   if(condition) goto epilogue;
   bar=$var
epilogue:
   unlock_all()

we could emit:

prologue:
   locked_p = false
body:
   if(condition) goto epilogue;
   if(!locked_p) lock_all(); locked_p = true;
   bar=$var
epilogue:
   if (locked_p) unlock_all()

IOW: defer locking to the first moment when any global is actually
read/written, tracking locked-ness in a new context local.  This would involve
only a small change to the translator, involving only context-free logic.  That
could
later be optimized to remove repeated checks/etc. over multiple global vars in
a control-flow / context aware way.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

Re: [Bug translator/26296] New: delay script-global locking until required

Craig Ringer
>
> IOW: defer locking to the first moment when any global is actually
> read/written, tracking locked-ness in a new context local.  This would
> involve
> only a small change to the translator, involving only context-free logic.
> That
> could later be optimized to remove repeated checks/etc. over multiple
> global vars in
> a control-flow / context aware way.
>
>
Even an explicit construct that scopes locking would be handy. Borrow from
Java's "synchronized" perhaps.

The fact that whole probes get locked is a serious limitation for one of my
systemtap use cases, where I inject delays and faults into the target
application. The probe flow is supposed to be something like:

global targets_map;

probe process("foo").mark("some_probe_point") {
  if (pid() in targets_map) {
      kdelay(100000);
  }
}

where kdelay is a simple embedded C wrapper around the kernel function of
the same name. But due to the locking on the global "targets_map", every
hit on "some_probe_point" will block on the lock held by the sleeping
probe. So probes can't inject sleeps or delays to try to trigger race
conditions.

So yes, the ability to take a lock over a narrower scope than the whole
probe would be very desirable.

I've wondered about the feasibility of doing this in embedded C, but
haven't had a chance to explore it properly yet.

This reminds me - is it ever safe to sleep in a systemtap probe, e.g. to
call ksleep()  rather than busy-loop?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

[Bug translator/26296] delay script-global locking until required

Sourceware - systemtap mailing list
In reply to this post by Sourceware - systemtap mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26296

--- Comment #1 from craig at 2ndquadrant dot com ---

>
> IOW: defer locking to the first moment when any global is actually
> read/written, tracking locked-ness in a new context local.  This would
> involve
> only a small change to the translator, involving only context-free logic.
> That
> could later be optimized to remove repeated checks/etc. over multiple
> global vars in
> a control-flow / context aware way.
>
>
Even an explicit construct that scopes locking would be handy. Borrow from
Java's "synchronized" perhaps.

The fact that whole probes get locked is a serious limitation for one of my
systemtap use cases, where I inject delays and faults into the target
application. The probe flow is supposed to be something like:

global targets_map;

probe process("foo").mark("some_probe_point") {
  if (pid() in targets_map) {
      kdelay(100000);
  }
}

where kdelay is a simple embedded C wrapper around the kernel function of
the same name. But due to the locking on the global "targets_map", every
hit on "some_probe_point" will block on the lock held by the sleeping
probe. So probes can't inject sleeps or delays to try to trigger race
conditions.

So yes, the ability to take a lock over a narrower scope than the whole
probe would be very desirable.

I've wondered about the feasibility of doing this in embedded C, but
haven't had a chance to explore it properly yet.

This reminds me - is it ever safe to sleep in a systemtap probe, e.g. to
call ksleep()  rather than busy-loop?

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

Re: [Bug translator/26296] New: delay script-global locking until required

Sourceware - systemtap mailing list
In reply to this post by Craig Ringer
On Fri, Jul 24, 2020 at 8:23 AM Craig Ringer <[hidden email]> wrote:

>
> >
> > IOW: defer locking to the first moment when any global is actually
> > read/written, tracking locked-ness in a new context local.  This would
> > involve
> > only a small change to the translator, involving only context-free logic.
> > That
> > could later be optimized to remove repeated checks/etc. over multiple
> > global vars in
> > a control-flow / context aware way.
> >
> >
> Even an explicit construct that scopes locking would be handy. Borrow from
> Java's "synchronized" perhaps.
>
> The fact that whole probes get locked is a serious limitation for one of my
> systemtap use cases, where I inject delays and faults into the target
> application. The probe flow is supposed to be something like:
>
> global targets_map;
>
> probe process("foo").mark("some_probe_point") {
>   if (pid() in targets_map) {
>       kdelay(100000);
>   }
> }

Usually there is a  lock because of the use of maps/associative
arrays. Use your own C implementation (check the code base I sent you)
... or we can implement inline C support.

You can not sleep in many probes. Such code does not crash
immediately, but eventually it will.
In some probes it is safe to sleep.

>
> where kdelay is a simple embedded C wrapper around the kernel function of
> the same name. But due to the locking on the global "targets_map", every
> hit on "some_probe_point" will block on the lock held by the sleeping
> probe. So probes can't inject sleeps or delays to try to trigger race
> conditions.
>
> So yes, the ability to take a lock over a narrower scope than the whole
> probe would be very desirable.
>
> I've wondered about the feasibility of doing this in embedded C, but
> haven't had a chance to explore it properly yet.

That's the route you have

>
> This reminds me - is it ever safe to sleep in a systemtap probe, e.g. to
> call ksleep()  rather than busy-loop?
>
> --

Spinlocks are Ok. Calls to sleep() generally is not safe,
Reply | Threaded
Open this post in threaded view
|

[Bug translator/26296] delay script-global locking until required

Sourceware - systemtap mailing list
In reply to this post by Sourceware - systemtap mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26296

--- Comment #2 from arkady.miasnikov at gmail dot com ---
On Fri, Jul 24, 2020 at 8:23 AM Craig Ringer <[hidden email]> wrote:

>
> >
> > IOW: defer locking to the first moment when any global is actually
> > read/written, tracking locked-ness in a new context local.  This would
> > involve
> > only a small change to the translator, involving only context-free logic.
> > That
> > could later be optimized to remove repeated checks/etc. over multiple
> > global vars in
> > a control-flow / context aware way.
> >
> >
> Even an explicit construct that scopes locking would be handy. Borrow from
> Java's "synchronized" perhaps.
>
> The fact that whole probes get locked is a serious limitation for one of my
> systemtap use cases, where I inject delays and faults into the target
> application. The probe flow is supposed to be something like:
>
> global targets_map;
>
> probe process("foo").mark("some_probe_point") {
>   if (pid() in targets_map) {
>       kdelay(100000);
>   }
> }

Usually there is a  lock because of the use of maps/associative
arrays. Use your own C implementation (check the code base I sent you)
... or we can implement inline C support.

You can not sleep in many probes. Such code does not crash
immediately, but eventually it will.
In some probes it is safe to sleep.

>
> where kdelay is a simple embedded C wrapper around the kernel function of
> the same name. But due to the locking on the global "targets_map", every
> hit on "some_probe_point" will block on the lock held by the sleeping
> probe. So probes can't inject sleeps or delays to try to trigger race
> conditions.
>
> So yes, the ability to take a lock over a narrower scope than the whole
> probe would be very desirable.
>
> I've wondered about the feasibility of doing this in embedded C, but
> haven't had a chance to explore it properly yet.

That's the route you have

>
> This reminds me - is it ever safe to sleep in a systemtap probe, e.g. to
> call ksleep()  rather than busy-loop?
>
> --

Spinlocks are Ok. Calls to sleep() generally is not safe,

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug translator/26296] delay script-global locking until required

Sourceware - systemtap mailing list
In reply to this post by Sourceware - systemtap mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26296

--- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> ---
> Even an explicit construct that scopes locking would be handy. Borrow from
> Java's "synchronized" perhaps.

If one can come up with easy-to-explain, implementable, safe
semantics, yeah perhaps!

> global targets_map;
>
> probe process("foo").mark("some_probe_point") {
>   if (pid() in targets_map) {
>       kdelay(100000);
>   }
> }

In this example, you need the dual of the subject feature:
release of locks as early as possible.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

Re: [Bug translator/26296] delay script-global locking until required

Craig Ringer
On Wed, 5 Aug 2020 at 03:59, fche at redhat dot com via Systemtap <
[hidden email]> wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=26296
>
> --- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> ---
> > Even an explicit construct that scopes locking would be handy. Borrow
> from
> > Java's "synchronized" perhaps.
>
> If one can come up with easy-to-explain, implementable, safe
> semantics, yeah perhaps!
>

I'm thinking something like this:

* Explicit locking is scoped to a block
* Locks are acquired against a named global variable
* Within a scope that uses explicit locking, ab attempt to access global
variables for which locks have not been explicitly acquired is a semantic
error
* Any exit from a block - "next", "return", throwing an exception, etc -
releases the lock at escape from the block.
* A warning will be raised during compilation if any given global is
accessed under explicit locking in one part of a script or tapset, but via
implicit probe level locking in another part.

Deadlock protection is a bit interesting. I haven't looked at how systemtap
takes care of that at the moment. If it can detect deadlock and fail
gracefully that's probably sufficient.

Of course it's all handwaving unless I have time to write it, since I don't
get to ask others to. And I'm a bit stuck in C++ error message spam in the
relatively simple patch I wrote for @enum already...


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

[Bug translator/26296] delay script-global locking until required

Sourceware - systemtap mailing list
In reply to this post by Sourceware - systemtap mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26296

--- Comment #4 from craig at 2ndquadrant dot com ---
On Wed, 5 Aug 2020 at 03:59, fche at redhat dot com via Systemtap <
[hidden email]> wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=26296
>
> --- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> ---
> > Even an explicit construct that scopes locking would be handy. Borrow
> from
> > Java's "synchronized" perhaps.
>
> If one can come up with easy-to-explain, implementable, safe
> semantics, yeah perhaps!
>

I'm thinking something like this:

* Explicit locking is scoped to a block
* Locks are acquired against a named global variable
* Within a scope that uses explicit locking, ab attempt to access global
variables for which locks have not been explicitly acquired is a semantic
error
* Any exit from a block - "next", "return", throwing an exception, etc -
releases the lock at escape from the block.
* A warning will be raised during compilation if any given global is
accessed under explicit locking in one part of a script or tapset, but via
implicit probe level locking in another part.

Deadlock protection is a bit interesting. I haven't looked at how systemtap
takes care of that at the moment. If it can detect deadlock and fail
gracefully that's probably sufficient.

Of course it's all handwaving unless I have time to write it, since I don't
get to ask others to. And I'm a bit stuck in C++ error message spam in the
relatively simple patch I wrote for @enum already...

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug translator/26296] delay script-global locking until required

Sourceware - systemtap mailing list
In reply to this post by Sourceware - systemtap mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26296

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #5 from Frank Ch. Eigler <fche at redhat dot com> ---
commit 25012d82 attempts an algorithmic optimization to the
locking problem.  It should handle both Craig's "early unlock"
and multiple folks' "late lock" needs, without new syntax or
semantics (!).

--
You are receiving this mail because:
You are the assignee for the bug.