Systemtap does not clean up properly

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Systemtap does not clean up properly

corpaul
We use Systemtap in a continuous integration environment (Jenkins). Sometimes, a job is aborted manually and it seems the Systemtap kernel module is not unloaded then. As a result, several stap modules remain loaded after aborting jobs the system runs out of memory.

Is there a way to prevent/fix this?

--CP
Reply | Threaded
Open this post in threaded view
|

Re: Systemtap does not clean up properly

Jonathan Lebon
Someone else might be able to chime in with a more permanent solution. But for a temporary workaround, you could use something like this script[1] to clean up the modules manually after aborting jobs.

Cheers,

Jonathan

[1] https://sourceware.org/ml/systemtap/2008-q1/msg00051.html

----- Original Message -----

> We use Systemtap in a continuous integration environment (Jenkins).
> Sometimes, a job is aborted manually and it seems the Systemtap kernel
> module is not unloaded then. As a result, several stap modules remain loaded
> after aborting jobs the system runs out of memory.
>
> Is there a way to prevent/fix this?
>
> --CP
>
>
>
> --
> View this message in context:
> http://sourceware-org.1504.n7.nabble.com/Systemtap-does-not-clean-up-properly-tp244731.html
> Sent from the Sourceware - systemtap mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Systemtap does not clean up properly

corpaul
Thanks, but we are actually already doing this. Here is a stack trace in
case it helps: http://paste.debian.net/42978/

--CP

On 19-09-13 15:27, Jonathan Lebon wrote:

> Someone else might be able to chime in with a more permanent solution. But for a temporary workaround, you could use something like this script[1] to clean up the modules manually after aborting jobs.
>
> Cheers,
>
> Jonathan
>
> [1] https://sourceware.org/ml/systemtap/2008-q1/msg00051.html
>
> ----- Original Message -----
>> We use Systemtap in a continuous integration environment (Jenkins).
>> Sometimes, a job is aborted manually and it seems the Systemtap kernel
>> module is not unloaded then. As a result, several stap modules remain loaded
>> after aborting jobs the system runs out of memory.
>>
>> Is there a way to prevent/fix this?
>>
>> --CP
>>
>>
>>
>> --
>> View this message in context:
>> http://sourceware-org.1504.n7.nabble.com/Systemtap-does-not-clean-up-properly-tp244731.html
>> Sent from the Sourceware - systemtap mailing list archive at Nabble.com.
>>

Reply | Threaded
Open this post in threaded view
|

Re: Systemtap does not clean up properly

David Smith-19
In reply to this post by corpaul
On 09/19/2013 07:40 AM, corpaul wrote:
> We use Systemtap in a continuous integration environment (Jenkins).
> Sometimes, a job is aborted manually and it seems the Systemtap kernel
> module is not unloaded then. As a result, several stap modules remain loaded
> after aborting jobs the system runs out of memory.
>
> Is there a way to prevent/fix this?

I've got a couple of questions.

1) What version of systemtap are you running?

2) When you say "a job is aborted manuall", what exactly do you mean?
How is the job aborted manually?

--
David Smith
[hidden email]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
Reply | Threaded
Open this post in threaded view
|

Re: Systemtap does not clean up properly

Frank Ch. Eigler
In reply to this post by corpaul

c.bezemer wrote:

> Thanks, but we are actually already doing this.

I guess the question is how are you killing the userspace stap
processes, and how are you trying to manually remove the stap_XXXX
modules?  If you just SIGINT the stap* processes, they should clean up
nicely after themselves.  If you SIGKILL, then you take responsibility
for removing the kernel modules, which should be just an rmmod.

> Here is a stack trace in case it helps:
> http://paste.debian.net/42978/

That trace appears to show some near-OOM conditions from stap scripts
that are trying to start up, as opposed to anything related to
shutdown.  (The actual pattern indicates a fragmentation problem that
dsmith's commit 3f873e53e should fix.)

- FChE