Cyg_Mutex::check_this() fails

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Cyg_Mutex::check_this() fails

René Schipp von Branitz Nielsen
...Sending this from my private mail account because the eCos mailserver
blocks mail containing confidentiality disclaimers, which I can't prevent
when sending from my work mail account... :( :(

--------------------oOo--------------------

Hello folks,

I have a condition variable, c, tied to a mutex, m, and used like this with
a FIFO, f:

void producer(void)
{
    cyg_mutex_lock(&m);
    copy_to_fifo(&f, some_data);
    cyg_cond_signal(&c);
    cyg_mutex_unlock(&m);
}

void consumer_thread(cyg_addrword_t data) {
    cyg_mutex_lock(&m);
    while (1) {
        while (fifo_empty(&f)) {
            cyg_cond_wait(&c);
        }

        // m is locked here.

        // Empty FIFO.
        while (!fifo_empty(&f)) {
            copy_from_fifo(&f, &some_data);
            cyg_mutex_unlock();

            // Do something with some_data
            ...

            cyg_mutex_lock();
        }
    }
}

The following description refers to line numbers of rev. 1.15 of mutex.cxx.

When the system is under heavy interrupt load, threads may get scheduled in
and out more frequently than when not. Under these circumstances, I
sometimes get an assertion (cyg_assert_msg()) stemming from line 197 of
mutex.cxx, which is Cyg_Mutex::check_this(). Placing a breakpoint in this
function reveals that it happens when the consumer is about to wake up, that
is, in line 651, which is the second half of
Cyg_Condition_Variable::wait_inner( Cyg_Mutex *mx ).

A closer look at wait_inner() shows that when CYG_ASSERTCLASS( mx, "Corrupt
mutex") is invoked, the scheduler is not locked, which in turn means that
Cyg_Mutex::check_this() line 197 is tested non-atomically. Line 197
contains:
    if (( locked && owner == NULL ) return false;

So if the preemptive scheduler schedules the caller of cyg_cond_wait() out
in between the test of "locked" and "owner == NULL", and the mutex state
changes while scheduled out, we have a problem.

As I see it, CYG_ASSERTCLASS(some_obj, "some message") serves two purposes:
  1) check that some_obj is non-NULL and
  2) check that some_obj->check_this() returns TRUE.

IMO, only the first check needs to be made by wait_inner(), because line
#677 attempts to reacquire the mutex (mx->lock()), which itself performs the
check_this() check - and with the scheduler locked.

There are other places in mutex.cxx where CYG_ASSERTCLASS(Cyg_Mutex,
"message") is invoked without the scheduler locked, but I can't judge
whether these are OK or not.

Bottomline is that I suggest to change line #651 of mutex.cxx from
  CYG_ASSERTCLASS( mx, "Corrupt mutex"); to
  CYG_ASSERT(mx, "Invalid mutex pointer"); /* Or some other message */

Or is there anything fundamental, I have missed here?
Comments are appreciated.

Thanks in advance,
René Schipp von Branitz Nielsen
Vitesse Semiconductor Corporation


Reply | Threaded
Open this post in threaded view
|

Re: Cyg_Mutex::check_this() fails

jamescmaki
Hi Rene,

At this point in the code the scheduler is not locked and the mutex is not owned. This means that it is a bad idea to be looking at the internal state of the mutex, i.e., calling CYG_ASSERTCLASS( mx, "Corrupt mutex").

I can reproduce this issue consistently now and I am testing a fix that changes CYG_ASSERTCLASS to CYG_ASSERT. I am hoping that this will resolve the problem instead of just moving the failed assert into Cyg_Mutex::lock().

However even if this does fix the problem, I am still wary. From what I can see, locked and owner are never modified without locking the scheduler. This means that they are both updated atomically. Because of this, when the thread in  wait_inner() resumes it should see both of these variables in a consistent state regardless of whether it holds a lock or not. I am wondering if this can be explained by reordering and by delaying the check until the scheduler is locked again it avoids this problem.

-James

Reply | Threaded
Open this post in threaded view
|

Re: Cyg_Mutex::check_this() fails

jamescmaki
OK I didn't think that through fully.

if (( locked && owner == NULL )

The thread sees locked == true, gets rescheduled, then sees owner == NULL.