A&M Rattler and TCP performance

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

A&M Rattler and TCP performance

John Doe-11
Hi, all.

I'm developing an embedded application using eCos on a PowerPC target, and
I'm experiencing very poor ethernet (TCP) performance with my target,
wondering if it is an eCos/HAL bug, poor configuration (user error, user
being me...), or a hardware problem.

The problem is only for TCP reads (host->target), giving typically
100-200kBytes/sec. TCP writes are fine, with 6-8MBytes/sec. UDP read/write
also seems OK, giving around 1-2MBytes/sec depending on the application used
for testing.

Using tcpdump and ethereal on a minimal test application sending data from
the host to the eCos target, I can see several odd things :

1) The preformance is very good in short bursts, typically for a few
milleseconds, before it stalls. The host stops sending data. The target
seems to have ACKed perfectly OK, indicating a TCP window with plenty of
space left. Then there is a delay, before the target probaly times out and
resends an ACK, and the host continues transferring. The target time-out
occurs exactly every 100ms.
- It seems the host is causing the problem here, since it stops sending for
no apparent reason
- this only happens when the target is eCos.

2) there are quite a few (~10%) retransmits, but the host detects these, and
does fast retransmits, giving no stalls in the data transfers.
- Should I worry about these? I guess it should never happen.

3) Before every stall, ethereal labels the last ACK from the target as a
"TCP window update". I have no clue what to make of this, but it seems very
consistent...

The problem seems to be similar to the one in this thread :
http://sources.redhat.com/ml/ecos-discuss/2002-04/msg00379.html
but since the problem did not seem to be resolved, I'm still stuck.

The thread mentioned above suggested collisions due to misconfiguration of
duplex on the target. My target has a LED to indicate collisions, which
remains unlit.

The target is a Analogue&Micro Rattler with a 200MHz MPC8250 processor. eCos
is from CVS, I have testet up to 2005-12-07 without any difference in
behaviour.

eCos/HAL was build with default configuration values. I have also tested
increasing the number of input and output buffers for the ethernet device
driver, and increasing the memory designated for the FreeBSD stack, giving
exactly the same results.

In an attempt to eliminate the host from the search, I have tried both
Fedora Core 2 and Core 4 (Intel, 32 bit) . I also tried compiling the same
test application using a Linux target, which gave excellent performance
(11MBytes/sec on a 100Mbit ethernet).

As for hardware, I have tested on the A&M board and on a homebrewed board
derived from the A&M schematics, giving the same results.

Does anyone have any ideas? Any tests I should do to narrow down the search
area?

Best regards,
  Ola Bård Langlo

_________________________________________________________________
Are you using the latest version of MSN Messenger? Download MSN Messenger
7.5 today! http://messenger.msn.co.uk


--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

Reply | Threaded
Open this post in threaded view
|

Re: A&M Rattler and TCP performance

Andrew Lunn-2
On Thu, Dec 15, 2005 at 01:21:26PM +0000, John Doe wrote:

> Hi, all.
>
> I'm developing an embedded application using eCos on a PowerPC target, and
> I'm experiencing very poor ethernet (TCP) performance with my target,
> wondering if it is an eCos/HAL bug, poor configuration (user error, user
> being me...), or a hardware problem.
>
> The problem is only for TCP reads (host->target), giving typically
> 100-200kBytes/sec. TCP writes are fine, with 6-8MBytes/sec. UDP read/write
> also seems OK, giving around 1-2MBytes/sec depending on the application
> used for testing.
>
> Using tcpdump and ethereal on a minimal test application sending data from
> the host to the eCos target, I can see several odd things :
>
> 1) The preformance is very good in short bursts, typically for a few
> milleseconds, before it stalls. The host stops sending data. The target
> seems to have ACKed perfectly OK, indicating a TCP window with plenty of
> space left. Then there is a delay, before the target probaly times out and
> resends an ACK, and the host continues transferring. The target time-out
> occurs exactly every 100ms.
> - It seems the host is causing the problem here, since it stops sending for
> no apparent reason
> - this only happens when the target is eCos.

Please could you post a trace from tcpdump.

It is unlikely to be a host problem. It is more likely the host is
simply responding to what the target has told it to do.

> 2) there are quite a few (~10%) retransmits, but the host detects these,
> and does fast retransmits, giving no stalls in the data transfers.
> - Should I worry about these? I guess it should never happen.

This is not good. 10% packet getting loss it way too high. You should
investigate this. You might want to see if you are running out of mbuf
or clusters. This would cause a discard and so a retry.
 
> 3) Before every stall, ethereal labels the last ACK from the target as a
> "TCP window update". I have no clue what to make of this, but it seems very
> consistent...

Is it trying to make the window smaller? My guess is it is. Making the
window smaller could indiate it is running out of buffers.


        Andrew

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

Reply | Threaded
Open this post in threaded view
|

Re: A&M Rattler and TCP performance

John Doe-11

>Please could you post a trace from tcpdump.

OK. Here are a few lines containing both a retransmit and a stall. Line
wrapping will probably make this very ugly, but here goes :

13:26:53.073631 IP (tos 0x0, ttl  64, id 24003, offset 0, flags [DF], proto
6, length: 1500) 192.168.123.123.51054 > 192.168.123.124.12345: P
34753:36201(1448) ack 1 win 5840 <nop,nop,timestamp 2762928138 6772731>
13:26:53.074012 IP (tos 0x0, ttl  64, id 29674, offset 0, flags [DF], proto
6, length: 52) 192.168.123.124.12345 > 192.168.123.123.51054: . [tcp sum ok]
ack 31857 win 17376 <nop,nop,timestamp 6772731 2762928137>
13:26:53.074022 IP (tos 0x0, ttl  64, id 24004, offset 0, flags [DF], proto
6, length: 1500) 192.168.123.123.51054 > 192.168.123.124.12345: P
36201:37649(1448) ack 1 win 5840 <nop,nop,timestamp 2762928138 6772731>
13:26:53.074401 IP (tos 0x0, ttl  64, id 29675, offset 0, flags [DF], proto
6, length: 52) 192.168.123.124.12345 > 192.168.123.123.51054: . [tcp sum ok]
ack 31857 win 17376 <nop,nop,timestamp 6772731 2762928137>
13:26:53.074411 IP (tos 0x0, ttl  64, id 24005, offset 0, flags [DF], proto
6, length: 1500) 192.168.123.123.51054 > 192.168.123.124.12345: .
37649:39097(1448) ack 1 win 5840 <nop,nop,timestamp 2762928139 6772731>
13:26:53.074781 IP (tos 0x0, ttl  64, id 29676, offset 0, flags [DF], proto
6, length: 52) 192.168.123.124.12345 > 192.168.123.123.51054: . [tcp sum ok]
ack 31857 win 17376 <nop,nop,timestamp 6772731 2762928137>
13:26:53.074793 IP (tos 0x0, ttl  64, id 24006, offset 0, flags [DF], proto
6, length: 1500) 192.168.123.123.51054 > 192.168.123.124.12345: .
31857:33305(1448) ack 1 win 5840 <nop,nop,timestamp 2762928139 6772731>
13:26:53.075204 IP (tos 0x0, ttl  64, id 29677, offset 0, flags [DF], proto
6, length: 52) 192.168.123.124.12345 > 192.168.123.123.51054: . [tcp sum ok]
ack 39097 win 10136 <nop,nop,timestamp 6772731 2762928139>
13:26:53.075217 IP (tos 0x0, ttl  64, id 24007, offset 0, flags [DF], proto
6, length: 1500) 192.168.123.123.51054 > 192.168.123.124.12345: .
39097:40545(1448) ack 1 win 5840 <nop,nop,timestamp 2762928140 6772731>
13:26:53.075226 IP (tos 0x0, ttl  64, id 24008, offset 0, flags [DF], proto
6, length: 1500) 192.168.123.123.51054 > 192.168.123.124.12345: .
40545:41993(1448) ack 1 win 5840 <nop,nop,timestamp 2762928140 6772731>
13:26:53.075517 IP (tos 0x0, ttl  64, id 29678, offset 0, flags [DF], proto
6, length: 52) 192.168.123.124.12345 > 192.168.123.123.51054: . [tcp sum ok]
ack 39097 win 17376 <nop,nop,timestamp 6772731 2762928139>
13:26:53.075717 IP (tos 0x0, ttl  64, id 29679, offset 0, flags [DF], proto
6, length: 52) 192.168.123.124.12345 > 192.168.123.123.51054: . [tcp sum ok]
ack 41993 win 14480 <nop,nop,timestamp 6772731 2762928140>
13:26:53.075728 IP (tos 0x0, ttl  64, id 24009, offset 0, flags [DF], proto
6, length: 1500) 192.168.123.123.51054 > 192.168.123.124.12345: P
41993:43441(1448) ack 1 win 5840 <nop,nop,timestamp 2762928140 6772731>
13:26:53.075737 IP (tos 0x0, ttl  64, id 24010, offset 0, flags [DF], proto
6, length: 1500) 192.168.123.123.51054 > 192.168.123.124.12345: .
43441:44889(1448) ack 1 win 5840 <nop,nop,timestamp 2762928140 6772731>
13:26:53.075904 IP (tos 0x0, ttl  64, id 29680, offset 0, flags [DF], proto
6, length: 52) 192.168.123.124.12345 > 192.168.123.123.51054: . [tcp sum ok]
ack 41993 win 17376 <nop,nop,timestamp 6772731 2762928140>
13:26:53.172454 IP (tos 0x0, ttl  64, id 29681, offset 0, flags [DF], proto
6, length: 52) 192.168.123.124.12345 > 192.168.123.123.51054: . [tcp sum ok]
ack 43441 win 17376 <nop,nop,timestamp 6772741 2762928140>
13:26:53.172465 IP (tos 0x0, ttl  64, id 24011, offset 0, flags [DF], proto
6, length: 1500) 192.168.123.123.51054 > 192.168.123.124.12345: .
44889:46337(1448) ack 1 win 5840 <nop,nop,timestamp 2762928237 6772741>
13:26:53.172474 IP (tos 0x0, ttl  64, id 24012, offset 0, flags [DF], proto
6, length: 1500) 192.168.123.123.51054 > 192.168.123.124.12345: .
46337:47785(1448) ack 1 win 5840 <nop,nop,timestamp 2762928237 6772741>

>This is not good. 10% packet getting loss it way too high. You should
>investigate this. You might want to see if you are running out of mbuf
>or clusters. This would cause a discard and so a retry.

I will try this. I think I can enable some diagnostic output when running
out of mbufs. Also, I will try again to increase the number of buffers, and
maybe verify somehow that my change is actually effective.

>Is it trying to make the window smaller? My guess is it is. Making the
>window smaller could indiate it is running out of buffers.

Actually, it seems to make the window larger. In this tcpdump, the window
was increased from 14480 to 17376 bytes(?) by the target, right before the
stall.

Best regards,
  Ola Bård Langlo

_________________________________________________________________
Be the first to hear what's new at MSN - sign up to our free newsletters!
http://www.msn.co.uk/newsletters


--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss