[patch][rfa] Don't Generate Code to Support Unused Write Stacks

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[patch][rfa] Don't Generate Code to Support Unused Write Stacks

Dave Brolley-2
Hi,

This patch is for cgen generating SID only.

When generating write stacks and the supporting code in order to suppor
the (delay ...) rtl construct for SID, cgen currently generates a write
stack for all registers and memory modes regardless of whether they are
used or not. This is a performance problem uding writeback as the
majority of the generated stacks are never written to. For most ports
using (delay ...), the pc and perhaps one or two additional items are
all that are ever delayed.

This patch keeps track of which hardware and memory modes are actually
used by (delay ...) and generates only the needed write stacks. One
potentially icky part of the patch is that it requires that semantics be
generated first, since this is where the usage information is gathered.
I modified sid/component/cgen-cpu/CGEN.sh.in to ensure this.

For the internal port for which performance was an issue, this yielded
an improvement of 17%. I have also tested this on mt and m32r, the two
other ports currently using write stacks.

Comments? Concerns? OK to commit?

Dave

2006-07-11  Dave Brolley  <[hidden email]>

        * CGEN.sh.in (fileopts): Place semantic switches first so that they
        are processed first.

2006-07-11  Dave Brolley  <[hidden email]>

        * cpu/sh64-compact.cpu (movual, movual2): New insns.
        (movcol): New insn.
        * cpu/sh.cpu (sh4a-nofpu-models): New pmacro.
        * sid.scm (-op-gen-delayed-set-maybe-trace): If delay used, mote the
        hardware or memory mode which was used.
        * sid-cpu.scm (hw-need-write-stack?): New function.
        (-gen-hw-stream-and-destream-fns): Compute stack-regs. Use it to
        identify hardware which uses write stacks.
        (useful-mode-names): Renamed to write-stack-memory-mode-names.
        Initialized to an empty list.
        (-gen-writestacks, -gen-reset-fn, -gen-unified-write-fn): Use
        hw-need-write-stack?.
        * hardware.scm (used-in-delay-rtl?): New member of <hardware-base>.
        (define-getters <hardware-base>): Define used-in-delay-rtl?.
        (used-in-delay-rtl?): New method of <hardware-base>.
        (hw-used-in-delay-rtl?): New function.


Index: cgen/hardware.scm
===================================================================
RCS file: /cvs/cvsfiles/devo/cgen/hardware.scm,v
retrieving revision 1.49
diff -c -p -r1.49 hardware.scm
*** cgen/hardware.scm 20 Jul 2003 18:06:02 -0000 1.49
--- cgen/hardware.scm 12 Jul 2006 19:23:43 -0000
***************
*** 66,71 ****
--- 66,75 ----
  ; or #f if not computed yet.
  ; This is a derived from the ISA attribute and is for speed.
  (isas-cache . #f)
+
+ ; Flag indicates whether this hw has been used in a (delay ...)
+ ; rtl expression
+ (used-in-delay-rtl? . #f)
  )
       nil)
  )
***************
*** 77,83 ****
     ; ??? These might be more properly named hw-get/hw-set, but those names
     ; seem ambiguous.
     (get . getter) (set . setter)
!    isas-cache)
  )
 
  ; Mode,rank,shape support.
--- 81,87 ----
     ; ??? These might be more properly named hw-get/hw-set, but those names
     ; seem ambiguous.
     (get . getter) (set . setter)
!    isas-cache used-in-delay-rtl?)
  )
 
  ; Mode,rank,shape support.
***************
*** 159,164 ****
--- 163,177 ----
  )
 
  (define (hw-isas hw) (send hw 'get-isas))
+
+ ; Was this hardware used in a (delay ...) rtl expression?
+
+ (method-make!
+  <hardware-base> 'used-in-delay-rtl?
+  (lambda (self) (elm-get self 'used-in-delay-rtl?))
+ )
+
+ (define (hw-used-in-delay-rtl? hw) (send hw 'used-in-delay-rtl?))
 
  ; FIXME: replace pc?,memory?,register?,iaddress? with just one method.
 
Index: cgen/sid-cpu.scm
===================================================================
RCS file: /cvs/cvsfiles/devo/cgen/sid-cpu.scm,v
retrieving revision 1.65
diff -c -p -r1.65 sid-cpu.scm
*** cgen/sid-cpu.scm 18 Jun 2006 17:00:04 -0000 1.65
--- cgen/sid-cpu.scm 12 Jul 2006 19:23:44 -0000
*************** namespace @arch@ {
*** 171,176 ****
--- 171,180 ----
         (not (obj-has-attr? hw 'VIRTUAL)))
  )
 
+ (define (hw-need-write-stack? hw)
+   (and (register? hw) (hw-used-in-delay-rtl? hw))
+ )
+
  ; Subroutine of -gen-hardware-types to generate the struct containing
  ; hardware elements of one isa.
 
*************** namespace @arch@ {
*** 204,209 ****
--- 208,214 ----
  (define (-gen-hw-stream-and-destream-fns)
    (let* ((sa string-append)
  (regs (find hw-need-storage? (current-hw-list)))
+ (stack-regs (find hw-need-write-stack? (current-hw-list)))
  (reg-dim (lambda (r)
     (let ((dims (-hw-vector-dims r)))
       (if (equal? 0 (length dims))
*************** namespace @arch@ {
*** 211,219 ****
   (number->string (car dims))))))
  (write-stacks
   (map (lambda (n) (sa n "_writes"))
!       (append (map (lambda (r) (gen-c-symbol (obj:name r))) regs)
        ;; %redact changeone /sa m/ /sa \(symbol-\>string m\)/ unless renesas-sh-optimizations
!       (map (lambda (m) (sa m "_memory")) useful-mode-names))))
  (stream-reg (lambda (r)
        (let ((rname (sa "hardware." (gen-c-symbol (obj:name r)))))
  (if (hw-scalar? r)
--- 216,224 ----
   (number->string (car dims))))))
  (write-stacks
   (map (lambda (n) (sa n "_writes"))
!       (append (map (lambda (r) (gen-c-symbol (obj:name r))) stack-regs)
        ;; %redact changeone /sa m/ /sa \(symbol-\>string m\)/ unless renesas-sh-optimizations
!       (map (lambda (m) (sa m "_memory")) write-stack-memory-mode-names))))
  (stream-reg (lambda (r)
        (let ((rname (sa "hardware." (gen-c-symbol (obj:name r)))))
  (if (hw-scalar? r)
*************** typedef struct {
*** 382,388 ****
  ;;; begin stack-based write schedule
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
! (define useful-mode-names '(BI QI HI SI DI UQI UHI USI UDI SF DF))
 
  (define (-calculated-memory-write-buffer-size)
    (let* ((is-mem? (lambda (op) (eq? (hw-sem-name (op:type op)) 'h-memory)))
--- 387,393 ----
  ;;; begin stack-based write schedule
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
! (define write-stack-memory-mode-names '())
 
  (define (-calculated-memory-write-buffer-size)
    (let* ((is-mem? (lambda (op) (eq? (hw-sem-name (op:type op)) 'h-memory)))
*************** typedef struct {
*** 447,454 ****
 
 
  (define (-gen-writestacks)
!   (let* ((hw (find register? (current-hw-list)))
! (modes useful-mode-names)
  (hw-pairs (map (lambda (h) (list (gen-c-symbol (obj:name h))
     (obj:name (hw-mode h))))
  hw))
--- 452,459 ----
 
 
  (define (-gen-writestacks)
!   (let* ((hw (find hw-need-write-stack? (current-hw-list)))
! (modes write-stack-memory-mode-names)
  (hw-pairs (map (lambda (h) (list (gen-c-symbol (obj:name h))
     (obj:name (hw-mode h))))
  hw))
*************** using namespace cgen;
*** 629,637 ****
  (define (-gen-reset-fn)
    (let* ((sa string-append)
  (objs (append (map (lambda (h) (gen-c-symbol (obj:name h)))
!    (find register? (current-hw-list)))
        (map (lambda (m) (sa (symbol->string m) "_memory"))
!    useful-mode-names)))
  (clr (lambda (elt) (sa "    clear_stacks (" elt "_writes);\n"))))
      (sa
       "  template <typename ST> \n"
--- 634,642 ----
  (define (-gen-reset-fn)
    (let* ((sa string-append)
  (objs (append (map (lambda (h) (gen-c-symbol (obj:name h)))
!    (find hw-need-write-stack? (current-hw-list)))
        (map (lambda (m) (sa (symbol->string m) "_memory"))
!    write-stack-memory-mode-names)))
  (clr (lambda (elt) (sa "    clear_stacks (" elt "_writes);\n"))))
      (sa
       "  template <typename ST> \n"
*************** using namespace cgen;
*** 645,652 ****
       "  }")))
 
  (define (-gen-unified-write-fn)
!   (let* ((hw (find register? (current-hw-list)))
! (modes useful-mode-names)
  (hw-triples (map (lambda (h) (list (gen-c-symbol (obj:name h))
     (obj:name (hw-mode h))
     (length (-hw-vector-dims h))))
--- 650,657 ----
       "  }")))
 
  (define (-gen-unified-write-fn)
!   (let* ((hw (find hw-need-write-stack? (current-hw-list)))
! (modes write-stack-memory-mode-names)
  (hw-triples (map (lambda (h) (list (gen-c-symbol (obj:name h))
     (obj:name (hw-mode h))
     (length (-hw-vector-dims h))))
Index: cgen/sid.scm
===================================================================
RCS file: /cvs/cvsfiles/devo/cgen/sid.scm,v
retrieving revision 1.64
diff -c -p -r1.64 sid.scm
*** cgen/sid.scm 20 Jun 2006 19:54:18 -0000 1.64
--- cgen/sid.scm 12 Jul 2006 19:23:44 -0000
***************
*** 1100,1105 ****
--- 1100,1110 ----
  (idx-args (if (equal? idx "") "" (string-append ", " idx)))
  )
     
+     (if delayval
+ (if (eq? (obj:name hw) 'h-memory)
+    (set write-stack-memory-mode-names (cons md write-stack-memory-mode-names))
+    (elm-set! hw 'used-in-delay-rtl? #t)))
+
      (string-append
       "  {\n"
 
Index: sid/component/cgen-cpu/CGEN.sh.in
===================================================================
RCS file: /cvs/cvsfiles/devo/sid/component/cgen-cpu/CGEN.sh.in,v
retrieving revision 1.10
diff -c -p -r1.10 CGEN.sh.in
*** sid/component/cgen-cpu/CGEN.sh.in 10 May 2006 20:44:49 -0000 1.10
--- sid/component/cgen-cpu/CGEN.sh.in 12 Jul 2006 19:24:52 -0000
*************** rm -f tmp-semsw-$$.cxx1 tmp-semsw-$$.cxx
*** 78,83 ****
--- 78,85 ----
  rm -f tmp-dec-$$.h1 tmp-dec-$$.h
  rm -f tmp-dec-$$.cxx1 tmp-dec-$$.cxx
 
+ # Do semantics first, if specified because some information is collected
+ # about write stacks during this pass which is used by other passes.
  fileopts=""
  for f in $filespecs
  do
*************** do
*** 89,96 ****
      decode.cxx) fileopts="$fileopts -D tmp-dec-$$.cxx1" ;;
      model.h) fileopts="$fileopts -N tmp-mod-$$.h1" ;;
      model.cxx) fileopts="$fileopts -M tmp-mod-$$.cxx1" ;;
!     semantics.cxx) fileopts="$fileopts -S tmp-sem-$$.cxx1" ;;
!     sem-switch.cxx) fileopts="$fileopts -X tmp-semsw-$$.cxx1" ;;
      write.cxx) fileopts="$fileopts -W tmp-write-$$.cxx1" ;;
      *) echo "unknown file spec: $f" >&2 ; exit 1 ;;
      esac
--- 91,98 ----
      decode.cxx) fileopts="$fileopts -D tmp-dec-$$.cxx1" ;;
      model.h) fileopts="$fileopts -N tmp-mod-$$.h1" ;;
      model.cxx) fileopts="$fileopts -M tmp-mod-$$.cxx1" ;;
!     semantics.cxx) fileopts="-S tmp-sem-$$.cxx1 $fileopts" ;;
!     sem-switch.cxx) fileopts="-X tmp-semsw-$$.cxx1 $fileopts" ;;
      write.cxx) fileopts="$fileopts -W tmp-write-$$.cxx1" ;;
      *) echo "unknown file spec: $f" >&2 ; exit 1 ;;
      esac
Reply | Threaded
Open this post in threaded view
|

SID for PowerPC

Evgeny Belyanco
Hi!

I found nice manual

eCos and SID on ARM PID target HOWTO
http://www.asisi.co.uk/ecos_sid.html

What about PowerPC? Is SID now capable to run PowePC?

Power PC is not mentioned in "library of components"
http://sourceware.org/cgi-bin/cvsweb.cgi/~checkout~/src/sid/component/CATALOG?content-type=text/plain&cvsroot=sid
but it is some PowerPC files in fresh snapshot.

Here is some "PowerPC support for SID"
http://ecos.sourceware.org/ml/sid/2002-q2/msg00014.html


Evgeny Belyanko
**********************************
* E-mail: [hidden email]
**********************************

Reply | Threaded
Open this post in threaded view
|

Re: [patch][rfa] Don't Generate Code to Support Unused Write Stacks

Frank Ch. Eigler
In reply to this post by Dave Brolley-2

brolley wrote:

> [...]
> When generating write stacks and the supporting code in order to
> suppor the (delay ...) rtl construct for SID, cgen currently generates
> a write stack for all registers and memory modes regardless of whether
> they are used or not. [...]
> For the internal port for which performance was an issue, this yielded
> an improvement of 17%. [...]

Sounds good.  I'm surprised though that your change should cause such
a noticeable improvement.  It may be that the sid-side code to handle
the write queue testing/iteration is rather deficient.  (Try adding
some UNLIKELY markers to the CPU::writeback() function's while()
conditions.)

- FChE
Reply | Threaded
Open this post in threaded view
|

Re: [patch][rfa] Don't Generate Code to Support Unused Write Stacks

Dave Brolley-2
Frank Ch. Eigler wrote:

>brolley wrote:
>
>  
>
>>[...]
>>When generating write stacks and the supporting code in order to
>>suppor the (delay ...) rtl construct for SID, cgen currently generates
>>a write stack for all registers and memory modes regardless of whether
>>they are used or not. [...]
>>For the internal port for which performance was an issue, this yielded
>>an improvement of 17%. [...]
>>    
>>
>
>Sounds good.  I'm surprised though that your change should cause such
>a noticeable improvement.  It may be that the sid-side code to handle
>the write queue testing/iteration is rather deficient.  (Try adding
>some UNLIKELY markers to the CPU::writeback() function's while()
>conditions.)
>  
>
I think that the difference was 39 vs 2 write stacks to manage for this
port.

Dave

Reply | Threaded
Open this post in threaded view
|

Re: SID for PowerPC

Frank Ch. Eigler
In reply to this post by Evgeny Belyanco
Hi -

> [...]
> What about PowerPC? Is SID now capable to run PowePC?
> [...]

Sorry, no, we have not completed a port of a PowerPC core
simulator to sid.

- FChE
Reply | Threaded
Open this post in threaded view
|

Re[2]: SID for PowerPC

Evgeny Belyanco
Frank Ch. Eigler
Thursday, July 13, 2006, 11:10:02 PM, you wrote:
>> What about PowerPC? Is SID now capable to run PowePC?

FCE> Sorry, no, we have not completed a port of a PowerPC core
FCE> simulator to sid.

:(

Is it any plan for complete this (within '06-07 years)?

I have no skills in SID now, but how I can help this? What part of PPC
core simulator not finished? Or it need tests?

Evgeny Belyanko
**********************************
* E-mail: [hidden email]
**********************************

Reply | Threaded
Open this post in threaded view
|

Re: SID for PowerPC

Frank Ch. Eigler
Hi -

> FCE> Sorry, no, we have not completed a port of a PowerPC core
> FCE> simulator to sid.
>
> Is it any plan for complete this (within '06-07 years)?

I am not aware of any.

> I have no skills in SID now, but how I can help this? [...]

Depending on your interest and expertise, various options include:
- finding a way of using gdb's psim instead
- continuing Johan Rydberg's powerpc cpu model sketch in cgen/cpu/powerpc.cpu

- FChE
Reply | Threaded
Open this post in threaded view
|

Re: [patch][rfa] Don't Generate Code to Support Unused Write Stacks

Dave Brolley-2
In reply to this post by Dave Brolley-2
Frank Ch. Eigler wrote:

>> Sounds good.  I'm surprised though that your change should cause such
>> a noticeable improvement.  It may be that the sid-side code to handle
>> the write queue testing/iteration is rather deficient.  (Try adding
>> some UNLIKELY markers to the CPU::writeback() function's while()
>> conditions.)
>
>
This patch has now been committed.

Dave