[patch][commit] Performance of --gprof option

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[patch][commit] Performance of --gprof option

Dave Brolley-2
Hi,

We have two ports which support pipeline modelling (--final-insn-count,
--trace-count) combined with use of --gprof=<file>,cycles=1 in order to
obtain cycle accurate profiles of the applications being simulated. Our
clients have complained about the poor performance of SID when using
these options. The poor performance is mainly due to the following factors:

o The gprof component is a "regular" client of the target scheduler when
in cycle mode.
o It is necessary to use --insn-count=1 in order to obtain accurate
samples when using --gprof=<file>,cycles=n. Otherwise the gprof
component is not triggered after the given number of cycles.

These combine to slow the simulation because each simulated cycle
requires one entire target scheduler iteration. Also, the gprof
component's "sample" pin is driven once per simulated cycle.

By making two simple changes, I was able to improve the performance of
SID by 30% in this situation.

1) Have the cpu component perform gprof sampling directly.
2) Allow a given number of cycles to be driven on the gprof component's
"sample" pin.

Change number 1 removes the requirement of using --insn-count=1. Gprof
sampling is now done by the cpu component in step_pin_handler and in
stop_after_insns_p.

Change number 2 allows several samples to be accumulated at the same pc
(when there is some kind of stall due to latency), eliminating the need
to drive the "sample" pin once per simulated cycle. This dramatically
reduces the number of times the pin is driven during a typical simulation.

These changes also fix a couple of "bugs" in that the gprof option
wasn't working as advertised (when --help is specified) since

o More samples would be taken than expected in insn-count mode if the
cpu was yielded
o Fewer samples than expected would be taken in cycle mode if
--insn-count=1 was not used.

The correct number of samples is now taken in all situations.

I've committed the attached patch which implements these changes. Tested
on two internal ports which support --gprof. Judging from the lack of
calls to cg_profile*, no other ports are using this feature.

Dave

2006-06-14  Dave Brolley  <[hidden email]>

        * gprof.cxx (gprof_component): Remove sim_sched and sim_sched_event.
        Remove the sim-sched-event attribute and the sim-sched relation.
        (accumulate): Increment the bucket count by the value driven on the
        pin.
        (configure): No longer a subscription to the target scheduler to
        maintain.

2006-06-14  Dave Brolley  <[hidden email]>

        * sidcpuutil.h (basic_cpu::sample_gprof_pin): New pin.
        (sample_gprof): New method of basic_cpu.
        (step_pin_handler): Don't reset current_step_insn_count.
        Call sample_gprof, if requested and save latency.
        (stop_after_insns_p): Likewise.
        (unconfigure_gprof): Now takes no arguments. Remove unused code.
        Disconnect the sample_gprof_pin.
        (configure_gprof): Now takes configuration string. Connect the
        sample_gprof_pin. Determine the sampling mode.
        (gprof_cycles,gprof_counter,gprof_prev_latency): New members of
        basic_cpu.
        (configure): Initialize gprof_counter.
        (basic_cpu): Add sample-gprof pin.

2006-06-14  Dave Brolley  <[hidden email]>

        * commonCfg.cxx (GprofCfg): Don't subscribe to the target scheduler.
        Connect the cpu's sample-gprof pin to our sample pin. No longer any
        relation with the target scheduler.
        (BoardCfg::write_config): Nl longer any relation between gprof and
        the target scheduler.


Index: sid/component/profiling/gprof.cxx
===================================================================
RCS file: /cvs/src/src/sid/component/profiling/gprof.cxx,v
retrieving revision 1.15
diff -c -p -r1.15 gprof.cxx
*** sid/component/profiling/gprof.cxx 11 May 2006 16:04:38 -0000 1.15
--- sid/component/profiling/gprof.cxx 14 Jun 2006 19:28:59 -0000
*************** namespace profiling_components
*** 135,143 ****
      vector<statistics> stats;
      unsigned current_stats;
 
-     component *sim_sched;
-     string sim_sched_event;
-
      string target_attribute;
      component* target_component;
 
--- 135,140 ----
*************** namespace profiling_components
*** 272,278 ****
  return component::ok;
        }
 
!     void accumulate (host_int_4)
        {
  if (! this->target_component) return;
 
--- 269,275 ----
  return component::ok;
        }
 
!     void accumulate (host_int_4 ticks)
        {
  if (! this->target_component) return;
 
*************** namespace profiling_components
*** 300,306 ****
 
  if (quantized < this->stats[current_stats].value_min) this->stats[current_stats].value_min = quantized;
  if (quantized > this->stats[current_stats].value_max) this->stats[current_stats].value_max = quantized;
! this->stats[current_stats].value_hitcount_map [quantized] ++;
        }
 
      void accumulate_call (host_int_4 selfpc_low)
--- 297,303 ----
 
  if (quantized < this->stats[current_stats].value_min) this->stats[current_stats].value_min = quantized;
  if (quantized > this->stats[current_stats].value_max) this->stats[current_stats].value_max = quantized;
! this->stats[current_stats].value_hitcount_map [quantized] += ticks;
        }
 
      void accumulate_call (host_int_4 selfpc_low)
*************** namespace profiling_components
*** 559,596 ****
       }
     current_stats = i;
   }
-
- // If cycles was specified, then we need to be subscribed to the
- // target scheduler
- if (! sim_sched)
-  return;
- if (parts.size () == 2)
-  {
-    host_int_4 cycles;
-    component::status s = parse_attribute (parts[1], cycles);
-    if (s == component::ok)
-      {
- sim_sched->connect_pin (sim_sched_event + "-event", & accumulate_pin);
- sim_sched->set_attribute_value (sim_sched_event + "-time", make_attribute (cycles));
- // Take a sample now to make up for the one which just got cancelled when
- // N-time was set.
- accumulate (1);
- return;
-      }
-  }
       }
-    // No gprof config or cycles was not specified. We will not be triggered by the
-    // target scheduler.
-    if (sim_sched)
-      sim_sched->disconnect_pin (sim_sched_event + "-event", & accumulate_pin);
     return;
   }
        }
 
    public:
      gprof_component ():
-       sim_sched (0),
-       sim_sched_event ("0"),
        target_attribute ("pc"),
        target_component (0),
        output_file_format (endian_unknown),
--- 556,568 ----
*************** namespace profiling_components
*** 605,611 ****
 
  add_pin ("sample", & this->accumulate_pin);
  add_attribute ("sample", & this->accumulate_pin, "pin");
- add_attribute ("sim-sched-event", & this->sim_sched_event, "setting");
  add_pin ("cg-caller", & this->cg_caller_pin);
  add_attribute ("cg-caller", & this->cg_caller_pin, "pin");
  add_pin ("cg-caller-hi", & this->cg_caller_hi_pin);
--- 577,582 ----
*************** namespace profiling_components
*** 653,659 ****
        "setting");
  add_attribute ("output-file-endianness", & this->output_file_format, "setting");
  add_uni_relation ("target-component", & this->target_component);
- add_uni_relation ("sim-sched", & this->sim_sched);
 
  cg_caller_hi_pin.driven (0);
  cg_callee_hi_pin.driven (0);
--- 624,629 ----
Index: sid/include/sidcpuutil.h
===================================================================
RCS file: /cvs/src/src/sid/include/sidcpuutil.h,v
retrieving revision 1.35
diff -c -p -r1.35 sidcpuutil.h
*** sid/include/sidcpuutil.h 11 May 2006 19:29:51 -0000 1.35
--- sid/include/sidcpuutil.h 14 Jun 2006 19:28:59 -0000
*************** namespace sidutil
*** 253,258 ****
--- 253,259 ----
      output_pin cg_callee_pin;
      output_pin cg_jump_pin;
      output_pin cg_return_pin;
+     output_pin sample_gprof_pin;
     
      // tracing
    private:
*************** namespace sidutil
*** 310,316 ****
  recursion_record limit (& this->step_limit);
  if (UNLIKELY(! limit.ok())) return;
 
- this->current_step_insn_count = 0;
  this->yield_p = false;
 
  // Check for triggerpoints due right now; may set yield_p!
--- 311,316 ----
*************** namespace sidutil
*** 320,326 ****
  sid::host_int_8 prev_latency = this->total_latency;
  sid::host_int_8 prev_insn_count = this->total_insn_count;
  if (! this->yield_p)
!  this->step_insns ();
  sid::host_int_8 num_insns = this->total_insn_count - prev_insn_count;
  sid::host_int_8 latency = this->total_latency - prev_latency;
 
--- 320,331 ----
  sid::host_int_8 prev_latency = this->total_latency;
  sid::host_int_8 prev_insn_count = this->total_insn_count;
  if (! this->yield_p)
!  {
!    if (UNLIKELY (this->gprof_configured_p))
!      this->sample_gprof (1);
!    this->gprof_prev_latency = this->total_latency;
!    this->step_insns ();
!  }
  sid::host_int_8 num_insns = this->total_insn_count - prev_insn_count;
  sid::host_int_8 latency = this->total_latency - prev_latency;
 
*************** namespace sidutil
*** 392,401 ****
--- 397,436 ----
        return num;
      }
 
+     void sample_gprof (sid::host_int_4 num_insns)
+     {
+       this->gprof_counter += num_insns;
+
+       // Sample for gprof in insn-count mode?
+       if (this->gprof_cycles == 0)
+ {
+  sid::host_int_4 ticks = this->gprof_counter / this->step_insn_count;
+  if (ticks > 0)
+    {
+      this->sample_gprof_pin.drive (ticks);
+      this->gprof_counter %= this->step_insn_count;
+    }
+  return;
+ }
+
+       // Sample for gprof in cycle mode
+       if ((sid::signed_host_int_8)(this->total_latency) > (sid::signed_host_int_8)(this->gprof_prev_latency))
+ this->gprof_counter += this->total_latency - this->gprof_prev_latency;
+
+       sid::host_int_4 ticks = this->gprof_counter / this->gprof_cycles;
+       if (ticks > 0)
+ {
+  this->sample_gprof_pin.drive (ticks);
+  this->gprof_counter %= this->gprof_cycles;
+ }
+     }
+
      virtual void step_insns () = 0;
      bool stop_after_insns_p (sid::host_int_4 num)
        {
  this->current_step_insn_count += num;
+
+ bool rc;
  if (UNLIKELY(this->yield_p ||
     (this->current_step_insn_count >= this->step_insn_count)))
            {
*************** namespace sidutil
*** 403,414 ****
              // arithmetic overhead in the inner insn-stepping loops.
     this->total_insn_count += this->current_step_insn_count;
     this->current_step_insn_count = 0;
!    return true;
            }
          else
   {
!    return false;
   }
        }
 
      void
--- 438,459 ----
              // arithmetic overhead in the inner insn-stepping loops.
     this->total_insn_count += this->current_step_insn_count;
     this->current_step_insn_count = 0;
!    rc = true; // stop!
            }
          else
+  rc = false; // don't stop
+
+ // Sample for gprof?
+ if (UNLIKELY (this->gprof_configured_p))
   {
!    // Count 1 fewer insns if exiting to account for the one counted on entry
!    if (rc)
!      --num;
!    this->sample_gprof (num);
   }
+ this->gprof_prev_latency = this->total_latency;
+
+ return rc;
        }
 
      void
*************** namespace sidutil
*** 487,493 ****
  return static_cast<cpu_trap_disposition>(trap_disposition_pin.sense ());
        }
 
!     void unconfigure_gprof (const string &gprof_spec, sid::host_int_4 num_cycles)
        {
  if (! gprof_configured_p)
   return;
--- 532,538 ----
  return static_cast<cpu_trap_disposition>(trap_disposition_pin.sense ());
        }
 
!     void unconfigure_gprof ()
        {
  if (! gprof_configured_p)
   return;
*************** namespace sidutil
*** 495,536 ****
  assert (gprof);
  sid::pin *p;
 
- #if 0 // can't happen?
- // If 'cycles' was specified on the --gprof option, then
- // first, sample the address of the branch which caused
- // the reconfig for the given number of cycles.
- if (num_cycles && last_caller && gprof_spec.size () > 6)
-  {
-    vector<string> parts = tokenize (gprof_spec.substr (6), ",");
-    if (parts.size () > 1)
-      {
- p = gprof->find_pin ("sample");
- if (p)
-  {
-    std::string save_pc = this->attribute_value ("pc");
-    if (! save_pc.empty ())
-      {
- sid::component::status s = this->set_attribute_value ("pc", make_numeric_attribute (last_caller));
- if (s == sid::component::ok)
-  for (int i = 0; i < num_cycles; ++i)
-    p->driven (1);
- this->set_attribute_value ("pc", save_pc);
-      }
-  }
-      }
-  }
- #endif
-
  // Then disconnect the call graph notification pins.
  p = gprof->find_pin ("cg-caller");
  if (p) cg_caller_pin.disconnect (p);
  p = gprof->find_pin ("cg-callee");
  if (p) cg_callee_pin.disconnect (p);
 
  gprof_configured_p = false;
        }
 
!     void configure_gprof ()
        {
  if (gprof_configured_p)
   return;
--- 540,557 ----
  assert (gprof);
  sid::pin *p;
 
  // Then disconnect the call graph notification pins.
  p = gprof->find_pin ("cg-caller");
  if (p) cg_caller_pin.disconnect (p);
  p = gprof->find_pin ("cg-callee");
  if (p) cg_callee_pin.disconnect (p);
+ p = gprof->find_pin ("sample");
+ if (p) sample_gprof_pin.disconnect (p);
 
  gprof_configured_p = false;
        }
 
!     void configure_gprof (const string &config)
        {
  if (gprof_configured_p)
   return;
*************** namespace sidutil
*** 551,556 ****
--- 572,593 ----
     p->driven (last_callee);
  }
 
+ p = gprof->find_pin ("sample");
+ if (p)
+  sample_gprof_pin.connect (p);
+
+ vector<string> parts = tokenize (config.substr (6), ",");
+ if (parts.size () == 2)
+  {
+    component::status s = parse_attribute (parts[1], gprof_cycles);
+    gprof_counter = gprof_cycles - 1;
+  }
+ else
+  {
+    gprof_cycles = 0;
+    gprof_counter = step_insn_count - 1;
+  }
+
  gprof_configured_p = true;
        }
 
*************** namespace sidutil
*** 563,568 ****
--- 600,608 ----
      sid::host_int_4 last_caller;
      sid::host_int_4 last_callee;
      bool gprof_configured_p;
+     sid::host_int_4 gprof_cycles;
+     sid::host_int_4 gprof_counter;
+     sid::host_int_8 gprof_prev_latency;
 
      virtual void configure (const string &config)
        {
*************** namespace sidutil
*** 580,588 ****
     gprof->set_attribute_value ("configure!", config);
     // Now do our own configuration
     if (config.size () > 6)
!      configure_gprof ();
     else
!      unconfigure_gprof (config, num_cycles);
     return;
   }
  if (config.size () <= 11)
--- 620,628 ----
     gprof->set_attribute_value ("configure!", config);
     // Now do our own configuration
     if (config.size () > 6)
!      configure_gprof (config);
     else
!      unconfigure_gprof ();
     return;
   }
  if (config.size () <= 11)
*************** namespace sidutil
*** 592,598 ****
     sid::host_int_4 n;
     sid::component::status s = parse_attribute (config.substr (11), n);
     if (s == sid::component::ok)
!      step_insn_count = n;
     return;
   }
  if (config.substr (0, 8) == "verbose=")
--- 632,642 ----
     sid::host_int_4 n;
     sid::component::status s = parse_attribute (config.substr (11), n);
     if (s == sid::component::ok)
!      {
! step_insn_count = n;
! if (gprof_configured_p && gprof_cycles == 0)
!  gprof_counter = step_insn_count - 1;
!      }
     return;
   }
  if (config.substr (0, 8) == "verbose=")
*************** namespace sidutil
*** 838,843 ****
--- 882,888 ----
  public:
      basic_cpu ():
        total_latency (0),
+       current_step_insn_count (0),
        step_limit ("instruction stepping", 1),
        sched_query (this),
        triggerpoint_manager (this),
*************** public:
*** 857,862 ****
--- 902,908 ----
        last_caller (0),
        last_callee (0),
        gprof_configured_p (false),
+       gprof_prev_latency (0),
        core_probe (0),
        main (0)
        {
*************** public:
*** 881,886 ****
--- 927,933 ----
  add_pin ("cg-return", & this->cg_return_pin);
  add_pin ("cg-jump", & this->cg_jump_pin);
  add_pin ("print-insn-summary!", & this->print_insn_summary_pin);
+ add_pin ("sample-gprof", &sample_gprof_pin);
  add_pin ("endian-set!", & this->endian_set_pin);
  add_pin ("eflags-set!", & this->eflags_set_pin);
  add_watchable_pin ("trap", & this->trap_type_pin); // output side
Index: sid/main/dynamic/commonCfg.cxx
===================================================================
RCS file: /cvs/src/src/sid/main/dynamic/commonCfg.cxx,v
retrieving revision 1.12
diff -c -p -r1.12 commonCfg.cxx
*** sid/main/dynamic/commonCfg.cxx 11 May 2006 20:27:02 -0000 1.12
--- sid/main/dynamic/commonCfg.cxx 14 Jun 2006 19:28:59 -0000
*************** GprofCfg::GprofCfg (const string name,
*** 1002,1022 ****
  {
    assert (cpu);
    assert (sess);
-   // Add a subscription to the target scheduler. Even if it's not
-   // used now, it could be used due to dynamic configuration.
-   assert (sess->sim_sched);
-   int slot = sess->sim_sched->add_subscription (this, "sample");
-   sess->sim_sched->set_regular (slot, true);
-   sess->sim_sched->set_time (slot, interval);
-   sess->sim_sched->set_priority (slot, SchedCfg::gprof_priority);
 
!   if (type != simulated_cycles)
!     {
!       // default to instruction_count
!       string ev = sidutil::make_attribute (cpu->get_subscription_number());
!       ev += "-event";
!       conn_pin (sess->sim_sched, ev, this, "sample");
!     }
 
    sess->shutdown_seq->add_output (7, this, "store");
    relate (this, "target-component", cpu);
--- 1002,1009 ----
  {
    assert (cpu);
    assert (sess);
 
!   conn_pin (cpu, "sample-gprof", this, "sample");
 
    sess->shutdown_seq->add_output (7, this, "store");
    relate (this, "target-component", cpu);
*************** GprofCfg::GprofCfg (const string name,
*** 1025,1031 ****
    set (this, "value-attribute", "pc");
    set (this, "bucket-size", "4"); // bytes-per-bucket
    set (this, "output-file", filename);
-   set (this, "sim-sched-event", sidutil::make_attribute (slot));
  }
 
  // Create a gprof component but don't activate it
--- 1012,1017 ----
*************** GprofCfg::GprofCfg (const string name,
*** 1039,1057 ****
  {
    assert (cpu);
    assert (sess);
-   // Add a subscription to the target scheduler. Even if it's not
-   // used now, it could be used due to dynamic configuration.
-   assert (sess->sim_sched);
-   int slot = sess->sim_sched->add_subscription (this, "sample");
-   sess->sim_sched->set_regular (slot, true);
-   sess->sim_sched->set_time (slot, 1);
-   sess->sim_sched->set_priority (slot, SchedCfg::gprof_priority);
 
    sess->shutdown_seq->add_output (7, this, "store");
    relate (this, "target-component", cpu);
    set (this, "value-attribute", "pc");
    set (this, "bucket-size", "4"); // bytes-per-bucket
-   set (this, "sim-sched-event", sidutil::make_attribute (slot));
  }
 
 
--- 1025,1035 ----
*************** void BoardCfg::write_config (Writer &w)
*** 1330,1336 ****
        if (gprof)
        {
   // gprof's configure! attribute will be set by the cpu.
-  Relation (gprof, "sim-sched", sess->sim_sched).write_to (w);
   Relation (cpu, "gprof", gprof).write_to (w);
        }
        if (! gloss->possibly_wrapped ())
--- 1308,1313 ----