University of Maryland
Brought to you by Maryland Memory-Systems Research
Overview of DRAMsimThis is the web site for DRAMsim, hardware-validated, public-domain DRAM system simulation code that was developed by members of the Systems and Computer Architecture Lab (SCAL) in the Department of Electrical and Computer Engineering at the University of Maryland. It is both a standalone simulator and part of the SYSim System-Level Simulation Framework developed in the same group and which will be made available with the release of our book, Memory Systems: Cache, DRAM, Disk--A Holistic Approach to Design, due out Spring 2007 through Morgan Kaufmann.
This research work is directed by Prof. Bruce Jacob, and this simulation code is being developed as part of ongoing research activity on (DRAM-based) memory systems. As of Fall 2005, the DRAM simulation framework represents roughly 20 student-years of development effort, with the most recent version of the code representing the Ph.D. work of Dave Wang, Brinda Ganesh, and several other contributors. The simulator is written up in the following publication (should you wish to cite a source):
The DRAMsim Memory-System Simulation Framework is a set of simulation codes written in C to simulate the behaviour of commodity DRAM devices such as SDRAM, DDR SDRAM, DDR2 and DDR3 SDRAM devices. Support for Direct RDRAM devices have been deprecated due to the decreased interest in the memory system. However, efforts are currently underway to implement FB-DIMM support into the simulation framework, and implementation of XDR, RLDRAM and FCRAM devices remain as options for the future.
DRAMsim can be run as a standalone object, using a pre-programmed set of equations to generate a sequence of memory requests with different interarrival times and address patterns. It can also be driven by an address trace, taking timing and address information from the trace instead. DRAMsim has also been ported to act as a transaction-driven, variable-latency DRAM memory system for sim-MASE from the University of Michigan and sim-Alpha from the University of Texas at Austin. Efforts are currently underway to port the simulation framework to a suitable multiprocessor simulation framework (currently GEMS).
DRAMsim has been designed to simulate a sophisticated memory system where transactions can be freely re-ordered, the address mapping scheme can be independently configured, the row buffer management policy can be independently adjusted, DRAM refresh policies can be flexibly implemented or turned off entirely. The DRAM device and system configurations, timing parameters and power consumption parameters can all be set independently and adjusted in configuration files.
Why Do We Need to Simulate DRAM Systems This Accurately??
Modern computer system performance is increasingly limited by the performance of DRAM-based memory systems. As a result, there is great interest in providing accurate simulations of DRAM based memory systems as part of architectural research. Unfortunately, there is great difficulty associated with the study of modern DRAM memory systems, arising from the fact that DRAM-system performance depends on many independent variables such as workload characteristics of memory access rate and request sequence, memory-system architecture, memory-system configuration, DRAM access protocol, and DRAM device timing parameters. As a result, system architects and design engineers often disagree on the usefulness of a given performance-enhancing feature, since the performance impact of that feature typically depends on the characteristics of specific workloads, memory-system architecture, memory-system configuration, DRAM access protocol and DRAM device timing parameters.
Figure 1 shows the pipelined scheduling of a DDR2 SDRAM device. Despite the fact that the simulated memory system uses a closed-page policy and rotates through available banks on the DRAM device, which should simplify scheduling considerably, the scheduling of this system is actually more complex than earlier DRAM systems: for instance, new timing parameters such as tRRD and tFAW are contributing to the growing set of timing constraints placed on each successive generation of DRAM devices.
DRAM based memory systems are impacted primarily by two attributes: row cycle time and device datarate. Presently, DRAM row cycle times are decreasing at a rate of approximately 7% per year, and DRAM device datarates are increasing with each new generation of DRAM devices at the rate of 100% every three years,
The difference in the scaling trends of the DRAM device means that fundamental DRAM device performance characteristics are changing every single generation, and the changing performance characteristics cannot be accurately predicted by linear extrapolations. The result is that no computer architect can rest easy knowing that he or she has obtained X% of performance improvement with a set of microarchitectural techniques on the current generation memory system, because the same set of microarchitectural techniques may not to be as effective in a future memory system due to the differences in the scaling attributes of DRAM devices,
Our DRAM-system simulation work enables system architects not only to explore the impact of a set of microarchitectual techniques on a given memory system but also to examine the effectiveness of those microarchitectural techniques on a future generation memory system with future generations of DRAM devices.
Screenshots of Trace Viewers
Along with the simulator, we are making available two bus-trace viewers, which can be helpful in analyzing a sequence of transactions on the DRAM bus. The following are screen shots of the BTV Bus Trace Viewer, developed by Dave Wang.
BTVBTV is a bus trace viewer that can be easily adapted to provide a macroscopic view of traces. The screen capture image of BTV displays the temporal characteristics of the SETI@HOME application, as viewed from the processor bus. As the application marches through loops, proportional ratios of reads data and write data requests are generated with little I/O or instruction fetch activities. The illustrated screen shot also displays the temporary disruptive effects to request pattern as the operating system interrupts the SETI@HOME application every 10 ms. (CPU quantum)
The following is a picture of an application with very different characteristics: Quake3.
Additional screenshots can be found here, where we show several different zoom levels for each trace.
The following are screen shots of the VisTool trace viewer, developed by Vincent Chan and Austin Lanham.
VisToolThe DRAM VisTool is a visual tool developed in Java that creates a graphical environment reflecting a variety of information relating to DRAM devices. DRAM VisTool focuses on the interaction between the memory controller and the DRAM System as described by DRAMsim. VisTool works in conjunction with DRAMsim to accurately display memory-system timing diagrams and statistics. The idea was to transfer a large amount of numerical data (generated from DRAMsim) and convert it into meaningful graphical interpretations that can be used to better understand the behavior of memory accesses.
The following screen shot shows the typical data view, illustrating the timing of events on the bus, such as command bus, bank utilization, data bus, etc. Note that the view can be scaled to any size.
Each event on the bus can be singled out to show its detailed transaction information:
The tool supports other data views, such as a statistical histogram of activity, similar to the output of BTV:
DRAMsim version 2
Update: January 2006
We have several people in our group that are using the DRAMsim1 code, and it has been modified for use in power simulations as well as FB-DIMM memory-system simulations, among many other things. The code is based on a cycle-accurate state-to-state transition of DRAM devices. We have found that, due to the complexities of this simualtion approach and DRAM device behavior, it is tremendous work to modify the code and still retain correctness.
Consequently, we are working on a new version of the DRAM simulation core that uses a protocol-table approach; the protocol table works for SDRAM, DDR SDRAM, DDR2 and DDR3, and it accounts for all the subtle differences in internal prefetch lengths, write-delay differences and additive latency differences.
If you're interested in developing DRAM simulation software along with us (and a few others around the world), let us know, and we will make the new code available (it is placed under GPL). The difference is that you would be getting code that is not "full featured" and is not ready to be plugged back into AlphaSim, MASE, or GEMS right now.
Bottom line: if you just want to use DRAMsim with an existing simulator, check out the old (more feature rich) code below, but our priority to do bug fixes on that code is low for now, since the priority is being placed on getting the new code healthy and making it more feature rich. If you are interested in v2, you can start with the following docs:
Download DRAMsim and Related Information
Note: code and applications are available under the GNU Public License.
|White paper||-||"DRAMsim: A memory-system simulator." David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Katie Baynes, Aamer Jaleel, and Bruce Jacob. SIGARCH Computer Architecture News, vol. 33, no. 4, pp. 100-107. September 2005.|
Source code for the DRAMsim simulator.
Source code for the DRAMsim simulator.
|ports||-||Interface modules that connect DRAMsim to other well-known simulators:|
Bus Trace Viewer tarball.
The Bus Trace Viewer is a Tcl/Tk based visualization tool that allows the user to look at the address trace to be fed into the simulation framework. It aids in the understanding of the trace workload.
The DRAM VisTool is a visual tool developed in Java that creates a graphical representation of activity on the various busses and in physical resources over the course of time.
Memory address trace from 176.gcc.
This trace can be used as input to drive the DRAMsim memory-system simulation code as well as be viewed with the trace viewing tool, BTV. This address trace is captured with the L2 cache size set to 4 MB. As a result, there are very few cache misses, and memory access pattern is very bursty in nature. This trace is completely uninteresting, but it does show the format of the address trace. When used in conjunction with the bus trace viewing tool, please select "mase" as the trace type.
|DRAMsim manual||-||The University of Maryland Memory-System Simulator. David Wang, Brinda Ganesh, and Bruce Jacob.|
|VisTool manual||-||DRAM VisTool User Manual. Vincenet Chan and Austin Lanham.|
|David Tawei Wang||-||
Modern DRAM Memory Systems: Performance Analysis and a High Performance, Power-Constrained DRAM-Scheduling
An in-depth treatment of modern, power-limited, DRAM systems. The timing parameters tFAW and tRRD, introduced at the DDR2 generation of DDRx SDRAM, have a deleterious effect on system-level performance, significantly limiting sustainable bandwidth. The work characterizes the problem and provides a scheduling algorithm that offers a solution.
The Effects of Out-of-Order Execution on the Memory System.
Studies the deleterious effects of out-of-order execution. Common wisdom holds that bigger is better: to wit, the larger the instruction window, the better the performance. Numerous studies show this result, and, consequently, numerous other studies investigate low-cost ways to implement large instruction windows. This study shows that, once one includes a realistic model of the memory system, nearly all of those projected performance gains fail to materialize.
|Rami Nasr||-||M.S. 2005. FBsim and the Fully Buffered DIMM Memory-System Architecture. (Lutron)|
Power-aware scheduling for next-generation memory systems.
Next-generation memory systems such as the Fully-Buffered DIMM operate at speeds that increase the DRAM system's power dissipation to the point that the DRAM system represents a significant portion of the overall power budget--to the point of requiring thermal dissipation measures such as heat spreaders, fans, and the like. This work investigates software and hardware-based approaches to reduce the power and heat dissipation in these systems.
|Nuengwong (Ohm) Tuaycharoen||-||
SYSim: The complete-system simulator for memory-system performance and power studies.
Building a complete-system simulator with detailed models of cache, DRAM, and disk. Investigating the "Systemic" behaviors of the entire memory hierarchy in virtual memory with extremely detailed simulation. Studying the effects of the design parameter variation to performance and power dissipation in memory systems since one parameter can change the system characteristics dramatically (> 2X).
Controller issues for networks-on-chip and SoC-based memory systems.
Investigating how to satisfy the memory requirements of masters with varying demand for bandwidth and varying degrees of latency tolerance. Specific questions: How are a real-time component's needs met when the component competes with a dozen other bus masters? How can the DRAM's behavior be used to effectively schedule the commands to improve the system throughput, while at the same time reducing latency and satisfying real-time requirements?
DRAMSIM IN USE:
|2005 HPCA||-||"Using Virtual Load/Store Queues (VLSQs) to reduce the negative effects of reordered memory instructions." Aamer Jaleel and Bruce Jacob. Proc. 11th International Symposium on High Performance Computer Architecture (HPCA 2005), pp. 191-200. San Francisco CA, February 2005.|
|2003 IEEE Micro||-||"A case for studying DRAM issues at the system level." Bruce Jacob. IEEE Micro, vol. 23, no. 4, pp. 44-56. July/August 2003.|
|2001 ISCA||-||"Concurrency, latency, or system overhead: Which has the largest impact on uniprocessor DRAM-system performance?" Vinodh Cuppu and Bruce Jacob. Proc. 28th International Symposium on Computer Architecture (ISCA 2001), pp. 62-71. Goteborg Sweden, June 2001.|
|2001 IEEE-TC||-||"High performance DRAMs in workstation environments." Vinodh Cuppu, Bruce Jacob, Brian Davis, and Trevor Mudge. IEEE Transactions on Computers, vol. 50, no. 11, pp. 1133-1153. November 2001. (TC Special Issue on High-Performance Memory Systems)|
|1999 ISCA||-||"A performance comparison of contemporary DRAM architectures." Vinodh Cuppu, Bruce Jacob, Brian Davis, and Trevor Mudge. Proc. 26th International Symposium on Computer Architecture (ISCA 1999), pp. 222-233. Atlanta GA, May 1999.|
Contact InformationThe people that brought you this simulation code can be contacted via email.
Traditional correspondence can be sent to
Prof. Bruce Jacob
Dept. of Electrical & Computer Engineering
University of Maryland
College Park, MD 20742
AcknowledgmentsThis work is supported by the National Science Foundation, Cray, and IBM.