2007
|
(book)
|
-
Memory Systems: Cache, DRAM, Disk.
Bruce Jacob, Spencer W. Ng, and David T. Wang, with contributions by Samuel Rodriguez.
ISBN 978-0-12-379751-3.
Morgan Kaufmann Publishers, Fall 2007.
|
|
|
This represents the culmination of much of our work.
Currently sitting at about 1000 pages, densely set (~500 words per page), it is roughly
half a million total words.
|
|
HPCA
|
|
|
|
|
2006
|
ISLPED
|
|
|
|
This is the most accurate study of SRAM energy & power yet; Sam's software, vCACTI, is a
major overhaul of existing CACTI-based programs.
Read his thesis.
|
|
HPCA
|
|
|
|
Few besides Aamer are really looking at what sort of sharing patterns exist in the last-level cache.
|
|
IEEE-TC
|
|
|
|
One of the problems with software-managed TLBs is that they can clog up a high-performance,
deeply pipelined, highly out-of-order architecture. TLB misses happen regularly, and to service
them in software, you have to flush the pipe (oops!) ... one of the resons for the popularity of
hardware-managed TLBs these days. This is a neat trick that gives you the best of both worlds.
|
2005
|
HPCA
|
|
|
|
The funny thing about out-of-order execution is that it is great for general instructions, but it
is sucky for memory instructions. Norm Jouppi was one of the first to notice this, and
Aamer was his intern at the time ... Norm handed the idea off to Aamer, and the study was born.
Took us several years to get people to believe the results.
|
|
ISPASS
|
|
|
|
A really cool set of benchmarks that pounds the memory system.
|
|
SIGARCH
|
-
"DRAMsim: A memory-system simulator."
David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Katie Baynes, Aamer Jaleel, and Bruce Jacob.
SIGARCH Computer Architecture News, vol. 33, no. 4, pp. 100-107. September 2005.
|
|
|
Simply put, the most accurate DRAM system simulator in the world. At least, that we know of.
|
2003
|
IEEE Micro
|
|
|
|
DRAM systems have become so complex that they resemble highly out-of-order processors.
Lots of concurrency, lots of queueing and scheduling, deep pipelines ... it has gotten to
the point that you realy have to take an architectural approach, not just a circuits approach.
|
2001
|
ISCA
|
|
|
|
The first study to show how complicated the memory-system design space is: it is extremely
non-linear (very noisy, not well behaved -- good solutions lie right next to bad solutions),
and the cost of poor analysis is huge -- the variance can be 3x or more from worst case to best
case, even within a group of "high-performance" configurations.
|
|
CASES
|
|
|
|
Explores an interesting design space wherein bandwidth is paid to reduce storage
necessary to get good performance in DSP applications. Among other things, shows
that you can get excellent performance out of a *very* small number of cache blocks.
|
|
IEEE-TC
|
|
|
|
This is our extended/journal version of the 1999 ISCA study. Among other things,
this adds DDR measurements, cache effects (number of MSHRs), etc.
|
|
IEEE-TC
|
|
|
|
This is our extended/journal version of the 1997 HPCA study. The original title when
we submitted this to IEEE was
"Software-managed address translation and software-managed caches," because we give
a detailed description of how to build a software-managed cache. However, one reviewer
was obstinate about not allowing us to use the term "software-managed cache" in the title,
and also refused to accept the paper until after 2000 (we submitted the article in 1998,
a full three years before it finally showed up in print). Interesting.
|
1999
|
ISCA
|
|
|
|
|
|
CASES
|
|
|
|
The paper presents more thought on the idea of software-managed caches, first
mentioned in the 1998 ASPLOS paper, below, and also discussed in the 1998 CASES
paper. In particular, this paper gives (and is the first to give)
an architecture for a fully associative software-managed cache design.
|
|
ESC
|
|
|
|
An extended abstract that goes into a little bit more detail on software-managed
caches than the 1998 CASES paper below.
The slides for the talk are available on-line in PDF format
and include many details not found in the paper.
|
1998
|
ASPLOS
|
|
|
|
This paper coins the term "software-managed cache" ... first-ever appearance of
the now-common term.
The paper is the first to show the low-level costs of virtual memory, for example
the interrupt costs and the different costs associated with page-table organizations.
|
|
IEEE Computer
|
|
|
|
I've been told that this paper, coupled with the IEEE Micro article
below, is considered by many to be the definitive reference on virtual memory, and
the two papers are what many graduate students use to study for their quals on the
topic.
|
|
IEEE Micro
|
|
|
|
I've been told that this paper, coupled with the IEEE Computer article
above, is considered by many to be the definitive reference on virtual memory, and
the two papers are what many graduate students use to study for their quals on the
topic.
|
|
CASES
|
|
|
|
An extended abstract that introduces the idea of software-managed caches.
The slides for the talk are available on-line in PDF format
and include many details not found in the paper.
|
1997
|
HPCA
|
-
"Software-managed address translation."
Bruce Jacob and Trevor Mudge.
Proc. Third International Symposium on High Performance Computer Architecture (HPCA
1997), pp. 156-167,
San Antonio TX, February 1997.
|
|
|
Starts with the "in-cache address translation" concept and takes it a step further:
i.e., what if we dispense with page-table-walking hardware altogether? (SPUR dispensed
with the TLB and used a hardware table walker to probe the cache) Among other
things, it simplifies the hardware design (potentially making it simpler to verify,
etc.)
and gives more flexibility to the operating system (potentially enabling real-time
guarantees, etc.).
|
1996
|
IEEE-TC
|
|
|
|
Yet another analytical cache-modeling paper. The difference: this one is correct.
:)
|