S. Kounev, K.-D. Lange, J. von Kistowski

Systems Benchmarking

For Scientists and Engineers

  • Provides theoretical and practical foundations as well as an in-depth look at modern benchmarks and benchmark development.
  • May serve both as a textbook and a handbook on benchmarking of systems and components used as building blocks of modern information and communication technologies applications.
  • Includes concrete applications and case studies based on input from consortia such as the Standard Performance Evaluation Corporation (SPEC) and the Transaction Processing Performance Council (TPC).
"This book should be required reading for anyone interested in making good benchmarks."

– from the Foreword by David Patterson, 2017 ACM A.M. Turing Award Laureate.

DOI: 10.1007/978-3-030-41705-5

Hardcover ISBN: 978-3-030-41704-8

eBook ISBN: 978-3-030-41705-5

Buy This Book

This book serves as both a textbook and handbook on the benchmarking of systems and components used as building blocks of modern information and communication technology applications. It provides theoretical and practical foundations as well as an in-depth exploration of modern benchmarks and benchmark development.

The book is divided into two parts: foundations and applications. The first part introduces the foundations of benchmarking as a discipline, covering the three fundamental elements of each benchmarking approach: metrics, workloads, and measurement methodology. The second part focuses on different application areas, presenting contributions in specific fields of benchmark development. These contributions address the unique challenges that arise in the conception and development of benchmarks for specific systems or subsystems, and demonstrate how the foundations and concepts in the first part of the book are being used in existing benchmarks. Further, the book presents a number of concrete applications and case studies based on input from leading benchmark developers from consortia such as the Standard Performance Evaluation Corporation (SPEC) and the Transaction Processing Performance Council (TPC).

Providing both practical and theoretical foundations, as well as a detailed discussion of modern benchmarks and their development, the book is intended as a handbook for professionals and researchers working in areas related to benchmarking. It offers an up-to-date point of reference for existing work as well as latest results, research challenges, and future research directions. It also can be used as a textbook for graduate and postgraduate students studying any of the many subjects related to benchmarking. While readers are assumed to be familiar with the principles and practices of computer science, as well as software and systems engineering, no specific expertise in any subfield of these disciplines is required.

Samuel Kounev is a Professor of Computer Science and Chair of Software Engineering at the University of Würzburg (Germany). He has been actively involved in the Standard Performance Evaluation Corporation (SPEC), the largest standardization consortium in the area of computer systems benchmarking, since 2002. He serves as the elected chair of the SPEC Research Group, which he initiated in 2010 with the goal of providing a platform for collaborative research efforts between academia and industry in the area of quantitative system evaluation. Samuel is also co-founder and Steering Committee Co-chair of several conferences in the field, including the ACM/SPEC International Conference on Performance Engineering (ICPE) and the IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). He has published extensively in the area of systems benchmarking, modeling, and evaluation of performance, energy efficiency, reliability, and security.

Klaus-Dieter Lange is a Distinguished Technologist at Hewlett Packard Enterprise (HPE), where he started his professional career in 1997, with a focus on performance and workload characterization, industry-standard benchmark development, server efficiency, and the design of secure enterprise solutions. He has been active in several of the SPEC Steering and Sub-committees since 2005, serves on the SPEC Board of Directors, and has been on the ICPE Steering Committee since its inception. Klaus is the founding chair of the SPECpower Committee, which developed and maintains under his technical leadership among others the SPECpower_ssj2008 benchmark, the SPEC PTDaemon, and the Server Efficiency Rating Tool (SERT) suite.

Jóakim von Kistowski is a software architect at DATEV eG, where he focuses on software performance and load testing, driving benchmarking method adoption, software architecture, software testing and evaluation. Jóakim has a strong SPEC background, actively contributing to the SPECpower Committee and serving as elected chair of the SPEC RG Power Working Group.

In January of 2010, I met Sam and Klaus at the inaugural International Conference on Performance Engineering (ICPE), in San Jose, USA. I gave the keynote address "Software Knows Best: Portable Parallelism Requires Standardized Measurements of Transparent Hardware" to an audience where half was from the industry and half from academia. That was by design, since in their roles as co-founders and steering committee members of ICPE, they drove to establish this forum for sharing ideas and experiences between industry and academia. Thus, I was not surprised to see that their book "Systems Benchmarking—For Scientists and Engineers" has the same underlying tone: to foster the integration of theory and practice in the field of systems benchmarking. Their work is twofold: Part I can be used as a textbook for graduate students as it introduces the foundations of benchmarking. It covers:

  • the fundamentals of benchmarking,
  • a refresher of probability and statistics,
  • benchmarking metrics,
  • statistical measurements,
  • experimental design,
  • measurement techniques,
  • operational analysis and basich queueing models,
  • workloads, and
  • benchmark standardization.

Part II features a number of concrete applications and case studies based on input from leading benchmark developers from consortia such as the Standard Performance Evaluation Corporation (SPEC) or the Transaction Processing Performance Council (TPC). It describes a broad range of state-of-the-art benchmarks, their development, and their effective use in engineering and research. In addition to covering classical performance benchmarks—including CPU, energy efficiency, virtualization, and storage benchmarks—the book looks at benchmarks and measurement methodologies for evaluating elasticity, performance isolation, and security aspects. Moreover, some further topics related to benchmarking are covered in detail, such as resource demand estimation.

The authors also ventured to share some insightful retrospectives in regard to benchmark development in industry-standard bodies, as they have been active in SPEC for many years. The information about the formation and growth of SPEC and TPC over the last 30 years is valuable when starting new leading initiatives like Embench or MLPerf.

One of my observations is that benchmarks shape a field, for better or for worse. Good benchmarks are in alignment with real applications, but bad benchmarks are not, forcing engineers to choose between making changes that help end users or making changes that only help with marketing.

This book should be required reading for anyone interested in making good benchmarks.

Berkeley, CA, USA
January 2020

David Patterson
2017 ACM A.M. Turing Award Laureate

I am delighted to write a foreword for this thorough, comprehensive book on theory and practice of benchmarking. I will keep it short, so people can quickly start on the substantial text itself.

Creating good benchmarks is harder than most imagine. Many have been found to have subtle flaws or have become obsolete. In addition, benchmark audiences differ in their goals and needs. Computer system designers use benchmarks to compare potential design choices, so they need benchmarks small enough to simulate before creating hardware. Software engineers need larger examples to help design software and tune its performance. Vendors want realistic benchmarks that deter gimmicks by competitors. They dislike wasting time on those they know to be unrepresentative. Buyers might like to run their own complete workloads, but that is often impractical. They certainly want widely reported, realistic benchmarks they trust that correlate with their own workloads. Researchers like good, relevant examples they can analyze and use in textbooks.

In the 1980s, benchmarks were still often confusing and chaotic, driven by poor examples and much hype. Vendors boasted of poorly defined MIPS, MFLOPS, or transactions, and universities often studied tiny benchmarks. Luckily, the last few decades have seen huge progress, some contributed by the authors themselves. From personal experience, the close interaction of academia and industry has long been very fruitful. The three authors have extensive experience combining academic research, industrial practice, and the nontrivial methods to create good industry-standard benchmarks on which competitors can agree.

I am especially impressed by the pervasive balance of treatments in this book. It aims to serve as both a handbook for practitioners and a textbook for students. It certainly is the former and if I were still teaching college, I would use it as a text.

It starts with the basics of benchmarks and their taxonomies, then covers the theoretical foundations of benchmarking: statistics, measurements, experimental design, and queueing theory. That is very important, from experience giving guest lectures, where I have often found that many computer science students had not studied the relevant statistical methods, even at very good schools. The theory is properly complemented with numerous case studies.

The book explores the current state of the art in benchmark developments, but as important, provides crucial context by examining decades of benchmark evolution, failures and successes. It recounts histories of changes from scattered benchmarks to the more disciplined efforts of industry–academic consortia, such as the Transaction Processing Performance Council (TPC) and especially the Standard Performance Evaluation Corporation (SPEC), both started in late 1988. Much was learned not just about benchmarking technology and good reporting, but in effective ways to organize such groups. Both organizations are still quite active, three decades later, an eternity in computing. Chapter 10’s history of the SPEC CPU benchmarks’ evolution is especially instructive.

From history and long-established benchmarks, the book then moves to modern topics—energy efficiency, virtualization, storage, web, cloud elasticity, performance isolation in complex data centers, resource demand estimation, and research in software and system security. Some of these topics were barely imaginable for benchmarking when we started SPEC in 1988 just to create reasonable CPU benchmarks!

This is a fine book by experts. It offers many good lessons and is well worth the time to study.

Portola Valley, CA, USA
January 2020

John R. Mashey
SPEC Co-Founder and
Former Silicon Valley Graphics VP/Chief Scientist

Teaching materials (lecture slides, exercises, code examples in R) are available on request.
Please contact Samuel Kounev (samuel.kounev∂uni-wuerzburg.de) if you are interested.

The supplementary materials will be further extended and refined - if you are interested in being informed when updated materials are available, you can sign up for our mailing list.

@book{KoLaKi-2020-SystemsBenchmarking,
	author    = {Samuel Kounev and Klaus-Dieter Lange and Jóakim von Kistowski},
	title     = {{Systems Benchmarking}},
	subtitle  = {{For Scientists and Engineers}},
	publisher = {Springer International Publishing},
	year      = {2020},
	edition   = {1},
	isbn      = {978-3-030-41704-8},
	doi       = {10.1007/978-3-030-41705-5},
}