Cockcroft Headroom Plot - Part 3 - Histogram Fixes

August, 2008
by Adrian Cockcroft

About the Author
Adrian Cockcroft

Adrian is best known as the author of four books including Sun Performance and Tuning (2 editions); Resource Management; and Capacity Planning for Internet Services. In his 16 years at Sun he worked in technical sales and marketing, led creation of the BluePrints best practice publishing program, tested very complex integrated systems, was a leader of Sun's Six Sigma program and was the Chief Architect and Product Boss for Sun's High Performance Technical Computing business unit. In this time he gave many training classes and consulted with a wide range of customers, most notably as the on-site capacity planning consultant for the Salt Lake 2002 and Athens 2004 Olympic Games.

Joining eBay in 2004, he initially worked for Operations Architecture, investigating new platforms and providing guidance to the capacity planning groups at eBay and PayPal. As a founding member of eBay Research Labs in 2005, Adrian helped define the initial strategy for the Labs and an Innovation Forum. He researched operations related platforms and processes, lead research into advanced Skype plugin applications, contributed to development of the Skype4Java API and prototyped advanced wireless/mobile applications. During 2006 he published an IEEE paper on simulating large scale peer to peer networks, and a CMG paper on utilization measurement problems.

Adrian has consulted on architecture, scalability and performance for the Bebo.com social network, and is an advisory board member for Infovell and Holocosmos.

In 2007 Adrian joined Netflix as a Director of Web Engineering, directing a team responsible for research and development of scalable personalized web architectures.

Adrian filed two patents on capacity planning techniques while at Sun, and four patents related to peer to peer marketplaces while at eBay.

Adrian has a blog at http://perfcap.blogspot.com where he discusses capacity planning techniques, new computer technology, and how markets and innovation

interact. He is also a member of the Homebrew Mobile Phone Club, and several local classic car clubs.

There were some scaling issues with the histograms that needed fixing. Ultimately this made the code look a lot more complex, but it now deals with scaling the plot and the histogram with a fixed zero origin on both axes. It’s important to maintain the zero origin for a throughput vs. response time plot.

The tricky part is that the main plot is automatically oversized from its data range by a few percent and the units used in the histogram are completely different. A histogram with 6 bars is scaled to have the bars at unit intervals and is 6 wide plus the width of the bars etc. After lots of trial and error, the main plot now uses the maximum bucket size of the histogram as its max value and artificially offsets the histograms by what looks like about the right amount. The plot below uses fixed data as a test. Observe that the first bar includes two points-- that’s due to the particular algorithm used by R. Some alternative histogram algorithms are available, but this one seems to be most appropriate to throughput/response time data.

> chp(5:10,5:10)

Cockcroft Headroom Plot

The updated code follows.

chp <- function(x,y,xl="Throughput",yl="Response",tl="Throughput Over Time", ml="Cockcroft Headroom Plot") { xhist <- hist(x,plot=FALSE) yhist <- hist(y, plot=FALSE) xbf <- xhist$breaks[1] # first ybf <- yhist$breaks[1] # first xbl <- xhist$breaks[length(xhist$breaks)] # last ybl <- yhist$breaks[length(yhist$breaks)] # last xcl <- length(xhist$counts) # count length ycl <- length(yhist$counts) # count length xrange <- c(0,xbl) yrange <- c(0,ybl) nf <- layout(matrix(c(2,4,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE) layout.show(nf) par(mar=c(5,4,0,0)) plot(x, y, xlim=xrange, ylim=yrange, xlab=xl, ylab=yl) par(mar=c(0,4,3,0)) barplot(xhist$counts, axes=FALSE, xlim=c(xcl*0.03-xbf/((xbl-xbf)/(xcl-0.5)),xcl*0.97), ylim=c(0, max(xhist$counts)), space=0, main=ml) par(mar=c(5,0,0,1)) barplot(yhist$counts, axes=FALSE, xlim=c(0,max(yhist$counts)), ylim=c(ycl*0.03-ybf/((ybl-ybf)/(ycl-0.5)),ycl*0.97), space=0, horiz=TRUE) par(mar=c(2.5,1.7,3,1)) plot(x, main=tl, cex.axis=0.8, cex.main=0.8, type="S") }

[Parts 1 and 2 appeared in the May and July issues respectively.]