What Is Guerrilla Capacity Planning? Download Sample pages 1 PDF ( KB); Download Sample pages 2 PDF ( KB); Download Table of contents. Neil J. Gunther,, Ph.D. Guerrilla Capacity Planning. A Tactical Approach to Planning for Highly. Scalable Applications and Services. Hit-and-Run Tactics. Enable Guerrilla. Capacity Planning. Neil J. Gunther. We so- called performance experts tend to regurgitate certain per- formance clichés to.

Guerrilla Capacity Planning Pdf

Language:English, German, Dutch
Country:Russian Federation
Genre:Science & Research
Published (Last):09.01.2016
ePub File Size:30.65 MB
PDF File Size:13.67 MB
Distribution:Free* [*Registration needed]
Uploaded by: TERESIA

Guerrilla Capacity Planning Neil J. GuntherGuerrilla Capacity Planning A Tactical Approach to Planning for Highly Sc. Request PDF on ResearchGate | Guerrilla capacity planning: A tactical approach to planning for highly scalable applications and services | In these days of. Guerrilla Capacity Planning. A Tactical Approach to Planning for Highly Scalable Applications and Services. Bearbeitet von. Neil J Gunther. 1. Auflage

It decreases the risk of waste, but it may result in the loss of possible customers either by stockout or low service levels.

Three clear advantages of this strategy are a reduced risk of overbuilding, greater productivity due to higher utilization levels, and the ability to put off large investments as long as possible. Organization that follow this strategy often provide mature, cost-sensitive products or services. Match strategy is adding capacity in small amounts in response to changing demand in the market. This is a more moderate strategy.

Adjustment strategy is adding or reducing capacity in small or large amounts due to consumer's demand, or, due to major changes to product or system architecture.

Capacity[ edit ] In the context of systems engineering , capacity planning [4] is used during system design and system performance monitoring Capacity planning is long-term decision that establishes a firm's overall level of resources. It extends over time horizon long enough to obtain resources. Capacity decisions affect the production lead time, customer responsiveness, operating cost and company ability to compete.

Inadequate capacity planning can lead to the loss of the customer and business. Excess capacity can drain the company's resources and prevent investments into more lucrative ventures.

The question of when capacity should be increased and by how much are the critical decisions. Failure to make these decisions correctly can be especially damaging to the overall performance when time delays are present in the system. In this example 4. By repeating this process for all the parts that run through a given machine, it is possible to determine the total capacity required to run production. Capacity available[ edit ] When considering new work for a piece of equipment or machinery, knowing how much capacity is available to run the work will eventually become part of the overall process.

Typically, an annual forecast is used to determine how many hours per year are required. Append a decimal point. Scan and locate the last nonzero digit prior to the decimal point. All zeros trailing that digit should be ignored. Let be the number. There are two nonzero digits: Is there a decimal point? So, we append it to produce: Now we scan from left to right and locate the last nonzero digit prior to the decimal point. That is the digit 3. Suppose 0.

There is one nonzero digit: Once again, that is the 5. We start counting from there and include any zeros. As you might expect, this manual process gets rather tedious, and if you do not use it frequently you might forget the algorithm altogether. Surprisingly, the above algorithm is not readily available in tools like calculators, spreadsheets, or mathematical codes.

To help help rectify this situation Appendix D contains the sigdigs algorithm in several languages. It should be clear from those examples how to translate sigdigs into your favorite programming dialect, e.

But, how can that be? The pharmacist may be required to measure out 11 milliliters of a liquid to make up your prescription. A milliliter means one one-thousandth of a liter, so 0. The pharmacist would use a graduated cylinder that has milliliter ml intervals marked on it. If the size of the cylinder held a total of ml of liquid, there would be major intervals marked on the side.

Similarly, ml is the same as one and one-tenth of a liter. In that case, the pharmacist might use a liter-size graduated cylinder to measure ml in two steps. First, a full liter ml is measured out. Then, the same graduated cylinder is used to measure out one tenth of a liter a deciliter , and the two volumes are added together to produce ml.

There are two things to notice about these measurements: In other words, the quantities , 11, and 0. We would like to express it correctly with only 3 sigdigs. We round up 3. But what if the number was 7. Otherwise, it has odd parity. If the number had been 7. Checking the parity compensates for the old rounding bias. In general, for a terminating string of digits: X Y Z we can express the new rounding rules in the following Algorithm.

Examine Y b. If Z is blank or a string of zeros then g. Examine the parity of X h. Drop Y and all trailing digits The old rule corresponds to steps a—c and i , whereas the new procedure introduces the parity checking steps d—h. Using the old rule, you would round down if the next digit was any of 1, 2, 3, or 4 , but you would round up if the next digit was any of 5, 6, 7, 8, or 9. This is where the bias comes from.

Many tools, e. You can check this by setting the Cell Format to General. So, before adding or subtracting measured quantities, round them to the same degree of precision as the least precise number in the group to be summed. Sum the quantities 2. The least precise values have 3 sigdigs.

Setting the sigdigs under their respective columns, and using a diamond to indicate the absence of a digit in that column, we have: Division also frequently produces more digits in the quotient than the original data possessed, if the division is continued to several decimal places. To correct this situation, the following rules are used: Equal Sigdigs.

Unequal Sigdigs. Final Rounding. After performing the multiplication or division, round the result to the same number of sigdigs as appear in the less accurate of the original factors. Converting the timebase to seconds, the product becomes 2. However, applying product rule 2 we retain 4 sigdigs in the second factor, i. Let us examine this claim more closely. From Example 3.

It is actually a guess as to what a department might charge on an hourly basis for its services. For example, try asking your accountant for an estimate of the error on your income tax returns.

The number of transactions per day is also expected to have some errors associated with it. But because it is a quantity that is measured directly by the computer system, the number of sigdigs is likely to be higher than the hourly charge estimate. However, no error margin was provided for that quantity either. You may be thinking we could have reached this conclusion immediately by simply rounding up the published result for D If the value in 3.

Moreover, in 3. This shows the variant of product rule 2 in action. Equation 3. It provides a measure of the vertical displacement between a given data point yi and the regression line. Often, the value of the standard deviation is used to serve as the error bars for the data Sect. Notice that the standard deviation is positive and has the same units as the estimated quantity itself.

In the response time calculation of Sect. An appropriate way to express the error is half that sigdig viz. Error bars associated with throughput measurements 3. The error bars in Fig.

Although not entirely realistic, it is better than having no visual cue for error margins. A measure of the precision of an instrument is given by its uncertainty. A good rule of thumb for determining instrument error IE is: The accuracy of an experimental value is expressed using its percent error Sect. Each of these methods can run into precision problems. One way around this problem is to use interval arithmetic.

The idea behind interval arithmetic is to represent a real number by two numbers, a lower and an upper bound. This helps to avoid classical rounding errors of the type discussed in Sect. The rules for calculating with intervals look like this: Symbolic computation systems, such as Mathematica mentioned in Chap.

See e. The Golden Rule in Sect. If you have absorbed the points of this chapter, you now know there are 3 sigdigs in the number You can verify that by manually applying Algorithm 3.

The NIST guidelines at physics. Now, you also appreciate why CPU utilization data is never displayed with more than 2 sigdigs. Ignoring error margins leads to the sin of precision, or possibly worse!

A rat is killed, a man is broken, but a horse splashes. Haldane 4. What Haldane could not have foreseen is that size also matters for computer systems. In this chapter we explore the fundamental concept of scaling with a view to quantifying it for the purposes of doing Guerrilla-style capacity planning. Did you ever wonder how big the giant was? It seems not to be mentioned in the original versions of the story. For that matter, did you ever wonder how big a beanstalk can be?

It not only has to support Jack and the giant on the way down, it also has to support its own weight. Compared with the largest trees known, e. See Examples 4. He recognized, over years ago, that there was a natural limit to the size of physical structures; the inherent 42 4 Scalability—A Quantitative Approach strength of materials does not permit arbitrary dimensions for real objects.

Neither giant beanstalks nor a gorilla with the physical dimensions of King Kong is possible. The simplest notion of scaling is to take every dimension, e. This is called geometric scaling Sect. Of course, the volume V grows as the cubic power of the length, width, and height.

Therefore, if you double the sides of the cube, you increase its volume by a factor of eight! As we shall see, this introduces the concept of power laws into any discussion of scalability.

We shall revisit power law scaling in Chap. Galileo and Haldane recognized that any volume not only occupies space, it has a mass since it is made of some kind of material , and that mass has weight here on earth, anyway. As the volume grows, so does the weight.

At some point, the volume will weigh so much that it will literally crush itself! That suggests that there must be some critical size that an object can have just prior to crushing itself.

It turns out that a similar notion of critical size applies to scaling up computer systems and the applications they run. In this chapter we introduce a universal scaling law Sect. This universal scalability law is based on rational functions, rather than power laws. A useful method for the subsequent discussion is known as dimensional analysis. The three engineering dimensions: Dimensions are not the same as units. A common notational convention is to write the dimensions of a quantity using square brackets [ ], e.

The surface area A can be expressed of its length dimension as: Starting with 4. When an organism changes shape in response to size changes i. The density: This follows from 4. The weight is also called the mechanical load. In other words, the strength of a rigid body can be measured by the applied pressure P , or force per unit area: Using 4.

This becomes particularly apparent if we plot both 4. Example 4. One fairy tale has a giant that is ten times as tall an a human and scaled in geometric proportion. If we take the typical human attributes to be: Clearly, such a humanoid would be crushed under their own weight. The critical size for a human involves more than just weight, e. The Guinness record is held by a man 8 ft 4 in tall 2. In the Jack and the Beanstalk story, assume the cloud base is ft fairly low , and instead of a beanstalk, consider a tree.

Taking the typical tree attributes to be: At tons, not only is a ft tree unlikely to be sustainable, a ft beanstalk is even less likely to be able to bear its own weight, let alone the giant in Example 4. Although perhaps not too far removed from the realm of fairy tales, but much more likely than giants and cloud-climbing beanstalks, is the so-called space elevator www.

This proposed alternative means of putting payloads into earth orbit will be constructed from a ribbon of carbon nanotubes having the 30 times the tensile strength of steel. Conceptually, you can make a system bigger or smaller by merely stretching it equally in every dimension isometrically.

However, it has been documented since the time of Galileo that mechanical and biological systems have inherent limitations to growth and to shrinkage. As we scale up a physical system, intrinsic overheads such as weight begin to dominate and eventually cause the system to fail beyond some critical point.

Therefore, scaling is actually allometric Fig. Remark 4. Indeed, they also scale in a monotonically increasing fashion, not unlike the strength curve in Fig. These limits do not usually lead to catastrophic system failure, but the increasing overhead can degrade system performance Fig.

Ideal linear speedup dashed compared with more realistic nonlinear speedup solid. Unlike Fig. In this chapter we lay down the fundamental concepts and theorems for a universal law of scalability, and then show how to apply them in Chap.

Our approach is universal because it applies to any hardware, including symmetric multiprocessors SMPs , chip multiprocessors CMP or multicores, and clusters. And, as we shall see in Chap. Since, at root, scalability is intimately tied up with the concept of parallel workloads, we start by reviewing the simple notion of ideal parallelism. The former term is commonly associated with a measure of parallel numerical performance, while the latter is more appropriate for commercial system workloads.

Ideal parallelism. This notion underlies the motivation for an aircraft designer to use a supercomputer—she wants the same complex calculations to be executed faster. This is not, however, the reason that commercial businesses invest in additional processing power. More commonly, a commercial enterprise needs to support more users in such a way that the additional workload does not adversely impact the response times of the current user community.

This, in turn, requires that the capacity of the computing system be increased by scaling it up in proportion to the additional load. We shall denote this scaled up capacity C p , where p is the number of processors in the system. Naive parallelism assumes that a workload that runs on a uniprocessor in time T1 can be equally partitioned and executed on p processors in one pth of the uniprocessor execution time, viz. This is tantamount to linear speedup. The remainder of the workload is said to be parallelizable.

The execution time for a Single processor execution time T1 parallel serial Smaller reduction Fig. The uniprocessor time can be split into two portions: The serial fraction takes values in the range: This is depicted by the divisions on the middle bar of Fig.

The reduced execution time to perform the complete computation with p processors is depicted by the shorter bar in the lower right of Fig. Using the above notation, we can write this time reduction as: The speedup: The smaller the denominator Tp can be made, the greater the speedup that can be achieved. Substituting 4. Hereafter, we shall refer to 4. It is noteworthy that 4. In fact, there are no equations at all in that three-page paper.

Its content is entirely empirical. Equation 4. One has to admire the skill in having an equation named after you that you never wrote down.

But Amdahl did not have it all his own way. On this score he lost badly. This speedup function has the characteristic curve shown in Fig. Similar models have been used to express speedup bounds on various types of parallel platforms, e.

According to 4. Amdahl speedup solid compared to linear scalability dashed. Theorem 4. Rewrite 4. Put another way: This conclusion is also consistent with a well-known result in queueing theory See Gunther a, Chap.

This is the basis for the canonical argument favoring mainframes. See Remark 4. Only one transaction generation process runs at a time on a single processor. In general, a processor will not be doing useful work if it has to do other things such as: The uniprocessor throughput is given by: Per user response time T1 uniprocessor multiprocessor Serial latency due to p - 1 processors Fig. The dual processor capacity is expected be twice that of the uniprocessor: We know, however, from queueing theory and from measurements on real multiprocessor systems that the dual-processor consistently completes slightly less than 2C1 transactions in time T1.

Referring to Fig.

Capacity planning

To see this in more detail, we use 4. The steps are: Extending this argument by analogy with 4. How can it be explained? Consider each case separately. We can summarize this transformation as: We summarize this transformation as: For scaleup, however, the elapsed time becomes: The form of Css p can easily be derived using Figure The total uniprocessor elapsed time, T 1 , can be trivially rewritten as the sum of two terms viz. As a consequence, the serial portion no longer dominates the speedup ratio.

Rewriting the speedup 4. Workloads of this type are referred to as data-parallel and gives rise to the term SPMD single program multiple data in contrast to SIMD single instruction multiple data. Typical sources of multiprocessor overhead include: Per user response time T1 uniprocessor Coherency delay between p p - 1 processors multiprocessor Serial latency due to p - 1 processors Fig. Multiuser scaleup showing the per-user response time growing linearly with the number of processors due to serial delays cf.

Figure 4. Following the same steps we used to derive 4. The result is: It is a concave function. Comparison with Fig. Elsewhere, 4. Note the position of the square brackets. This is true for the tightlycoupled SMP architectures from which 4. In modern loosely-coupled systems it is possible to have coherency latencies independent of contention latencies, e.

Universal scalability as given in 4. R x is a rational function if it can be expressed as the quotient of two polynomials P x and Q x: Q x That this scaling function is associated with certain fundamental aspects of queueing theory see Appendix A , has some very important implications for physically achievable computer system scalability.

The term concave function is being used here in the mathematical sense. Concave refers to the fact that the function has a bump shape , when viewed from the x-axis, and therefore has a unique maximum see Figs. Conversely, a convex function is bowl shaped , when viewed from the x-axis, and therefore has a unique minimum. For more details, the interested reader should see mathworld. If there were no interaction between the processors the capacity function would scale linearly, i.

The second term in the denominator. The third term in the denominator. It represents the penalty incurred for maintaining consistency of shared writable data see e. Consider a hotel reservation system running on an SMP platform. The reservation database application involves shared writebale data. Since the central reservation system supports more than one user terminal or Web browser and typically a single database instance handles more than one hotel location, this is a perfect application for an SMP.

Let us track what happens to a particular user process. Since there are likely other database processes also wanting to access table rows, the DBMS will 60 4 Scalability—A Quantitative Approach only grant the DB row lock to one process while all the others wait in a queue for that same DB lock.

Why not? Even though it has permission from the DBMS and has possession of the DB lock , there is also considerable likelihood that another process already wrote into the same table entry, but it did so while executing on another CPU. It does not take much imagination to realize that a similar argument also explains virtual memory thrashing see e. See Sect. The location on the p-axis of the maximum in 4. See Fig. We seek the value of p when the gradient is zero i.

The following properties of 4. Such behavior is clearly undesirable for scalability. Property ii states that universal scalability in 4. Property iii states that a maximum can exist, even in the absence of contention. As the curves in Fig. CHpL 35 1-param 2-param 3-param 2-param' 30 25 20 15 10 5 50 p Fig. The third parameter is associated with a cubic term in the denominator of Eqn. The details of how this can be accomplished are presented in Chap.

Since it refers to the remaining fraction of available processor capacity after various computational overheads have been subtracted out, its value lies in the range 0 4. This means that the scalability curve is an inverted parabola Fig. Sg HpL 50 40 30 20 10 50 p Fig. From this we can infer that it represents a single shared-bus model. Universal law 15 10 5 25 50 75 p Fig. I was told that performance engineers had been using 4.

In Fig. Table 4. Conjecture 4. This makes them simpler to apply, but as we shall discover in Chap. Moreover, there is nothing in 4. In particular, it includes no terms representing the kind of interconnect technology between processor nodes e. Since the universal model has no intrinsic structure, it can be used to model more general computer systems such as multicores and clusters. The partitioning of p processors among n nodes in Fig.

In order to apply 4. The other important assumption is that the nodes and the processors are homogeneous. This assumption is most likely to hold for a multicore system see Chap. Such undesirable parameter values have been chosen merely to demonstrate the ability of 4. We shall make further use of these ideas in Chap. The key idea is that scaling cannot be unbounded. In a physical system, is bounded by a critical point where the weight exceeds the physical capability of the system to support itself.

We showed that a similar idea exists for computer systems. All of this is expressed in the universal scalability model given by 4. We conjecture that any scalability model based on rational functions requires no more than two parameters. The model is universal in that it is not restricted to a particular architecture or workload—there are no terms in 4.

Conversely, in the case where performance is considered inferior the critical capacity maximum sets in too early , it cannot be used in reverse to resolve which subsystem needs to be tuned to improve it. After all, Gene Amdahl based his conclusions on a tedious analysis of the serial delay in individual test workloads. For a two-parameter model, this looks like a daunting and unpalatable task. But ye of little faith, never fear; mathematical statistics to the rescue!

Scalability, especially application scalability, is a perennial hot topic. See for example the following Web links discussing the scalability of: For the purposes of this chapter, the main points established in Chap.

In other words, each additional processor is assumed to be capable of executing another N processes. Remark 5. All of the examples are presented using Excel Levine et al. More details can be found in Atkison et al. The images consist of bit RGB pixel values 8 bits for each color channel. The inclusion of these models assures that there is a strong correlation between performance on the benchmark and performance in actual use. The benchmark reports the number of ray-geometry intersections performed per second during the ray-tracing phase of the rendering.

Time spent reading the model into memory and performing setup operations is not included in the performance evaluation. Reference images used in the ray-tracing benchmark Table 5. It is generally more illuminating to plot these data as shown in Fig. The homogeneity of the ray-tracing benchmark workload makes it a very useful candidate for analysis using our universal capacity model 5. Table 5.

Ray tracing benchmark results on a NUMA architecture Processors p 1 4 8 12 16 20 24 28 32 48 64 Throughput X p 20 78 74 5 Evaluating Scalability Parameters 5. The Origin is a nonuniform memory architecture NUMA architecture in which the memory is partitioned across each processor. These physically separate memory partitions can be addressed as one logically shared address space.

This means that any processor can make a memory reference to any memory location. Access time depends on the location of a data word in memory Hennessy and Patterson The performance did not increase uniformly with the number of processors. All of the benchmark codes showed considerable variability in each experimental run. Scatter plot of the benchmark throughput data in Table 5. Up to six processors can be interconnected without a router module to the crossbar switch.

Up to eight CPUs can be placed in a single backplane. The great variability of performance in higher numbers of CPUs can be explained by the distribution of the computation across the available nodes of the system.

As the following steps are carried out, they can be incorporated into an Excel spreadsheet like that shown in Fig. Calculate the deviation from linearity Sect.

This is because 5. How many data points do we actually need to estimate the parameters in our universal scalability function? There are two ways to look at this question.

The simplest polynomial is a straight line: We need at least two data points to unambiguously determine the slope and the y-intercept. In Table 5. As we shall see in Sect. From the standpoint of unambiguously determining the interpolating polynomial, we should only need three data points.

However, because the measured throughputs involved systematic and statistical errors see Chap. For this, we need regression analysis. In general, 5. On the other hand, you do not necessarily require a data set as complete that in Table 5. In any event, it is advisable to have at least four data points to be statistically meaningful.

Guerrilla Capacity Planning

Hence, four data points should be considered to be the minimal set. Referring to the benchmark data in Table 5. Therefore, that capacity ratio 5. X 1 20 All intermediate values of C p can be calculated in the same way.

This result already informs us that the fully loaded way platform produces less than one quarter of the throughput expected on the basis of purely linear scaling.

The results are summarized in Table 5. We are now in a position to prepare the benchmark data in Excel for nonlinear regression analysis Levine et al. As shall see in Sect. We can, however, perform regression on a transformed version of 5.

The appropriate transformation is described in the following sections. Overall, 5. The shape of this function is associated with a parabolic curve see, e. Equation 5. This can most easily be accomplished by reorganizing the inverted equation 5. Notice that 5. This change of variables, 5. The transformed variables can be interpreted physically as providing a measure of the deviation from ideal linear scaling discussed in Sect.

The parabolic shape of the regression curve 5. It is not so apparent in Fig.

Table of contents

This is explained further in Sect. Theorem 5. In order for the scalability function 5. See the discussion in Sect.

We can also use 5. In particular, the requirements: They are requirements in the sense that they cannot be guaranteed by the regression analysis.

They must be checked in each case. Schematic representation of the allowed regression curves 5. Corollary 5. It follows from Theorem 5. Figure 5. It shows why the above requirements must hold. The upper curve is a parabola which meets requirement i and corresponds to the most general case of universal scalability. Another way to understand the above requirements is to read Fig. As the clock-hand moves upward from the X-axis, it starts to sweep out an area underneath it.

As the b clock-hand reaches the middle inclined straight line, imagine another clock-hand belonging to a starting in that position. At that position, the area swept out by the b clock-hand is greater than the area sweep out by the a clock-hand, since the latter has only started moving.

Hence, requirement ii is met. As we continue in an anticlockwise direction, the two clock-hands move together but start to bend in Daliesque fashion to produce the upper curve in Fig. Deviations from linearity based on the data in Table 5.

Once you have made the scatter plot, go to the Chart menu item in Excel and choose Add Trendline. This choice will present you with a dialog box Fig. Usually, there is a certain subjective choice in this step, but not in our case.

Therefore, according to Sect. This choice corresponds to the quadratic equation 5. The corresponding dialog box for this choice is shown in Fig. Select each of the checkboxes as follows: Checking the third box causes the corresponding numerical value of R2 to be displayed in the chart as well. Values in the range 0. Options dialog box for the Excel Trendline curve dashed curve as well as the calculated quadratic equation and the R2 value.

This ends the regression analysis, but we still need to generate the complete theoretical scaling curve using 5. The values computed by Excel in Table 5. You are advised to read Appendix B carefully for more details about this issue. This raises a dilemma. It is important from a GCaP standpoint that you learn to perform quantitative scalability analysis as presented in this chapter, and Excel provides a 5.

However, because of its potential precision limitations, known to Microsoft support. Several remarks can be made. This suggests that there are likely to be few cache misses, very little memory paging, and so on.

The maximum in the scalability curve is predicted by 5. Clearly, this is a purely theoretical value, since it is almost seven times greater than the maximum number of physical processors that can be accommodated on the system backplane. Theoretical scalability dashed line predicted by 5. As expected, very few data points actually lie on the curve. We did not have repeated measurements in this case.

Error reporting for the scalability data in Fig. How well does this regression method work when there is less data? We consider the following typical cases: See Chap. Note that the corresponding scatter plot appears in Fig. The projected scalability in Fig. Referring to the platform architecture described in Sect. The temporary adverse impact on scalability observed in the data could be a result of coherency delays to memory references across these multiple buses.

Linear deviation for the benchmark data in Fig. This is another reason that error reporting Sect. Again, these data are expected to correspond to a low-contention region in the scalability curve. Predicted scalability for the limited benchmark data in Table 5.

Clearly, the predicted scalability depends quite sensitively on what is known and what is unknown. This is exactly how it should be Table 5.

Guerrilla Capacity Planning

The fourth data point in Table 5. Unless the performance measurements are planned with the universal scalability model in mind, it is not uncommon for X 1 to be absent, thus 92 5 Evaluating Scalability Parameters KRays per Second 50 0 0 5 10 15 20 25 30 Processors p Fig. Predicted scalability for the limited benchmark data in Fig. Is there any way around this problem? There are two answers to this question.

Second, you can do a form of regression analysis on the raw data without normalizing it, but this may involve more trial and error than described in this chapter. Worse yet, 5. How is that possible? Moreover, 5. It is in this sense that 5. As validation, in Sect. The procedural steps for applying regression analysis in Excel to estimate the scalability parameters was described in Sect.

Rather than requiring 5. We now move on to examine how these quantitative scalability concepts can be applied to software applications. Section A: For our purposes, the actual code and what is does is unimportant. What matters is that section A comprises 98 6 Software Scalability lines not all of which are displayed here , while section B consists of just 20 lines.

Suppose the objective is to improve the run-time performance of the entire program. The obvious choice is to look for opportunities in the largest body of code.

What performance improvement can be expected under these circumstances? On the other hand, if we were able to reduce the execution time of section B by just a factor of 10, then the overall run-time performance improvement would be: This win follows from the fact that section B is executed nine times more often than section A.

For many readers, this will be the most likely application of the universal scalability law. Some readers might recognize 6. To understand why Theorem 6.

Corollary 6. From 6. Theorem 6. See Appendix A and Gunther a, b, b for details. We already know that the universal scalability law of Chap. Otherwise, their interpretation remains the same as in Chaps.

Measure the throughput X N for a set of user loads N. Calculate the deviation from linearity. When following this procedure, it is very important to keep the following assumptions in mind: Hardware Measurements: Software Measurements: In this case, we measure C N as a function of the user load N ; the latter being the independent variable. Finally, you may be wondering if there is also a queueing model representation of 6. Indeed, there is. It is the repairman model Appendix A , but with a load-dependent server.

Associated presentations are available online: The main thrust of his argument can be summarized as follows. Ultimatley, Sutter concludes that concurrency-oriented programming will supercede the current paradigm of object-oriented OO programming.

Sutter does not explain that, but part of the reason has to do with the thermal barrier that accompanies high clock frequencies in CMOS. The power dissipation occurs from charging and discharging of nodal capacitances found on the output of every logic gate in the circuit.

Each logic transition in the CMOS circuit incurs a voltage change, which draws energy from the power supply. What is worse than 6. As microprocessor geometries continue to shrink, A becomes smaller and 6.

Combine that with increasing clock frequency in 6. To ameliorate this problem, microprocessor manufacturers have decided to place more, lower speed, and therefore lower power, processor cores on the same die, viz. Similar issues are well-known in the context of concurrent programming on symmetric multiprocessors SMPs. Instead of having multiple processors in a single box, we are beginning to see multiple processors on a single silicon die.

Experience with SMP platforms, both historically e. The only way these values can be known is by system measurement, as we show in the following sections. The analytic method is the same as that described in Chap.

These multiuser activities are emulated by concurrently running multiple copies of scripts containing the shell commands. The relevant performance metric is the throughput measured in scripts per hour. Remark 6. A very important distinguishing feature of the benchmark is that it does not rely on a single metric as does CPU, www. Rather, a graph showing the complete throughput characteristic must be reported. You can download the full report from www.

Table 6. The peak throughput is It occurs at 72 generators or virtual users. Beyond the peak, the throughput becomes retrograde. In an ironic twist of fate, the measurements in Table 6. Contrast this with the regression analysis in Table 6.

Using 6. The Amdahl bound represents a worst-case scenario where all the users issue their requests simultaneously see Appendix A. Consequently, all N requests get piled up at the CPU. Moreover, all the users have to wait until all the requests have been serviced and returned to their respective owners before any further 6. This is an all-or-nothing situation; all users who have requests are either waiting in the run-queue or thinking. Both these phases represent relatively low throughput.

As I suggest to students in my classes www. The reader should keep in mind that our purpose here is to understand how to apply the universal scalability law to software applications, and not to determine which platform or application combination has the best performance. To demonstrate the point, we have deliberately chosen to analyze older versions of both Microsoft Windows and the Microsoft SQL Server relational database management system.

Although SQL Server 6. SQL Server 7. Microsoft has refreshed the product from the ground up. Numerous architectural enhancements have boosted the scalability of SQL Server 7. Benchmark Factory simulates real database application workloads by generating loads using industry standard benchmarks, e. Baseline measurements were taken on both the Windows NT 4. Even the 8-way throughput appears to have saturated at the same VUser load.

As we shall see in the next section, appearances can be deceiving. Our analysis methods, however, are valid for any system speeds. Universal scalability models dashed lines for the data in Fig. Now, we can perceive a deeper explanation of the throughput measurements. Application scalability is a perennial issue Williams and Smith , and the usual tool for assessing it is a load-test environment that spans a distributed platform involving: PC front-end drivers.

Each tier is connected via a network, such as Base-T switched Ethernet. Testing up through thousands of potential users is expensive both in terms of licensing fees and the amount of hardware needed to support such intense workloads, and very time consuming. The question naturally arises, Can the universal scaling model be applied in the sense of providing the virtual loadtest environment described in Chap.

Multitier performance testing environment showing the front-end drivers typically PCs that run the load-test scripts, multiple Web servers running HTTP daemons, application servers e. These results demonstrate clearly how the universal scaling model can be applied to more complex multitier systems and applications.

The load-driver client sends an Web request e. A script e. The business object uses a database connection e. The database server responds with the result of the query. The business object processes the result and sends the response back to the Web server tier.That is the digit 3. To correct this situation, the following rules are used: By the way, if your company does produce a product with a response time like that in Fig.

We see immediately in the fourth column of Table 1. Using 6. Predicted scalability for the limited benchmark data in Table 5.

WENDY from Hollywood
I am fond of sharing PDF docs kiddingly. Look through my other articles. I have a variety of hobbies, like skeleton.