добавить свой файл
  1 ... 2 3 4 5 6 ... 8 9

Short Answers 10 points each

Nested analysis

Series analysis

First variable


Second variable


Used invalid values








Consider a program with two loops, controlled by index variables. The first variable increments (by 1 each iteration) from -3 to 20. The second variable increments (by 2 each iteration) from 10 to 20. The program can exit from either loop normally at any value of the loop index. (Ignore the possibility of invalid values of the loop index.)

    • If these were the only control structures in the program, how many paths are there through the program?

      • If the loops are nested

      • If the loops are in series, one after the other

    • If you could control the values of the index variables, what test cases would you run if you were using a domain testing approach?

    • Please explain your answers with enough detail that I can understand how you arrived at the numbers.


  • The first variable has 24 possible values (-3, -2, -1, 0, 1, ..., 20)

  • Second variable has 6 possible values (10, 12, 14, 16, 18, 20)

a) Analysis if loops are nested.

Suppose loop 1 was 1, 2, 3, Suppose loop 2 was 4,5

Loop 1 with loop 2 inside it =

2 one through loop (1,4) (1,5)

4 twice through loop (1,4,2,4), (1,4,2,4,5), (1,4,5,2,4), (1,4,5,2,4,5)

8 three through loop (1,4,2,4,3,4), (1,4,2,4,3,45), (1,4,2,45,3,4), (1,4,2,45,2,45), (1,45,2,4,3,4), (1,45,2,4,3,45), (1,45,2,45,3,4), (1,45,2,45,3,45)

Illustrates the general rule:

If N1 = number of values of the outer loop and N2 = number of values of the inner loop,

Number of paths = sum (i=1 to N1) N2^i

Example 2 + 2*2 + 2*2*2 for our sample loop

The sum is therefore (sum)(i=1-to-24) 6^i


















































b) Analysis if the loops are in series

  • Total = 24 x 6 = 144 paths

  • First variable -3, 0, 20 (Ignore the possibility of invalid values of the loop index)

  • Second variable 10, 20

[Note to the reader: Students who handled the nested analysis well handled everything else well. In contrast, some students who correctly counted the variables’ values, figured out the series, and seemed to handle the material reasonably comfortably, blew the nested analysis.

  • If the nested analysis is done well, it probably deserves more than 4 points out of 10 -- but the student doesn’t need the points.

  • If the nested analysis is not done well, the 4 point allowance serves as a cap on the damage this part of the question can do to the grade for the full question.]

Not tested




Benefit N

1 each

Risk N

1 each

Benefit M

1 each

Risk M

1 each

Distinguish between using code coverage to highlight what has not been tested from using code coverage to measure what has been tested. Describe some benefits and some risks of each type of use. (In total, across the two uses, describe three benefits and three risks.)

Grading notes:

Emphasis on what has not been tested: you are looking for such things as blind spots in the testing, or reality check on the process or on the projected ship date. There is no necessary claim that this is a valid progress measure. You are merely identifying a set of tasks that have not been done.

Emphasis on measurement: Assumption is that coverage is a valid measure of testing process. You are looking for status check or productivity of staff member or group or nearness to completion.

Benefits and risks of highlighting the negative:


      • reveal problems with the testing process

      • reveal weaknesses or blind spots of the testing strategy

      • reveal the overall utility of a collection of testing artifacts (no point maintaining a large test suite that achieves only 2% coverage)

      • reveal impossibility of a ship date


      • blaming tone

      • might persuade managers to rely on this, in a way that encourages them to use coverage as a measurement of progress later.

Benefits and risks of measurement:


  • for exam purposes, I will accept the notion that we can check nearness to completion of testing with this measure

  • can note progress against a plan

  • can report results in a way managers are used to hearing

  • one factor in a ship decision


  • encourages people to do things that are counted rather than things that are more likely to reveal problems

  • discourages people from tests (e.g. configuration tests) that are not counted

  • gives mgmt the false perception of progress because it omits key tests that are not counted.

  • Encourages premature release of the product

  • Discriminates against testers who do tests that are “redundant” under this measure

[References: Marick’s writings on coverage, such as Classic Testing Mistakes and How to Misuse Code Coverage.]

7 Month Field

Boundary chart *

Realize Inter-depence

28 day month

29 day leap year

30 day

31 day

Month Range

Year Range

Invalid pairings

Invalid max min

Invalid chars

(shotgun penalty -- they should not appear)

Tests of full field (all 3 values in 1 test)

Overall analysis


/ 24

Grade / 10

















* = discretionary boost of up to one point allowed for “Clue”, should be rarely given.

Imagine testing a date field. The field is of the form MM/DD/YYYY (two digit month, two digit day, 4 digit year). Do an equivalence class analysis and identify the boundary tests that you would run in order to test the field. (Don’t bother with non-numeric values for these fields.)

Grading notes--

  • "overall analysis"--refers to the discussion and presentation of the analysis.

  • A boundary chart is not compulsory, but some organized presentation of the material is. These 2 points are for the presentation of the answer

  • Interdependence: the valid days will differ depending on which month and whether we are in leap year or not.

  • Invalid pairings: there must be tests of invalid combinations, such as Feb 29 in a non-leap-year or June 31. [If the student shows no thinking about invalid combinations, deduct additional points from “overall analysis”]

  • There’s a 10% grading penalty for wasting space on non-numeric values. These don’t belong in the answer (read the question), so the student is writing a shotgun answer. I don’t always have the opportunity to penalize for defocused shotguns, but this is such an obvious situation that I am glad to take advantage of it. It lets me make a point about sticking to the call of the question when I review mid-term test results.

Equivalence classes

  • valid days. There are 4 equivalence classes for days.

  • All are 0-x, where x=28, 29, 30, 31 depending on the months and leap year

    1. month = 28 day

    2. month = 29 day leap year

    3. month = 30 day

    4. month = 31 day

  • valid months 1-12

  • years 0-9999 or some other plausible range

List tests

(a) {28 day month} {0,1,28,29} {0,2000,9999,10000, leap year}

(b) {29 day month} {0,1,29,30} {0,2000,9999,10000, leap year}

(c) {30 day month} (0,1,30,31} {0,2000,9999,10000, leap year}

(d) {31 day month} {0,1,31,32} {0,2000,9999,10000, leap year}

In other words, for tests of type (a), pick a 28 day month (February) and test with one of the numbers in the set {0, 1, 28, 29} and with one of the numbers in the set {0,2000,9999,10000, any leap year}.

Minefield question













8. In lecture, I used a minefield analogy to argue that variable tests are better than repeated tests. Provide five counter-examples, contexts in which we are at least as well off reusing the same old tests.

<(I’m crediting repeated tests across different configurations, even though the thing that varies here is the configuration (it is a varying test in this sense)>

The following quotes are from a discussion on Software-Testing mailing list. They were not read in class, but they provide arguments that were seen as appropriate by two senior members of the field. Students should get credit for any of these.

Examples from James Bach:

“You might rationally repeat tests...

“1. if there is a substantially greater probability of a problem happening in

an area that is exercised by the tests, compared to other areas. The

distribution of problems across a product space is not necessarily uniform.

“2. if any problem that could be discovered by those tests is likely to have

substantially more importance than problems in other areas. The distribution

of the importance of product behavior is not necessarily uniform.

“3. if they have *some* value and are sufficiently inexpensive compared to

the cost of new and different tests. New tests may still be vitally

important for the test effort, however.

“4. if the tests you repeat represent the only tests that seem worth doing.

This is the virus scanner argument: maybe a repeated virus scan is okay,

instead of constantly changing virus tests. However, sometimes we introduce

variation because we aren't sure what tests truly are worth doing.

“5. if variation of tests is specifically prohibited by contract or

regulation. In other words, the point of your testing may not be to find


“6. if the repeated tests comprise a performance standard that gets its value

by comparison with previous executions of the same exact tests. When

historical test data is used as an oracle, then you must take care that the

tests you perform are comparable to the historical data. Holding tests

constant may not the only way to make results comparable, but it can be the

best choice available.

“7. when the discovery of bugs is probablistic, perhaps due to important

variables involved that you can't control in your tests. Performing a test

that is, to you, exactly the same as a test you've performed before, may

result in discovery of a bug that was always there but not revealed until

the uncontrolled variables line up in a certain way. This is the same reason

that a gambler at a slot machine has for playing again after losing the first time.”

From Ross Collard, supplementing Bach:

1, Regression Testing is Insurance

James Bach's original posting (attached) discussed whether it is better to vary or repeat the same test cases. It was not directly about regression testing, but regression testing is related and I would like to address the implications of his posting for regression testing.

Many people think that regression testing is over-rated, and the minefield analogy could be used to help make the case for this position. One implication of the minefield is: "Don't re-test the same conditions."

I don't agree that regression testing is over-rated. Large organizations like Microsoft and Cisco mindlessly re-run the same regression test cases night after night, often at the rate of tens of thousands of test cases per night.

Almost all of these test cases almost always pass -- usually well over 99% pass with reasonably stable and mature systems. The biggest category of regression test case failures is generally the repeats -- the ones we already know about, because they failed before for minor reasons and we are being leisurely in getting around to fixing them. So these test case failures do not provide any new information.

Places like Cisco have made their automated regression testing fast enough and cheap enough that even if the pass rate is 100%, they have not paid too much to gain a sense of confidence. (A caution -- many observers would say that this is a dangerous and false sense of confidence if the regression testing is not highly competent.)

In the movie "Groundhog Day", Bill Murray wakes up each morning, only to have to re-live the same day over and over. Bill could have been a regression tester, because the heart of regression testing is repetition; re-running the same test cases from version to version of a system. As Yogi Berra said: "It's deja vu all over again."

2. Regression Testing is Distinct from Modification Testing

It helps to ensure we are using words in the same way here.

Localized change testing or modification testing addresses what has changed. This testing is narrow in focus, based on the change requests, problem reports, programmers' impact assessments, before / after comparisons (the diff or delta files) from source code control tools, or other sources of information.

3. Regional Impact Testing

After the specific change has been checked, i.e, the system behavior conforms to the expected new behavior described in the change request, or the problem as reported in the problem report has been fixed, some people perform what they call regional impact testing. (Note -- there is usually more than one change request or problem report included in a new system build. Localized testing is done for each one of them.)

Regional impact testing goes beyond the localized change and seeks to test in the perceived high-impact region around the change. This requires that the testers have a reasonable chance of identifying the high-impact region, which usually requires a gray box view of the system architecture -- what connects to what internally.

As an example of regional impact testing, consider two system features which externally appear to be unrelated. If one feature is changed, there is no particular reason to re-test the other. Internally, though, let's say these features are data-coupled, i.e., share some common data. This means that the features interact through the shared data and could interfere with each other. The first feature could run first, corrupt the shared data, and cause problems for the other feature.

Regression testing, at least in this view, is different and follows the localized modification testing and regional impact testing. In other words, the modification itself has already been checked prior to the regression testing.

4. The Mines Move

A significant observation, in my opinion, was stated by Kamesh Pemmaraju in his posting.

He made the point that the minefield analogy is inexact (like all analogies). The locations of the hidden mines are not static. Every night after we have cleared (or more likely, partly cleared) the minefield the enemy is back in there seeding the field with new mines. (With apologies to software engineers -- we know (hope?) their primary objective is not to seed mines.)

As Kamesh said: "In a dynamic environment, new mines are planted and old mines (that were cleared earlier) re-appear and these active mines may now occur in the paths that were traversed before."

5. Changes Often Introduce Bugs

We have ample evidence that changes can introduce bugs, and these bugs are scattered in patterns that do not respect the parts of the system we have already tested. In other words, it is not hard to inadvertently break something which previously was working.

According to Watts Humphrey (I think) of the SEI, the probability of a software engineer inadvertently introducing a defect with a modification is 20% to 50%. To be fair, most of these new defects are trivial, and about two-thirds of them are seen and fixed / removed immediately by the software engineer before they are seen by the system testers.

Capers Jones of Software Productivity Research estimates that for every hundred Y2K fixes, seven new defects were introduced. Y2K fixes, while numerous, were considered very straightforward and low risk, perhaps 1 or 2 on a scale from 1 to 10 of the difficulty of software fixes.

IBM has reported that 9% of all modifications to its MVS mainframe operating system introduce new defects -- and that is just the ones they know about.

For an Alcatel subsidiary which makes high-speed backbone switches for the Internet, the number of modifications which introduce new bugs is over 20%. (I don't have permission to reveal the subsidiary's name.)

We also have the notorious example of DSC Communications, where an inadvertent one-character bug in a three-line-of-code change to an existing two million LOC system came close to putting the company out of business. They did not catch the side effect (the inserted bug) because they did not adequately regression test. Or at least, that was a question raised in a congressional inquiry into the damages the side effect caused.

6. Variations and Equivalence

I do not necessarily disagree with James Bach's heuristic: "It's better to vary tests than to repeat the same tests." As he said, the reason for raising the issue is to help ensure we think through the advantages and disadvantages of varying test cases in a particular situation.

Several people in prior postings pointed out that the number of test cases run is typically only a small sample of all possible conditions, it is better to vary the sample on re-runs.

However the whole idea of equivalence classes (sets of test cases grouped together based on commonality vs. variances, where if the system works for one test case it likely works for all in the set), reduces the importance of varying the test cases.

In theory, this "if it works for one, it works for them all" claim means that there is no point to variations within an equivalence class.

Of course, since the definitions of equivalence classes always have some assumptions and uncertainties, and our equivalence class groupings are usually imperfect, there is still a significant value to varying the test cases nevertheless.

So while deciding whether to vary the test cases is worthwhile, and how, I do not see these as the most important issues.

The more important point in my opinion is to reasonably extensively re-test existing features and characteristics that should not have been affected by a change -- in other words, whether to regression test at all, regardless of how much the test cases are varied or held constant.

Many, perhaps most, organizations make a change, then test the change itself and a little bit around the change, and do little or no regression testing.

7. Partial Regression Testing

Many test groups do not run a full regression test on every build or version of a system, because it is prohibitively expensive and time-consuming. Ideally, the regression testing would be so fast and cheap that the testers can mindlessly re-run all the test cases, but this is usually not the situation.

The idea in partial regression testing is either (a) to draw a boundary around a change to a system, and to test only within that boundary, or (b) to take a subset of an entire regression test case library, based on intelligent selection criteria for the situation, and re-run only this subset

The assumption behind the first idea is that the system is decoupled -- any change introduced within the boundary cannot adversely affect anything outside the boundary, i.e., in the untouched remainder of the system.

Some purists think that "partial regression" is a contradiction in terms: a regression test means a complete re-test and thus cannot be partial. Despite this quibble, the concept of a partial regression test can help to determine the appropriate limits for a regression test.

8. Re-Test Coverage Guidelines

To what extent should existing features, which should not have themselves been changed, be re-tested after a modification?

Coverage guidelines embody a strategy for determining how much regression testing to do, based on our best guess at the trade-offs of costs, benefits and risks in a particular testing situation.

To implement these guidelines, each test case in the regression test case library has to be categorized and tagged by its category (a test case can belong to multiple categories). Examples of categories include (a) heavily used features and (b) unusually complex uses of a feature.

For a low-to-moderate risk, low-to-moderate complexity system, fairly typical guidelines for the recommended degree of re-testing after a change, (i.e., not including the testing of the change itself), organized by category of regression test cases, are as follows:

Regression Test Category Degree of Coverage (*)

[(*) Percentage of all regression test cases in each category which are re-run.]

1.Smoke test 100%

2. Test cases which have failed in the past

a. Critical errors 100%

b. Moderate or minor errors 10% to 20%

3. Test cases for basic functionality

a. Area(s) most impacted by changes 25% to 50%

in this release

b. Positive remaining test cases 5% to 10%

c. Negative (robustness) remaining test cases 10% to 20%

4. Test cases for complex features 25% to 50%

5. Test cases for frequently or heavily used features 25% to 50%

6. Test cases for business-critical features 50% to 100%

7. Bad fix test cases 100%

The overall percentage coverage (percentage of all the test cases in the test case library which are included in this particular regression test run) might be 25%-40%, as a weighted average of the categories listed above.

9. Mutations

Douglas Hoffman discussed automated test case mutations in his STAR presentation.

Earlier, I mentioned extracting subsets of test cases from an existing (and probably large) library of regression test cases. These test cases do not have to physically all exist -- there could be a test data generator available to generate variations on demand. So for partial regression testing the test cases effectively can be extracted from a virtual set.

(Mutation analysis, which seems to have fallen into disuse in its initial use, originally generated mutations for a different purpose - to examine the efficacy of test cases.)

10. Repeatability vs. Rotation

If only a subset of the test cases in a particular category within the regression test case library are going to be selected for re-testing, there is the option of selecting and using the exact same test cases from cycle to cycle of regression testing, or rotating the subset of test cases run in each cycle.

Let's assume that of all the existing positive test cases for basic functionality of a low-risk, low-complexity system, the target coverage has been set to 10%. In other words, if there is a total of 150 test cases in this category in the test case library, 15 of them will be included in this regression test.

The question is: on subsequent re-testing of future versions of the system, should the same subset of let's say 15 out of 150 test cases from a given category be re-used, or should the membership of this subset be rotated (varied) from test cycle to test cycle?

This is simply a re-statement of James Bach's original question.

There are two schools of thought on this. One opinion is that the randomly selected test cases should not be rotated but remain constant, because this way there is before-and-after comparability of the same test results from test cycle to test cycle.

With the first option, 15 test cases are selected and run in every regression cycle. The remaining 135 test cases are never utilized., at least not in this series of regression test cycles.

This first option, to re-run the exact same test cases from cycle to cycle, has the virtue of repeatability. The selected test cases are run routinely in every cycle, so that we can compare the results of these same test cases across all the cycles of regression testing.

The same test cases should always produce the same results over time, unless there is a deliberate reason that we should expect a change in results. The discrepancy in results, if it occurs, is what we are interested in. (Comparator tools are cheap and simple, but great for this mindless re-checking.)

As a few people such as Rex Black mentioned, this repetition is sometimes contractually required, especially in areas regulated by U. S. government agencies such as the DOD, FDA, FAA and NRA.

In 5 cycles of regression testing, however, the total coverage does not exceed 10% because the same test cases are always being run.

The other school of opinion is that the members of the subset of test cases taken from each category should be rotated from test cycle to test cycle, in order to provide a broader test coverage over time.

The second option, to rotate the subset of test cases within the same category from test cycle to test cycle, has the virtue of providing broader coverage -- a wider range of the test cases within the category are executed over a series of regression test cycles. However, we lose 100% repeatability -- not all the selected test cases are run in every regression cycle.

With this second option, a different subset of 15 test cases out of the total of 150 are selected and run in each regression cycle. Over a duration of 5 test cycles, a total of 75 different test cases are executed, for a total coverage of 50%, but none of them are repeated.

11. The Cost of Generating Variations of Outcomes

Deliberate and random or periodic variations are fairly easy to build into regression test case and can be triggered automatically, and generating variations of test cases can be automated too.

Now I want to be a hypocrite and disagree with myself.

It is easy to generate variations of the input data values and initial conditions.

It is frequently much harder to generate the correct variations of the expected outcomes of these test cases.

Imagine, for example, we have to test a system which computes taxes for the U.S. Internal Revenue Service from input tax returns.

Producing the input variations is dead easy -- we can use a test data generator to bury us in test cases within minutes.

But trying to determine whether we got the right results (the computed amounts of tax due) is the killer. For this we would need an oracle, which in the general case would have to incorporate all the U.S. tax code -- the oracle would be bigger than the system we are trying to test.

12. Parallel or Volume Testing

In some ways our great-grandparents who tested mainframe systems were way ahead of the game. They used unplanned, uncontrolled variations as part of a popular mainframe testing technique called parallel testing

In a parallel test, which is also called a volume test, a large volume of test cases are "pumped through" the system or the feature being tested, in a before-and-after comparison of the new system version with the prior version. The term back-to-back testing is also used, to describe the situation where the same set of test cases is executed with two versions of the same system. Apart from what is expected to have changed from version to version, everything else should be the same in the new version as in the old one.

Unlike a planned, detailed test, usually the parallel test cases are deliberately not pre-defined individually nor pre-filtered. The test cases are usually extracted in bulk wholesale mode from a passing (large) stream of live data. This live data stream is continuously changing. After the extracted test data is used in one before-and-after comparison of two system versions, it is typically thrown away and a fresh (and different) extract is used the next time.

A pre-defined set of test cases may reflect biases and contain gaps. The hope of parallel testing is that, with a sufficiently large volume of these test transactions, all significant conditions and combinations of conditions will be checked.

This idea is a little like throwing mud at a politician: with sufficient volume, some must stick. In the words of Lenin: "Quantity has a quality of its own." (He was commenting on the use of tanks in ground warfare.)

Unfortunately, parallel testing carries its own significant dangers, the biggest of which is the hidden carry-forward of pre-existing bugs. The before and after versions of the system contain the same hidden bug and so both behave the same way. New England Tel (now part of Verizon) faced a huge liability and litigation for years of small over-billings which added up to large sums, based on a hidden bug which passed years' worth of parallel testing.

To be fair, the hidden carry-forward of pre-existing bugs is a danger which can occur with any type of before-and-after results comparison, including regression testing where the expected results have not been independently computed.

13. Varying the Number of Test Cases per Cycle

For most test projects, it is unrealistic to assume that exactly the same number of test cases will be executed on each build.

As the trust in each subsequent build increases, the testers may be willing to decrease the number of regression test cases for each build. More likely, not all features will be ready to test in the earlier the test cycles, so that not all test cases could be run even if we wanted to. All the test cases are not likely to be ready at the beginning, even if all the features are available to test. A full regression test may deliberately not be run on each build, either because of the frequent turnaround (the quick replacement of the build by the next new version does not allow enough time), or because the regression testing is too expensive. And some builds may be skipped, as the testers may choose to wait for a later build instead of taking the build which is immediately available.

In addition, the testers usually gain more confidence from build to build, as (a) the testers' understanding of what they have increases, and (b) the versions become progressively cleaner.

So far, my discussion has assumed that the number of test cases to re-run is fixed from build to build. This variation in the number of test cases run from cycle to cycle may influence the mix between repeatability vs. rotation.

As the systenm under trest stabilizes / matures, some regression test groups increase the proportion of test cases which are repeated (fixed), but I have not figured out as yet whether this is a good idea or not.

14. Determining the Best Re-Test Strategy

So do we repeat, rotate or use some mix of the two?

I have not done any kind of survey, but in my experience most regression test organizations repeat anywhere from 75% to 100% of the same test cases, and rotate 0% to 25%.

I suspect the amount of rotation (variation) has generally been way too low.

We should repeat test cases only in areas where:

(a) The probabilities of new breakages appear to be high. For example, areas where prior failures rates have been high are good candidates for intense re-testing.

(b) The costs of failure are so high we are not willing to take any chances.

(c) We want comparable results from test cycle to test cycle.

(d) The cost of variations (e.g., independently computing the new expected outcomes) is too high to be effective.

Ross Collard


















List three different missions for a test group. How would your testing strategy differ across the three missions?

Grading Notes:

A plausible relationship between the mission and the strategy / activity is sufficient for full credit, but the strategy / activity description has to go beyond a statement of fulfilling the mission. For example, if the mission is “Find defects”, the testing strategy has to say something more than “find defects” such as concentrating on stress tests, concentrating on high risk areas of the product, etc.

Varying definitions of test groups (from course slides)

  • Find defects

  • Maximize bug count

  • Block premature product releases

  • Help managers make ship / no-ship decisions

  • Assess quality

  • Minimize technical support costs

  • Conform to regulations

  • Minimize safety-related lawsuit risk

  • Assess conformance to specification

  • Find safe scenarios for use of the product (find ways to get it to work, in spite of the bugs)

  • Verify correctness of the product

  • Assure quality

“Clue” is discretionary, typical value is 0.5. If the mission strategies are weak but I give them a full credit, I’ll not award the clue point (it was already awarded).

Goodness of tests










List and describe four different dimensions (different “goodnesses”) of “goodness of tests”.

Grading Notes:

1 point for the item and 1.5 for the description

List from course:

  • More powerful

  • More credible

  • Provides better support for troubleshooting

  • Representative of a broader group of tests

  • Is representative of events more likely to be encountered by the customer

  • Is more likely to help the tester or developer develop an insight into the program

  • Is easier to automate, easier to evaluate, more feasible, lower opportunity cost

<< предыдущая страница   следующая страница >>