bigpo.ru
добавить свой файл
1 2 ... 8 9

Assessment in the Software Testing Course1

Cem Kaner2

Workshop on the Teaching of Software Testing

February 2003




Abstract


This report collects my notes on the assessment issues in my software testing course. These notes will serve as the raw data archive for a summary paper submitted for traditional publication. The notes include a description of approach to examinations, sample exam questions, exam study guide, examples of my grading scheme for exams. The notes also include sample assignments (with grading notes).

Contents


1. Background: The Software Testing Course

2. Assessment Methods

3. Exams

4. Assignments

5. Bonus Assignments

6. Quizzes

7. Closing Notes, Including Plans for Change

Appendix A: Pool of questions given to students for exam preparation.

Appendix B: Study Guide. (How to study for this type of test. How to answer essay questions.)

Appendix C: Grading analysis for several exam questions.

Appendix D: Sample assignments and grading notes


1. Background: The Software Testing Course


This report collects my notes on the assessment issues involved in my software testing course. The notes are still somewhat rough. I’ll probably edit them one more time before writing a summary paper for publication. These notes will serve as the raw data archive for that summary paper.

The course itself has been in evolution since 1987, when I began “Tester College” while managing the testing group in Electronic Arts’ Creativity Division. Hung Quoc Nguyen and I then developed a software testing course for the Silicon Valley chapter of the American Society for Quality in 1994. I have taught that course to working professionals frequently since then, in public courses offered by UC Berkeley Extension and UC Santa Cruz Extension, Software Quality Engineering, Software Test Labs, logiGear, and Satisfice, and in onsite classes at large and small software companies (such as Microsoft, Hewlett-Packard, Intel, Quarterdeck, Compaq, PostalSoft, PowerQuest, Symantec, Rational, Kodak, Gilbarco, Aveo, BMC, IDTS, Tideworks, Wind River, Cigital, and many others.) A recent version of my commercial course notes is available at http://www.testingeducation.org/coursenotes/kaner_cem/cm_200204_blackboxtesting.

I modified the course for academic use in 2000 and have taught the academic course at Florida Tech five times, modifying it (especially in the assessment) each time. A version of my academic course notes is available at http://www.testingeducation.org/coursenotes/kaner_cem/ac_200108_blackboxtesting.

The academic course focuses on black box software testing. I teach a second course in glass box testing that takes the black box course as a prerequisite.

The black box course has five core topic areas, which I prioritize as follows:

  • Paradigms of software testing: a look at 9 dominant styles of black box testing. Students apply several of these to a sample application, such as StarOffice.

  • Bug advocacy: effective replication, analysis, and reporting of bugs.

  • Test documentation: examples of test documentation components and an overview of requirements analysis to determine what is needed in what context.

  • Additional test design issues: The primary examples are an overview of GUI-level regression testing and design of these tests for maintainability, and all pairs combination testing.

  • Process and organizational issues: We look primarily at the structure and missions of typical software testing groups and the implications for testing of different software lifecycle models. This material is presented primarily to provide context for those students who have no industrial experience, and to provide exposure to alternative contexts for students who have worked only in one or two companies.

Several other topics come and go in the class, depending on student interest, applicability to the sample application that we are testing, and various other factors. These have included state-model-based testing, software test metrics, high volume test automation, status reporting, project planning, quality/cost analysis, failure modes and effects analysis, and finding a job in software testing.

This report focuses on assessment issues, and so I will not further discuss the choice of topics here.

Throughout the course, we apply what we learn to a sample application. So far, we’ve used the TI Interactive Calculator and the word processing and the presentation modules of OpenOffice. Another senior member of our faculty (James Whittaker) also uses sample applications in his testing courses, primarily Microsoft products under development.

I recommend working with the open source products for several reasons:

  • The students bug reports are publicly available. This can help them at job interview time (they can point to records of their actual work product). It also encourages them to take the reporting task seriously.

  • Students can see the progress of their bugs through the bug reporting system, watching comments develop on their bug reports, reporting fixes, complaining of non-reproducibility, asking for more information, and so on. They get to participate in a series of real-project bug discussions, gaining insight and experience that will be directly applicable on the job.

  • The students’ work is valued and they get personal feedback. This is not true of all open source projects, but an instructor can get a sense of the feedback style on the project by examining the reports already in the bug tracking database before selecting an application.

2. Assessment Methods


Assessment is the course’s primary educational tool.

I give lectures and students (academic and commercial) generally like them, but lectures can only transmit so much information, and students forget them anyway.

I use the lectures to provide a structure for the material and to provide real-life examples, compelling or entertaining stories that will help students understand how or why a technique was used in practice, what the effects of different life cycle models can be, and so on. The lectures create contexts for the material.

During the lectures, I also run several discussions focused on hypotheticals or thought experiments. These are effective learning tools for some students.

My expectation is that most students will do most of their learning while doing homework, assignments, and studying for, writing, and reviewing the results of tests and exams.

I encourage students to work together when they do assignments. In general, encouraging collaboration (students co-sign artifacts that they work on together, with the explicit expectation that more co-authors must produce more work), seems to have been effective in eliminating plagiarism. The collaboration is done openly instead of secretly.

However, because 35% of the final grade comes from the assignments, there is an incentive for a weak student to pair with a stronger student in order to cash in on the high marks the stronger student will earn. The first few times that I taught the course, this was a serious problem and at least two students passed the course who probably should not have. I addressed this problem with the following policy, printed in the syllabus (and reviewed with students in the first class):

To pass the course, you must have a passing average on the mid-term test and the final exam.

  • Undergraduates (CSE 4431): If the average of your mid-term test and your final exam is below 60%, you will fail the course no matter how well you do on the assignments and no matter how many bonus points you have.

  • Graduates (SWE 5410): If the average of your mid-term test and your final exam is below 68%, you will fail the course no matter how well you do on the assignments and no matter how many bonus points you have.

You can earn grades as follows

  • In-class quizzes up to 5% (1% per quiz, pass/fail grading)

  • Assignments 30-35% (depends on the number of quizzes)

  • Mid-term test 25%

  • Final exam 40%

  • Bonus Assignments (including bug reports) up to 10%

    • Bug reports up to 5%

Total points available 110%

I don't grade on a curve. If everyone gets 90% or more, everyone gets an A. (B is 80-89; C is 70-79; D is 60-69; F is 0-59).

Since adopting the threshold policy, I haven’t seen as much obvious imbalance in effort on the assignments. The student who can’t pass the exams can’t pass the course.

3. Exams


To maximize the educational benefits from the mid-term and final exams, I hand out a pool of questions a few weeks before the exam. The exam questions are selected from the pool. The exam is closed book.

3.1 Benefits


The most important benefit of this approach is that it allows the students to think through their answers and prepare them carefully. The exam is merely a production exercise--the student isn’t spending precious time trying to understand each question and think through a strategy for answering it. Because I can assume that the students have thoughtfully developed their answers, I can apply a higher standard when I grade the answers.

Another important benefit is that it gives the students structure for studying together. I encourage students to review and discuss each others’ answers.3 The questions focus their work, give them something explicit to work on.

A third benefit is that this approach allows me to ask complex questions without unfairly disadvantaging students whose native language is not English. People read at different speeds. Students whose English language skills are still under development need extra time to read and comprehend essay questions and to structure their answers. Because they have that extra preparation time, I don’t have make allowances for them when I grade.

3.2 Source material


Appendix A lists questions that I’ve used in the course.

Appendix B is an instruction / suggestion sheet that I give students with the questions. The sheet includes two types of guidance:

  • How to study for this type of test

  • How to answer questions on this type of test.

Appendix C provides analyses and answers for several of the exam questions.

3.3 Risks and Problems


The primary problem with this course’s approach is that many students aren’t used to answering essay questions and so they deal with them ineffectively. This problem is not unique to computer science students. For example, teaching first year law students how to answer essay questions is a critical task, repeated in course after course.4 Many universities publish guidelines for undergraduates on how to study for, and organize, essay questions.5

The problems that I note here apply broadly to essay-format exams among graduate students in computer science. For example, at Florida Tech in Fall, 2002, a substantial minority of the population of students writing the Software Engineering Comprehensive Exam failed or nearly failed the exam because of ineffective essay-answering strategies that yielded

  • weakly structured answers that missed important points or

  • shotgun answers (unfocused, not directly responsive to the question).

3.3.1 Weak Structure


Consider the following question as an example:

Define a scenario test and describe the characteristics of a good scenario test. Imagine developing a set of scenario tests for the Outlining feature of the word processing module of OpenOffice. What research would you do in order to develop a series of scenario tests for Outlining? Describe two scenario tests that you would use and explain why each is a good test. Explain how these tests would relate to your research.

This has several components:

  • Define a scenario test

  • Describe the characteristics of a good scenario test

  • What research would you do in order to develop a series of scenario tests for Outlining?

  • Describe two scenario tests you would use.

  • Explain why each of the two scenario tests is a good test

  • Explain how these two scenario tests would relate to your research

A well organized answer will have at least six sections, one for each of the bulleted components. You might have two additional sections, by splitting Describe two scenario tests you would use and Explain why each of the two scenario tests is a good test into two sections, one for each test.

Without structure, it is easy to miss a section and thereby to lose points.

Students must learn to focus their answer to match the “call of the question” (the specific issues raised in the question).

It is a safe bet that a substantial portion of the undergraduate or graduate CS students who attend the software testing class will not yet have sufficiently developed their skills in identifying and responding to the call of the question.

3.3.2 Shotgun Answers


A student using a shotgun strategy responds with a core dump of everything that seems to be relevant to the general topic. Much of this information might be correct, but if it is non-responsive to the call of the question, it is irrelevant and I will ignore it. However, to the extent that irrelevant information is incorrect, if I notice an error, I will deduct points for it.

Here’s an example of a question that yielded a lot of shotgunning and not enough points:

Imagine that you are an external test lab, and Sun comes to you with OpenOffice. They want you to test the product. When you ask them what test documentation they want, they say that they want something appropriate but they are relying on your expertise. To decide what test documentation to give them, what questions would you ask (up to 7 questions) and for each answer, how would the answer to that question guide you?

In the course, we looked over a long list of requirements-eliciting questions.6 Students were free to use the ones we discussed or to supply their own.

The question does not call for definition / discussion of IEEE 829 or for a list of the common test documentation components. It doesn’t call for a description of test matrices or a discussion of how to create them. I got these and much more on a recent exam. Unless this information was couched in terms of a question or the interpretation of the question/answer, it was irrelevant--a waste of the students’ time (and in a time-limited exam, a tax on the student’s ability to complete the rest of the test in the time available).

3.3.3 Time Management


Students should have fewer exam time management problems when their exams contain only questions from the study guide. After all, they have (in theory) answered each question and they have a sense of how long each question takes to answer.

In practice, many students run out of time the first time they take a test like this, because they don’t realize that time management will be an issue or what to do to manage it. As an example of how to manage a timing problem, if a student develops a “perfect” answer to an essay question, but discovers that it will take an hour to write, she will prioritize and reduce the length of the answer in order to fit it within the time available.

3.3.4 Lack of Preparation


I encountered the worst timing problems during my first year of teaching at Florida Tech. I didn’t yet have a reputation among the students, and few other instructors gave students a list of study questions that included all of the questions that would actually appear on the exam. As a result, students in my first two courses didn’t study the particular questions in detail and didn’t develop their own answers. These students performed badly on their mid-term exams; many of them didn’t come close to finishing the exam because they had to read, comprehend, and plan an answer for each question rather than recognizing the question and starting to write the answer they have already prepared.

3.3.5 Weak Group Preparation


The best way to prepare for these tests is for each student to attempt each question on his own. The first attempt should be open book with no time limit. After each student has his own answers, he should compare notes with other students. The diversity of approaches will highlight ambiguities in the question, hidden assumptions on the part of the student, and muddled, disorganized thinking about the structure and call of the question. Independent preparation by several students is essential.

Unfortunately, many students form study groups in which they either:

  • Divide up the questions. One or two students attempt to answer each question and then report back to the group. The rest of the students then attempt to memorize the answers.

  • Attempt to develop the answers in-group, four or more students arguing and together.

Neither of these approaches works well. There are so many questions in the study list that few (or no) students can effectively memorize all the answers. As a result, I often see answer fragments, relevant material mixed with irrelevant (something memorized for a different question), or answers that have been distorted (such as forgotten words, points made so far out of sequence that they don’t make sense, etc.)

The group-think approach works better but often produces weak answers. The group tends to latch onto the first answer that appears to make sense. Or it latches onto the answer advocated by the loudest or most persuasive or most persistent student in the group.

It is much more effective to start from a diverse group of prepared answers, with the people who understand and can explain why they prepared the answers in they way they did.

I tell students this every term, and every term a significant group of students tries the divide-and-(oops)-don’t-conquer strategy and the work-only-during-group-study sessions. Most of them learn their lesson the hard way when they write an unsatisfactory mid-term exam.

3.3.6 Weak Answers Propagate Through the Group


Sometimes, the entire class answers a question in a way that is obviously (to me) mistaken or otherwise sub-optimal. I’ve seen several class-specific exam answers like this. By class-specific, I mean that a different class, on encountering the same question, has handled it much better.

3.3.7 Failure to Consult Required Readings


I publish my lecture notes on the web, using a tool called BlackBoard. Along with my lecture notes, I supply copies of several other articles, some in a folder labeled as Required Reading and others in a folder labeled as Recommended Reading.

Surprisingly often, students consult the lecture notes and ignore the required readings. I now choose at least one question that relies on the required readings and not on the lecture, just to remind students (reminder-by-consequence) that they are supposed to read the required readings.

A more subtle problem arises when a question can be answered to a mediocre degree from the lecture notes, and much better from the required readings. In that case, the large majority of the class often gives the mediocre answer. It is tempting to the grader to accept the majority product as the right product.

3.3.8 Excessively Short Lists are Too Easy to Memorize


One safeguard against students memorizing every answer (relying on other students to generate answers for them) is that there are too many answers to memorize.

3.3.9 Excessively Long Lists and Lists Distributed Too Late Motivate Little Studying


If the list goes to students too late (relative to its length), the list is seen as unreasonable -- impossible -- not worth paying careful attention to.

3.3.10 Prioritization is not Student-Driven


An issue raised with the assessment approach used in this course is that it is seen as micromanaging the study habits of the students. The instructor boils the course down to a relatively short list of questions and the students study these (and nothing else).

I don’t see this as much of a problem. If I include everything that I think is important in the class, then the students study a fairly wide range of material.

Left without guidance, students still prioritize, but their prioritization is based more on rumor and those hints (real or imagined) that they got from the instructor. To some degree, their prioritization is also based on their interests. Students study things that capture their imagination and often learn a great deal from that. The exam structure in this course does not encourage people to take long, fascinating tangents. I try to make up for this, not entirely successfully, with the assignments and bonus point opportunities.

4. Assignments


Appendix D provides sample assignments and grading notes.

The intent of the assignments is to give students practice exercises so they can build skills.

Students are encouraged to test in pairs and to edit each other’s bug reports before filing them in the sample application’s bug database, such as IssueZilla for OpenOffice.

The assignments are useful as far as they go, but they are inefficient. It takes many students three assignments before they get reasonably competent in the style of domain analysis that I teach. (They hand in Assignment 3 about 5 weeks after the start of classes.)

What we need (and are developing) are many more, simpler homework exercises that can be used for practice. My analogy is to the typical Calculus course. Students do a lot of homework, and learn many concepts quickly, much more quickly than the testing students are picking up concepts in the testing course.

5. Bonus Assignments


Students can collect up to 10 bonus points, bringing their maximum possible point count to 110%.

In general, I use bonus assignments to encourage students to improve their communication skills and their system administration skills.

As examples, when we test OpenOffice:

  • One student takes responsibility for providing installation and update technical support for the class. If anyone has problems installing OpenOffice, they go to this student.

  • Another student takes responsibility for coaching people in the mechanics of the bug tracking system.

  • Another student or two might make a class-long presentation on a topic of interest. For example, a student with relevant work experience gave a presentation on failure modes and effects analysis. Another presentation might profile test tools available at sourceforge.org.

Additionally, students can earn bonus points by reporting bugs and editing bug reports.

Here are the rules I include in the syllabus on bonus point bug reports:

Bug Reporting:

The OpenOffice staff are primarily volunteers, like you. These people are not to be abused. Bug reports should be written in a respectful tone. If your reports are disrespectful, sarcastic, or in any other way inappropriate in tone, I will refuse to award any bonus points for any of your bug reports. Some of these volunteers may reject your reports unreasonably. They might reject perfectly good (bad) bugs. Or they might respond sarcastically or disrespectfully. This happens in industry too. You'll have to learn to deal with it. (But don't respond in kind.)

Submitting bug reports is voluntary, but every report you submit will be reviewed and commented on by the OpenOffice staff. This is an important training opportunity.

I will award bonus points (up to 5% of your grade can be for bug reports or bug replications) for bug reports. Here are my standards for awarding bonus points for bugs.

  1. Many of the bug reports in the IssueZilla database don't meet my standards. Many people who work on open source projects never had training in testing. But you do have that training, and the point of this exercise is to give you experience writing bug reports at a professional level. Therefore I will hold you to a professional standard, not to the standard of a part-time volunteer. I suspect that several bugs that are sent to me will not be awarded bonus points.

  2. Reports of already-reported bugs will not qualify for a bonus.

  3. I will award UP TO 1 bonus point per bug report (to a maximum total of 5 points across all bugs submitted). If you developed an acceptable bug report as a group of N people, you will get 1/N points each for that bug.

  4. I will not base the decision on personal demonstrations of bugs. If the bug report, as written seems unclear, confusing, or insignificant, it does not qualify. The bonus is for the report, not for the bug.

  5. I will not review every bug in the database. I will only look at bugs that you ask me to look at. If I have to wade through mediocre or poor bugs from you, I will give up, even if that means skipping potentially worthy ones. Therefore, please exercise care in pointing me to your submissions. Only send me to your good ones.

Bug Reviews:

The O-O bug database has many bugs that have not been replicated or analyzed. They need these replicated and, often, explained in more detail.

I will award bonus points (up to 5% of your grade can be for bug reports or bug replications) for bug reviews.

  1. Each well-reviewed bug is worth up to one-half bonus point (to a maximum of 5). If you do an acceptable review as a group of N people, you will get 1/2N points each for that bug.

  2. I will not review every bug in the database. I will only look at bugs that you ask me to look at. If I have to wade through mediocre or poor reviews from you, I will give up, even if that means skipping potentially worthy ones. Therefore, please exercise care in pointing me to your work. Only send me to your good ones.

Assignment 2 presents bug review standards in more detail.

Bug reports and bug reviews that are done as part of an assignment are not eligible for bonus credit.

6. Quizzes


I also use occasional quizzes -- unannounced tests that are worth about 1% of the final grade each. I mark them pass/fail.

I use the quiz to focus the students’ attention. For example, sometimes I want them to puzzle through a new concept in class or to apply something that we’ve talked about in the last few lectures. Occasionally, I use a quiz or a brief, hand-in homework assignment, to help me understand what information has been successfully conveyed to most of the class.

7. Closing Notes, Including Plans for Change


Overall, I think the approach of using evaluation to drive students’ learning experiences has been successful. However, there are some areas for improvement in the course:

  • Student performance on mid-terms is unnecessarily weak. The study guide helps those who rely on it, but too many students ignore it until their mid-term wake-up call. Next time that I teach the course, I’ll try a lecture well before the midterm on study strategies (we’ve done that already, even with a presentation by a previous student), that shows how I grade test questions. I’ll probably start with a quiz, using a question from a prior exam, so that students will have thought intensely about the question before I show how I grade it.

  • We’re experimenting with rubrics. These might help some students improve the structure and approach to their assignments.

  • We need more practice materials, a series of exercises that run from simple to complex, that have students work through the routine aspects of each testing technique. My lab has been working on developing these but we don’t have a good enough collection yet.

  • The course relies on readings and lecture notes. There isn’t a course text because I haven’t yet found a good course text. Probably next time I teach, I’ll require Kaner / Bach / Pettichord’s Lessons Learned in Software Testing and perhaps Whittaker’s How to Break Software or Hendrickson’s lecture notes on Bug Hunting. Over the long term, though, we need to develop a traditional textbook.

Appendix A: Exam Study Guide Questions


Note: I don’t include all of the following questions in every list and I change the list from year to year. I cover different topics from year to year and some of these will be irrelevant in a given year.

It’s important to keep the workload manageable. Depending on your school’s culture, you might make your list longer or shorter--but don’t underestimate your students. Many students will rise to a challenge, especially if they believe you are genuinely interested in their work.

The exams are closed book.

Timing, Coverage and Difficulty of the Exam


The questions in each section below vary in difficulty and length.

In drafting an exam, I answer each question that is a serious candidate for inclusion in the exam and clock my answer. To clock the answer, I write the answer out once, to get my thinking and structure down. Then I write a second draft and time that. (Remember, students have been drafting their answers in the course of studying for the exam, so on the exam, they are generating the Nth draft answer.) I allow students twice as long as it took me to hand write my second draft. For a 75 minute exam, I cumulate questions to total 55-65 minutes, leaving the extra 10-20 minutes for students who write slowly. For example, the exam might include 4 definitions, 4 short answers and 2 long answers. This particular exam offers 100 points worth of questions, but some of my 75 minute exams are out of 95 or 105 -- the total count is less important than the estimated time and difficulty of the complete product.

I rate questions as Easy, Medium, and Hard and drive the difficulty of the exam by the mix of the ratings.

Finally, I pick the questions in a way that reasonably represents what we covered in class. In some cases, the questions rely on explicitly required readings rather than on material we covered in the lecture. In some cases, the questions on the list cover material that I don’t actually reach in time for the exam. These questions are excluded from the exam.

Ambiguity


One of the advantages of circulating the questions in advance is that the students can challenge them before the exam. Surprisingly, a question might be perfectly clear to the students in one semester but ambiguous to the students in the next semester.

I encourage students to draw ambiguities to my attention. I resolve the ambiguities by sending an electronic mail message to the students. I may exclude the question from the exam if the correction came too late or the answer to the corrected question is too complex.


следующая страница >>