Foreword by Alfred V. Aho | xix |
Foreword by Richard A. DeMillo | xxi |
Preface | xxiii |
1.1 The Need for Reliable Software | 3 |
1.2 Software Reliability Engineering Concepts | 5 |
1.3 Book Overview | 8 |
1.4 Basic Definitions | 12 |
1.5 Technical Areas Related to the Book | 19 |
1.5.1 Fault Prevention | 19 |
1.5.2 Fault Removal | 20 |
1.5.3 Fault Tolerance | 20 |
1.5.4 Fault/Failure Forecasting | 21 |
1.5.5 Scope of this Handbook | 21 |
1.6 Summary | 22 |
Problems | 22 |
2.1 Introduction | 27 |
2.2 The Dependability Concept | 28 |
2.2.1 Basic Definitions | 28 |
2.2.2 On the Impairments to Dependability | 28 |
2.2.3 On the Attributes of Dependability | 32 |
2.2.4 On the Means for Dependability | 33 |
2.3 Failure Behavior of an X-Ware System | 35 |
2.3.1 Atomic Systems | 35 |
2.3.2 Systems Made up of Components | 41 |
2.4 Failure Behavior of an X-Ware System with Service Restoration | 49 |
2.4.1 Characterization of System Behavior | 50 |
2.4.2 Maintenance Policies | 51 |
2.4.3 Reliability Modeling | 53 |
2.4.4 Availability Modeling | 60 |
2.5 Situation with Respect to the State-of-the-Art in Reliability Evaluation | 64 |
2.6 Summary | 68 |
Problems | 68 |
3.1 Introduction | 71 |
3.2 Historical Perspective and Implementation | 72 |
3.2.1 Historical Background | 72 |
3.2.2 Model Classification Scheme | 73 |
3.2.3 Model Limitations and Implementation Issues | 76 |
3.3 Exponential Failure Time Class of Models | 77 |
3.3.1 Jelinski-Moranda "De-Eutrophication" Model | 77 |
3.3.2 Nonhomogeneous Poisson Process Model | 80 |
3.3.3 Schneidewind's Model | 82 |
3.3.4 Musa's Basic Execution Time Model | 87 |
3.3.5 Hyperexponential Model | 90 |
3.3.6 Others | 92 |
3.4 Weibull and Gamma Failure Time Class of Models | 93 |
3.4.1 Weibull Model | 93 |
3.4.2 S-Shaped Reliability Growth Model | 95 |
3.5 Infinite Failure Category Models | 98 |
3.5.1 Duane's model | 98 |
3.5.2 Geometric Model | 99 |
3.5.3 Musa-Okumoto Logarithmic Poisson | 102 |
3.6 Bayesian Models | 104 |
3.6.1 Littlewood-Verrall Reliability Growth Model | 105 |
3.6.2 Other Bayesian Models | 109 |
3.7 Model Relationships | 109 |
3.7.1 Generalized Exponential Model Class | 109 |
3.7.2 Exponential Order Statistic Model Class | 111 |
3.8 Software Reliability Prediction in Early Phases of the Life Cycle | 111 |
3.8.1 Phase-Based Model | 111 |
3.8.2 Predicting Software Defects from Ada Design | 112 |
3.8.3 Rome Laboratory Work | 113 |
3.9 Summary | 114 |
Problems | 115 |
4.1 Introduction | 119 |
4.2 Examples of Model Disagreement and Inaccuracy | 120 |
4.2.1 Simple Short Term Predictions | 120 |
4.2.2 Longer Term Predictions | 123 |
4.2.3 Model Accuracy Varies from Data Source to Data Source | 126 |
4.2.4 Why We Cannot Select the Best Model a Priori | 126 |
4.2.5 Discussion - a Possible Way Forward | 127 |
4.3 Methods of Analyzing Predictive Accuracy | 128 |
4.3.1 Basic Ideas - Recursive Comparison of Predictions with Eventual Outcomes | 128 |
4.3.2 The Prequential Likelihood Ratio (PLR) | 131 |
4.3.3 The U-Plot | 135 |
4.3.4 The Y-Plot | 140 |
4.3.5 Discussion: the Likely Nature of Prediction Errors, and How We can Detect Inaccuracy | 141 |
4.4 Recalibration | 145 |
4.4.1 The U-Plot as a Means of Detecting 'Bias' | 145 |
4.4.2 The Recalibration Technique | 146 |
4.4.3 Examples of the Power of Recalibration | 147 |
4.5 A Worked Example | 150 |
4.6 Discussion | 156 |
4.6.1 Summary of the Good News: Where We Are Now | 156 |
4.6.2 Limitations of Present Techniques | 159 |
4.6.3 Possible Avenues for Improvement of Methods | 160 |
4.6.4 Best Advice to Potential Users | 162 |
4.7 Summary | 163 |
Problems | 164 |
5.1 Introduction | 167 |
5.2 Concepts | 168 |
5.3 Development Procedure | 170 |
5.3.1 Customer Type List | 173 |
5.3.2 User Type List | 173 |
5.3.3 System Mode List | 174 |
5.3.4 Functional Profile | 176 |
5.3.5 Operational Profile | 183 |
5.4 Test Selection | 194 |
5.4.1 Selecting Operations | 195 |
5.4.2 Regression Test | 196 |
5.5 Special Issues | 197 |
5.5.1 Indirect Input Variables | 197 |
5.5.2 Updating the Operational Profile | 197 |
5.5.3 Distributed Systems | 198 |
5.6 Other Uses | 199 |
5.7 Application to DEFINITY | 200 |
5.7.1 Project Description | 200 |
5.7.2 Development Process Description | 200 |
5.7.3 Describing Operational Profiles | 201 |
5.7.4 Implementing Operational Profiles | 203 |
5.7.5 Conclusion | 204 |
5.8 Application to FASTAR (Fast Automated Restoration) | 204 |
5.8.1 System Description | 204 |
5.8.2 FASTAR: SRE Implementation | 206 |
5.8.3 FASTAR: SRE Benefits | 210 |
5.9 Application to the Power Quality Resource System | 210 |
5.9.1 Project Description | 210 |
5.9.2 Developing the Operational Profile | 211 |
5.9.3 Testing | 213 |
5.9.4 Conclusion | 214 |
5.10 Summary | 215 |
Problems | 215 |
6.1 Introduction | 219 |
6.2 Benefits and Approaches to SRE | 220 |
6.2.1 Importance and Benefits | 221 |
6.2.2 An SRE Success Story | 221 |
6.2.3 SRE Costs | 222 |
6.2.4 SRE Activities | 223 |
6.2.5 Implementing SRE Incrementally | 223 |
6.2.6 Implementing SRE on Existing Projects | 224 |
6.2.7 Implementing SRE on Short-Cycle Projects | 226 |
6.3 SRE During Feasibility and Requirements Phase | 226 |
6.3.1 Feasibility Stage | 226 |
6.3.2 Requirements Stage | 228 |
6.4 SRE during Design and Implementation Phase | 232 |
6.4.1 Design Stage | 232 |
6.4.2 Implementation Stage | 233 |
6.5 SRE during the System Test and Field Trial Phase | 235 |
6.5.1 Determine Operational Profile | 236 |
6.5.2 System Test Stage | 237 |
6.5.3 Field Trial Stage | 241 |
6.6 SRE during Post-Delivery and Maintenance Phase | 242 |
6.6.1 Project Post-Release Staff Needs | 242 |
6.6.2 Monitor Field Reliability vs. Objectives | 243 |
6.6.3 Track Customer Satisfaction | 245 |
6.6.4 Time New Feature Introduction by Monitoring Reliability | 245 |
6.6.5 Guide Produce and Process Improvement with Reliability Measures | 246 |
6.7 Getting Started with SRE | 246 |
6.7.1 Prepare Your Organization for SRE | 247 |
6.7.2 Find More Information or Support | 250 |
6.7.3 Do an SRE Self-Assessment | 250 |
6.8 Summary | 252 |
Problems | 253 |
7.1 Introduction | 255 |
7.2 Measurement Framework | 256 |
7.2.1 Establishing Software Reliability Requirements | 259 |
7.2.2 Setting up a Data Collection Process | 266 |
7.2.3 Defining Data to be Collected | 267 |
7.2.4 Choosing a Preliminary Set of Software Reliability Models | 272 |
7.2.5 Choosing Reliability Modeling Tools | 273 |
7.2.6 Model Application and Application Issues | 273 |
7.2.7 Dealing with Evolving Software | 276 |
7.2.8 Practical Limits in Modeling Ultrareliability | 277 |
7.3 Investigation at JPL | 278 |
7.3.1 Project Selection and Characterization | 278 |
7.3.2 Characterization of Available Data | 280 |
7.3.3 Experimental Results | 280 |
7.4 Investigation at Bellcore | 281 |
7.4.1 Project Characteristics | 281 |
7.4.2 Data Collection | 284 |
7.4.3 Application Results | 285 |
7.5 Linear Combination of Model Results | 289 |
7.5.1 Statically-Weighted Linear Combinations | 290 |
7.5.2 Weight Determination Based on Ranking Model Results | 290 |
7.5.3 Weight Determination Based on Changes in Prequential Likelihood | 291 |
7.5.4 Modeling Results | 291 |
7.5.5 Overall Project Results | 292 |
7.5.6 Extensions and Alternatives | 295 |
7.5.7 Long-Term Prediction Capability | 298 |
7.6 Summary | 299 |
Problems | 300 |
8.1 Introduction | 303 |
8.2 Framework | 304 |
8.2.1 Overview | 304 |
8.2.2 Operational vs. Development Phase Evaluation | 306 |
8.2.3 Past Work | 306 |
8.3 Measurement Techniques | 307 |
8.3.1 On-Line Machine Logging | 308 |
8.3.2 Manual Reporting | 310 |
8.4 Preliminary Analysis of Data | 312 |
8.4.1 Data Processing | 312 |
8.4.2 Fault and Error Classification | 314 |
8.4.3 Error Propagation | 317 |
8.4.4 Error and Recovery Distributions | 320 |
8.5 Detailed Analysis of Data | 323 |
8.5.1 Dependency Analysis | 324 |
8.5.2 Hardware-Related Software Errors | 327 |
8.5.3 Evaluation of Software Fault Tolerance | 328 |
8.5.4 Recurrences | 329 |
8.6 Model Identification and Analysis of Models | 333 |
8.6.1 Impact of Failures on Performance | 333 |
8.6.2 Reliability Modeling in the Operational Phase | 335 |
8.6.3 Failure/Error/Recovery Model | 339 |
8.6.4 Multiple Error Model | 344 |
8.7 Impact of System Activity | 345 |
8.7.1 Statistical Models from Measurements | 345 |
8.7.2 Overall System Behavior Model | 348 |
8.8 Summary | 352 |
Problems | 353 |
9.1 Introduction | 359 |
9.2 Measurement and Software | 360 |
9.2.1 Software Defects | 361 |
9.2.2 The Spectrum of Defect Analysis | 364 |
9.3 Principles of ODC | 367 |
9.3.1 The Intuition | 367 |
9.3.2 The Design of Orthogonal Defect Classification | 370 |
9.3.3 Necessary Condition | 371 |
9.3.4 Sufficient Conditions | 373 |
9.4 The Defect-Type Attribute | 374 |
9.5 Relative Risk Assessment Using Defect Types | 376 |
9.5.1 Subjective Aspects of Growth Curves | 377 |
9.5.2 Combining ODC and Growth Modeling | 379 |
9.6 The Defect Trigger Attribute | 384 |
9.6.1 The Trigger Concept | 384 |
9.6.2 System Test Triggers | 387 |
9.6.3 Review and Inspection Triggers | 387 |
9.6.4 Function Test Triggers | 388 |
9.6.5 The Use of Triggers | 389 |
9.7 Multidimensional Analysis | 393 |
9.8 Deploying ODC | 396 |
9.9 Summary | 398 |
Problems | 399 |
10.1 Introduction | 401 |
10.2 Reliability Growth Characterization | 402 |
10.2.1 Definitions of Reliability Growth | 403 |
10.2.2 Graphical Interpretation of the Subadditive Property | 404 |
10.2.3 Subadditive Property Analysis | 406 |
10.2.4 Subadditive Property and Trend Change | 407 |
10.2.5 Some Particular Situations | 408 |
10.2.6 Summary | 409 |
10.3 Trend Analysis | 410 |
10.3.1 Trend Tests | 410 |
10.3.2 Example | 419 |
10.3.3 Typical Results That Can Be Drawn from Trend Analyses 422 | |
10.3.4 Summary | 424 |
10.4 Application to Real Systems | 424 |
10.4.1 Software of System SS4 | 425 |
10.4.2 Software of System S27 | 427 |
10.4.3 Software of System SS1 | 427 |
10.4.4 Software of System SS2 | 429 |
10.4.5 SAV | 429 |
10.5 Extension to Static Analysis | 431 |
10.5.1 Static Analysis Conduct | 431 |
10.5.2 Application | 433 |
10.6 Summary | 433 |
Problems | 435 |
11.1 Introduction | 439 |
11.2 Data Collection Principles | 441 |
11.2.1 Introduction | 441 |
11.2.2 Failures, Faults, and Related Data 442 | |
11.2.3 Time | 444 |
11.2.4 Usage | 445 |
11.2.5 Data Granularity | 446 |
11.2.6 Data Maintenance and Validation | 447 |
11.2.7 Analysis Environment | 448 |
11.3 Data Analysis Principles | 449 |
11.3.1 Plots and Graphs | 450 |
11.3.2 Data Modeling and Diagnostics | 454 |
11.3.3 Diagnostics for Model Determination | 455 |
11.3.4 Data Transformations | 458 |
11.4 Important Topics in Analysis of Field Data | 459 |
11.4.1 Calendar Time | 461 |
11.4.2 Usage Time | 461 |
11.4.3 An Example | 462 |
11.5 Calendar-Time Reliability Analysis | 463 |
11.5.1 Case Study (IBM Corp.) | 464 |
11.5.2 Case Study (Hitachi) | 466 |
11.5.3 Further Examples | 468 |
11.6 Usage-Based Reliability Analysis | 469 |
11.6.1 Case Study (Northern Telecom Telecommunication Systems) | 469 |
11.6.2 Further Examples | 470 |
11.7 Special Events | 472 |
11.7.1 Rare Event Models | 473 |
11.7.2 Case Study (Space Shuttle Flight Software) | 476 |
11.8 Availability | 479 |
11.8.1 Introduction | 479 |
11.8.2 Measuring Availability | 480 |
11.8.3 Empirical Unavailability | 481 |
11.8.4 Models | 483 |
11.9 Summary | 486 |
Problems | 487 |
John Munson (University of Idaho) and Taghi Khoshgoftaar (Florida Atlantic University)
12.1 Introduction | 493 |
12.2 Static Program Complexity | 495 |
12.2.1 Software Metrics | 495 |
12.2.2 A Domain Model of Software Attributes | 496 |
12.2.3 Principal Components Analysis | 497 |
12.2.4 The Usage of Metrics | 499 |
12.2.5 Relative Program Complexity | 500 |
12.2.6 Software Evolution | 502 |
12.3 Dynamic Program Complexity | 504 |
12.3.1 Execution Profile | 505 |
12.3.2 Functional Complexity | 505 |
12.3.3 Dynamic Aspects of Functional Complexity | 507 |
12.3.4 Operational Complexity | 509 |
12.4 Software Complexity and Software Quality | 510 |
12.4.1 An Overview | 510 |
12.4.2 An Application and Its Metrics | 512 |
12.4.3 Multivariate Analysis in Software Quality Control | 514 |
12.4.4 Fault Prediction Models | 518 |
12.4.5 Enhancing Predictive Models with Increased Domain Coverage | 520 |
12.5 Software Reliability Modeling | 523 |
12.5.1 Reliability Modeling with Software Complexity Metrics | 524 |
12.5.2 The Incremental Build Problem | 526 |
12.6 Summary | 527 |
Problems | 527 |
13.1 Introduction | 531 |
13.2 Overview of Software Testing | 532 |
13.2.1 Kinds of Software Testing | 532 |
13.2.2 Concepts from White-Box and Black-Box Testing | 532 |
13.3 Operational Profiles | 534 |
13.3.1 Difficulties in Estimating the Operational Profile | 535 |
13.3.2 Estimating Reliability | 537 |
13.4 Time/Structure Based Software Reliability Estimation | 539 |
13.4.1 Definitions and Terminology | 539 |
13.4.2 Basic Assumptions | 540 |
13.4.3 Testing Methods and Saturation Effect | 541 |
13.4.4 Testing Effort | 541 |
13.4.5 Limits of Testing Methods | 542 |
13.4.6 Empirical Basis of the Saturation Effect | 543 |
13.4.7 Reliability Overestimation due to Saturation | 545 |
13.4.8 Incorporating Coverage in Reliability Estimation | 546 |
13.4.9 Filtering Failure Data Using Coverage Information | 547 |
13.4.10 Selecting the Compression Ratio | 551 |
13.4.11 Handling Rare Events | 553 |
13.5 A Microscopic Model of Software Risk | 554 |
13.5.1 A Testing-Based Model of Risk Decay | 554 |
13.5.2 Risk Assessment: An Example | 555 |
13.5.3 A Simple Risk Computation | 558 |
13.5.4 A Risk Browser | 560 |
13.5.5 The Risk Model and Software Reliability | 561 |
13.6 Summary | 563 |
Problems | 563 |
14.1 Introduction | 567 |
14.2 Present Status | 568 |
14.3 Principles and Terminology | 569 |
14.3.1 Result Verification | 570 |
14.3.2 Redundancy | 574 |
14.3.3 Failures and Faults | 575 |
14.3.4 Adjudication by Voting | 577 |
14.3.5 Tolerance | 578 |
14.4 Basic Techniques | 581 |
14.4.1 Recovery Blocks | 581 |
14.4.2 N-Version Programming | 582 |
14.5 Advanced Techniques | 583 |
14.5.1 Consensus Recovery Block | 583 |
14.5.2 Acceptance Voting | 584 |
14.5.3 N Self-Checking Programming | 584 |
14.6 Reliability Modeling | 585 |
14.6.1 Diversity and Dependence of Failures | 586 |
14.6.2 Data-Domain Modeling | 589 |
14.6.3 Time-Domain Modeling | 594 |
14.7 Reliability in the Presence of Inter-Version Failure Correlation | 596 |
14.7.1 An Experiment | 596 |
14.7.2 Failure Correlation | 598 |
14.7.3 Consensus Voting | 599 |
14.7.4 Consensus Recovery Block | 601 |
14.7.5 Acceptance Voting | 603 |
14.8 Development and Testing of Multi-Version Fault-Tolerant Software | 604 |
14.8.1 Requirements and Design | 605 |
14.8.2 Verification, Validation and Testing | 606 |
14.8.3 Cost of Fault-Tolerant Software | 607 |
14.9 Summary | 609 |
Problems | 609 |
15.1 Introduction | 615 |
15.2 Fault Tree Modeling | 615 |
15.2.1 Cutset Generation | 617 |
15.2.2 Fault Tree Analysis | 619 |
15.3 Fault Trees as a Design Aid for Software Systems | 622 |
15.4 Safety Validation Using Fault Trees | 623 |
15.5 Analysis of Fault Tolerant Software Systems | 627 |
15.5.1 Fault Tree Model for Recovery Block System | 629 |
15.5.2 Fault Tree Model for N-Version Programming System | 630 |
15.5.3 Fault Tree Model for N Self-Checking Programming System | 632 |
15.6 Qualitative Analysis of Fault Tolerant Software | 635 |
15.6.1 Methodology for Parameter Estimation from Experimental Data | 635 |
15.6.2 A Case Study in Parameter Estimation | 639 |
15.6.3 Comparative Analysis of Three Software Fault Tolerant Systems | 642 |
15.7 System-Level Analysis of Hardware and Software System | 645 |
15.7.1 System Reliability/Safety Model for DRB | 647 |
15.7.2 System Reliability/Safety Model for NVP | 648 |
15.7.3 System Reliability/Safety Model for NSCP | 650 |
15.7.4 A Case Study in System-Level Analysis | 651 |
15.8 Summary | 657 |
Problems | 657 |
16.1 Introduction | 661 |
16.2 Reliability Simulation | 662 |
16.2.1 The Need for Dynamic Simulation | 663 |
16.2.2 Dynamic Simulation Approaches | 664 |
16.3 The Reliability Process | 665 |
16.3.1 The Nature of the Process | 666 |
16.3.2 Structures and Flows | 667 |
16.3.3 Interdependencies among Elements | 668 |
16.3.4 Software Environment Characteristics | 669 |
16.4 Artifact-Based Simulation | 669 |
16.4.1 Simulator Architecture | 670 |
16.4.2 Results | 675 |
16.5 Rate-Based Simulation | 676 |
16.5.1 Event Process Statistics | 677 |
16.5.2 Single-Event Process Simulation | 678 |
16.5.3 Recurrent Event Statistics | 679 |
16.5.4 Recurrent Event Simulation | 681 |
16.5.5 Secondary Event Simulation | 682 |
16.5.6 Limited Growth Simulation | 683 |
16.5.7 The General Simulation Algorithm | 684 |
16.6 Rate-Based Reliability | 686 |
16.6.1 Rate Functions of Conventional Models | 686 |
16.6.2 Simulator Architecture | 687 |
16.6.3 Display of Results | 689 |
16.7 The Galileo Project Application | 690 |
16.7.1 Simulation Experiments and Results | 691 |
16.7.2 Comparisons with Other Software Reliability Models | 694 |
16.8 Summary | 696 |
Problems | 697 |
Nachimu Karunanithi (Bellcore) and Yashwant Malaiya (Colorado State University)
17.1 Introduction | 699 |
17.2 Neural Networks | 700 |
17.2.1 Processing Unit | 700 |
17.2.2 Architecture | 702 |
17.2.3 Learning Algorithms | 705 |
17.2.4 Backpropagation Learning | 705 |
17.2.5 Cascade-correlation Learning Architecture | 707 |
17.3 Application of Neural Networks for Software Reliability | 709 |
17.3.1 Dynamic Reliability Growth Modeling | 709 |
17.3.2 Identifying Fault-Prone Modules | 710 |
17.4 Software Reliability Growth Modeling | 710 |
17.4.1 Training Regimes | 712 |
17.4.2 Data Representation Issue | 712 |
17.4.3 A Prediction Experiment | 713 |
17.4.4 Analysis of Neural Network Models | 718 |
17.5 Identification of Fault-Prone Software Modules | 718 |
17.5.1 Identification of Fault-Prone Modules Using Software Metrics | 719 |
17.5.2 Data Set Used | 719 |
17.5.3 Classifiers Compared | 720 |
17.5.4 Data Representation | 722 |
17.5.5 Training Data Selection | 723 |
17.5.6 Experimental Approach | 723 |
17.5.7 Results | 723 |
17.6 Summary | 726 |
Problems | 726 |
Appendix A. Software Reliability Tools | 729 |
Appendix B. Review of Reliability Theory, Analytical Techniques, and Basic Statistics | 747 |
References | 781 |
Index | 821 |