Handbook of Software Reliability Engineering

Foreword by Alfred V. Aho xix

Foreword by Richard A. DeMillo xxi

Preface xxiii

Chapter 1. Introduction

Michael R. Lyu (AT&T Bell Labs.)

1.1 The Need for Reliable Software	3
1.2 Software Reliability Engineering Concepts	5
1.3 Book Overview	8
1.4 Basic Definitions	12
1.5 Technical Areas Related to the Book	19
1.5.1 Fault Prevention	19
1.5.2 Fault Removal	20
1.5.3 Fault Tolerance	20
1.5.4 Fault/Failure Forecasting	21
1.5.5 Scope of this Handbook	21
1.6 Summary	22
Problems	22

Chapter 2. Software Reliability and System Reliability

Jean-Claude Laprie and Karama Kanoun (LAAS-CNRS, France)

2.1 Introduction 27

2.2 The Dependability Concept 28

      2.2.1 Basic Definitions 28

      2.2.2 On the Impairments to Dependability 28

      2.2.3 On the Attributes of Dependability 32

      2.2.4 On the Means for Dependability 33

2.3 Failure Behavior of an X-Ware System 35

      2.3.1 Atomic Systems 35

      2.3.2 Systems Made up of Components 41

2.4 Failure Behavior of an X-Ware System with Service Restoration 49

      2.4.1 Characterization of System Behavior 50

      2.4.2 Maintenance Policies 51

    2.4.3 Reliability Modeling 53

      2.4.4 Availability Modeling 60

2.5 Situation with Respect to the State-of-the-Art in Reliability Evaluation 64

2.6 Summary 68

      Problems 68

Chapter 3. Software Reliability Modeling Survey

William Farr (Naval Surface Warfare Center)

3.1 Introduction 71

3.2 Historical Perspective and Implementation 72

      3.2.1 Historical Background 72

      3.2.2 Model Classification Scheme 73

      3.2.3 Model Limitations and Implementation Issues 76

3.3 Exponential Failure Time Class of Models 77

      3.3.1 Jelinski-Moranda "De-Eutrophication" Model 77

      3.3.2 Nonhomogeneous Poisson Process Model 80

      3.3.3 Schneidewind's Model 82

      3.3.4 Musa's Basic Execution Time Model 87

      3.3.5 Hyperexponential Model 90

      3.3.6 Others 92

3.4 Weibull and Gamma Failure Time Class of Models 93

      3.4.1 Weibull Model 93

      3.4.2 S-Shaped Reliability Growth Model 95

3.5 Infinite Failure Category Models 98

      3.5.1 Duane's model 98

      3.5.2 Geometric Model 99

      3.5.3 Musa-Okumoto Logarithmic Poisson 102

3.6 Bayesian Models 104

      3.6.1 Littlewood-Verrall Reliability Growth Model 105

      3.6.2 Other Bayesian Models 109

3.7 Model Relationships 109

      3.7.1 Generalized Exponential Model Class 109

      3.7.2 Exponential Order Statistic Model Class 111

3.8 Software Reliability Prediction in Early Phases of the Life Cycle 111

      3.8.1 Phase-Based Model 111

      3.8.2 Predicting Software Defects from Ada Design 112

      3.8.3 Rome Laboratory Work 113

3.9 Summary 114

      Problems 115

Chapter 4. Techniques for Prediction Analysis and Recalibration

Sarah Brocklehurst, Bev Littlewood (City University of London)

4.1 Introduction 119

4.2 Examples of Model Disagreement and Inaccuracy 120

      4.2.1 Simple Short Term Predictions 120

      4.2.2 Longer Term Predictions 123

      4.2.3 Model Accuracy Varies from Data Source to Data Source 126

      4.2.4 Why We Cannot Select the Best Model a Priori 126

      4.2.5 Discussion - a Possible Way Forward 127

4.3 Methods of Analyzing Predictive Accuracy 128

      4.3.1 Basic Ideas - Recursive Comparison of Predictions with Eventual Outcomes 128

      4.3.2 The Prequential Likelihood Ratio (PLR) 131

      4.3.3 The U-Plot 135

      4.3.4 The Y-Plot 140

      4.3.5 Discussion: the Likely Nature of Prediction Errors, and How We can Detect Inaccuracy 141

4.4 Recalibration 145

      4.4.1 The U-Plot as a Means of Detecting 'Bias' 145

      4.4.2 The Recalibration Technique 146

      4.4.3 Examples of the Power of Recalibration 147

4.5 A Worked Example 150

4.6 Discussion 156

      4.6.1 Summary of the Good News: Where We Are Now 156

      4.6.2 Limitations of Present Techniques 159

      4.6.3 Possible Avenues for Improvement of Methods 160

      4.6.4 Best Advice to Potential Users 162

4.7 Summary 163

      Problems 164

Chapter 5. The Operational Profile

John Musa, Bruce Juhlin, Gene Fuoco, Diane Kropfl, and Nancy Irving (AT&T Bell Labs.)

5.1 Introduction	167
5.2 Concepts	168
5.3 Development Procedure	170
5.3.1 Customer Type List	173
5.3.2 User Type List	173
5.3.3 System Mode List	174
5.3.4 Functional Profile	176
5.3.5 Operational Profile	183
5.4 Test Selection	194
5.4.1 Selecting Operations	195
5.4.2 Regression Test	196
5.5 Special Issues	197
5.5.1 Indirect Input Variables	197
5.5.2 Updating the Operational Profile	197
5.5.3 Distributed Systems	198
5.6 Other Uses	199
5.7 Application to DEFINITY	200
5.7.1 Project Description	200
5.7.2 Development Process Description	200
5.7.3 Describing Operational Profiles	201
5.7.4 Implementing Operational Profiles	203
5.7.5 Conclusion	204
5.8 Application to FASTAR (Fast Automated Restoration)	204
5.8.1 System Description	204
5.8.2 FASTAR: SRE Implementation	206
5.8.3 FASTAR: SRE Benefits	210
5.9 Application to the Power Quality Resource System	210
5.9.1 Project Description	210
5.9.2 Developing the Operational Profile	211
5.9.3 Testing	213
5.9.4 Conclusion	214
5.10 Summary	215
Problems	215

Chapter 6. Best Current Practice of SRE

Mary Donnelly, Bill Everett, John Musa, and Geoff Wilson (AT&T Bell Labs.)

6.1 Introduction 219

6.2 Benefits and Approaches to SRE 220

      6.2.1 Importance and Benefits 221

      6.2.2 An SRE Success Story 221

      6.2.3 SRE Costs 222

      6.2.4 SRE Activities 223

      6.2.5 Implementing SRE Incrementally 223

      6.2.6 Implementing SRE on Existing Projects 224

      6.2.7 Implementing SRE on Short-Cycle Projects 226

6.3 SRE During Feasibility and Requirements Phase 226

      6.3.1 Feasibility Stage 226

      6.3.2 Requirements Stage 228

6.4 SRE during Design and Implementation Phase 232

      6.4.1 Design Stage 232

      6.4.2 Implementation Stage 233

6.5 SRE during the System Test and Field Trial Phase 235

      6.5.1 Determine Operational Profile 236

      6.5.2 System Test Stage 237

      6.5.3 Field Trial Stage 241

6.6 SRE during Post-Delivery and Maintenance Phase 242

      6.6.1 Project Post-Release Staff Needs 242

      6.6.2 Monitor Field Reliability vs. Objectives 243

      6.6.3 Track Customer Satisfaction 245

      6.6.4 Time New Feature Introduction by Monitoring Reliability 245

      6.6.5 Guide Produce and Process Improvement with Reliability Measures 246

6.7 Getting Started with SRE 246

      6.7.1 Prepare Your Organization for SRE 247

      6.7.2 Find More Information or Support 250

      6.7.3 Do an SRE Self-Assessment 250

6.8 Summary 252

      Problems 253

Chapter 7. Software Reliability Measurement Experience

Allen Nikora (Jet Propulsion Laboratory) and Michael R. Lyu (AT&T Bell Labs.)

7.1 Introduction	255
7.2 Measurement Framework	256
7.2.1 Establishing Software Reliability Requirements	259
7.2.2 Setting up a Data Collection Process	266
7.2.3 Defining Data to be Collected	267
7.2.4 Choosing a Preliminary Set of Software Reliability Models	272
7.2.5 Choosing Reliability Modeling Tools	273
7.2.6 Model Application and Application Issues	273
7.2.7 Dealing with Evolving Software	276
7.2.8 Practical Limits in Modeling Ultrareliability	277
7.3 Investigation at JPL	278
7.3.1 Project Selection and Characterization	278
7.3.2 Characterization of Available Data	280
7.3.3 Experimental Results	280
7.4 Investigation at Bellcore	281
7.4.1 Project Characteristics	281
7.4.2 Data Collection	284
7.4.3 Application Results	285
7.5 Linear Combination of Model Results	289
7.5.1 Statically-Weighted Linear Combinations	290
7.5.2 Weight Determination Based on Ranking Model Results	290
7.5.3 Weight Determination Based on Changes in Prequential Likelihood	291
7.5.4 Modeling Results	291
7.5.5 Overall Project Results	292
7.5.6 Extensions and Alternatives	295
7.5.7 Long-Term Prediction Capability	298
7.6 Summary	299
Problems	300

Chapter 8. Measurement Based Analysis of Software Reliability

Ravi K. Iyer (University of Illinois) and Inhwan Lee (Tandem, Inc.)

8.1 Introduction 303

8.2 Framework 304

      8.2.1 Overview 304

      8.2.2 Operational vs. Development Phase Evaluation 306

      8.2.3 Past Work 306

8.3 Measurement Techniques 307

      8.3.1 On-Line Machine Logging 308

      8.3.2 Manual Reporting 310

8.4 Preliminary Analysis of Data 312

      8.4.1 Data Processing 312

      8.4.2 Fault and Error Classification 314

      8.4.3 Error Propagation 317

      8.4.4 Error and Recovery Distributions 320

8.5 Detailed Analysis of Data 323

      8.5.1 Dependency Analysis 324

      8.5.2 Hardware-Related Software Errors 327

      8.5.3 Evaluation of Software Fault Tolerance 328

      8.5.4 Recurrences 329

8.6 Model Identification and Analysis of Models 333

      8.6.1 Impact of Failures on Performance 333

      8.6.2 Reliability Modeling in the Operational Phase 335

      8.6.3 Failure/Error/Recovery Model 339

      8.6.4 Multiple Error Model 344

8.7 Impact of System Activity 345

      8.7.1 Statistical Models from Measurements 345

      8.7.2 Overall System Behavior Model 348

8.8 Summary 352

      Problems 353

Chapter 9. Orthogonal Defect Classification

Ram Chillarege (IBM Research)

9.1 Introduction	359
9.2 Measurement and Software	360
9.2.1 Software Defects	361
9.2.2 The Spectrum of Defect Analysis	364
9.3 Principles of ODC	367
9.3.1 The Intuition	367
9.3.2 The Design of Orthogonal Defect Classification	370
9.3.3 Necessary Condition	371
9.3.4 Sufficient Conditions	373
9.4 The Defect-Type Attribute	374
9.5 Relative Risk Assessment Using Defect Types	376
9.5.1 Subjective Aspects of Growth Curves	377
9.5.2 Combining ODC and Growth Modeling	379
9.6 The Defect Trigger Attribute	384
9.6.1 The Trigger Concept	384
9.6.2 System Test Triggers	387
9.6.3 Review and Inspection Triggers	387
9.6.4 Function Test Triggers	388
9.6.5 The Use of Triggers	389
9.7 Multidimensional Analysis	393
9.8 Deploying ODC	396
9.9 Summary	398
Problems	399

Chapter 10. Trend Analysis

Karama Kanoun and Jean-Claude Laprie (LAAS-CNRS, France)

10.1 Introduction 401

10.2 Reliability Growth Characterization 402

      10.2.1 Definitions of Reliability Growth 403

      10.2.2 Graphical Interpretation of the Subadditive Property 404

      10.2.3 Subadditive Property Analysis 406

      10.2.4 Subadditive Property and Trend Change 407

      10.2.5 Some Particular Situations 408

      10.2.6 Summary 409

10.3 Trend Analysis 410

      10.3.1 Trend Tests 410

      10.3.2 Example 419

      10.3.3 Typical Results That Can Be Drawn from Trend Analyses 422

      10.3.4 Summary 424

10.4 Application to Real Systems 424

      10.4.1 Software of System SS4 425

      10.4.2 Software of System S27 427

      10.4.3 Software of System SS1 427

      10.4.4 Software of System SS2 429

      10.4.5 SAV 429

10.5 Extension to Static Analysis 431

      10.5.1 Static Analysis Conduct 431

      10.5.2 Application 433

10.6 Summary 433

      Problems 435

Chapter 11. Field Data Analysis

Wendell Jones (BNR, Inc.) and Mladen Vouk (NCSU)

11.1 Introduction 439

11.2 Data Collection Principles 441

      11.2.1 Introduction 441

      11.2.2 Failures, Faults, and Related Data 442

      11.2.3 Time 444

      11.2.4 Usage 445

      11.2.5 Data Granularity 446

      11.2.6 Data Maintenance and Validation 447

      11.2.7 Analysis Environment 448

11.3 Data Analysis Principles 449

      11.3.1 Plots and Graphs 450

      11.3.2 Data Modeling and Diagnostics 454

      11.3.3 Diagnostics for Model Determination 455

      11.3.4 Data Transformations 458

11.4 Important Topics in Analysis of Field Data 459

      11.4.1 Calendar Time 461

      11.4.2 Usage Time 461

      11.4.3 An Example 462

11.5 Calendar-Time Reliability Analysis 463

      11.5.1 Case Study (IBM Corp.) 464

      11.5.2 Case Study (Hitachi) 466

      11.5.3 Further Examples 468

11.6 Usage-Based Reliability Analysis 469

      11.6.1 Case Study (Northern Telecom Telecommunication Systems) 469

      11.6.2 Further Examples 470

11.7 Special Events 472

      11.7.1 Rare Event Models 473

      11.7.2 Case Study (Space Shuttle Flight Software) 476

11.8 Availability 479

      11.8.1 Introduction 479

      11.8.2 Measuring Availability 480

      11.8.3 Empirical Unavailability 481

      11.8.4 Models 483

11.9 Summary 486

       Problems 487

Chapter 12. Software Metrics for Reliability Assessment

John Munson (University of Idaho) and Taghi Khoshgoftaar (Florida Atlantic University)

12.1 Introduction 493

12.2 Static Program Complexity 495

      12.2.1 Software Metrics 495

      12.2.2 A Domain Model of Software Attributes 496

      12.2.3 Principal Components Analysis 497

      12.2.4 The Usage of Metrics 499

      12.2.5 Relative Program Complexity 500

      12.2.6 Software Evolution 502

12.3 Dynamic Program Complexity 504

      12.3.1 Execution Profile 505

      12.3.2 Functional Complexity 505

      12.3.3 Dynamic Aspects of Functional Complexity 507

      12.3.4 Operational Complexity 509

12.4 Software Complexity and Software Quality 510

      12.4.1 An Overview 510

      12.4.2 An Application and Its Metrics 512

      12.4.3 Multivariate Analysis in Software Quality Control 514

      12.4.4 Fault Prediction Models 518

      12.4.5 Enhancing Predictive Models with Increased Domain Coverage 520

12.5 Software Reliability Modeling 523

      12.5.1 Reliability Modeling with Software Complexity Metrics 524

      12.5.2 The Incremental Build Problem 526

12.6 Summary 527

       Problems 527

Chapter 13. Software Testing and Reliability

Joseph R. Horgan (Bellcore) and Aditya P. Mathur (Purdue University)

13.1 Introduction 531

13.2 Overview of Software Testing 532

      13.2.1 Kinds of Software Testing 532

      13.2.2 Concepts from White-Box and Black-Box Testing 532

13.3 Operational Profiles 534

      13.3.1 Difficulties in Estimating the Operational Profile 535

      13.3.2 Estimating Reliability 537

13.4 Time/Structure Based Software Reliability Estimation 539

      13.4.1 Definitions and Terminology 539

      13.4.2 Basic Assumptions 540

      13.4.3 Testing Methods and Saturation Effect 541

      13.4.4 Testing Effort 541

      13.4.5 Limits of Testing Methods 542

      13.4.6 Empirical Basis of the Saturation Effect 543

      13.4.7 Reliability Overestimation due to Saturation 545

      13.4.8 Incorporating Coverage in Reliability Estimation 546

      13.4.9 Filtering Failure Data Using Coverage Information 547

      13.4.10 Selecting the Compression Ratio 551

      13.4.11 Handling Rare Events 553

13.5 A Microscopic Model of Software Risk 554

      13.5.1 A Testing-Based Model of Risk Decay 554

      13.5.2 Risk Assessment: An Example 555

      13.5.3 A Simple Risk Computation 558

      13.5.4 A Risk Browser 560

      13.5.5 The Risk Model and Software Reliability 561

13.6 Summary 563

       Problems 563

Chapter 14. Fault-Tolerant Software Reliability Engineering

David McAllister and Mladen Vouk (NCSU)

14.1 Introduction 567

14.2 Present Status 568

14.3 Principles and Terminology 569

      14.3.1 Result Verification 570

      14.3.2 Redundancy 574

      14.3.3 Failures and Faults 575

      14.3.4 Adjudication by Voting 577

      14.3.5 Tolerance 578

14.4 Basic Techniques 581

      14.4.1 Recovery Blocks 581

      14.4.2 N-Version Programming 582

14.5 Advanced Techniques 583

      14.5.1 Consensus Recovery Block 583

      14.5.2 Acceptance Voting 584

      14.5.3 N Self-Checking Programming 584

14.6 Reliability Modeling 585

      14.6.1 Diversity and Dependence of Failures 586

      14.6.2 Data-Domain Modeling 589

      14.6.3 Time-Domain Modeling 594

14.7 Reliability in the Presence of Inter-Version Failure Correlation 596

      14.7.1 An Experiment 596

      14.7.2 Failure Correlation 598

      14.7.3 Consensus Voting 599

      14.7.4 Consensus Recovery Block 601

      14.7.5 Acceptance Voting 603

14.8 Development and Testing of Multi-Version Fault-Tolerant Software 604

      14.8.1 Requirements and Design 605

      14.8.2 Verification, Validation and Testing 606

      14.8.3 Cost of Fault-Tolerant Software 607

14.9 Summary 609

        Problems 609

Chapter 15. Software Reliability Analysis using Fault Trees

Joanne Bechta Dugan (University of Virginia)

15.1 Introduction 615

15.2 Fault Tree Modeling 615

      15.2.1 Cutset Generation 617

      15.2.2 Fault Tree Analysis 619

15.3 Fault Trees as a Design Aid for Software Systems 622

15.4 Safety Validation Using Fault Trees 623

15.5 Analysis of Fault Tolerant Software Systems 627

      15.5.1 Fault Tree Model for Recovery Block System 629

      15.5.2 Fault Tree Model for N-Version Programming System 630

      15.5.3 Fault Tree Model for N Self-Checking Programming System 632

15.6 Qualitative Analysis of Fault Tolerant Software 635

      15.6.1 Methodology for Parameter Estimation from Experimental Data 635

      15.6.2 A Case Study in Parameter Estimation 639

      15.6.3 Comparative Analysis of Three Software Fault Tolerant Systems 642

15.7 System-Level Analysis of Hardware and Software System 645

      15.7.1 System Reliability/Safety Model for DRB 647

      15.7.2 System Reliability/Safety Model for NVP 648

      15.7.3 System Reliability/Safety Model for NSCP 650

      15.7.4 A Case Study in System-Level Analysis 651

15.8 Summary 657

        Problems 657

Chapter 16. Software Reliability Simulation

Robert Tausworthe (Jet Propulsion Laboratory) and Michael R. Lyu (AT&T Bell Labs.)

16.1 Introduction 661

16.2 Reliability Simulation 662

      16.2.1 The Need for Dynamic Simulation 663

      16.2.2 Dynamic Simulation Approaches 664

16.3 The Reliability Process 665

      16.3.1 The Nature of the Process 666

      16.3.2 Structures and Flows 667

      16.3.3 Interdependencies among Elements 668

      16.3.4 Software Environment Characteristics 669

16.4 Artifact-Based Simulation 669

      16.4.1 Simulator Architecture 670

      16.4.2 Results 675

16.5 Rate-Based Simulation 676

      16.5.1 Event Process Statistics 677

      16.5.2 Single-Event Process Simulation 678

      16.5.3 Recurrent Event Statistics 679

      16.5.4 Recurrent Event Simulation 681

      16.5.5 Secondary Event Simulation 682

      16.5.6 Limited Growth Simulation 683

      16.5.7 The General Simulation Algorithm 684

16.6 Rate-Based Reliability 686

      16.6.1 Rate Functions of Conventional Models 686

      16.6.2 Simulator Architecture 687

      16.6.3 Display of Results 689

16.7 The Galileo Project Application 690

      16.7.1 Simulation Experiments and Results 691

      16.7.2 Comparisons with Other Software Reliability Models 694

16.8 Summary 696

        Problems 697

Chapter 17. Neural Networks for SRE

Nachimu Karunanithi (Bellcore) and Yashwant Malaiya (Colorado State University)

17.1 Introduction 699

17.2 Neural Networks 700

      17.2.1 Processing Unit 700

      17.2.2 Architecture 702

      17.2.3 Learning Algorithms 705

      17.2.4 Backpropagation Learning 705

      17.2.5 Cascade-correlation Learning Architecture 707

17.3 Application of Neural Networks for Software Reliability 709

      17.3.1 Dynamic Reliability Growth Modeling 709

      17.3.2 Identifying Fault-Prone Modules 710

17.4 Software Reliability Growth Modeling 710

      17.4.1 Training Regimes 712

      17.4.2 Data Representation Issue 712

      17.4.3 A Prediction Experiment 713

      17.4.4 Analysis of Neural Network Models 718

17.5 Identification of Fault-Prone Software Modules 718

      17.5.1 Identification of Fault-Prone Modules Using Software Metrics 719

      17.5.2 Data Set Used 719

      17.5.3 Classifiers Compared 720

      17.5.4 Data Representation 722

      17.5.5 Training Data Selection 723

      17.5.6 Experimental Approach 723

      17.5.7 Results 723

17.6 Summary 726

        Problems 726

Appendix A. Software Reliability Tools

729

Appendix B. Review of Reliability Theory, Analytical Techniques, and Basic Statistics

747

References

781

Index

821

Foreword by Alfred V. Aho	xix
Foreword by Richard A. DeMillo	xxi
Preface	xxiii

2.1 Introduction	27
2.2 The Dependability Concept	28
2.2.1 Basic Definitions	28
2.2.2 On the Impairments to Dependability	28
2.2.3 On the Attributes of Dependability	32
2.2.4 On the Means for Dependability	33
2.3 Failure Behavior of an X-Ware System	35
2.3.1 Atomic Systems	35
2.3.2 Systems Made up of Components	41
2.4 Failure Behavior of an X-Ware System with Service Restoration	49
2.4.1 Characterization of System Behavior	50
2.4.2 Maintenance Policies	51
2.4.3 Reliability Modeling	53
2.4.4 Availability Modeling	60
2.5 Situation with Respect to the State-of-the-Art in Reliability Evaluation	64
2.6 Summary	68
Problems	68

3.1 Introduction	71
3.2 Historical Perspective and Implementation	72
3.2.1 Historical Background	72
3.2.2 Model Classification Scheme	73
3.2.3 Model Limitations and Implementation Issues	76
3.3 Exponential Failure Time Class of Models	77
3.3.1 Jelinski-Moranda "De-Eutrophication" Model	77
3.3.2 Nonhomogeneous Poisson Process Model	80
3.3.3 Schneidewind's Model	82
3.3.4 Musa's Basic Execution Time Model	87
3.3.5 Hyperexponential Model	90
3.3.6 Others	92
3.4 Weibull and Gamma Failure Time Class of Models	93
3.4.1 Weibull Model	93
3.4.2 S-Shaped Reliability Growth Model	95
3.5 Infinite Failure Category Models	98
3.5.1 Duane's model	98
3.5.2 Geometric Model	99
3.5.3 Musa-Okumoto Logarithmic Poisson	102
3.6 Bayesian Models	104
3.6.1 Littlewood-Verrall Reliability Growth Model	105
3.6.2 Other Bayesian Models	109
3.7 Model Relationships	109
3.7.1 Generalized Exponential Model Class	109
3.7.2 Exponential Order Statistic Model Class	111
3.8 Software Reliability Prediction in Early Phases of the Life Cycle	111
3.8.1 Phase-Based Model	111
3.8.2 Predicting Software Defects from Ada Design	112
3.8.3 Rome Laboratory Work	113
3.9 Summary	114
Problems	115

4.1 Introduction	119
4.2 Examples of Model Disagreement and Inaccuracy	120
4.2.1 Simple Short Term Predictions	120
4.2.2 Longer Term Predictions	123
4.2.3 Model Accuracy Varies from Data Source to Data Source	126
4.2.4 Why We Cannot Select the Best Model a Priori	126
4.2.5 Discussion - a Possible Way Forward	127
4.3 Methods of Analyzing Predictive Accuracy	128
4.3.1 Basic Ideas - Recursive Comparison of Predictions with Eventual Outcomes	128
4.3.2 The Prequential Likelihood Ratio (PLR)	131
4.3.3 The U-Plot	135
4.3.4 The Y-Plot	140
4.3.5 Discussion: the Likely Nature of Prediction Errors, and How We can Detect Inaccuracy	141
4.4 Recalibration	145
4.4.1 The U-Plot as a Means of Detecting 'Bias'	145
4.4.2 The Recalibration Technique	146
4.4.3 Examples of the Power of Recalibration	147
4.5 A Worked Example	150
4.6 Discussion	156
4.6.1 Summary of the Good News: Where We Are Now	156
4.6.2 Limitations of Present Techniques	159
4.6.3 Possible Avenues for Improvement of Methods	160
4.6.4 Best Advice to Potential Users	162
4.7 Summary	163
Problems	164

6.1 Introduction	219
6.2 Benefits and Approaches to SRE	220
6.2.1 Importance and Benefits	221
6.2.2 An SRE Success Story	221
6.2.3 SRE Costs	222
6.2.4 SRE Activities	223
6.2.5 Implementing SRE Incrementally	223
6.2.6 Implementing SRE on Existing Projects	224
6.2.7 Implementing SRE on Short-Cycle Projects	226
6.3 SRE During Feasibility and Requirements Phase	226
6.3.1 Feasibility Stage	226
6.3.2 Requirements Stage	228
6.4 SRE during Design and Implementation Phase	232
6.4.1 Design Stage	232
6.4.2 Implementation Stage	233
6.5 SRE during the System Test and Field Trial Phase	235
6.5.1 Determine Operational Profile	236
6.5.2 System Test Stage	237
6.5.3 Field Trial Stage	241
6.6 SRE during Post-Delivery and Maintenance Phase	242
6.6.1 Project Post-Release Staff Needs	242
6.6.2 Monitor Field Reliability vs. Objectives	243
6.6.3 Track Customer Satisfaction	245
6.6.4 Time New Feature Introduction by Monitoring Reliability	245
6.6.5 Guide Produce and Process Improvement with Reliability Measures	246
6.7 Getting Started with SRE	246
6.7.1 Prepare Your Organization for SRE	247
6.7.2 Find More Information or Support	250
6.7.3 Do an SRE Self-Assessment	250
6.8 Summary	252
Problems	253

8.1 Introduction	303
8.2 Framework	304
8.2.1 Overview	304
8.2.2 Operational vs. Development Phase Evaluation	306
8.2.3 Past Work	306
8.3 Measurement Techniques	307
8.3.1 On-Line Machine Logging	308
8.3.2 Manual Reporting	310
8.4 Preliminary Analysis of Data	312
8.4.1 Data Processing	312
8.4.2 Fault and Error Classification	314
8.4.3 Error Propagation	317
8.4.4 Error and Recovery Distributions	320
8.5 Detailed Analysis of Data	323
8.5.1 Dependency Analysis	324
8.5.2 Hardware-Related Software Errors	327
8.5.3 Evaluation of Software Fault Tolerance	328
8.5.4 Recurrences	329
8.6 Model Identification and Analysis of Models	333
8.6.1 Impact of Failures on Performance	333
8.6.2 Reliability Modeling in the Operational Phase	335
8.6.3 Failure/Error/Recovery Model	339
8.6.4 Multiple Error Model	344
8.7 Impact of System Activity	345
8.7.1 Statistical Models from Measurements	345
8.7.2 Overall System Behavior Model	348
8.8 Summary	352
Problems	353

10.1 Introduction	401
10.2 Reliability Growth Characterization	402
10.2.1 Definitions of Reliability Growth	403
10.2.2 Graphical Interpretation of the Subadditive Property	404
10.2.3 Subadditive Property Analysis	406
10.2.4 Subadditive Property and Trend Change	407
10.2.5 Some Particular Situations	408
10.2.6 Summary	409
10.3 Trend Analysis	410
10.3.1 Trend Tests	410
10.3.2 Example	419
10.3.3 Typical Results That Can Be Drawn from Trend Analyses 422
10.3.4 Summary	424
10.4 Application to Real Systems	424
10.4.1 Software of System SS4	425
10.4.2 Software of System S27	427
10.4.3 Software of System SS1	427
10.4.4 Software of System SS2	429
10.4.5 SAV	429
10.5 Extension to Static Analysis	431
10.5.1 Static Analysis Conduct	431
10.5.2 Application	433
10.6 Summary	433
Problems	435

11.1 Introduction	439
11.2 Data Collection Principles	441
11.2.1 Introduction	441
11.2.2 Failures, Faults, and Related Data 442
11.2.3 Time	444
11.2.4 Usage	445
11.2.5 Data Granularity	446
11.2.6 Data Maintenance and Validation	447
11.2.7 Analysis Environment	448
11.3 Data Analysis Principles	449
11.3.1 Plots and Graphs	450
11.3.2 Data Modeling and Diagnostics	454
11.3.3 Diagnostics for Model Determination	455
11.3.4 Data Transformations	458
11.4 Important Topics in Analysis of Field Data	459
11.4.1 Calendar Time	461
11.4.2 Usage Time	461
11.4.3 An Example	462
11.5 Calendar-Time Reliability Analysis	463
11.5.1 Case Study (IBM Corp.)	464
11.5.2 Case Study (Hitachi)	466
11.5.3 Further Examples	468
11.6 Usage-Based Reliability Analysis	469
11.6.1 Case Study (Northern Telecom Telecommunication Systems)	469
11.6.2 Further Examples	470
11.7 Special Events	472
11.7.1 Rare Event Models	473
11.7.2 Case Study (Space Shuttle Flight Software)	476
11.8 Availability	479
11.8.1 Introduction	479
11.8.2 Measuring Availability	480
11.8.3 Empirical Unavailability	481
11.8.4 Models	483
11.9 Summary	486
Problems	487

12.1 Introduction	493
12.2 Static Program Complexity	495
12.2.1 Software Metrics	495
12.2.2 A Domain Model of Software Attributes	496
12.2.3 Principal Components Analysis	497
12.2.4 The Usage of Metrics	499
12.2.5 Relative Program Complexity	500
12.2.6 Software Evolution	502
12.3 Dynamic Program Complexity	504
12.3.1 Execution Profile	505
12.3.2 Functional Complexity	505
12.3.3 Dynamic Aspects of Functional Complexity	507
12.3.4 Operational Complexity	509
12.4 Software Complexity and Software Quality	510
12.4.1 An Overview	510
12.4.2 An Application and Its Metrics	512
12.4.3 Multivariate Analysis in Software Quality Control	514
12.4.4 Fault Prediction Models	518
12.4.5 Enhancing Predictive Models with Increased Domain Coverage	520
12.5 Software Reliability Modeling	523
12.5.1 Reliability Modeling with Software Complexity Metrics	524
12.5.2 The Incremental Build Problem	526
12.6 Summary	527
Problems	527

13.1 Introduction	531
13.2 Overview of Software Testing	532
13.2.1 Kinds of Software Testing	532
13.2.2 Concepts from White-Box and Black-Box Testing	532
13.3 Operational Profiles	534
13.3.1 Difficulties in Estimating the Operational Profile	535
13.3.2 Estimating Reliability	537
13.4 Time/Structure Based Software Reliability Estimation	539
13.4.1 Definitions and Terminology	539
13.4.2 Basic Assumptions	540
13.4.3 Testing Methods and Saturation Effect	541
13.4.4 Testing Effort	541
13.4.5 Limits of Testing Methods	542
13.4.6 Empirical Basis of the Saturation Effect	543
13.4.7 Reliability Overestimation due to Saturation	545
13.4.8 Incorporating Coverage in Reliability Estimation	546
13.4.9 Filtering Failure Data Using Coverage Information	547
13.4.10 Selecting the Compression Ratio	551
13.4.11 Handling Rare Events	553
13.5 A Microscopic Model of Software Risk	554
13.5.1 A Testing-Based Model of Risk Decay	554
13.5.2 Risk Assessment: An Example	555
13.5.3 A Simple Risk Computation	558
13.5.4 A Risk Browser	560
13.5.5 The Risk Model and Software Reliability	561
13.6 Summary	563
Problems	563

14.1 Introduction	567
14.2 Present Status	568
14.3 Principles and Terminology	569
14.3.1 Result Verification	570
14.3.2 Redundancy	574
14.3.3 Failures and Faults	575
14.3.4 Adjudication by Voting	577
14.3.5 Tolerance	578
14.4 Basic Techniques	581
14.4.1 Recovery Blocks	581
14.4.2 N-Version Programming	582
14.5 Advanced Techniques	583
14.5.1 Consensus Recovery Block	583
14.5.2 Acceptance Voting	584
14.5.3 N Self-Checking Programming	584
14.6 Reliability Modeling	585
14.6.1 Diversity and Dependence of Failures	586
14.6.2 Data-Domain Modeling	589
14.6.3 Time-Domain Modeling	594
14.7 Reliability in the Presence of Inter-Version Failure Correlation	596
14.7.1 An Experiment	596
14.7.2 Failure Correlation	598
14.7.3 Consensus Voting	599
14.7.4 Consensus Recovery Block	601
14.7.5 Acceptance Voting	603
14.8 Development and Testing of Multi-Version Fault-Tolerant Software	604
14.8.1 Requirements and Design	605
14.8.2 Verification, Validation and Testing	606
14.8.3 Cost of Fault-Tolerant Software	607
14.9 Summary	609
Problems	609

15.1 Introduction	615
15.2 Fault Tree Modeling	615
15.2.1 Cutset Generation	617
15.2.2 Fault Tree Analysis	619
15.3 Fault Trees as a Design Aid for Software Systems	622
15.4 Safety Validation Using Fault Trees	623
15.5 Analysis of Fault Tolerant Software Systems	627
15.5.1 Fault Tree Model for Recovery Block System	629
15.5.2 Fault Tree Model for N-Version Programming System	630
15.5.3 Fault Tree Model for N Self-Checking Programming System	632
15.6 Qualitative Analysis of Fault Tolerant Software	635
15.6.1 Methodology for Parameter Estimation from Experimental Data	635
15.6.2 A Case Study in Parameter Estimation	639
15.6.3 Comparative Analysis of Three Software Fault Tolerant Systems	642
15.7 System-Level Analysis of Hardware and Software System	645
15.7.1 System Reliability/Safety Model for DRB	647
15.7.2 System Reliability/Safety Model for NVP	648
15.7.3 System Reliability/Safety Model for NSCP	650
15.7.4 A Case Study in System-Level Analysis	651
15.8 Summary	657
Problems	657

16.1 Introduction	661
16.2 Reliability Simulation	662
16.2.1 The Need for Dynamic Simulation	663
16.2.2 Dynamic Simulation Approaches	664
16.3 The Reliability Process	665
16.3.1 The Nature of the Process	666
16.3.2 Structures and Flows	667
16.3.3 Interdependencies among Elements	668
16.3.4 Software Environment Characteristics	669
16.4 Artifact-Based Simulation	669
16.4.1 Simulator Architecture	670
16.4.2 Results	675
16.5 Rate-Based Simulation	676
16.5.1 Event Process Statistics	677
16.5.2 Single-Event Process Simulation	678
16.5.3 Recurrent Event Statistics	679
16.5.4 Recurrent Event Simulation	681
16.5.5 Secondary Event Simulation	682
16.5.6 Limited Growth Simulation	683
16.5.7 The General Simulation Algorithm	684
16.6 Rate-Based Reliability	686
16.6.1 Rate Functions of Conventional Models	686
16.6.2 Simulator Architecture	687
16.6.3 Display of Results	689
16.7 The Galileo Project Application	690
16.7.1 Simulation Experiments and Results	691
16.7.2 Comparisons with Other Software Reliability Models	694
16.8 Summary	696
Problems	697

17.1 Introduction	699
17.2 Neural Networks	700
17.2.1 Processing Unit	700
17.2.2 Architecture	702
17.2.3 Learning Algorithms	705
17.2.4 Backpropagation Learning	705
17.2.5 Cascade-correlation Learning Architecture	707
17.3 Application of Neural Networks for Software Reliability	709
17.3.1 Dynamic Reliability Growth Modeling	709
17.3.2 Identifying Fault-Prone Modules	710
17.4 Software Reliability Growth Modeling	710
17.4.1 Training Regimes	712
17.4.2 Data Representation Issue	712
17.4.3 A Prediction Experiment	713
17.4.4 Analysis of Neural Network Models	718
17.5 Identification of Fault-Prone Software Modules	718
17.5.1 Identification of Fault-Prone Modules Using Software Metrics	719
17.5.2 Data Set Used	719
17.5.3 Classifiers Compared	720
17.5.4 Data Representation	722
17.5.5 Training Data Selection	723
17.5.6 Experimental Approach	723
17.5.7 Results	723
17.6 Summary	726
Problems	726