SoftRel: The Simulation Technique



Overview



The software reliability process simulator SoftRel does not assume

staff, resources, or schedule models, but provides for quintuple

inputs for them.  The simulator also captures the effects of

interrelationships among activities, and characterizes all events

as piecewise-Poisson Markov processes with explicitly defined event

rate functions, as explained in the Chapter 16 of this Handbook.



The set of adjustable parameters input to SoftRel is called the "model";

the set of event status monitors that describes the evolving process at any

given time is called the set of "facts".  The "model" and "facts"

structures are defined so as to accommodate multiple categories of

classes of events in the subprocesses of the overall reliability process,

with each "model"-"facts" pair representing a separate class of events.

Because of the usual assumption that event processes are independent,

the same simulation technique could be applied simultaneously by using

separate computer processors running the same algorithms for each class.

If only a single processor were to be used, the same algorithms could

be applied to each class separately, but interleaved in time, or else they

could be run entirely separately.  In entirely separate executions,

the sets of results would be merged later into a proper time sequence.



For simplicity, in its initial form, the simulator reported here only

accommodates a single category of events for each of the reliability

subprocesses.  Separate runs using different "model" parameters can be

later merged to simulate performance of a single process that has multiple

failure categories, if desired.  Extension of SoftRel  to accommodate the

more general case is not conceptually difficult, but has not yet been

undertaken.  Later versions may possibly include multiple failure categories,

should this feature prove beneficial.



SoftRel simulates two types of failure events, namely, defects in

specification documents and faults in code, all considered to be in

the same seriousness category, as reflected by the single set of "model"

parameters.  As an aside, we note that the seriousness category is often

indicated by the probabilities of observation and outage, and the lengths

of outages: a process with these quantities high will have highly visible

and abortive failures, whereas  when these probabilities are low, the

process will have rarely noticed, inconsequential failures.



The ``documentation'' currently simulated by SoftRel  consists only

of requirements, design, interface specifications, and other entities whose

absence or defective nature can beget faults into subsequently produced code.

Integration and test procedures, management plans, and other ancillary

documentation, when deemed not to correlate directly with fault generation,

are excluded.  The presumption is that the likelihood of a fault

at any given time increases proportionately to the amount of documentation

missing or in error.



SoftRel  does not currently simulate the propagation of missing and

defective requirements into missing and defective design and interface

specifications; both requirements analysis and design activities are

currently combined in the document construction and integration

phases.  All defects occur either in proportion to the amount of new and

reused documentation, to the amount that was changed, deleted, and added,

or to the number of defects that were reworked.




The Simulation Input Parameters The reliability process in SoftRel is fairly comprehensive with respect to what really transpires during software development. The capability to mirror that process in a simulator will require a large number of parameters relating to the ways in which people and processes interact. The SoftRel "model" parameters are the following: Model parameters (fixed per execution): dt simulation time increment, days workday_fraction average calendar days per day worked doc_new_size new documentation units doc_reuse_base reused documentation units doc_reuse_deleted reused documentation units deleted doc_reuse_added documentation units added to reuse base doc_reuse_changed documentation units changed in reuse doc_build_rate new documentation units/workday doc_reuse_acq_rate reused documentation acquisition units/workday doc_reuse_del_rate reused documentation deletion units/workday doc_reuse_add_rate reused documentation addition units/workday doc_reuse_chg_rate reused documentation changed units/workday defects_per_unit defects generated/new documentation unit reuse_defect_rate reused documentation indigenous defects/unit del_defect_rate defects inserted/deleted reused unit add_defect_rate defects inserted/addition to reused unit chg_defect_rate defects inserted/changed reused unit hazard_per_defect documentation hazard units added or removed per defect new_doc_inspect_frac fraction of new documentation inspected reuse_doc_inspect_frac fraction of reuse documentation inspected insp_doc_units_per_workday inspected documentation units/workday inspection_limit relative number of defects that can be removed by inspection find_rate_per_defect rate of defect discovery per hazard unit per documentation unit defect_fix_rate corrected documentation defects/workday defect_fix_adequacy true documentation fixes/correction new_defects_per_fix defects created/correction doc_del_per_defect documentation units deleted/correction doc_add_per_defect documentation units added/correction doc_chg_per_defect documentation units changed/correction code_new_size new code units code_reuse_base reused code units code_reuse_deleted reused code units deleted code_reuse_added code units added to reuse base code_reuse_changed code units otherwise changed in reuse base code_build_rate new code units/workday code_reuse_acq_rate reused code acquired, units/workday code_reuse_del_rate reused code deletions, units/workday code_reuse_add_rate reused code additions, units/workday code_reuse_chg_rate reused code changed, units/workday faults_per_unit faults generated/code unit reuse_fault_rate indigenous reused code faults/code unit del_fault_rate faults inserted/deleted code unit add_fault_rate faults inserted/added code unit chg_fault_rate faults inserted/changed code unit faults_per_defect number of code faults/defect miss_doc_fault_rate faults/code unit generated per missing documentation fraction hazard_per_fault code hazard units added or removed per fault new_code_inspect_frac fraction of new code inspected reuse_code_inspect_frac fraction of reused code inspected insp_code_units_per_workday inspected code units/workday find_rate_per_fault fraction of faults detected per inspected unit fault_fix_rate code faults "corrected"/workday fault_fix_adequacy true fault fixes/"correction" new_faults_per_fix faults created/"correction" code_del_per_fault code units deleted per fault "correction" code_add_per_fault code units added per fault "correction" code_chg_per_fault code units changed per fault "correction" tests_gen_per_workday test cases/workday tests_used_per_unit test cases used/resource unit failure_rate_per_fault failures/resource unit/fault density miss_code_fail_rate failures per resource unit per missing code fraction prob_observation probability that failure is observed prob_outage probability that a failure causes outage outage_time_per_failure delay caused by failure, days analysis_rate failures analyzed/workday analysis_adequacy faults recognized/fault analyzed repair_rate fault "repairs"/workday repair_adequacy true repairs/"repair" new_faults_per_repair faults created/"repair" validation_rate "repairs" validated/workday find_rate_per_fix detected bad repairs/repair validation retest_rate retested faults/resource unit retest_adequacy detected bad repairs/retest/unrepaired fault schedule schedule_item list: (t_begin, t_end, event, staff, resources)* packets. When the work effort expended by an activity is needed, it may be computed by using the instantaneous staffing, or work force, function s(alpha, t) defined for each such activity alpha over the time periods of applicability. The corresponding work effort w(alpha, T) over a time interval (0, T), for example, is w(alpha, T) = _0^T s(alpha, t) dt In SoftRel, s(alpha, t) is coded as "staffing(A, p, M)", where "A" is the activity, "p" points to a "facts" structure, and "M" points to a "model". Similarly, if computer CPU time, or another computer resource, is required in calculating the event-rate functions above, it is found through the conversion function q(alpha, t), which is defined for each activity alpha as the CPU or resource utilization per wall-clock day. The CPU resource usage over the time interval (0, T), for activity alpha, for example, is T_cpu(alpha, T) = _0^T q(alpha, t) dt The function q(alpha, t) in SoftRel appears as "resource(A, p, M)", with the same arguments as "staffing", above. The number of wall-clock days may be interpreted either as literal calendar days, or as actual workdays. These alternatives are selected by proper designation of the "model" parameter, "workday_fraction". A value of unity signifies that time and effort accumulate on the basis of one workday effort per schedule day per individual. A value of 5/7 means that work effort and resource utilization accumulate on the average only during 5 of the 7 days of the week. A value of 230/365 denotes that 230 actual workdays span 365 calendar days. These compensations are made in the "staffing" and "resource" functions, above. Activities of the life cycle are controlled by the staffing function. No progress in an activity takes place unless it has an allocated work force. If, however, staffing is non-zero, event rates involve s(alpha, t) when work effort dependencies exist, and q(alpha, t) when CPU dependencies are manifest. Staffing and computer resource allocations in the "model" are made via the "schedule" list of "schedule item" packets, each of which contains "activity" & = & index of the work activity "t_begin" & = & beginning time of the activity, days "t_end" & = & ending time of the activity, days "staffing" & = & staff level of the activity, persons "cpu" & = & resources available, units per day "next" & = & pointer to next "schedule item" packet The entire list is merged by the staffing and resource-utilization functions, s and q, or "staffing" and "cpu" in the program, to provide scheduled workforce and computer resources at each instant of time throughout the process. Both "staffing" and "cpu" express resource units per project day. If the schedule quintuples include weekends, holidays, and vacations, then staff and resource values must be compensated so that the integrated staff and resources over the project schedule are the allocated total effort and resource values. This is done via the parameter "workday_fraction" discussed above.
Event Status Monitors: Output The event status indicators of interest, or "facts", during the reliability process are the time-dependent values Project Status ("facts" output for each dt iteration): active Project is active if true, else completed DU Total documentation units goal DU_t Total number of documentation units built DU_n New documentation units DU_r Acquired reused documentation units DU_rd Reused documentation deleted units DU_ra Reused documentation additional units DU_rc Reused documentation changed units E_d Human errors putting defects in all documentation E_dn Human errors putting defects in new documentation E_dr Human errors putting defects in reused documentation DH Total documentation hazard DH_n Hazard in new documentation DH_r Hazard in reused documentation DI_t Inspected portion of all documentation DI_n Inspected portion of new documentation DI_r Inspected portion of reused documentation D Documentation defects detected d Documentation defects corrected CU Total code units goal CU_t Total code units built CU_n New code units CU_r Acquired reused code units CU_rd Reused code deleted units CU_ra Reused code additional units CU_rc Reused code changed units E_f Human errors putting faults in all code E_fn Human errors putting faults in new code E_fr Human errors putting faults in reused code CH Total code hazard CH_n New code hazard CH_r Reused code hazard CI_t Inspected portion of all code CI_n Inspected portion of new code CI_r Inspected portion of reused code e Code faults detected in inspection h Code inspection faults corrected (healed) C Test Cases prepared c Test cases expended F Failures encountered during testing A Failures Analyzed for fault f Faults isolated by testing w Faults needing rework, revalidation, etc. u Number of faulty repairs R Number of fault repairs undertaken V Validations conducted of fault repairs RT Retests conducted r Faults actually repaired rr Faults re-repaired outage Total outage time due to failure t Current accumulated time T[N] Time by activity array W[N] Work effort by activity array cpu[N] CPU/resource usage by activity array ``Documentation units'' and ``code units'' are typically counted in pages of specifications and lines of source code, but other conventions are acceptable, provided that rate functions and parameters of the "model" are consistently defined. Other status metrics "facts" of interest are "t" & = & Current time. "T[i]" & = & Cumulative time consumed by activity "i". "W[i]" & = & Cumulative work effort consumed by activity "i". "cpu[i]" & = & Cumulative CPU or other computer resource consumed by activity "i". "outage" & = & Total outage time due to failure. "active" & = & Boolean indicator, true if the process has not yet terminated. Note that the time-related activities above which measure times in days are expressed as elapsed wall-clock time. Conversions to effort in workdays and to CPU (or other) computer resource utilization in resource-days are "model"-related and addressed previously.