SoftRel: The Simulation Technique


Overview



The software reliability process simulator SoftRel does not assume

staff, resources, or schedule models, but provides for quintuple

inputs for them.  The simulator also captures the effects of

interrelationships among activities, and characterizes all events

as piecewise-Poisson Markov processes with explicitly defined event

rate functions, as explained in the Chapter 16 of this Handbook.



The set of adjustable parameters input to SoftRel is called the "model";

the set of event status monitors that describes the evolving process at any

given time is called the set of "facts".  The "model" and "facts"

structures are defined so as to accommodate multiple categories of

classes of events in the subprocesses of the overall reliability process,

with each "model"-"facts" pair representing a separate class of events.

Because of the usual assumption that event processes are independent,

the same simulation technique could be applied simultaneously by using

separate computer processors running the same algorithms for each class.

If only a single processor were to be used, the same algorithms could

be applied to each class separately, but interleaved in time, or else they

could be run entirely separately.  In entirely separate executions,

the sets of results would be merged later into a proper time sequence.



For simplicity, in its initial form, the simulator reported here only

accommodates a single category of events for each of the reliability

subprocesses.  Separate runs using different "model" parameters can be

later merged to simulate performance of a single process that has multiple

failure categories, if desired.  Extension of SoftRel  to accommodate the

more general case is not conceptually difficult, but has not yet been

undertaken.  Later versions may possibly include multiple failure categories,

should this feature prove beneficial.



SoftRel simulates two types of failure events, namely, defects in

specification documents and faults in code, all considered to be in

the same seriousness category, as reflected by the single set of "model"

parameters.  As an aside, we note that the seriousness category is often

indicated by the probabilities of observation and outage, and the lengths

of outages: a process with these quantities high will have highly visible

and abortive failures, whereas  when these probabilities are low, the

process will have rarely noticed, inconsequential failures.



The ``documentation'' currently simulated by SoftRel  consists only

of requirements, design, interface specifications, and other entities whose

absence or defective nature can beget faults into subsequently produced code.

Integration and test procedures, management plans, and other ancillary

documentation, when deemed not to correlate directly with fault generation,

are excluded.  The presumption is that the likelihood of a fault

at any given time increases proportionately to the amount of documentation

missing or in error.



SoftRel  does not currently simulate the propagation of missing and

defective requirements into missing and defective design and interface

specifications; both requirements analysis and design activities are

currently combined in the document construction and integration

phases.  All defects occur either in proportion to the amount of new and

reused documentation, to the amount that was changed, deleted, and added,

or to the number of defects that were reworked.





The Simulation Input Parameters



The reliability process in SoftRel is fairly comprehensive with

respect to what really transpires during software development.

The capability to mirror that process in a simulator will require a

large number of parameters relating to the ways in which people and

processes interact.



The SoftRel "model" parameters are the following:



Model parameters (fixed per execution):



dt				simulation time increment, days

workday_fraction		average calendar days per day worked



doc_new_size		  	new documentation units

doc_reuse_base			reused documentation units

doc_reuse_deleted		reused documentation units deleted

doc_reuse_added 		documentation units added to reuse base

doc_reuse_changed		documentation units changed in reuse



doc_build_rate			new documentation units/workday

doc_reuse_acq_rate		reused documentation acquisition units/workday

doc_reuse_del_rate		reused documentation deletion units/workday

doc_reuse_add_rate		reused documentation addition units/workday

doc_reuse_chg_rate		reused documentation changed units/workday



defects_per_unit		defects generated/new documentation unit

reuse_defect_rate		reused documentation indigenous defects/unit

del_defect_rate 		defects inserted/deleted reused unit

add_defect_rate 		defects inserted/addition to reused unit

chg_defect_rate 		defects inserted/changed reused unit

hazard_per_defect		documentation hazard units added or removed

				per defect



new_doc_inspect_frac		fraction of new documentation inspected

reuse_doc_inspect_frac		fraction of reuse documentation inspected

insp_doc_units_per_workday	inspected documentation units/workday

inspection_limit		relative number of defects that can be

				    removed by inspection

find_rate_per_defect	  	rate of defect discovery per hazard unit

				per documentation unit



defect_fix_rate 		corrected documentation defects/workday

defect_fix_adequacy		true documentation fixes/correction

new_defects_per_fix	  	defects created/correction

doc_del_per_defect		documentation units deleted/correction

doc_add_per_defect		documentation units added/correction

doc_chg_per_defect		documentation units changed/correction



code_new_size		  	new code units

code_reuse_base 	  	reused code units

code_reuse_deleted	  	reused code units deleted

code_reuse_added		code units added to reuse base

code_reuse_changed		code units otherwise changed in reuse base



code_build_rate 	  	new code units/workday

code_reuse_acq_rate		reused code acquired, units/workday

code_reuse_del_rate		reused code deletions, units/workday

code_reuse_add_rate		reused code additions, units/workday

code_reuse_chg_rate		reused code changed, units/workday



faults_per_unit 	  	faults generated/code unit

reuse_fault_rate		indigenous reused code faults/code unit

del_fault_rate			faults inserted/deleted code unit

add_fault_rate			faults inserted/added code unit

chg_fault_rate			faults inserted/changed code unit

faults_per_defect		number of code faults/defect

miss_doc_fault_rate		faults/code unit generated per missing

				documentation fraction

hazard_per_fault		code hazard units added or removed per fault



new_code_inspect_frac		fraction of new code inspected

reuse_code_inspect_frac 	fraction of reused code inspected

insp_code_units_per_workday	inspected code units/workday

find_rate_per_fault		fraction of faults detected per inspected unit



fault_fix_rate			code faults "corrected"/workday

fault_fix_adequacy		true fault fixes/"correction"

new_faults_per_fix		faults created/"correction"

code_del_per_fault		code units deleted per fault "correction"

code_add_per_fault		code units added per fault "correction"

code_chg_per_fault		code units changed per fault "correction"



tests_gen_per_workday	  	test cases/workday

tests_used_per_unit		test cases used/resource unit



failure_rate_per_fault		failures/resource unit/fault density

miss_code_fail_rate		failures per resource unit per missing code

				fraction

prob_observation		probability that failure is observed

prob_outage			probability that a failure causes outage

outage_time_per_failure 	delay caused by failure, days



analysis_rate		  	failures analyzed/workday

analysis_adequacy		faults recognized/fault analyzed



repair_rate			fault "repairs"/workday

repair_adequacy 		true repairs/"repair"

new_faults_per_repair		faults created/"repair"



validation_rate 		"repairs" validated/workday

find_rate_per_fix		detected bad repairs/repair validation

retest_rate		     	retested faults/resource unit

retest_adequacy 		detected bad repairs/retest/unrepaired fault



schedule			schedule_item list:

				    (t_begin, t_end, event, staff, resources)*

					       packets.





When the work effort expended by an activity is needed, it may be

computed by using the instantaneous staffing, or work force,

function s(alpha, t) defined for each such activity alpha over the time

periods of applicability.  The corresponding work effort w(alpha, T)

over a time interval (0, T), for example, is



        w(alpha, T) = _0^T s(alpha, t) dt



In SoftRel, s(alpha, t) is coded as "staffing(A, p, M)", where "A" is

the activity, "p" points to a "facts" structure, and "M" points to a

"model".



Similarly, if computer CPU time, or another computer resource, is

required in calculating the event-rate functions above, it is found

through the conversion function q(alpha, t), which is defined for each

activity alpha as the CPU or resource utilization per wall-clock day.

The CPU resource usage over the time interval (0, T), for activity

alpha, for example, is



        T_cpu(alpha, T) = _0^T q(alpha, t) dt



The function q(alpha, t) in SoftRel  appears as "resource(A, p, M)",

with the same arguments as "staffing", above.



The number of wall-clock days may be interpreted either as

literal calendar days, or as actual workdays.  These alternatives are

selected by proper designation of the "model" parameter, "workday_fraction".

A value of unity signifies that time and effort accumulate on the basis

of one workday effort per schedule day per individual.  A value of

5/7 means that work effort and resource utilization accumulate on the

average only during 5 of the 7 days of the week.  A value of 230/365

denotes that 230 actual workdays span 365 calendar days.  These

compensations are made in the "staffing" and "resource" functions, above.



Activities of the life cycle are controlled by the staffing function.

No progress in an activity takes place unless it has an allocated work

force. If, however, staffing is non-zero,  event rates involve s(alpha, t)

when work effort dependencies exist, and q(alpha, t) when CPU

dependencies are manifest.



Staffing and computer resource allocations in the "model"

are made via the "schedule" list of "schedule item" packets,

each of which contains



        "activity" & = & index of the work activity

        "t_begin"  & = & beginning time of the activity, days

        "t_end"    & = & ending time of the activity, days

        "staffing" & = & staff level of the activity, persons

        "cpu"      & = & resources available, units per day

        "next"     & = & pointer to next "schedule item" packet



The entire list is merged by the staffing and resource-utilization

functions, s and q, or "staffing" and "cpu" in the program, to provide

scheduled workforce and computer resources at each instant of time

throughout the process.  Both "staffing" and "cpu" express resource

units per project day.  If the schedule quintuples include weekends,

holidays, and vacations, then staff and resource values must be

compensated so that the integrated staff and resources over the project

schedule are the allocated total effort and resource values.  This is

done via the parameter "workday_fraction" discussed above.





Event Status Monitors: Output



The event status indicators of interest, or "facts",  during the

reliability process are the time-dependent values



Project Status ("facts" output for each dt iteration):



active      Project is active if true, else completed



DU         Total documentation units goal

DU_t       Total number of documentation units built

DU_n       New documentation units

DU_r       Acquired reused documentation units

DU_rd      Reused documentation deleted units

DU_ra      Reused documentation additional units

DU_rc      Reused documentation changed units

E_d        Human errors putting defects in all documentation

E_dn       Human errors putting defects in new documentation

E_dr       Human errors putting defects in reused documentation

DH         Total documentation hazard

DH_n       Hazard in new documentation

DH_r       Hazard in reused documentation

DI_t       Inspected portion of all documentation

DI_n       Inspected portion of new documentation

DI_r       Inspected portion of reused documentation

D          Documentation defects detected

d          Documentation defects corrected



CU         Total code units goal

CU_t       Total code units built

CU_n       New code units

CU_r       Acquired reused code units

CU_rd      Reused code deleted units

CU_ra      Reused code additional units

CU_rc      Reused code changed units

E_f        Human errors putting faults in all code

E_fn       Human errors putting faults in new code

E_fr       Human errors putting faults in reused code

CH         Total code hazard

CH_n       New code hazard

CH_r       Reused code hazard

CI_t       Inspected portion of all code

CI_n       Inspected portion of new code

CI_r       Inspected portion of reused code

e          Code faults detected in inspection

h          Code inspection faults corrected (healed)



C          Test Cases prepared

c          Test cases expended

F          Failures encountered during testing

A          Failures Analyzed for fault

f          Faults isolated by testing

w          Faults needing rework, revalidation, etc.

u          Number of faulty repairs

R          Number of fault repairs undertaken

V          Validations conducted of fault repairs

RT         Retests conducted

r          Faults actually repaired

rr         Faults re-repaired

outage     Total outage time due to failure

t          Current accumulated time

T[N]       Time by activity array

W[N]       Work effort by activity array

cpu[N]     CPU/resource usage by activity array





``Documentation units'' and ``code units'' are typically counted in

pages of specifications and lines of source code, but other conventions

are acceptable, provided that rate functions and parameters of the

"model" are consistently defined.



Other status metrics "facts" of interest are



        "t"      & = &  Current time. 

	"T[i]"	 & = &  Cumulative  time consumed by activity "i". 

	"W[i]"	 & = &  Cumulative work effort consumed by activity "i". 

	"cpu[i]" & = &  Cumulative CPU or other computer resource consumed

			by activity "i". 

        "outage" & = &  Total outage time due to failure. 

        "active" & = &  Boolean indicator, true if the process has not yet

			terminated.



Note that the time-related activities above which measure times in days

are expressed as elapsed wall-clock time.  Conversions to effort in

workdays and to CPU (or other) computer resource utilization in

resource-days are "model"-related

and addressed previously.