When we talk about high-integrity technical systems we have technical systems in mind that provide their functions with a reliable level of service and degrade gracefully, if the level of service cannot be sustained for whatever reason. Graceful degradation has two aspects. First, the failure impact is contained in order to minimise the impact on the system and beyond. Second, human operators have reasonable time for corrective actions or the preparation for the loss of function. For most technical systems, a total loss of system function will be tolerable requiring that the system falls in a fail-safe mode not adversely affecting other functions. Sometimes fail-operativeness may be demanded when a minimum-safe operation functionality needs to be provided uninterruptedly.
How to engineer high-integrity technical systems? If this question is now spreading in your mind and if you want answers, you are an intended visitor of this website. The author has spent his whole professional life to investigate the intricacies of the question and to develop ways forward in innovative practical applications. At the end, all efforts culminated in the Theory of Systemgestaltung expressing the art of systems engineering in a consistent set of four comprehensive basic systems engineering narratives. – Sorry, there is no proper one-to-one translation of the German term Gestalt into the English language. You may just talk about a theory of systems engineering as the brand name H·I·T·S Engineering suggests.
For applying the theory in practice, this website provides an overview of the essentials and additional considerations on the reasonability of the approach. The author offers further training and coaching support on request. This page continues below with a popular misconception and a few essential principles on performing systems engineering of high-integrity technical systems.
A theoretical shortcut claims equivalence of high-integrity technical systems with perfect products and services. The best way to end up in perfectness would be to start in perfectness and never leave the path of perfectness until the end. This first-time-right approach has a number of downsides. First, it fosters overconfidence in human capabilities, processes, methods and tools. Second, it leads to a pronounced blame game culture. If something is not perfect, there must be a single point of failure. The purpose of the root cause analysis is then primarily to find someone guilty. Third, if assurance activities would never reveal any issues, no evidence for process quality would ever be generated beyond personal and team affirmations. Fourth, the engineering of innovative systems is interlocked with re-shaping existing language and generating new language. The perfectness level of Gestalt qualities at the end of development is higher than at the beginning. Successful systems engineering is a one-way road towards perfection, fully in the literal sense. The perfect world illusion provides an excellent opportunity to remind Henry Louis Mencken (1880-1956): “For every problem there is an answer that is clear, simple and … wrong.”
Imagine you are in charge of the technical approval of a high-integrity technical system with required fail-operative capabilities since a total loss of function shall be extremely improbable due to its catastrophic impacts. An independent certification authority has tasked you because of your expertise and experience. For sure, you will take the job seriously as you know your recommendation may have a bunch of severe consequences. You have granted access to all product data generated during development and auditing rights regarding the engineering team comprising more than hundred engineers from various engineering and other scientific disciplines, partly with specific skills on a high level of excellence. Let us now discuss how to proceed.
At first, the vast amount of product data just seems like a huge haystack. Your job is to search for needles and to evaluate their potential to hurt somebody. A sound starting point is the configuration baseline. The configuration baseline refers to the product data saying what the product is. Furthermore, all omissions recording inconsistencies in the product data are mandatory content in order to ensure consistency of the configuration baseline in total. Evaluating the system architecture, the system functions and the allocation of system functions to system elements is essential for understanding the system. From validation and verification data you extract the criticality of functions and system elements. For two reasons, the omissions are important. First, certain patterns may be recognisable identifying critical technical issues and process steps. Second, the reasonability of allowances and limitations in response to individual omissions or the complete set of omissions needs to be assessed carefully and completely. Unsatisfactory assessment results are a clear no-go and lead to a denial of technical approval.
Now you are aware of the needles you have been directed to by omissions. And you have some hints where in the haystack more needles may be found. The existence or non-existence of further previously unknown needles in the surrounding of known needles provides essential cues about overall process quality. Thus, you may start taking samples from the haystack in the surrounding of known needles and proceed then further with samples from all areas of the haystack. Of course, you never will find all needles by taking samples. It is worth to assess the residual risk by considering how needles are produced. Let us start with the simple scenario that a number has not been copied correctly from a requirement to a functional model. Before the functional model was approved for downstream use there should have been a check that obviously failed. The mismatch has not been detected by further assurance activities downstream, too. This may have a number of reasons including: First, the datum has little relevance for the function. Second, no test case exists checking this specific datum. Third, the test case uses the datum from the functional model instead of the requirement. In the first case, the impact severity is low. When detected, the needle may be tolerable in the first case after further analysis showing that it does no harm. The second case has a very low probability assumed that the errors are independent events. If they are not independent from each other, we may have a systematic process issue with a potential to produce further needles in high numbers. The same yields for the third case. The example demonstrates the importance of the actual process quality for providing or denying technical approval.
The theory of process quality focusses on process adherence. Defined processes have to be followed in order to avoid deviations. Arbitrary deviations from defined processes have a high potential to produce needles. If this theory reminds you to the American style of mass production, you are on the right track. The analogy of development and production is granted only in the case of perfect initial requirements and mature process definitions. In production, the configuration basis and the manufacturing process are given upfront and conforming products and services are verified in accordance with these product standards. In development, the appropriateness of the initial requirements needs always be controlled in a feedback loop by validation activities leading to rather high development dynamics increasing with the level of innovation and system complexity. Usually, configuration baselines refer to process definitions directly and indirectly, but the information to restore the actual development dynamics is not contained in configuration baselines. For good reasons, configuration baselines concentrate on what the product or service is. The development dynamics and conceptual considerations like evaluating potential alternative solutions should be mapped to further product data, but rarely provide sufficient evidence to assess the actual process quality later. It is common regulatory practice that authorities in charge of the certification of high-integrity technical systems with high safety and security classifications are auditing development continuously.
Continuing our example, you will start evaluating process quality with the process definitions referenced in the configuration baseline and proceed to the configuration control records tracing the evolution of selected samples. You may feel a need to get some assistance from the development team for understanding missing traces and suspected inconsistencies, and for identifying needles unambiguously. Development managers may promise you corrective actions by tightening process definitions as well as additional training, briefing and call to order of the engineers involved. Sounds good but may have devastating effects on process and product quality. It is time to rethink your role: You are no longer independent as you start to interfere with the development. If you do not want to become a well identifiable contributor to a mess, you should become aware that members of development teams engaged in engineering high-integrity technical systems operate not as lazy unskilled workers but are qualified domain specialists and experts with their own code of ethics. It is wise to establish good working relationships with these people by trying to understand the reasons why they deviated from the defined process and to profit from their ideas for your own root cause analysis. Together you may generate viable solution alternatives. Maybe, your recommendations for consideration to development managers are more successful than when delivered by subordinates. By the way, you will convert more and more from an application domain expert to a systems engineering process expert.
Most crises in engineering projects are rooted in application specific challenges. Whenever a crisis culminates close to quality breakdown and project failure, the incapability of engineering processes to cope with the true dynamics of development provides a major contribution to the mess. Typically, inappropriate engineering processes set tighter limits for project success than dwindling viable solution alternatives. With increasing development dynamics, process complexity grows far beyond product complexity. The particular example above is not just a funny caricature of engineering. In reality, massive milestone delays and budget overruns are observable frequently, in most cases caused by inappropriate processes. In conclusion, process adherence is not the only process quality concern. The root cause for deviations of actual processes from defined processes is often found in the process definitions themselves. Before we consider criteria for adequate process definitions in the next section, we spend a few thoughts on how contemporary process standards have evolved.
When it was recognised that product quality cannot be tested in and that process quality cannot be assessed by examining the product as it is, the establishment of criteria for assessing process quality became important. This ended up in a number of development, assurance and management viewpoints that scan the whole process to find evidence regarding the quality criteria defined in the scope of the particular viewpoint. Process standards with sets of viewpoints and quality criteria evolved. For process implementation, these standards are taken to derive plans with process definitions along the viewpoints. Setting up process definitions this way violates an essential systems thinking principle: Start with understanding the whole and then proceed to the details. A solution of the whole problem demands that all derived detailed problems are solved. But it is a reductionistic misconception to believe that when all detailed problems are solved the whole problem is solved as well. In the particular case, the beams from the various viewpoints on the whole process partly overlap leading to duplicated, sometimes conflicting process definitions. Some areas of the whole process may be missed completely since no viewpoint considers them. When process activity sequences and data models are designed along the process viewpoints, engineering tools support development with corresponding deficiencies.
We start the section with another scenario. Imagine you are now the chief engineer of a small-scale high-integrity control system. The vehicle design has concluded that fail-safe characteristics of your control system are sufficient. But reviewing the vehicle’s risk analysis you find a mitigation strategy for a particular risk with a lack of supporting evidence. If this mitigation strategy fails, the fail-safe characteristics of your control system will impose a significant reduction of the vehicle’s usability. In your own risk assessment, you define alternative mitigation measures: Develop the control system in accordance with the rigor applicable to fail-operative systems. However, your budget does not cover the additional effort. You define a solution strategy with two main objectives: First, perform the development activities by applying the higher standards, but omit cost-intensive assurance activities only required for the higher standards. Complete and traceable records of the development dynamics are indispensable in order to allow later out-of-time-sequence assurance activities and to resolve all potential doubts in your configuration information. Second, build up confidence in your high process quality with the certification authorities. In their audits, they should never have the opportunity to identify inconsistencies in your product data not captured by omissions. This will have an advantageous impact to increase the willingness of certification authorities to accept your process quality as adequate for a later re-classification to higher process quality levels.
The author recommends the following principles for the implementation of high-quality systems engineering processes satisfying both objectives:
Making experiences and acting as described above may qualify you to be appointed as the process responsible for a large-scale high-integrity technical system in crisis. You may then exploit the opportunity to generate further convincing evidence for the benefits of the presented approach.