Improving alarm management with ISA-18.2: Part 2

Siemens Ltd
Thursday, 06 March, 2014


Poor alarm management is one of the leading causes of unplanned downtime and has been a major contributor to some of the worst industrial accidents on record. In Part 1 of this article we discussed the release of the ISA-18.2 standard, which provides a blueprint for creating a safer and more productive plant. In this part we look more at how it can be implemented.

As explained in Part 1, in June of 2009 the standard ANSI/ISA-18.2-2009, ‘Management of Alarm Systems for the Process Industries’, was released. We examined its philosophy, the structure it provides around the alarm management life cycle and the detailed design of an alarm management system.

Alarm management is not a ‘do once’ activity - rather it is a process that requires continuous attention. Consequently, the basis of the standard is to follow a life-cycle approach as shown in Figure 1.

Figure 1: The alarm management lifecycle *1

Figure 1: The alarm management life cycle(1).

Following the ISA-18.2 standard - implementation

Having completed phases A through D of the alarm management life cycle (philosophy, identification, rationalisation and detailed design), we are ready to proceed to Phase E and implement the rationalised alarm system design.

Implementation (Phase E)

During the implementation phase, the alarms in the control system are put into operation. Testing is a key activity, particularly as new instrumentation (and alarms) is added to the system over time or process designs changes are made. Equally important during this phase is the training of the operators of the system, so they are comfortable with it and so they trust it to help them do their job. Training the operators with process simulation tools can create a ‘drilled response’ where corrective action is so well reinforced that it is automatic.

Operation and maintenance (Phases F and G)

The standard defines the recommended tools for handling of alarms during operation. One of the most important is called ‘alarm shelving’, which is a tool for the operator to temporarily suppress an alarm, thus removing it from view. Shelving is critical for helping an operator respond effectively during a plant upset by manually hiding less important alarms. Alarms that are shelved will reappear after a preset time period so that they are not forgotten. When shelved, an alarm should be removed from the active list and the indication should be cleared from the HMI graphics and faceplates. Systems that support shelving must provide a display which lists all shelved alarms.

The standard also documents what should be included in an alarm response procedure. The information fleshed out during rationalisation, such as an alarm’s cause, potential consequence, corrective action and the time to respond, should be made available to the operator. Ideally, this information should be displayed online rather than in written form.

Effective transfer of alarm status information between shifts is important in many facilities. If the operator coming on shift is only provided with a three-line entry in an operator logbook, he may be ill prepared to address any situation leading up to an incident. To improve shift transition, the system should allow operators to record comments for each alarm.

Maintenance is the stage where an alarm is taken out of service for repair, replacement or testing. The standard describes the procedures that must be followed, including documenting why an alarm was removed from service, the details concerning interim alarms, special handling procedures, as well as what testing is required before it is put back into service. The standard requires that the system be able to show a complete list of alarms that are currently out of service. As a safety precaution, this list should be reviewed before putting a piece of equipment back into operation to ensure that all of the necessary alarms are operational.

The standard describes three possible methods for alarm suppression, which is any mechanism used to prevent the indication of the alarm to the operator when the base alarm condition is present. All three methods have a place in helping to optimise performance.

Suppression method per ISA-18.2 Definition Relevant phase
Shelving A mechanism, typically initiated by the operator, to temporarily suppress an alarm. Operations
Suppressed by design Any mechanism within the alarm system that prevents the transmission of the alarm indication to the operator based on plant state or other conditions. Advanced alarm design
Out of service The state of an alarm during which the alarm indication is suppressed, typically manually, for reasons such as maintenance. Maintenance

Table 1: Methods of alarm suppression from ISA-18.2.

Monitoring and assessment (Phase H)

The monitoring and assessment section of the standard describes how to analyse the performance of the alarm system against recommended key performance indicators (Table 5). One of the key metrics is the number of alarms that are presented to the operator. In order to provide adequate time to respond effectively, an operator should be presented with no more than one to two alarms every 10 minutes. In many control rooms, operators are inundated with an average of one alarm every minute, which makes it challenging to respond correctly to each alarm. A related metric is the percentage of 10-minute intervals in which the operator received more than 10 alarms, which indicates the presence of an alarm flood.

ISA-18.2 recommends using no more than three or four different alarm priorities in the system. To help operators know which alarms are most important so they can respond correctly, it is recommended that no more than 5% of the alarms be configured as high priority. The system should make it easy to review the configured alarm priority distribution, for example, by exporting alarm information to a CSV file for analysis in MS Excel.

Analysis should also include identifying nuisance alarms, which are alarms that annunciate excessively, unnecessarily or do not return to normal after the correct response is taken (eg, chattering, fleeting or stale alarms). The system should have the capability of calculating and displaying statistics, such as alarm frequency, average time in alarm, time between alarms and time before acknowledgement. It is not uncommon for the majority of alarms (up to 80%) to originate from a small number of tags (10-20). This frequency analysis makes it easy to identify these ‘bad actors’ and fix them. The ‘average time in alarm’ metric can help identify chattering alarms, which are alarms that repeatedly transition between the alarm state and the normal state in a short period of time.

Another key objective of the monitoring and assessment phase is to identify stale alarms, which are those alarms that remain in the alarm state for an extended period of time (more than 24 hours). The system should allow the alarm display to be filtered, based on time-in-alarm, in order to create a stale alarm list. Alarm display filters should be able to be saved and re-used so that on-demand reports can be easily created.

Alarm performance metrics based on at least 30 days of data
Metric Target value
Annunciated alarms per time Target value: Very likely to be acceptable Target value: Maximum manageable
Annunciated alarms per day per operating position 150 alarms per day 300 alarms per day
Annunciated alarms per hour per operating position 6 (average) 12 (average)
Annunciated alarms per 10 min per operating pos. 1 (average) 2 (average)
Metric Target value
Percentage of hours with >30 alarms <1%
Percentage of 10-min periods with >10 alarms <1%
Maximum number of alarms in a 10-min period ≤10
Percentage of time the system is in a flood condition <1%
Percentage contribution of the top 10 most frequent alarms to the overall alarm load <1% to 5% maximum, with action plan to address deficiencies
Quantity of chattering and fleeting alarms Zero, develop action plans to correct any that occur
Stale alarms <5 per day, with action plan to address
Annunciated priority distribution If using three priorities: 80% low, 15% medium, 5% high
If using four priorities: 80% low, 15% medium, 5% high, <1% ‘highest’
Other special-purpose priorities are excluded from the calculation
Unauthorised alarm suppression Zero alarms suppressed outside of controlled or approved methodologies
Unauthorised alarm attribute changes Zero alarm attribute changes outside of approved methodologies or management of change (MOC)

Table 2: ISA-18.2 Alarm performance metrics(1).

Management of change (Phase I)

Even the most well-designed alarm system may not prevent problems if there is not strict control over access to configuration changes. Management of change entails the use of tools and procedures to ensure that modifications to the alarm system (such as changing an alarm’s limit) get reviewed and approved prior to implementation. Once the change is approved, the master alarm database should be updated to keep it current.

All changes made through the HMI should be automatically recorded with the timestamp, ‘from’ and ‘to’ values, along with who made the change. The system should provide the capability to set up access privileges (such as who can acknowledge alarms, modify limits or disable alarms) on an individual and a group basis. It is also important to prevent unauthorised configuration changes from the engineering station.

It is good practice to periodically compare the actual running alarm system configuration to the master alarm database to ensure that no unauthorised configuration changes have been made. The system should provide tools to facilitate this comparison in order to make it easy to discover differences (eg, alarm limit has been changed from 10.0 to 99.99). These differences can then be corrected to ensure consistency and traceability.

Audit (Phase J)

The last phase in the alarm management lifecycle is the audit phase. During this phase, periodic reviews are conducted of the alarm management processes that are used in the plant. The operation and performance of the system is compared against the principles and benchmarks documented in the alarm philosophy. The goal is to maintain the integrity of the alarm system and to identify areas of improvement. The alarm philosophy document is modified to reflect any changes resulting from the audit process.

Getting started

No matter whether you are working with an installed system, are looking to migrate or are putting in a new system, the ISA-18.2 standard provides a useful framework for improving your alarm management practices. There is no right or wrong place to start; however, your system will likely dictate which phase of the alarm management life cycle to focus on first. Alarm philosophy is a good place to start for a new system, while monitoring and assessment can be ideal for an existing system. Here are some of the key actions on which to concentrate when starting to adopt ISA-18.2:

  1. Develop an alarm philosophy document to establish the standards for how your organisation will do alarm management.
  2. Rationalise the alarms in the system to ensure that every alarm is necessary, has a purpose and follows the cardinal rule - that it requires an operator response.
  3. Analyse and benchmark the performance of the system and compare it to the recommended metrics in ISA-18.2. Start by identifying nuisance alarms, which can be addressed quickly and easily - this rapid return on investment may help justify additional investment in other alarm management activities.
  4. Implement management of change. Review access privileges and install tools to facilitate periodic comparisons of the actual configuration compared with the master alarm database.
  5. Audit the performance of the alarm system. Talk with the operators about how well the system supports them. Do they know what to do in the event of an alarm? Are they able to quickly diagnose the problem and determine the corrective action? Also, analyse their ability to detect, diagnose and respond correctly and in time.
  6. Perform a gap analysis on your legacy control system. Identify gaps compared to the standard (eg, lack of analysis tools) and opportunities for improvement. Consider the cost versus benefit of upgrading your system to improve its performance and for compliance with ISA-18.2. In many cases, a modern HMI can be added on top of a legacy control system to provide enhanced alarm management capability without replacing the controller and I/O.

Conclusion

Following the ISA-18.2 standard will become increasingly important as it is adopted by industry, insurance and regulatory bodies. The standard includes recommendations and requirements that can stop poor alarm management, which acts as a barrier to operational excellence. Look for a system that provides a comprehensive set of tools that can help you to follow the alarm management life cycle and address the most common alarm issues - leading to a safer and more efficient plant.

Depending on the capabilities of the native control system, additional third-party tools may be required to deliver the benefits of ISA-18.2. Finding a control system which provides, out of the box, the capabilities demanded by the standard can reduce life-cycle costs and make it easier for personnel to support and maintain. More information can be found at the ISA website www.isa.org, and copies of the standard are free to all ISA members.

References
  1. ANSI/ISA-18.2-2009, Management of Alarm Systems for the Process Industries, www.isa.org
  2. Zapata R and Andow P, Reducing the Severity of Alarm Floods, www.controlglobal.com
  3. EEMUA 191 (2007), Alarm Systems: A Guide to Design, Management and Procurement Edition 2, The Engineering Equipment and Materials Users Association, www.eemua.co.uk
  4. Abnormal Situation Management Consortium, www.asmconsortium.net
  5. NAMUR (Interessengemeinschaft Automatisierungstechnik der Prozessindustrie), www.namur.de
Related Articles

The cyber-physical manufacturing journey

It is time for manufacturers to start their own digitalisation journey and ride the wave of the...

Securing the smart factory: cybersecurity for advanced manufacturing

Threats to industrial operations have outpaced the capabilities of most OT cybersecurity...

AI in engineering: no immediate solutions for specific projects

Will AI ever replace the imaginative and creative engineering professional? Maybe, but not yet.


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd