Optimising service: cutting the costs of equipment ownership
Sunday, 15 June, 2008
Irrespective of whether you are a professional service provider, a manufacturer, an educator or a retailer, technology-based equipment plays a significant role in your ability to produce and supply services or goods to your customer. It logically follows, therefore, that failure of technology, plant or equipment in your business will impact upon your ability to supply — either in absolute terms, or by impacting on quality — with a consequent increase in costs to your business, erosion of profitability and loss of credibility with your customer.
While few would argue this point when considering the production or supply of physical goods, it is frequently overlooked, yet no less relevant, in the context of service delivery. Given that the cost of maintaining and repairing equipment will frequently exceed the purchase price many times over during its service life — an observation that is echoed in the Australian Standard AS IEC603001 which deals with equipment dependability management — any steps that can be taken to reduce the incidence of equipment failure and optimise both the frequency of scheduled maintenance and the uptime of equipment critical to your ability to meet your customers expectations must improve your bottom line.
The costs of equipment ownership aren’t just limited to the initial cost of purchase, the cost of consumables used and expenditure on technical support. There are indirect costs which can significantly exceed the cost of maintenance or direct costs of a failure, which point is again echoed in Australian Standard AS IEC603002. Brand value erosion, material wastage, labour costs incurred during idle time when equipment has failed, product give-away when equipment performs off-specification, and loss of goodwill when customers are not satisfied are all examples of indirect costs of ownership arising from maintenance failure.
There are a number of avenues for cost reduction through optimisation of scheduled maintenance intervals and reduction of the incidence of recurrent failure and performance degradation.
Are you over-servicing or under-servicing?
First, a couple of questions for you:
- How often is the performance of equipment used in your business checked?
- Why at that particular interval?
It’s a reasonably safe assumption that your business has equipment performance checks carried out or preventive maintenance completed at intervals that are mostly determined arbitrarily. Quarterly servicing, for example, is frequently implemented in a broad range of industries without any substantiation for the period selection.
The consequences of this arbitrary method of service scheduling are either over-servicing, resulting in increased cost of ownership and unnecessary disruption of operations, or worse still, under-servicing, whereby system performance is allowed to drift outside of control limits, unnoticed until the next service interval has elapsed, with the result that product quality or product repeatability suffers. Whilst in the latter instance savings may be made from reduced direct maintenance costs, the losses resulting from indirect costs will significantly exceed the short-term savings from reducing maintenance frequency.
The indirect costs resulting from poorly planned maintenance scheduling are frequently higher than the direct costs. Unfortunately, the indirect costs are rarely measured and remain unmanaged — the adage ‘if you aren’t measuring it you aren’t managing it’ holds true. Maintenance needs change with time as equipment ages, consequently the scheduling of routine maintenance also needs to be adaptive3.
How then should we optimise the service interval? For any piece of equipment, the optimal service interval will address two distinct aspects of performance:
- The correction of a deviation from optimum or desired performance — This may be either a qualitative or quantitative measure. For example, degradation in the quality of a printer’s output may be expressed qualitatively whilst an allowable error in a measurement of mass will be expressed quantitatively.
- Increasing uptime or equipment availability through breakdown prevention — This is an outcome of preventive maintenance.
Optimisation through performance correction
As a starting point, control limits on measurements or quality — maximum permissible error (MPE) or quality deviation — need to be determined. Control limits should reflect the maximum permissible error or quality variation parameter that can be tolerated without changing the characteristics or attributes of the product beyond the point at which it no longer meets customer expectations or industry-specified standards. The control limits define the characteristics of the product or service that are critical to the customers’ experience.
The key is to ensure that the maintenance intervals are:
- Wide enough that there is some degradation in performance since the preceding service was carried out.
- Narrow enough to ensure that performance has not degraded to or beyond the point of maximum permissible error.
At the point of equilibrium the effort expended and the costs incurred in carrying out the test and adjustment become of value — the cost of ownership is minimised.
The practical application of this approach requires analysis of data collected over time for each piece of equipment being considered. Data on performance relative to MPE (was the device performing within or outside of specification, and how wide was the deviation?) for a number of service intervals becomes the basis for adjustment of the width of the interval between future successive services. The size of the adjustment applied is initially determined empirically, unless historical data is available and drawn from service intervals of varying width from which a relationship in the time-series can be drawn. The following simplified example illustrates the application of the method.
Assuming that no historical data of value is available, and that quarterly service intervals are in place, at the next service interval record:
- Whether the unit under test is within or outside of MPE (positive or negative variation).
- The size and sign of the variation.
Depending on the results, adjust the scheduled service interval using the following rules:
- If the variation is positive (outside of tolerance), shorten the service interval — go to bi-monthly, for example.
- If the variation is negative (within tolerance), extend the interval — go to four-monthly, for example.
Repeat the process at the next (adjusted) scheduled service. If the size of the variation is large in either direction, a relatively large adjustment in interval size would also be implemented.
Using this method, a point of equilibrium will be reached whereby, at each service interval, the performance of the equipment will be such that adjustment is required but the MPE will not have been exceeded.
There are two additional factors that need to be kept in mind when implementing such a system:
- Single point user tests are not a substitute for performance checks across the measuring range — User tests, which good working practices would normally require, typically test at one point and are intended to highlight significant or catastrophic failures.
- The process should be ongoing — Continuous evaluation and correction are imperative if optimum results — reductions in cost of ownership — are to be maintained; the methodology is relevant because equipment performance does change over time. It is also likely that the size and rate of change over time will be a variable as well.
No matter how rigorous the preventive maintenance regime, equipment will break down and performance will drift. The causes of equipment failure, including the causes of performance degradation, can generally be classified into one of two groups, these being:
- Random — faults that ‘just happen’, for which there is no discernable external cause.
- Systematic — those faults for which an external cause can be identified and can be linked to the consequence, although the link may not be immediately obvious. It is possible to sub-group systematic faults either by environmental causes or operational causes. Most failures that occur are systematic in nature — there is a view that all faults are systematic4.
Faults which, to a single user of a particular piece of equipment, may appear to be random in nature are in most instances systematic. This becomes apparent when performance data drawn from a broader population of equipment is analysed.
When a piece of critical equipment fails, the typical reaction is for an urgent call to be made to a service provider. The likelihood is high that the person who arrives to help will not be the same person who previously worked on the item. Under pressure to respond to the immediacy of the situation, the attending technician will get the equipment up and running in as short a time as possible and then move on to the next urgent job on their list.
Without the benefit of knowing what faults the equipment may have had previously or which have occurred on similar equipment at different locations, the attending technician assumes that the current fault is a ‘one-off’ — a random failure — and addresses it as such. The symptoms are removed (the fault repaired) without necessarily identifying or eliminating the root cause.
One of the most significant effects of the above scenario is that recurrent faults continue recurring, and the symptoms are addressed without identifying the cause5. The user is satisfied when the immediate fault is repaired, and the technician’s report is filed for posterity.
Analysis of the detail of the fault or analysis of the extent of performance degradation (relevant data can be extracted from the technical reports which most service providers will supply on job completion or with their invoice) can help with identification and separation of random from systematic failures. Addressing causes of failure using methods such as Root Cause Analysis or Cause and Effect Diagrams will assist with creation of an effective means of minimising recurrent failures and will cut ownership costs.
Data collected from multiple service events, ideally from multiple installations of the same type or class of equipment can be classified according to the nature of the failure reported and the fault found. The elements that contribute to the accuracy of information that can be built from the data collected include:
- The classification methodology used — This is pivotal to the accuracy of information prepared; the number of classes should be large enough that differentiation is readily achieved, yet small enough to eliminate the risk of misclassification.
- The broader the population of similar equipment from which data can be collected, the more meaningful the information compiled will be — Data mining techniques can be applied to uncover a range of non-obvious relationships, which may not be possible when limited or single-site data only is available.
Businesses that operate similarly configured equipment at multiple, dispersed locations, stand to benefit from centralised data collection and analysis. Common faults inherent to the equipment type, or failures resulting from methods of use or other operational conditions — systematic faults — are likely to go largely unnoticed when distributed data management is in place or, worse still when no data management takes place. Similarly, specific direct service cost comparisons — parts and time for example — can be readily produced when analysed centrally, facilitating negotiation of standardised pricing.
While equipment failure is inevitable at some time during the equipment’s service life, information to measure and manage the performance and reliability parameters (which will facilitate minimisation of the incidence of recurrent faults and minimisation of the associated direct and indirect costs) can be built from data that exists in post-service technical reports. Rather than merely filing and forgetting these reports, centralisation of data collection and the application of statistical analysis tools in conjunction with data mining techniques will assist with uncovering a broad range of cost reduction opportunities including reduction of downtime, reduction of the incidence of failure, and early identification of performance degradation. Any improvement in equipment reliability or performance has a multiplying effect on improvement in profitability because of the reduction in indirect costs.
*Ian Joseph is the founder of Infometrics, which specialises in providing technical services, equipment performance data analysis and information services for optimisation of service delivery and equipment performance. Joseph has worked as an instrumentation and control technician in a range of industries around the world, including iron and steel, petrochemical processing and paper manufacturing, before joining the Philips Group’s Scientific and Industrial division where he managed technical service delivery to a diverse automation and scientific client base. He has also worked in product sales and distribution, supplying control systems and engineering support services to a diverse base throughout Australia with emphasis on food, chemical and petrochemical, and pharmaceuticals manufacturing. He has an MBA and is working towards completion of a PhD.
Infometrics Pty Ltd
- Australian Standard AS IEC60300 Part 3.10, Dependability Management, Application Guide — Maintainability, 2004, p v
- Australian Standard AS IEc60300 Part 3.14, Dependability Management, Application Guide — Maintenance and Maintenance Support, 2005, p v
- Australian Standard AS IEC60300 Part 3.14, Dependability Management, Application Guide — Maintenance and Maintenance support, p 8
- Goble, W., Hydrocarbon Processing, HPIN Automation Safety, July 2007, p 126
- Doggett, Dr A M, Journal of Industrial Technology: A Statistical Comparison of Three Root Cause Analysis Tools, Vol 20 #2, February to April 2004, p 2
A successful cyber attack on critical infrastructure that disrupts the provisions of essential...
Treat maintenance as an asset, not an expense, by designing a data-driven strategy that improves...
In acheiving cybersecurity for an OT environment, selecting an experienced solution provider...