Autonomous systems: big data in products, services and operations
By Dave Oswill, Product Marketing Manager, MathWorks
Thursday, 01 November, 2018
What data scientists and engineers need to know when working with big data as they move from ‘conceptualisation’ to ‘operationalisation’ of their designs.
Businesses are greatly expanding the autonomous capabilities of their products, services and manufacturing processes to better optimise their reliability and efficiency. The processing of big data is playing an integral role in developing these prescriptive analytics.
As a result, data scientists and engineers should pay attention to the following aspects of working with big data as they move from conceptualisation to operationalisation of their designs:
- Accessing data stored in various formats and systems.
- Finding and deriving relevant information in data.
- Using tools that scale to big data for both development and operationalisation.
By remaining mindful of when, where and how these challenges arise during the big data design process, data scientists and engineers will be better able to complete their projects on time and on budget.
Aggregating disparate data sets
One of the first steps in the development of an automated system is to select a scalable tool that can easily provide access to a wide variety of systems and formats used to store and manage big data sets. Data is often scattered, making it time-consuming to collect and categorise. For example, sensor or image data stored in files on a shared drive may need to be combined with metadata stored in SQL or NoSQL databases. Data may also reside in large-scale distributed storage and processing frameworks such as Hadoop and Spark.
In other cases, data in disparate forms (delimited text, spreadsheets, images, videos and proprietary formats) must be used together in order to understand the behaviour of the system and develop a predictive model. Businesses should look to equip their team with data analysis tools that provide a platform and workspace where engineers and scientists can easily access and aggregate big data sets.
Understanding what’s in your data
After the data is collected and aggregated, data scientists and engineers must interpret and transform that data into some form of actionable insight. Although any number of interpretive methods can be used, several broad techniques make it easier for engineers to summarise variables in a data set and uncover meaningful trends:
- Summary visualisations, such as binned scatter plots, provide a way to easily view patterns and trends within large data sets. These plots highlight areas where data points are more highly concentrated and then use a slide control to adjust colour intensity, which lets the designer interactively explore large data sets to quickly gain insights.
- Filtering and other signal processing techniques not only enable developers to detect slow-moving trends or infrequent events spread across data that are important to take into account in the theory or model, but they also enable developers to derive additional information from a set of data for use in predictive models or algorithms.
- Programmatically enabled data cleansing allows bad or missing data to be fixed before a valid model or theory is established, and it allows the same data-cleansing algorithm to be deployed in a production application, service or product.
- Feature selection techniques help developers find the data that is most relevant for the theory or model, enabling a more accurate and compact implementation of predictive models or algorithms.
Working with large-scale data
Data processing at scale is another crucial consideration in the design of automated systems. Although many data scientists and engineers are most efficient when working on a familiar workstation, data sets are often too large to be stored locally and require a level of software analysis, modelling and algorithm development that only a cluster-based computing platform can handle. Modelling tools that allow developers to easily move between systems without changing code greatly increase design efficiency.
Data scientists and engineers should look for a scalable data analysis and modelling tool that builds in enough domain-specific features to allow them to conveniently access data and easily work with it using familiar syntaxes and functions. By providing tools the domain expert commonly uses with easy-to-use machine learning functionality, engineers can combine their domain knowledge with the tools of the data scientist, allowing them to make more effective design decisions, quickly deploy their models, and test and validate the accuracy of any given model.
Once a data scientist or engineer has walked through the process and the associated challenges of designing a big data system, a final consideration must be assessed: the ability to rapidly operationalise predictive models and algorithms for enterprise-scale applications.
There are scalable data analysis and modelling tools available on the market that can provide product development teams with the domain-specific tools they need. With these tools, engineers and scientists can rapidly develop and integrate algorithms into their automated and embedded systems without the need to manually recode in another language.
By anticipating these aspects of working with big data, data scientists and engineers will be better able to integrate automated systems into their project chains in order to more quickly adapt to changing environmental and business conditions and address market needs more effectively.
Protecting critical infrastructure systems is imperative, but it is necessary to determine how to...
There are very strong parallels between industrial cybersecurity and those of process safety in...
Cybersecurity threats are ubiquitous and far-reaching. But the stakes are highest when the...