What are the commonalities between the recent Baltimore’s “RobbinHood” and San Francisco Municipal Transportation Agency “Mamba” malware attacks? Besides both crippling the activities of two Public Authorities, each event involved a ransomware.
What about similarities between Deutsch Bahn’s station and Lodz city tramway attacks? Not much, besides targeting railways. In the first case, the WannaCry virus affected Passenger Information Systems and mostly tarnished the German’s train operator image. In the second a fourteen-year-old Polish hacker derailed four trams after taking control over the entire network, through a modified TV remote control!
Cyberattacks in railways, unlike in many other sectors, can kill people (e.g., twelve were injured in Poland). This isn’t to say that a malware paralyzing a factory’s production line or a ransomware blocking access to a company’s server is tolerable but just to state that when lives are at stake, extra protection should be taken.
Since the origin of train transportation, life protection has been embedded in railway DNA through the concept of Safety. All networks with their connected equipment must be assessed according to levels of risks and hazards. Conditions are tested and failures predicted, which are then translated into four Safety Integrity Levels (SIL). Safety critical systems, such as the railway signaling system, will be graded SIL4 and need to be certified by independent assessors, before being allowed to operate. This lengthy and complex process is one of the major railway specificities, which must be accounted for when implementing Artificial Intelligence to protect railway systems efficiently.
AI and Rule-based Systems
Generally speaking, we can divide AI into two worlds: rule-based systems and machine learning.
Rule-based systems, also known as expert systems, are the simplest form of AI. They employ several optimization techniques to represent knowledge through rules.
In terms of coding, it means that we apply a series of “if-then” statements to a set of assertions, which can be seen as rules that act upon these assertions. Here’s an example: Assertion: there can only be one synchronization per node in a network; Rule: if more than one time synchronization per slave node – then publish time sync error”; Knowledge: there can only be one master clock that gives time to all devices on the network.
Anyone can easily understand why networks need one master clock. Without it, we couldn’t associate events in time or make devices work simultaneously. However, not every rule is simple to establish. In fact, in many situations we don’t know what to look for and must use data to find patterns enabling us to identify what type of assertions and rule must be created. While humans are bad at doing this, machines excel at analyzing raw data and proposing patterns. This is where machine learning and its famous subset deep-learning, come into the picture.
Machine Learning and Deep Learning
Contrary to rule-based systems, where each decision - expressed through rules - is manually programmed, in machine learning no human experience needs to be modelized. In other words, machine learning doesn’t need a programmer to establish these rules.
Whereas deep learning requires no human programming at all, machine learning still does. Indeed, to work properly someone still has to describe each element of a dataset in terms of a series of attributes to the learning program.
This process is more an art than a science, as the result will depend tremendously on finding good features. Let’s give examples of possible attributes for recognizing spams in an e-mail: many misspellings, words or topics out of context (e.g.,need a loan or sex), presence and structure of URLs, syntax, etc. This representation of data is then hard-coded as a series of features in a learning algorithm.
However, features aren’t always as obvious as the one in my example and must in some case pass through additional statistical processes to be identifiable from the dataset. Methods called feature transformation may be used to modify raw data into features suitable for modeling (e.g., in a comparable format). Other methods, such as feature extraction and selection create new attributes based on previous ones, and filter irrelevant ones. PCA, Linear regression, Decision trees, Fischer score are a few of these employed techniques to learn from the dataset what to look for.
In contrast to machine learning, deep learning refers to the techniques that extract these features from raw data, all by themselves. Deep learning is especially useful when the variables in the dataset aren’t really observable phenomena that can be quantified or categorized.It relies heavily on a technique called Artificial Neural Networks (ANNs), that is broadly described as a simplistic copy of the neural networks in the human brain, and that I describe thoroughly in my book.
Unlike machine learning that usually copes well with smaller datasets, ANNs require to be provided with lots of data. The more we can feed them with, the better the results.
Supervised and Unsupervised Learning
Regardless of how you extract features - through humans (i.e., machine learning) or computers (i.e., deep learning), you still need a program to come up with a result. If you know what to look for, meaning you already have a result in mind or examples of what the learning algorithm must find, then you can feed this information into the system. In other words, you may train the program.
If I go back to my spam detection example, you could propose thousands of spams and genuine e-mails to the learning program and check how accurate your spam detection learning algorithm worked. If you have 99% of correct answers and you are satisfied with this accuracy level, then you can run the supervised learning program on your e-mail network.
Broadly speaking, a supervised learning algorithm relies on a human being telling what the machine should expect in terms of result, creating a structure for solving a problem. Depending on the source, nature, and volume of data, it can be extremely time-consuming to create the labels (i.e., required output) for the algorithms to be trained with, even more so if you can’t easily define the main features of the dataset. In fact, the quality of a supervised learning process depends on the quality and quantity of labels that is used to train this program. Besides this issue, supervised learning integrates human biases and prohibits the program from looking at innovative solutions.
Unsupervised learning algorithms don’t use manual labels but attempt to construct their own representations.Basically, these algorithms try to summarize or classify unlabeled data in such a way, that they can give valuable insights on patterns, allowing us to check if the results make sense and can be useful. For instance, an oncologist could feed thousand of breast scans with or without any feature (e.g., tumors) and check if the system’s diagnosis is coherent with his or her own experience.
The Major Challenge of AI Interpretability
One of the biggest concerns with machine and deep learning is the difficulty to interpret results, especially in the context of railways. In fact, there are two challenges here: how a system extracts features and how it reaches a result.
Machine learning tend to propose techniques with higher interpretability, because a human being pre-identifies the attributes. A Decision tree is a good example of such a technique, as the algorithm follows logical paths easily understandable.Deep learning doesn’t provide this assistance and consequently results are harder to interpret.
Intuitively, we can deduct that supervised learning with its labelled data is easier to understand, because we know what to look for, in either types of learning algorithms. Though it is usually the case, some techniques (e.g., Support Vector Machine) make this exercise painfully difficult to do.
Although deep learning is a deterministic process, whenever used in conjunction with unsupervised learning methods, it becomes almost impossible to understand how the results were obtained. It’s linked to the inherent nature of an ANN, which computes the gradient needed to adjust weights and biases in its network layers, to optimize its utility function. To make things even more complicated, results of the computation are fed back into the ANN, and parameters are tweaked at each of the thousands hidden ANN layers, again and again, till a useful pattern can be established.
Can AI Be Applied for Cybersecurity in Railways?
Yes, technically it can, as academic researches show that an AI program’s rate of success at detecting malware can be as high as 99%. In practice, it already does, as shown in a recent Cap Gemini study, where 73% of 850 executives from various industries reported that their company were testing AI solutions for cybersecurity. In fact, it should be considered indispensable, as organizations cannot rely nowadays solely on cyber analysts. By the way, 61% of the interviewees in this study confessed that their enterprises couldn’t detect breach attempts without the use of AI solutions!
So, if AI technology is becoming indispensable, what should railway CEOs or CIOs look for? To answer this question, we need to go back to the railway main specificity, that is safety. Any non-safety related networks (e.g., Enterprise systems or ERP) may apply the same AI technologies as in any other verticals. For instance, any internet switch running on the operator’s enterprise ERP network could use effectively the same AI solutions as for any retailer or industry.
There is a caveat though. Because of their low interpretability, most unsupervised deep learning techniques would probably not be compliance-ready yet in these verticals and wouldn’t be either in railways.
However, for any network with safetyIntegrated level of 1 or higher, especially SIL 4 systems, it is an entirely different matter. Any threat detected there, resulting from cyber-attacks could generate derailments and direct injuries. It could also generate a denial of services, causing indirectly panics and injuries.
Which AI Technology Offers the Best Protection for Safety Networks?
If we consider technology generally not involving AI (e.g., encryption, data access and firewalls), there are two major cyber solutions for safety related network protection: Intrusion Detection System (IDS) and Intrusion Protection System (IPS). Both technologies read data packets running on a network and compare it to a database of known threats. An IDS differs from an IPS in the fact that it only monitors the data flow and then inform a human being of any abnormal pattern. On the other hand, an IPS accepts or rejects packets according to its own ruleset.
So, are IPS better in SIL networks because they eliminate the human factor and can act instantaneously? Not at all, for two reasons.
First, you want to avoid stopping a system for nothing, especially if it involves passengers. False positives, as they are called, happen when a system imagines a threat that doesn’t exist. They are very detrimental to a technology, because they may discredit it. By informing cyber analysts of potential malware rather than acting alone as an IPS, an IDS offers a human filter knowledgeable of the railway environment, protecting the solution from the discredit of too many false positives.
Secondly ,safety critical systems go through a lengthy certification process, which doesn’t consider security at all. The consequence of deploying an active solution such as an IPS that interferes with the data flow is that all railway operations would need to go through the certifying process again. Even more importantly, it means that the safety case would need to integrate security conditions. Because the human mind is ingenious and will come up with infinite ways of attacking a network, risks that per nature cannot be identified priorly couldn’t be mitigated. Thus, with any active IPS solution no SIL 4 systems would ever be homologated.
Ruled-base or Machine Learning IDS for Safety Critical Systems?
Safety related systems cannot be tampered with and consequently only an IDS can be implemented. This also means that an IDS solution cannot read directly the codes. Viruses must be detected indirectly by analyzing the behavior of the network and of the devices running on it. Through the flagging of abnormal patterns, we can understand that rules have been broken.
Fortunately, safety related systems follow standards rigorously. In other words, a cybersecurity solution that can read the protocols will work exclusively with pre-labelled data. Since data and its features are already structured, the need for machine learning techniques is obviously limited (not to say for deep learning…) and rule-based system are much more efficient.
Having said that, a rule-based IDS still misses one key element to fully master the data flow. The safety related network characteristics must be detailed in order to set a baseline, which will form the list of approved connected assets.
This is extremely important as a network is as strong as its weakest point. By the way, it’s amazing to see how little importance is given to such a basic protection.Unfortunately, many companies are finding it the hard way!
It is why a railway cyber specialist like Cylus, systematically offers an asset audit, as a first step. During this process, their IDS launches an auto-discovery process.Using pattern learning techniques, their AI cyber solution will not only be able to discover all devices and identify their properties, but also understand how their features react through time, creating thresholds for setting rules. Once the baseline is parametrized within these thresholds, any deviation will be considered as abnormal and raise an alert. This means that with such an IDS solution, no database of known malware code is required. This characteristic is essential for safety critical networks because it would anyhow be prohibited from actively searching for viruses.
AI is here to stay in railway cybersecurity
Whether we like it or not, AI is there to stay in railway cybersecurity protection. The “bad guys” are using AI to create ever more potent malware, actively searching for any network’s most vulnerable point to exploit and therefore, we need to fight fire with fire. Though currently the best protection to safety related systems is rule-based systems, a simpler version of AI, it is highly probable that cyber solutions will evolve and integrate machine learning.
There is especially one area of research that offers very interesting prospective for combining both AI techniques, called Rule-based Machine Learning (RBML). RBML applies a learning algorithm to automatically identify useful rules, which can then be hard-coded. Though still in its infancy, RBML has already been applied in railways, in a risk assessment of railway accident, to automatically identify relevant safety rules from a base of historical scenarios, which may otherwise have been difficult to extract from safety experts.