
A cybersecurity management system (CSMS) is a management system that uses a risk-based approach to identify organisational processes, responsibilities, and controls for handling cybersecurity-related risks and protecting vehicles from cyberattacks. In the automotive industry, as part of their CSMS, original equipment manufacturers (OEMs) are called upon to verify that their measures to address reasonably foreseeable vehicle security risks are accurate. Therefore, compliance with UN Regulation No. 155 (UN-R155) on the approval of vehicles regarding CSMSs requires vehicle manufacturers to demonstrate the sufficiency of their measures from two perspectives: 'verification and validation' of security measures derived from the TARA (Threat Analysis and Risk Assessment) method.
'Verification and validation' includes vulnerability testing and penetration testing, which are achieved through a combination of various techniques. This article introduces ‘fuzzing’, a technique for detecting out-of-specification behaviour in order to find known or unknown vulnerabilities in software.
Fuzzing (also known as fuzz testing) is a technique for detecting software vulnerabilities by inputting intentionally malformed data, called fuzz, into a product and observing the output results (behaviour). Fuzzing is a technique that has been widely used for decades, and can be effective for detecting vulnerabilities in the following areas.
CWE: Common Weakness Enumuration
Fuzzing for in-vehicle equipment can be viewed as a test to confirm that the target functionality has been implemented correctly, thereby verifying the sufficiency of security measures. The fuzzing process itself is very simple. Intentionally malformed data called fuzz is input to the product, and the resulting behaviour is used to judge and detect abnormalities.
DUT: Device under test
There are several important points to consider when conducting fuzzing, depending on the equipment and software being tested. These points are particularly important because a failure to sufficiently address them can lead to miscommunication between developers and testers, and between OEMs and suppliers. This article describes several matters that all stakeholders should reach a common understanding or agreement on when practicing fuzzing.
Depending on the fuzzing targets and methods used, abnormalities may be detected due to the behaviour of protocol stacks or SoC (system-on-a-chip) firmware in whose development and modification the supplier was not involved. When this occurs, it can be difficult for the supplier to analyse the cause or correct the abnormality on its own.
For this reason, fuzzing should generally be performed only on software that the supplier has developed itself and in which the supplier can analyse the causes of any problems and correct them. However, if it becomes necessary for the supplier to perform fuzzing on software that was developed by another party, it is important to reach an agreement in advance with the relevant OEM on the response policy to be used when an anomaly is detected that can be attributed to such software.
Developers should also consider how much information to give to testers before performing fuzzing. As shown in the figure below, tests can be classified as ‘black box’, ‘grey box’ or ‘white box’ based on the amount of information given to the testers.
Fuzzing for in-vehicle equipment generally assumes a ‘black box’ testing method, in which testers have no knowledge of the software implementation. However, while some fuzzing methods involve, for example, the input of random data with no prerequisite knowledge at all, such methods cannot be said to result in efficient testing. Therefore, in order to ensure efficiency, fuzzing is often performed by using a ‘grey box’ method where testers have some understanding of the undisclosed protocols and data structures handled by the software, such as CAN message specifications.
Using grey-box testing to enable an understanding of which protocols and data are interpreted by the software used for a given feature not only improves testing efficiency, but also makes it possible for testers to infer software implementations and weaknesses. It is also effective in showing that fuzzing has confirmed that the target functionality has been implemented correctly, as explained at the beginning of this article.
Some in-vehicle devices operate independently, but most are connected to networks such as the CAN bus and operate by communicating with other ECUs (electronic control units) and sensors. Therefore, depending on their specifications, some ECUs may not operate properly if these peripheral devices are not working properly. For this reason, when conducting tests, it may be necessary to prepare an environment in which the ECU can operate correctly. Although the specific possible environments will depend on the development phase, test environments can be broadly categorised into the following three types.
So, which of these environment types is the most appropriate for fuzzing? Testing in the actual vehicle environment, as shown in the figure above, is unlikely to be possible as such testing can generally only be performed by OEMs. Depending on the hardware to be connected, using a test bed environment can adversely affect the mobility of the test environment, including possible test locations and movement methods. Because fuzzing involves testing the software of the target device by directly inputting data to the device and observing the resulting output, the physical locations of connection harnesses and the target devices themselves have the potential to hinder the testing process in either an actual vehicle environment or an environment involving some physical connections. For this reason, it is not easy to automatically determine abnormalities based on the surface movements of an actuator. Therefore, unless it is clear that the target ECU will not operate properly without physical equipment, fuzzing can be adequately handled in a test environment that reproduces the peripheral environment using a simulator.
Such a simulation environment can be used to easily create a state in which the system, including peripheral devices, must meet certain conditions to be operational, and can be shared via configuration files and other means. However, if the tester does not have a simulator, you will need to consider in advance whether it is possible to provide the tester with a simulator that includes hardware.
One key point when conducting fuzzing is the method to be used to monitor equipment anomalies (alive monitoring). Because fuzzing determines abnormalities based on the behaviour of the target device, it is important to select a monitoring method that matches the specifications of the product. For example, when fuzzing is performed on a control ECU with a CAN interface, the following events can be considered factors for judging an abnormality.
If the target ECU is an actuator or another device that has some control functionality, you can also consider a method that monitors the signal lines for that functionality. In this case, you may need a separate measurement device (e.g. an oscilloscope) suited for the target.
It is also important to keep in mind that the alive monitoring method you use must be appropriate. As an extreme example, if you use ICMP Echo to perform alive monitoring when fuzzing an application that uses TCP/IP, you will be unable to correctly detect abormalities in the target application (such as the abnormal termination of a process) because ICMP is implemented by another software program running at a lower layer than the application.
You can also conduct more efficient testing by considering the operations to be used when an abnormality occurs. Normally, when a tool detects an abnormality, the tool immediately ceases operation. This is because, regardless of the cause of the abnormality, the software is most likely not in the normal state at the time the anomaly is identified. To resume testing, operations such as the restarting of equipment or processes need to be performed. Depending on the target, you may be able to consider automating such operations, further reducing the number of work hours needed to perform fuzzing. If you are able to automate the operations to be performed when an abnormality is detected, the tool will be able to execute tests on its own, which is particularly effective for situations such as tests that require a great deal of time simply to input data, such as those requiring a large amount of fuzz data to be input.
Some of the values that are set for fuzzing may include those used for boundary value analysis, which is used for unit and integration testing. However, because fuzzing is intended to detect abnormal software behavior that was not intended by the developer, the basic concept behind the setting of these values differs from that used in boundary value analysis. For example, in equivalence class partitioning, a part of boundary value analysis, changes in the output of the software are used as an indicator, whereas in fuzzing, the indicator is a value that takes into account how the software interprets the data (such as the value before or after a digit overflow or a format string). Thus, while the same data can sometimes be used for both boundary value analysis and fuzzing, the two testing methods are fundamentally different.
As mentioned above, fuzzing is performed by considering the data structure and how the software interprets the data. However, because fuzzing can generally only determine abnormalities at the availability level, it cannot be used to determine the possiblity of exploitation of such abnormalities, such as arbitrary code execution or the leakage of confidential information. This is due to the fact that fuzzing determines anomalies based on the output obtained in response to a given input.
For example, if a given set of fuzz data causes a process to forcibly terminate, this is determined to be an abnormality based on the surface behaviour of the equipment—i.e. the fact that the process has been terminated and no response can be obtained from the corresponding function. However, such an event has multiple possible causes (for example, insufficient resources, memory corruption, or an out-of-specification termination condition due to a design error), and the specific cause cannot be determined based only on the surface behavior.
Therefore, to investigate the cause of the abnormal behaviour, perform triage based on the results, and to investigate whether the abnormality presents the possibility of a high-risk attack (such as arbitrary code execution), additional information, such as debugging information on the device, is needed for further analysis.
These actions are to be taken by the equipment developer, who can check the source code to identify the location of the root cause. It is therefore important for testers to confirm the reproducibility of abnormalities detected by fuzzing in order to facilitate such post-testing. However, if the abnormality is caused by a problem that occurs only at certain times or when a certain amount of time has passed, such as a memory leak or a race condition, the abnormality may not occur with certain fuzz data and may not be able to be reliably reproduced. When the fuzz data that caused the abnormality cannot be immediately identified, it can sometimes be more efficient for the developer and tester to work together to identify the reproducibility and cause of the problem.
Fuzzing is one of the most common techniques used to detect software bugs and vulnerabilities. However, to demonstrate the sufficiency of security measures, it is important to reach a common understanding and agree upon an implementation policy with the relevant OEM and other stakeholders in the supply chain to ensure both the quality of test results and accountability. The following are some of the key points that OEMs, suppliers, developers, and testers should have a common awareness of and, if necessary, reach an agreement in advance in to ensure the efficiency and effectiveness of fuzzing.