May 24, 2017
High-performance computing (HPC) has opened the door to many innovations in research, academia and businesses. Yet, it has also introduced new cybersecurity challenges, as the computational power has attracted cybercriminals, as well as malicious insiders:
- Cybercriminals have compromised HPC clusters to leverage their extensive computational power by running password-cracking tools in an attempt to decrypt stolen password databases.
- Other cybercriminals have targeted HPC systems to leverage the high bandwidth connections of such systems to launch distributed denial of service (DDoS) attacks.
- Malevolent users have been tempted to misuse computing cycles for their own personal gain, like the U.S. researcher who was caught mining for bitcoins on National Science Foundation HPC systems back in 2014.
- Other insiders have altered data in HPC systems to sabotage the research.
- These are not the only security challenges facing HPC systems, as concerns for data integrity and privacy are growing in open science and highly distributed HPC environments. Many questions surround HPC security:
- How can we validate that the hundreds and even thousands of authorized users are appropriately using the HPC resources?
- How can we confirm that data used as the basis for research and analytical work has not been tampered with?
- How can we then trust the end result?
HPC systems have generated a lot of legitimate questions on their security maturity level, and many have tried to tackle this security conundrum by applying traditional IT security solutions. But traditional security solutions cannot meet all expectations for cluster security without hindering performance or even blocking legitimate and necessary data transfers.
HPC systems are built as a collection of geographically distributed resources, where open international collaboration is more than often the norm. Such systems provide resource management and allocation, so that end users can perform mathematical computations and analytics. It is quite challenging to coordinate security across different node platforms and specialized function nodes.
“Security solutions will need to evolve and appropriately manage the crushing volume of data generated by HPC ...”
— Zeina Zakhour, Atos
Moreover, some HPC clusters run exotic hardware and software stacks, with dispersed ownership and difficulties in tracking potential vulnerabilities in all these systems. Cluster security is an evolving concept. A single compromised node in a cluster increases the risk to the rest of the cluster nodes due to the fact that many nodes share identical configurations. To secure HPC systems, we must secure the data throughout its lifecycle, and a number of solutions have been implemented and specifically developed for HPC environments.
HPC Secure Architectural Framework
Science DMZ is a security model implemented for HPC systems to optimize network throughput by isolating the scientific HPC systems in a dedicated network enclave. This architecture facilitates access and network flows monitoring, as well as access restriction, through router access control lists, security-enhanced Linux (SELinux), IPTables, etc. A Medical Sciences DMZ is also used, leveraging the Science DMZ framework for computing environments requiring compliance with the Health Insurance Portability and Accountability Act (HIPAA). In such architectures, storage and compute nodes are not connected directly to the internet, and network flow containing sensitive data is encrypted.
User Authentication and Authorization
It is necessary to implement appropriate access control aligned with the least privilege principle to create individual user accounts with specific permissions across the HPC environments. Seagate Secure Data Appliance, or tools such as SELinux and Kerberos, provide such necessary access controls. Implementation of additional access control solutions, such as multi-factor authentication (MFA), will depend on the HPC environment and usage. For an open environment, where thousands of international users are involved, it is quite difficult to use MFA. Also in HPC environments, users will need longer life certificates, as they run mathematical calculations that take months to complete.
Cloud HPC clusters will require an adapted end-to-end encryption solution that allows decryption only at job execution and is generic enough to be usable on multiple sites. One solution would be to use encryption only when the payload or privilege level of operation requires it, instead of encrypting all traffic, thus limiting the impact on HPC performance. Homomorphic encryption also is being considered and tested by a number of European countries to secure data in HPC clusters. In homomorphic encryption, data remains encrypted through the storage and computational resources and applications work on the data without decrypting it. Today, this solution remains very processor intensive and, therefore, expensive. However, it remains promising.
HPC Systems Security Monitoring
Monitoring HPC clusters is essential to validate that the data traffic is legitimate and to detect any attempts to misuse the system. However, the traditional security monitoring solutions cannot ingest the high volume of data flow in HPC environments. They either will drop data and not be able to inspect all data flows or will impact the overall performance of the HPC network.
Although, today, we do not have commercially ready intrusion defense systems (IDS) and intrusion prevention systems (IPS) for HPC, interesting proof of concepts have been conducted by Lawrence Berkeley National Laboratory (LBNL) and the National Energy Research Scientific Computing Center (NERSC); they have deployed an “HPC friendly” IDS solution based on Bro Open source IDS technology. To facilitate the ingestion of data from very large network feeds, they distributed the data via the packet processing layer to create an intelligence load-balancing front-end solution, which divided the load into a series of worker nodes, allowing the data to be continuously analyzed in smaller streams.
HPC Systems Process Monitoring and Behavior Analytics
Behavior analytics and process monitoring are essential tools to detect, early on, any intrusions or attempts of misuse. As most users use the resources for specific research needs, regular and recurring activities are operated on HPC clusters. Therefore, any misalignment or unusual usage is easily detected.
Just as companies are using HPC systems for fraud detection, it is important to run such instances on HPC systems to detect unusual activity. Unusual activities could include
- increased network latency;
- increasing CPU usage;
- unauthorized jobs bypassing job scheduler;
- a large number of processes running on a node; and
- source data being rewritten.
One key principle in a traditional enterprise-security environment, which should apply for HPC environments, is to build an integrated and intelligence-driven security framework. The HPC cybersecurity controls, from network analysis to processes monitoring to user activities, must be analyzed and correlated to get a near real-time visibility on the security risks and potential intrusions on the system.
Given the fast pace in technological changes experienced within HPC environments, cybersecurity solutions will need to be co-developed with this community to adapt and bring security-by-design principles into action by integrating security in the hardware and software stacks. HPC environments foster high performance and collaborative operational models and require a security framework that balances performance and security controls. Therefore, security solutions will need to evolve and appropriately manage the crushing volume of data generated by HPC clusters to efficiently and transparently reinforce the security controls.
Zeina Zakhour is global chief technology officer for Cyber Security at Atos (Atos.net), creating by day (and a few nights) innovative solutions to be a step ahead of cybercriminals. Not an easy task, but she is putting her 16 years of experience in the cybersecurity field to good use. Zeina covers the spectrum of cybersecurity, from security advisory, to integration, managed security services and IoT and big data security.