Industrial Control Systems
Purpose
Help to better understand ICS networks and ideas on protecting them from cyber attacks.
Discuss common weakness and vulnerablities along with cyber risks for ICS.
Importance of knowing the networks that needs to be protected
Discuss mitigation strategies and defense in depth for more secure ICS environment
Basic Concepts
IT
IT refers to anything related to computing technology.
IT Infrastructure Components
The most common IT infrastructure components are: Switches and routers, Firewalls, Remote access, Databases, Clients, Local Area Network (LAN) / Wide Area Network (WAN), Servers, Wireless access.
Switches and Routers
Switches are used when connecting computers, printers, databases, and other networking equipment. To optimize communications and make sure data are going where they need to go, switches provide a high level of control and efficiency within the network. Switches can be used to isolate communications between specific devices and are configured to regulate network traffic, ensuring the network doesn’t get congested by too much information. If they’re not available, network data just don’t flow.
Switches generally come in two different types: Managed switches and unmanaged switches.
Managed switches are fully configurable. They provide tremendous flexibility and usually more capacity than unmanaged switches. They allow the administrator granular control over the network. Management can be done locally or remotely.
Unmanaged switches are switches that you buy, take out of the box, and put on the network. There is no requirement to configure them; in fact, they are often designed so they cannot be configured. An unmanaged switch is simple to use, such as the switch built into the router that your Internet Service Provider may provide for your home network.
Routers act as a dispatcher, choosing the best path for information to travel so it’s received quickly. While switches are used to connect components within a network, routers are used to connect networks together. Routers determine the best path for networks to connect and are configured so information is always up-to-date and accurate.
LAN/ WAN
A Local Area Network (LAN) supplies networking capability to a group of computers in close proximity, such as in an office building, a school, or a home. A LAN is useful for sharing resources such as files, printers, games, or other applications. A LAN often connects to other LANs, to the Internet, or a wide area network (WAN).
A WAN is a geographically dispersed telecommunications network. The term distinguishes a broader telecommunication structure from a LAN. A WAN may be privately owned or rented, but the term usually includes a connection with, or through, public (shared user) networks. An intermediate form of network in terms of geography is a metropolitan area network (MAN).
From a cybersecurity perspective, most organizations deploy their information infrastructures in a manner that implies trust between all assets connected to it. In a LAN environment, the trust relationship is easier to control because most, if not all, the assets connected to a LAN are managed internally. Securing a WAN is much more complex and requires cross-domain trust, authentication, and management.
Remote Access
Remote access allows a user to connect to a network or system as though they were physically located at the console.
Remote access extends the network outside of physical and network perimeters and allows access from anywhere. Organizations provide remote access services to support telecommuters, remote management and support, and vendor support access.
Remote access components can include modems, remote access servers, virtual links, or any capability that facilitates non-local user access into the IT infrastructure.
Firewall
A firewall is a network security system that controls incoming and outgoing traffic based on an applied rule set. A firewall establishes a barrier between a trusted secure network and another network (e.g., the Internet) that is assumed to be insecure and untrusted.
Administrators use access control lists (ACLs) to create rule sets to ensure only authorized communications occur between networks. Firewalls can be simple or complicated, but almost all have the capability to be actively managed by an administrator. Even the firewall you use to protect your home network has that capability.
Databases
Businesses demand users have access to vast stores of information; historical information as well as up-to-the-minute data that can influence current and future business decisions. Having timely access to data, either historical or recent, is vital for ensuring optimum performance.
Databases store information needed for operations, such as customer lists, marketing and sales data, accounts payable and receivable, payroll, and order tracking. Many business decisions are made, and operations function, based on the numbers or values that reside in databases. The networking component of the IT architecture provides communications between databases. The client queries the database for information and uses that information locally, then sends updated information back to the database.
Databases are often configured to reside on servers so clients from across the organization can request data, often in user-customized formats and configurations. Large, centralized databases may exchange information with peripheral secondary databases located across the business domain.
ICS databases hold critical information used to ensure proper set points and functions on devices, or to gather monitoring information used to determine system state. They can include time-stamped data, events, or alarms that are queried or used to populate graphic trends in the human-machine interface (HMI). ICS database security is an important consideration when designing overall defense strategies.
Wireless Access
Wireless access allows for the information infrastructure to be expanded quickly and effectively without having to lay network cable, drill holes, or adjust ceiling conduits. Almost every modern IT device has the capability to use wireless, and fewer organizations are building hard-wired networking infrastructures; their business architectures are being built primarily using wireless networking. ICS operators and vendors also use wireless communications to manage, monitor, and control their ICS devices. Many control systems include built-in wireless capabilities.
Wireless access points are actually routers and require specific attention regarding security and reliability. Wireless access points are an attractive target for the cyber adversary, allowing direct access into the infrastructure or devices connected to it. The availability of wireless access points is critical to the effectiveness of the network. If rendered inoperable, no one can connect to the network without being on the wire.
Historically, the communications protocols used by wireless systems were not very secure; however, current technology provides the capability to securely deploy and manage these devices if the organization enables it. Sometimes the overhead of applying and managing secure wireless configurations is considered too much for an organization to bear so it employs useful but insecure access points.
Because of the criticality of the devices in an ICS, and the potentially dire consequences of a compromise, ICS wireless architectures should be carefully configured with as many cybersecurity controls applied as possible while still allowing required functionality.
Servers
Web pages, mail, customer service portals, and other information services usually reside on dedicated network hosts called servers (or sometimes, application servers). Clients connect to servers to perform tasks or use services that are common to a group. Servers are the workhorses in the IT infrastructure—they “serve” the applications, databases, stored information, and services to the clients on a network. Because organizations often maintain large information stores on dedicated servers, they typically have a large amount of random access memory (RAM) and storage space.
Modern IT environments may have servers located in any number of physical locations. Connectivity between the servers and clients is supported through the networking infrastructure. As organizations continue to grow and more information needs to be made available to more users, businesses add more servers—and the security footprint of the business grows along with it.
The security of these assets is important because if they are unavailable or the information they contain is corrupt, users and applications cannot work properly. Security protection profiles can vary from server to server, depending on the information stored or processed on the server and the requirements of the business. The protection of the information stored or transmitted by servers, and controlling access to them are vital components of an organization’s cybersecurity strategy.
Clients
Clients are the information resources, such as personal computers, laptops, or smartphones that provide an interface for users to view and manipulate digital information.
Clients are the most common interface between human users and information. Clients often depend on information that resides both locally and on servers that could be elsewhere.
Local applications run on the client, and help us process information or connect to other devices in our networking environment.
In the control system domain, clients are called HMIs - a computer used to control and manage processes in critical infrastructure sectors such as energy, water, and transportation systems.
Client-Server Relationship
The relationship between clients and servers can be confusing because both can also be called a host. The table below helps to clarify this relationship.
Host |
Server |
Client |
---|---|---|
Always a physical node |
Can be a physical node or a software program |
Can be a physical node or a software program |
Can run both server and client programs |
Installed on a host |
Installed on a host |
Provides specific services |
Provides specific services to clients |
Accesses specific services available from the server |
Serves multiple users and devices |
Serves only clients |
Stand-alone or part of a client-server network |
UPS Battery Backup
Many data center and control system environments have back-up power on standby. This back-up power supply ranges from a simple off-the-shelf universal power supply (UPS) under a desk, to larger ones found in server racks, taking up the same space as a 4U server.
Other back-up power solutions involve a building being equipped to handle its functions, such as a different room with a wall of batteries, or generators ready to kick on.
Virtualization
HMI workstations, Historians and Databases, and anything that uses a standard operating system on any workstation or server platform, could be virtualized.
IT Cybersecurity Tenets
The International Organization for Standardization (ISO) defines information security as the preservation of confidentiality, integrity, and availability of information. These 3 tenets are used to select the security controls placed on a system, and help asset owners determine priorities for protecting their critical information and systems.
Confidentiality is defined by the ISO as ensuring that information is accessible only to those authorized to have access.
For example, a credit card transaction on the Internet requires the credit card number to be transmitted from the buyer to the merchant and from the merchant to a transaction processing network. The system attempts to enforce confidentiality by encrypting the card number during transmission by limiting the places where it might appear, and by restricting access to the places where it is stored. Confidentiality is necessary for maintaining the privacy of the cardholder’s personal information held in the system.
Integrity is maintaining and ensuring the accuracy and consistency of data over its entire life cycle. All characteristics of the data, including business rules, dates, definitions, lineage, and rules for how pieces of data relate must be correct for data to be complete.
Availability is the proportion of time a system is in a functioning condition. For any information system to serve its purpose, the information must be available when it is needed.
General IT Security
Security controls are the mechanisms used to mitigate vulnerabilities. An example of a security control is patching.
The Information Security Standard 27002 (ISO-27002) outlines hundreds of potential controls and control mechanisms
SANS provides a set of security policy templates that can be used to define policies.
Security Policy
The IT/ICS security policy document should reinforce management’s commitment to information security and contain the following items:
A definition of information security and its importance to the organization.
The intent of the policy regarding goals and principles of information security in conjunction with business strategy and objectives.
The structure of risk assessment and risk management as a framework for establishing controls.
Essential security policies, principles, standards, and compliance requirements, including the Rules of Behavior expected for all computing users.
Specific Security roles, responsibilities, accountabilities, and authorities (R2A2s) for IT/ICS security management and implementation, including reporting IT/ICS security incidents.
References to other policies and procedures that identify detailed security processes that everyone in the organization is expected to follow.
Access Control
An access control policy defining user or group rules and rights should be clearly stated. Access controls for both logical and physical assets should be considered together. The policy should consider:
The security requirements of each application/system.
The identification of all information related to the application or system and risks associated with access to the information.
The implementation of Least Privilege concepts for access to systems and applications.
Standard user profiles for common job roles.
The segregation of access control roles, such as access requests, access authorizations, and access administration.
Formal procedures for access requests and approvals.
The removal of access rights, including requirements for notifying administrators when individuals are transferred, their roles or access authorizations change, or they are terminated.
Asset Management
IT/ICS assets include:
Information such as databases, systems and research information, logs, operational or support procedures, continuity plans, failure and recovery procedures, and archives.
Software assets such as application software, system software, development tools, and utilities.
Physical assets such as computer equipment, communications equipment, removable media, and test and analysis equipment.
Services such as computing and communications services, general utilities such as air conditioning, fire protection, surveillance services, and other services.
People and their capabilities, skills, and experience; in addition to the reputation and image of the organization.
All IT/ICS assets must be clearly identified, inventoried, and maintained. The ownership of assets must be clearly identified, and the asset owner should be responsible for ensuring that IT/ICS assets are appropriately classified based on risk and are periodically reviewed for access restrictions and classification.
Business Continuity
The business continuity management process is implemented to protect critical business processes from the effects of major failures in systems as they relate to IT and ICS environments. The reliance on automated processes leaves an organization in distress when these automated processes are no longer functional and is exacerbated when no prior planning or instruction exists on how to cope with the event.
Understanding the impacts of an interruption caused by a security incident are important tools, even for the most seasoned business executive.
Communications and Operational
Operating procedures should be documented, maintained, and made available to all authorized users. These procedures should specify instructions for the execution of each function, to include processing and handling of information, backup and/or restoration instructions, scheduling and interdependencies of work, abnormal execution instructions including support contacts, restart instructions, and the expectations for managing system log information.
Configuration management and change control policies and procedures must be controlled and maintained. Any changes to systems must be tested prior to implementation, and should be independently reviewed and security controls verified. Updates to all documentation associated with the change should also be accomplished before the change is implemented. Logs of all changes containing relevant test procedures and results should be maintained.
Compliance
Individuals must understand the legal ramifications of their activities. Compliance activities fall into categories such as regulatory, legal, statutory, contractual, security, and intellectual property/copyright/trademark issues.
Control of the organization’s legal obligations requires that any of these items be documented and kept up to date. Usually the legal department should be involved in any IT or ICS acquisition associated with any legal or statutory requirement.
Records associated with information, systems, applications, or security should be categorized and maintained, and a retention schedule identified. If the records are stored electronically, procedures to access the data for the retention schedule must be ensured, even when technology changes render obsolete systems unusable. These technology changes normally require a conversion process that should be overseen by the owner of the information
Human Resources
Security roles, responsibilities, accountabilities, and authorities (R2A2) for employees, contractors, and third-party users must be defined and documented according to the organization’s IT/ICS security policies. Security roles and responsibilities include:
Requirements to implement the organization’s security policies
Protect assets from unauthorized access, disclosure, modification, destruction, or interference
Execute security processes as required
Take responsibility for individual actions
Report security events that pose a risk to the organization.
These R2A2s must be communicated to employees, contractors, and third parties prior to beginning work. Job descriptions provide an effective means of communicating the security responsibilities. In addition, the continual reinforcement of R2A2s via regular training play an important role in keeping personnel aware of their security obligations.
Information Systems Acquisition, Development, and Maintenance
Procurement specifications for systems must take into consideration the security controls to be incorporated into the system. DHS provides templates for Procurement Specifications that include significant security provisions. They should be used for even the smallest of acquisitions and considered mandatory for large scale, sensitive environments.
A formal testing and acquisition process should be followed by the organization. Contracts should address the identified security requirements. Additional functionality built into the product that may cause a security risk should be disabled/removed.
Many products have been evaluated formally for security and are certified for a particular use. A process in place to review equipment and evidence of security is provided through an “Evaluation Assurance Level” document performed by independent contractors to review equipment and software to the formal requirements of Common Criteria (ISO/IEC 15408).
Physical and Environmental
Security perimeters must be used to protect areas containing sensitive IT/ICS facilities. These perimeters should be clearly defined, access controlled via electronic locks or other physical barriers, and monitored via surveillance equipment.
Third-party users (vendors, support personnel, etc.) should be physically separated from the organization’s sensitive facilities. Likewise, public access, delivery and loading areas should be controlled and isolated where feasible.
Risk Assessment
Business exposure to IT/ICS security risks is a balance of cost and potential harm. This includes physical harm to people and assets, loss of reputation, environmental harm, regulatory violations, and others. Each of these potential business exposures has a cost that may be equated to the bottom line in terms of profits.
An organization performs a risk assessment to identify, quantify, and prioritize the risk against criteria for managing the risk and objectives related to the business bottom line. The product of the risk assessment is an estimate of the magnitude of the risk (risk analysis) and the significance of that risk (risk evaluation), along with the identification of potential threats and the current vulnerability to the threats (risk exposure).
Risk assessments are as important to an organization as any other business undertaking and should be a key activity performed on a regular basis. They should involve key stakeholders of the organization, including those with knowledge to assess the risks and prioritize mitigation actions based on the criticality of the asset. These key human resources should include, but are not limited to, CIO, Legal Counsel, Risk Manager, Compliance Manager, Public Relations, and Physical Security.
Step 1: Conducting a Risk Assessment
Conduct a risk assessment by:
Identifying the threat.
Determining the likelihood and impact of the threat.
Identifying the vulnerabilities that could be exploited by the threat.
An effective assessment involves learning as much about the system, its threats, vulnerabilities and impacts as possible, and then analyzing the risks by:
Developing a method to measure them.
Summarizing them.
Communicating the risks.
Step 2: Cyber Risk Mitigation
Risk mitigation is the process of taking actions to eliminate or reduce the probability of compromising the availability, integrity, and confidentiality of an ICS to acceptable levels. The justification for any risk reduction project is spelled out in the business case.
The business case establishes goals, identifies alternatives, ranks the alternatives, and then picks the best option for implementation.
The Pareto Principle: While looking at alternatives, apply the Pareto principle: 20% of the effort will cure 80% of the problem.
In other words, focus on solutions that provide the best bang for the buck. Many of the cyber risks can be reduced without technology by adopting and enforcing policies and procedures that define how staff interacts with the ICS systems.
Risk management for ICS cybersecurity is not a project, but a process.
How can you conduct risk evaluation?
Designate an evaluation team
Verify alternatives were already implemented
Conduct periodic reviews of the alternative’s effectiveness
Document the review
Once you have completed implementation of the project, you should conduct periodic reviews to ensure the alternatives have been implemented and are effective.
What questions should managers ask about potential cybersecurity risks?
How could cybersecurity threats affect the different functions of my business, including areas such as supply chain, public relations, finance, and human resources?
What type of critical information could be lost (e.g., trade secrets, customer data, research, personally identifiable information (PII)?
How can my business create long-term resiliency to minimize our cybersecurity risks?
What kind of cyber threat information sharing does my business participate in? With whom does my business exchange this information?
What type of information sharing practices could my business adopt that would help foster community among the different cybersecurity groups where my business is a member?
How can you mitigate cyberthreats?
Take
What is the threshold for notifying executive leadership about cybersecurity threats?
What is the current level of cybersecurity risk for our company?
What is the possible business impact to our company from our current level of cybersecurity risk?
What is our plan to address identified risks?
Action
What cybersecurity training is available for our workforce?
What measures do we employ to mitigate insider threats?
How does our cybersecurity program apply industry standards and best practices?
Are our cybersecurity program metrics measurable and meaningful?
Now
How comprehensive are our cybersecurity incident response plan and our business continuity and disaster recovery plan?
How often do we exercise our plans?
Do our plans incorporate the whole company or are they limited to information technology (IT)?
How prepared is my business to work with federal, state, and local government cyber incident responders and investigators, as well as contract responders and the vendor community?
Best Practices
The best practices listed below can help organizations manage cybersecurity risks. There is more information for each of the recommendations below contained in the Security Tips Report
Elevate cybersecurity risk management discussions to the company CEO and leadership team.
Retain a quality workforce.
Evaluate and manage organization-specific cybersecurity risks.
Develop and exercise cybersecurity plans and procedures for incident response, business continuity, and disaster recovery.
Ensure cybersecurity risk metrics are meaningful and measurable.
Maintain situational awareness of cybersecurity threats.
Security Incident Management
Suspected security events must be reported through appropriate channels as quickly as possible. A formal security event reporting procedure is required, along with an incident response and escalation procedure identifying the actions to be taken in the event of a security incident.
A point of contact should be established for reporting and managing security incidents. The reporting procedure generally contains:
Instructions to the individual reporting the incident (e.g., to do nothing that would compromise the ability to perform forensics on the system).
Security incident forms to support reporting.
Steps to be taken by responders in case of a security event.
Feedback processes to ensure those reporting security events are notified of results after the issue has been closed
Security Governance
Security governance encompasses a set of multi-disciplinary structures, policies, procedures, processes, and controls. It is implemented to manage information at an enterprise level, and supports an organization’s immediate and future regulatory, legal, risk, environmental, and operational requirements.
As defined by Gartner, Inc., security governance is, “the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival, and deletion of information. It includes the processes, roles, standards and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals.”
Organizations take great pride in their use of technology to advance their reputation and worthiness to the public and other organizations. However, too much information provides an avenue of risk when this information slips into the hands of those seeking to use it for their own benefit to do harm. A fine balance must be achieved to ensure IT/ICS information technology does not provide that avenue of risk.
IT Vulnerabilities
To fully understand cyber risk, we need to understand vulnerabilities. Simply put, vulnerabilities are weaknesses that, if exploited, could result in an undesirable consequence (such as a system compromise). Vulnerabilities usually have a negative impact on the security posture of the system and need to be mitigated to reduce the cyber risk.
Vulnerabilities can usually be mitigated, either by a reconfiguration of the system or by applying a security patch issued by the vendor. People often associate vulnerabilities with weaknesses that give a potential attacker a specific opportunity to compromise a system; but the existence of a vulnerability doesn’t always equate to an opportunity upon which an adversary can capitalize. Many factors contribute to whether an adversary will take advantage of a vulnerability, such as:
The ease in which the vulnerability could be exploited.
Where the adversary needs to be, relative to the system to attack.
Whether the adversary must authenticate to the system to carry out the attack.
Interestingly, system owners can also decide whether to fix a vulnerability based on similar criteria:
What are the tools available to exploit the vulnerability?
How easy is it to exploit the vulnerability?
How much does it cost to fix the vulnerability?
How accurate is the information about this vulnerability?
Is there any collateral damage (in other systems) that can be caused by exploiting this vulnerability?
As the number of cyber vulnerabilities grow, so does the capability to track and score these vulnerabilities. Scoring establishes a common measure of how much concern a vulnerability warrants, as compared to other vulnerabilities measured the same way. Scoring allows organizations to prioritize their cyber risk reduction activities and provides valuable intelligence on the current status of known vulnerabilities, their mitigation strategies, and the constantly evolving changes in levels of difficulty associated with exploiting the vulnerability. The most popular vulnerability scoring system is called the Common Vulnerability Scoring System (CVSS), and is hosted at National Institute of Standards and Technology (NIST).
The decision to implement a countermeasure to mitigate a vulnerability is not always obvious. Scoring allows the system owner to assess the potential impact in a general sense; however, operational requirements must also be taken into consideration. Updating a system or application or applying a patch may not be feasible as it could alter the functionality, cause a service interruption, or even cause the service or process to fail. This is where defense-in-depth strategies are applied.
In some cases, the vulnerabilities are inherent in the system and users of the technology are not in a position to fix them. For example, known vulnerabilities in some network protocols have been in place since they were designed. Network administrators and security implementers have worked together to compensate for them (as opposed to fixing the root problem).
In addition to insecure protocols, there are inherent vulnerabilities associated with all operating systems. Given that relatively few operating systems support the majority of our global IT infrastructures, any vulnerability in that operating system provides a target-rich environment for cyber attackers. When vulnerabilities are discovered in operating systems, it impacts not just one or two IT architectures; it impacts all architectures that use it. In some cases, this can impact tens of millions of computers, many of which control critical processes.
OT/ICS
OT
OT refers to hardware, software and systems used to monitor events, processes, and devics, as well as systems that make adjustments in industrial operations.
Operational Technology (OT) refers to systems used to monitor and control industrial operations.
ICS
ICS is a general term used to describe the integration of hardware and software with network connectivity in order to support critical infrastructure. An ICS is a system that handles process control and monitoring for the facility. It will take inputs from sensor and process instruments and provide output based on control functions in accordance with approved design control strategy.
The structures of ICS architectures are diverse, and depend upon system requirements, process function, and business needs. Vendor solutions sometimes dictate using specific ICS architectures; however, depending on the functionality and the complexity of the control action, there are common elements seen across all ICS architectures.
We should note that the differences between systems are diminishing as the capabilities merge. To reduce the confusion among the various types of control systems, we refer to them by their generic name, industrial control system (ICS).
Uses of ICS
A process is a series of steps taken to achieve a desired result. Purifying water, landing airplanes, and distilling chemicals are all examples of processes. ICS have components that are common to controlling all processes, even if the processes are different. However, because of differences within process environments, there will also be differences in ICS implementations.
For example, one process may be designed to shut off the product flow into a vessel based on the level the product has reached, while another process uses the product weight or a calculation of volumetric flow as a control. Each method has advantages and disadvantages based on costs, safety, environmental impacts, and product quality; but each uses ICS and the ICS used are implemented differently.
ICS and Cybersecurity
Industrial Control Systems (ICS) are critical to the operation of the nation’s infrastructure from the power grid to water distribution. Physical control barriers have been sufficient security in years past, but with the emergence of cyber threats, these barriers are insufficient. Focused cyberattacks can push a system into a dangerous state. We can go from lights on to lights out with a tap of a key.
Most ICS, regardless of their use, are implemented in environments where availability is crucial. Initially, physical security was the primary concern due to a safety perspective, rather than a system protection concern. Cybersecurity was not considered a problem because ICS were not interconnected and were located within trusted environments. As asset owners demanded products with more sophisticated IT functionality, such as remote access and interconnectivity with standards-based networks, vulnerabilities were introduced. These vulnerabilities are difficult to mitigate in a control system environment.
Exchanging physical security countermeasures for performance and convenience, while keeping availability a priority, makes applying standard protection strategies developed for traditional IT systems to ICS challenging, if not impossible.
Are vulnerabilities easy to mitigate in a control system environment?
No. Vulnerabilities are difficult to mitigate in a control system environment.
Vulnerabilities were introduced as asset owners demanded products with more sophisticated IT functionality, such as remote access and interconnectivity with standards-based networks.
Different ICS Terms
An ICS is … any system that gathers information on an industrial process and modifies, regulates, or manages the process to achieve a desired result.
Industrial control system (ICS) is a collective term used to describe different types of control systems and associated instrumentation, which include the devices, systems, networks, and controls used to operate and/or automate industrial processes. Depending on the industry, each ICS functions differently and are built to electronically manage tasks efficiently. Today the devices and protocols used in an ICS are used in nearly every industrial sector and critical infrastructure such as the manufacturing, transportation, energy, and water treatment industries.
ICS: A computer-based system used within many critical infrastructures to monitor and control sensitive processes and physical functions. “Control Systems” is a generic term applied to hardware, firmware, communications, and software that are used to monitor and control vital functions of physical systems.
ICS refers to the facilities, systems, and equipment that comprise the operational real-time control environment, services, diagnostics, and functional capabilities necessary for the effective and reliable operation of automation systems. ICS are made up of a device, or set of devices, that manage the behavior of other devices.
An ICS system is an interconnection of components related in such a manner as to command, direct, or regulate itself or another system. This process could occur within a single factory (e.g., a batch mixing process contained in a chemical plant) or be distributed over a large geographical area (e.g., tracking and coordination of train movement over a busy rail system).
Industrial Control Systems (ICS) includes systems used to monitor and control industrial processes.
ICS refers to a broad set of control systems including
SCADA (Supervisory Control and Data Acquistion): A large scale, distributed measurement and control system (geographically spread out). SCADA systems are used in the transmission and distribution of oil (pipelines), gas, water, and electricity (pipelines).
DCS (Distributed Control System): A system where control is achieved by the distribution of live data (intelligence) throughout the controlled system, rather than from a centrally located single unit. DCS are used in power generation, chemical processing, oil refining, and wastewater treatment. They might be located at one location only such as Nuclear power station with reactor basement (ground and first floor), cooling tower and field controller (communicating with IO and sending data to control room) (Distributed control).
PCS (Process Control System): A general term that encompasses several types of control systems used in industrial production, including SCADA, DCS, and other smaller control system configurations such as programmable logic controllers (PLC). PCS are used in water treatment, chemical processing, mining, pharmaceuticals, and manufacturing.
EMS (Energy Management System): A system of computer-aided tools used by operators of electric utility grids to monitor, control, and optimize the performance of the generation and/or transmission system. EMS are used in electrical energy and pump optimization. The Energy Management System (EMS) is the energy data of the system and optimizes the energy use of the ICS.
AS (Automation System): A technology concerned with performing a process by means of programmed commands combined with automatic feedback control to ensure proper execution of the instructions. The resulting system is capable of operating without human intervention. AS are used in material handling and discrete manufacturing.
SIS (Safety Instrumented System): An engineered set of hardware and software controls commonly used on critical process safety systems. SIS are especially useful in safety shutdown and equipment protection systems. Separate system from a DCS created specifically for safety purposes. For example as long as the variables (temperature, pressure and other important variables) are within specfied all is good. If not, SIS will shutdown the systems.
Any other automated control system: An example of another automated control system is a building automation system (BAS), such as automatic doors, or controls for heating, ventilation, and air conditioning (HVAC).
Supervisory Control & Data Acquisition (SCADA)
SCADA is an acronym for supervisory and data acquisition, a computer system for gathering and analyzing real time data. SCADA systems typically used to control geographically dispersed assets that are often scattered over thousands of square kilometers. In the past, communications between field controllers and host computers were dependent upon serial communications, most typically RS232. Data rates rarely exceeded 9,600 bits per second and resulted in ICS needing to be co-located or include multiple relays.
As digital technology and data transfer rates improved, networks extended to include more remote locations, and asset owners started to migrate their serial SCADA circuits and converted to digital networks. While this migration offers asset owners significant benefits, there are pitfalls. An improperly designed network can be a conduit for cyberattacks.
Interface with DCS, PCS, and EMS to monitor and control remote facilities that move products over large distances.
Because a tremendous amount of data is collected, the success of the SCADA system is dependent on the master controller successfully communicating with field controllers, such as RTU, IED, and PLC. If communications fail, the field controllers must individually control the remote facilities until the system re-establishes communications and the RTU or PLC can report to the master station.
Using SCADA components provides flexibility, in that they can integrate the HMI from one vendor with the PLC or RTU from another vendor, provided they use the same protocol. This means you can replace the HMI software without having to replace the RTU or PLC (and vice-versa).
The SCADA Server is the data server that sends data to the field control devices via communications network using various communication protocols. It facilitates the communication through the system.
Systems that are used to transport products such as oil, gas, water, electricity and people.
SCADA is used in a wide range of industries. Some of the common places that use SCADA for various processes include:
Electrical Delivery
Oil and Gas Delivery
Drinking Water Delivery
Wastewater Removal
Transportation Systems
Distributed Control System (DCS)
DCS were initially developed to support large process industries such as refineries and chemical plants. The DCS controllers are distributed throughout the plant; hence the name distributed control system. They are typically deployed at site facilities over the plant or control area.
DCS are different from a centralized control system where a single controller handles the control functions from a central location. DCS has each machine or group of machines controlled by a dedicated controller. These distributed individual automatic controllers are connected to the field devices.
The biggest advantage of DCS is its ability to have multiple controllers dividing tasks, because DCS is best suited for large-scale processing or manufacturing plants where a large number of continuous control loops need to be monitored and controlled. The biggest advantage to multiple controllers dividing the control tasks, because if any part of DCS fails, the plant can continue to operate irrespective of the failed area (Redundant processors, controllers, actuators and power supplies).
Typically a single vendor solution and generally very large and costly.
Due to the distribution of control system’s architecture of DCS, it has become prominent in large and complex industrial processes. Used with continous and batch applications, including:
Papermaking
Fixed Chemical (Chemical Plant)
Waster and wasterwater treatment
Rail Transit
Power Stations
Petrochemical (Refineries)
Biopharmaceutical
Food and beverage production
SCADA vs DCS
SCADA |
DCS |
---|---|
Data-gathering oriented |
Process oriented |
Larger geographical areas that use different communication systems, which are generally less reliable that local area network |
Data acquisition and control modules located within a confined area and communication between various distributed control units is carried out by local area network (LAN). |
No closed-loop control |
Closed-loop control at process control stations and remote terminal units |
Event driven - not scanned regularly, waits for an event to trigger actions |
Process-state driven - scans the process regularly and displays to operator, as well as on-demand |
Used in larger geographical locations, such as water management systems, power transmission and distribution control, etc. |
Used in installations within confined-space, such as single plant or factory, for complex control processes |
Process Control Systems (PCS)
Consists of computers, process control equipment, communication networks, and algorithms for maintaining the output of a specific function (process) within a desired range.
PCS, sometimes called ICS, function as pieces of equipment along the production line during manufacturing, testing the process and returning data for monitoring and troubleshooting. PCS are architecturally similar to SCADA systems, but also perform many of the functions of a DCS, and are similarly used at site facilities. PCS supports a variety of manufacturing processes including continuous, batch, and discrete processing.
Many PCS applications overlap with DCS applications. PCS are scalable and used in small power plants, as well large production facilities. As a general rule, however, DCS implementations are more suitable for large refineries and chemical plants.
PCS use many of the same software packages and hardware components as SCADA systems. This includes PLC and RTU. The main difference between PCS and SCADA system is that a PCS communicates with the field controllers using a plant network, while SCADA systems traditionally use serial communications and remote networks for the same task
Systems used in manufacturing facilities as well as smaller chemical refineries. Used at site facilities: Water and wastewater treatment plants, automotive manufacturing, chemical plants.
Energy Management Systems (EMS)
EMS is a series of processes that enables an organization to use data and information to maintain and improve energy performance, while improving operational efficiencies, decreasing energy intensity, and reducing environmental impacts.
Manage the generation and distribution of energy across the grid
Improve energy performance and operational efficiencies
Energy load forecasting and load shedding (understand what the forecast for energy was based on previous, same day load or same day weather)
Weather monitoring for load/energy forecasting
Economic generation determination (what it costs to bring on generation. If we that we are going to have a larger load, we can determine what is the most cost effective generation to bring online)
Renewable energy integration - balance load/energy (understand based on weather time of day and load, what type renewable energy will be available. So if we know it’s going to be raining and it’s going to be cloudy, we know that we are not going to get a lot of energy from solar. But if they are going to have wind in the forecast, we know time of day when the wind is available, we will figure out how to balance that into the energy and then take some generation offline because we have to incorporate new renewable energy)
Automatic generation control
Regulate generation to maintain load frequency of the grid
Network modeling - define system parameters, price nodes, and market scheduling (what we need, what will be required, and what’s our economic generation)
State estimation - algorithm to estimate the state of electric power system based on topology model (of that utility or of that asset owner, we need to understand what the loads gonna be; what’s available what and run an algorithm that goes through and does that estimation)
Contingency analysis - “What if” power system planning
Automation Systems
Cost effective, Stand alone
Used a lot in Warehouse; Follow a package through Staten Islands Amazon warehouse
Inventory control
Smoke control
Heating, ventilation, air conditioning (HVAC)
Lighting and access
Hot and chilled water control
Patient monitoring
Robots
Smart Home (centralized thermostat)
Safety Instrumented System (SIS)
Specifically designed to protect personnel, equipment, and the environment by reducing or preventing the likelihood or impact of an emergency event.
Demand or continuous operation
Associated sensors, logic solvers, and control elements
Logic to automatically activate the final elements when an emergency event occurs
Final elements must have the ability to bring the process to a safe state or provide adequate hazard mitigation
Adequate response time
Appropriate safety measures testing
Used in damaging environments such as Nuclear, Chemical, Refinary. Systems where once they goout of the safe boundaries could be very catastrophic for the environment.
OT Infrastructure Components
A new approach to protecting ICS is necessary. Understanding ICS and their components are the first steps to determining the cybersecurity vulnerabilities inherent in the design and operation of these systems and finding ways to protect them.
According to Business Advantage State of Industrial Cybersecurity (2017), 54% of companies experienced an industrial control system security incident within the past 12 months, and 16% had experienced three or more.
Human-Machine Interface (HMI)
Provides a graphical view into the process
Two forms: touch panel or software-based applictaion
Used for controlling, monitoring, alarming, and trending processes
The user interface in a manufacturing or process control system. It provides a graphics-based visualization of an industrial control and monitoring system. Previously called an “MMI” (man-machine interface), an HMI typically resides on a computer that communicates with a specialized computer in the plant, such as a programmable automation controller (PAC), programmable logic controller (PLC) or remote terminal unit (RTU). The HMI generally comes in two forms: either a touch panel or a software-based application that is loaded on a personal computer, workstation, tablet, or smart phone.
HMI workstations are typically located at a centralized or distributed control center, where operators see a complete set of unified control system data presented in a graphical user interface. This allows the operator to have a real-time or near real-time operational view of the process. An operator typically uses the HMI to monitor and control the process. They are also capable of providing historical trends, alarms and event notifications, or support other applications that an operator may use to do their job.
From a security perspective, the HMI system and/or data is an obvious target, as they typically use standard operating systems and are interconnected with outside networks or available through remote access methods. Many HMI have command and control functionality, and if compromised, could allow an attacker to take over a mission-critical process.
Field Controllers
The devices that consolidate inputs and outputs, taking the instructions from the operators to make changes in the field. Controllers can be programmed or updated in the field (remotely). These devices were designed as if they were in a “trusted” (the network map should show information about the trusted vs. un-trusted environments) environment. Therefore, when given a command, they obey or respond. Most do not authenticate to make sure they are receiving commands from a specific source.
Field controllers collect and process input and output (I/O) data. They also send the process data to the HMI, as well as process control commands from the HMI to the field controllers. The field controllers are often located close to the field devices in order to process the information as quickly as possible. For large distributed systems, field controllers may collect and aggregate information from hundreds or thousands of sources.
Field controllers are embedded microprocessor devices and are designed to withstand the rigors of an industrial environment. Like personal computers, they have a processor and internal memory, but usually do not have a mechanical hard drive. They convert the electrical signal from field devices (input) into a digital signal (1s and 0s), and convert a digital signal to an electrical signal (output).
Embedded Systems
Embedded Systems is computer system consisting of hardware and software specifically defined for a specific purpose or dedicated task. (Workstations, laptops and servers are not embedded system).
Embedded system used in ICS
Programmable Logic Controller, Remote Terminal Unit, DCS controllers, Intelligent Electronic Devices, field devics (HART, Foundation Fieldbus, Profibus, Devicenet)
Network/communication equipment (Routers, switches, modems, radios, terminal servers, gateways, firewall and other security appliances)
Others (GPS, time synchronziation, network printers, hand-held configuration devices, test equipment)
There are many different types of field controllers, and each is designed to support specific processes or sectors. The are four common types of field controllers: remote terminal units (RTU), intelligent electronic devices (IED), programmable logic controllers (PLC), and programmable automation controllers (PAC).
RTU
A remote terminal unit (RTU) is a microprocessor-controlled electronic device that interfaces objects in the physical world to a distributed control system or SCADA (supervisory control and data acquisition) system by transmitting telemetry data to a master system, and by using messages from the master supervisory system to control connected objects. As this interfacing involves the collection of telemetry data, the system is sometimes called a remote telemetry unit. One of the key characteristics of an RTU is that it relays information from a remote location over long distances to a centrally located host using/supporting a variety of communications mediums and ICS protocols.
RTU are capable of executing programs autonomously without having to involve the HMI or operator. This enables RTU to respond quickly to emergencies without operator input. For example, if the RTU program “sees” a high flow rate on one of the input flows, it can issue an output command to shut down a pump. In addition to converting analog or discrete measurements to digital information, RTU are also used as data concentrators and protocol converters. Typically, RTU are used by utilities and other industries that monitor and control geographically dispersed facilities.
Sectors using RTUs
Oil and gas: RTUs are used in offshore platforms, onshore oil wells, pipelines
Refineries and chemical plants: RTUs are used in environmental monitoring systems (pollution, air quality, emissions monitoring), outdoor warning sirens
Water and Wastewater: RTUs can be found in distribution systems, aqueducts, water resource management, collection systems
Electric power: RTUs are used in transmission and distribution systems across the country
Mine sites: RTUs are used in conveyor monitoring and control, mine water management, underground equipment monitoring, bore management, and material handling
Transportation Systems: RTUs are used in air traffic control, railroads, and trucking
Summary
Convert analog and discrete measurements to digital information
Contain analog and discrete inputs
Provide numerous communications options
Perform data concentration
Provide protocol conversion
Can execute programs autonomously without involving the HMI. This enables RTUs to respond quickly to emergencies without operator input. For example, if RTU prgroma sees high-flow rate on one of the input flows, it can issue a output command to shutdown the pump.
Intelligent Electronic Device (IED)
An Intelligent Electronic Device (IED) is a term used in the electric power industry to describe microprocessor-based controllers of power system equipment. It is used by the Energy sector to monitor and control electrical power devices such as circuit breakers, capacitors, and transformers. IED receive data from field sensors (I/O) and power equipment and can issue control commands. These commands include simple things such as tripping circuit breakers if they sense anomalies in voltage or current. They can also instruct system output to raise or lower voltage levels in order to maintain the desired level. Common types of IED include protective relaying devices, load tap changer controllers, circuit breaker controllers, capacitor bank switches, re-closer controllers, and voltage regulators.
Many owners/operators leave their IED with their “fresh out of the box” configurations. These default configurations, unfortunately, make it easier for those with ill intent to make changes to the operational parameters of the device. Furthermore, some owners opt to keep the extra communication programming ports active so they can view or make online changes from the shop or control room. Considering that modern IED are fully network aware, and in some cases, may have embedded services that facilitate remote administration, there is a valid concern for the cybersecurity of these devices.
The utilities which operated the power transmission stations were some of the first to use IED. This early use was not to comply with regulatory requirements, but to save money. The use of IED in this instance meant a highly paid technician would not have to drive to a potentially remote transmission station to retrieve data.
Programmable Logic Controller (PLC)
Over the years, PLC functionality matured, and the devices are now found in other sectors. In fact, there is a new class of field controllers called Process Automation Controllers (PAC). PAC combine the best features of RTU, PLC, and Distributed Control System (DCS) controllers into a universal controller for use across multiple sectors. The onboard processor and memory, along with the network capabilities, make this device particularly interesting from a cybersecurity perspective.
A Programmable Logic Controller, or PLC, is a ruggedized computer used for industrial automation and was created to respond to the needs of the automotive industry. These controllers can automate a specific process, machine function, or even an entire production line. In the 1960s, the automotive industry used relays, timers, and switches, along with extended wiring and cabling runs to control its assembly lines. Every time a model changed, the assembly line required a tear down and rebuild, making the process of auto manufacturing incredibly expensive and time-consuming. Only skilled electricians were qualified to perform this re-purposing of an assembly line.
In 1968, General Motors issued a request for a proposal to replace the vast hardwired relay systems with a computer-based system. A company called Bedford Associates won the proposal and created the PLC. The first PLC were programmed in Ladder Logic. This programming language is designed to mimic the relay diagrams electricians used to wire relays and timers in the older assembly plants. The image shows PLC ladder logic illustrating basic motor start/stop control.
In addition to ladder logic, other PLC programming languages have been developed. They are ladder logic, structured text, function block diagrams, sequential function chart, and instruction lists.
Program Execution
A line of code in a PLC program is called a rung.
PLC program execute from left to right and top to bottom.
Each completion of the program is called a scan.
A PLC will complete many scans in a single second (Scan rate: 50-60 milli-seconds/scan; SCADA system scan rate is approx 2 mins; metering at home (water/energy) is approx 15-30 mins).
Programming Concepts
Each rung executes on an “IF-Then” principle
IF the instruction(s) on the left are true then execute the instructions on the right.
Direct/Normal Open Contact
Direct/Normal Open Output Coil
Reverse/Normally Closed Contact
Placing multiple rungs (branch) on a single rung = OR
Placing multiple inputs on the same rung = AND
Programmable Automation Controller (PAC)
PAC is a term that is loosely used to describe any type of automation controller that incorporates higher-level instructions. The systems are used in ICS for machinery in a wide range of industries, including those involved in critical infrastructure. They provide a highly reliable, high-performance control platform for discrete logic control, motion control, and process control. There is no specific agreement between industry experts as to what differentiates a PAC from a PLC. In any case, defining exactly what constitutes a PAC is not as important as having users understand the types of applications for which each is best suited.
A PAC is geared more toward complex automation system architectures composed of a number of PC-based software applications, including HMI functions, asset management, historian, advanced process control (APC), and others. A PAC is also generally a better fit for applications with extensive process control requirements, as PACs are better able to handle analog I/O and related control functions. A PAC tends to provide greater flexibility in programming, larger memory capacity, better interoperability, and more features and functions in general.
PAC provide a more open architecture and modular design to facilitate communication and interoperability with other devices, networks, and enterprise systems. They can be easily used for communicating, monitoring, and control across various networks and devices because they employ standard protocols and network technologies, such as Ethernet, Open Platform Communication (OPC), and Structured Query Language (SQL.)
PACs also offer a single platform that operates in multiple domains, such as motion control, communication, sequential control and process control. Moreover, the modular design of a PAC simplifies system expansion and makes adding and removing sensors and other devices easy, often eliminating the need to disconnect wiring. Their modular design makes it easy to add and effectively monitor and control thousands of I/O points, a task beyond the reach of most PLC.
Technical details - Field Controllers
Processors (X86, PowerPC, ARM, MIPS)
Memory
Non-volatile Memory
Flash memory, EEPROM, EPROM, ROM
Firmware (boot code, real time operating system (RTOS), application program)
Volatile Memory (lost after power; much less susceptible to being able to manipulate or take items from)
RAM
Variables, stack, buffers
Input/Output
Discrete, Analog, Fieldbus (4 to 10 milliAmps or 0-10 Volts)
Communication Ports
Serial - RS232, RS422/485, USB, modems, radios
Network - Ethernet radio, ControlNet, LonWorks
User interface
Internal
Status lights, small LCD screens (HMIs), keypads, jumpers, dip switches, switches
External
Browers (allows to see the status, working of the devices), Applications (always check if the applications can be shutdown, is there a business use-case for them?). Remember the smaller the attack surface area the better!
Programs
RTOS (Neutrino & RTOS (QNX), VxWorks, Windows CE)
IEC 61131 program languages - Workbences (CoDeSys (allows the ability to program in anyone of the below languages), ISaGRAF) - Languages
Ladder Logic
Function Block Diagram (FBD)
Sequential Function Chart (SFC)
Structured Text (ST)
Instruction List (IL)
Device Drivers and Device Managers
Ethernet/IP Stacks
RS232/RS-485
Memory Managers
User interfaces
Services (Web server, FTP server, SNMP) (Any business case for these running? If not, turn them off)
Debuggers (data for troubleshooting, are we turning it off after debugging? Often, debuggers are turned-on exposing data and possible vulnerablities)
Field Devices
Field Devices are the instruments and sensors that measure process parameters and the actuators that control the process. This is the interface between the ICS and the physical process, be it the mixing of chemicals, the management of trains, or measuring of pressures in a gas pipeline.
This is the point in the system where information is collected about the process, modifications are made, and the process is controlled. The sensors or measuring instruments are often referred to as input devices because they “input” data into the ICS. In contrast, switches, valves, and other types of actuators that control the process are called output devices. This input and output information is often referred to as I/O.
Field Devices - Input
Sensors, or transmitters, collect data, or input, and are built into control instruments. The sensor may monitor one input point or measure over 100,000 points, such as within large refineries or utility front-end processors. The sensors convert physical parameters, such as temperature, pressure, level, flow, motor speed, valve state, or breaker position to electrical signals. The input device allows the operator to communicate and transmit instructions and data to computers for transmission, processing, display, or storage.
Sensors are commonly described by their type: discrete, analog, and digital
Discrete: Discrete input sensors support binary events including alarms and states. For example, the tank is full, the door is closed, the pressure is too high, or the pump is turned on.
Analog: Analog input sensors (transmitters) measure continuous processes such as flow, level, or pressures within a range; 0-100%, empty to full, 0 to 100 mph. Typically, they transmit this information to field controllers using an analog signal such as a 4 to 20mA.
Digital input sensors are similar to both discrete and analog instruments in that they measure continuous processes (such as flows) and support binary events. However, instead of using an analog loop signal or clean contacts, digital sensors use a digitally encoded ICS communications protocol format (representing an equivalent to 1s and 0s) signal to relay the data.
Signals generated by discrete and analog field devices are converted to digital format in a networked environment. The digital signals extend the network to the instrument, and consequently, the process.
Field Devices - Output
An output device is any peripheral that receives data from the field controller.
Discrete: Like their input counterpart, discrete output devices are also binary appliances. For instance, the field controller issues a signal to an output device, such as a circuit breaker, to open or close a breaker. Discrete output devices can communicate directly with discrete input devices. Furthermore, they can make control decisions and are programmable like a field controller.
Analog: The analog output transmits analog signals (voltage or current) that operate controls. Analog outputs are predominantly used to control actuators, valves, and motors in industrial environments. In this case, the field controller will send a varying electrical signal that can open or close the valve as needed.
Digital: A digital output allows to control a voltage with a computer. If the computer instructs the output to be high, the output will produce a voltage (generally about 5 or 3.3 volts). If the computer instructs the output to be low, it is connected to ground and produces no voltage. As a result, they can communicate more quickly and reliably, thus enabling their use in environments that are more critical, covering a wider range of applications. Examples include: alarms, control relays, fans, lights, horns, valves, switches, motor starters, etc.
Servers
Used to store configuration for the ICS and saves process data in historians for later retrieval. The servers connect to business networks to allow remote operations, configuration, or information exchanges to improve productivity.
Historians
Collect and store near realtime process information to help: - employees make better decisions - trend and analyze system behaviors - predict future requirements - ensure safe and reliable operations of their equipment
Application Servers
Used in process control, process view, alarms, event monitoring, and other functions.
Perform complex calculations to optimize functionality. For instance an energy management application requires pressure, flows, tank levels from the control system as well as weather data and electrical pricing to generate pump schedule for water distribution. It objective is to reduce electrical pumping cost while providing water to the customers.
Serve screens to the HMI for operator analysis
Database Servers and FEP
Collect and aggregate data sent to the HMI, Engineering workstations or any other servers that store or process operational data.
Engineering Workstations
interface with the server to accomplish engineering task such as to modify, configure database or program a field controller.
A specialized type of HMI, typically interface with the servers to modify the database or controllers to ensure the critical process runs properly.
Safety Systems
Safety systems provide protection to the process, physical equipment, or people from harmful situations that may arise during operations. It is a counter action critical in industrial operations in the case of a process goes beyond allowable control parameters. While this would result in a loss of productivity, it would spare the equipment and people harm. Safety systems are traditionally, designed to be separated from the control systems they protect. However, they frequently share some communications, field devices, alarms, etc.
ICS components
Relationship
Machines installed in industrial plants use a variety of field devices for control and monitoring. These devices connect to field controllers, which connect to Human Machine Interface (HMI).
ICS Segments In-Depth
ICS are composed of several components such as field devices, field controllers, and HMI. Each of these components can become complex.
Field Devices (Meters, Sensors, Valves, Switches)
Field Controller (PLC, PAC, RTU, IED)
HMI (Workstations, SCADA Server, Emergency Management System)
OT Cybersecurity Tenets
Availability, Integrity, and Confidentiality
Here’s an important fact to keep in mind maybe you’ve heard of the C-I-A elements in IT environments? It is important to be cautious about how we utilizie security technology developed for IT and how we implement it into ICS environments
Availability
The proportion of time a system is in a functioning condition. For any information system to serve its purpose, the information must be available when it is needed.
Ensuring timely and reliable access to, and use of, information.
Integrity:
Maintaining and ensuring the accuracy and consistency of data over its entire life cycle. All characteristics of the data including business rules, rules for how pieces of data related, dates definitions, and lineage must be correct for data to be complete.
The probability that data has not been altered in an unauthorized manner. Data integrity covers data in storage, during processing, and while in transit.
Confidentiality:
Ensuring that information is accessible only to those authorized to have access.
Preserving authorized restrictions on information access and disclosure, including the means for protecting personal privacy and proprietary information.
For example, a threat actor will usually have one or more of the following objectives when attacking a system:
Steal the data-making confidentiality a concern.
Sabotage or alter the data-making data integrity a concern.
Make the system fail or modify it so only the attacker can use it-making availability a concern.
Tenets Compared
Protecting data confidentiality traditionally has been thought of as the primary goal of cybersecurity. Organizations are extremely concerned with keeping data (such as trade secrets and personally identifiable information) from prying eyes. With the prevalence of identity theft and high-profile information breaches at huge retailers, such as Target and Home Depot, this is not surprising. Breaches also happen to government agencies, as evidenced by the recent Solarwinds breach.
For ICS asset owners, cybersecurity protection is needed, but not from a confidentiality perspective. Data integrity is somewhat more important for asset owners because the wrong action can be taken by a system if the underlying data is faulty. However, there is a general feeling that bad information is worse than no information at all.
That takes us to availability, which is a huge concern for ICS asset owners. In fact, some ICS require in excess of 99.999% uptime (the five 9s) and are on 24 hours per day x 7 days per week x 365 days per year.
Examples of critical infrastructure required to meet the five 9s standard are:
The 4 lifeline sectors (Water and Wastewater, Energy, Communications, and Transportation)
Electricity (generation, transmission, distribution)
Water treatment
Long-haul oil and natural gas pipelines
Tenet Requirements
Many ICS implementations require shared passwords, and the data transmitted is considered open because the protocols used for ICS are generally not secure. However, confidentiality controls protect against an adversary accessing production system set points, instructions, and other data vital to the sustainability of the organization.
Integrity controls are essential for maintaining system dependability. For example, you would not want an unauthorized person or event to make changes to a system and cause a pump to start when it is supposed to be off, or change a breaker setting that keeps a circuit from overloading. As many control systems involve safety functions, it is vital that the system has the correct information.
Aside from operational uptime requirements, many ICS include resiliency requirements demanding system information be exchanged at millisecond or sub-millisecond rates. Because of this, the impact of a system cycling (losing availability) for even a short time can be catastrophic.
As you can see, all 3 tenets are essential to the successful operation of ICS.
Attacks on ICS attacks
By its nature, critical infrastructure is vital to the health and well-being of our society. Any disruption or damage to our critical infrastructure could result in injury, illness, or even death. But the impacts are not limited to life and limb. They can also affect the environment or the economy.
We are unaware of any cyberattack on an ICS resulting in mass casualties. To date, we are aware of several cyber incidents resulting in catastrophic system failure and/or the destruction of critical infrastructure components.
The risk of a major cyber incident with catastrophic consequences is real, especially considering more ICS are interconnected with Internet-based infrastructures. As we understand the security issues arising because the increasing accessibility of ICS, combined with inherit cybersecurity vulnerabilities in the systems, the concept of attackers breaking into, and causing damage to, control systems becomes more plausible.
Motivation
Control systems are considered high-value targets to cyberattackers because the unauthorized access and manipulation of an ICS can result in real-world kinetic events. The impact is tangible.
Another scenario to consider is when adversaries do not require control of the system, but use it to collect specific operational intelligence about production that can be exploited for financial gain or competitive advantage.
Oldsmar Water Treatment plant
In February 2021, an attacker targeted a water treatment plant in Pinellas County, Florida. The plant was utilizing the software Team Viewer for remote access and assistance. This software was left running and the attacker was able to connect to the system through this channel.
The hacker increased the amount of sodium hydroxide setting from 100 parts-per-million (ppm) to about 11,100 ppm. This level is extremely dangerous in a water system.
The plant operator recognized the intrusion, observed the configuration change, immediately reversed the change, and initiated incident response protocols.
Colonial Pipeline
In early May 2021, Colonial Pipeline experienced a ransomware attack. Attackers entered the system via an unused but active VPN account. The attackers stole approximately 100 GB of data and installed ransomware.
An operator noticed the ransom message on a control room system early the morning of May 7. To stop the spread of the ransomware before it reached critical OT systems, the entire pipeline system was shut down 70 minutes after the initial discovery.
The pipeline delivered 2.5 million gallons of fuel per day to the southeast states. As word of the attack spread, people rushed to purchase fuel, causing shortages. A federal state of emergency was declared, allowing other means of transportation (road, rail, etc.) to attempt to ease the supply shortage.
Stuxnet
Stuxnet was a game changer because it was the first known malware to specifically target a control system. It is believed to have been introduced by a USB stick.
Stuxnet modifies programs for a specific PLC, hides the changes, and employs sophisticated evasion techniques. It only impacts ICS operating variable frequency drives.
Critical infrastructure and Key Resources (CIKR) Sectors
Chemical
The Chemical Sector is an integral component of the economy that manufactures, stores, uses, and transports potentially dangerous chemicals upon which a wide range of other critical infrastructure sectors rely. Securing these chemicals against growing and evolving threats requires vigilance from both the private and public sector.
What is its role?
The Chemical Sector manufactures, stores, uses, and transports chemicals.
What is at risk?
Manufacturing plants
Transport systems
Warehousing and storage systems
Chemical end users
Commercial facilities
The Commercial Facilities Sector includes a diverse range of sites that draw large crowds of people for shopping, business, entertainment, or lodging. Facilities within the sector operate on the principle of open public access, meaning the general public can move freely without the deterrent of highly visible security barriers. The majority of these facilities are privately owned and operated, with minimal interaction with the federal government and other regulatory entities.
What is its role?
The Commercial Facilities Sector provides a range of commercial services to the public.
Entertainment and Media (e.g., motion picture studios, broadcast media).
Gaming (e.g., casinos).
Lodging (e.g., hotels, motels, conference centers).
Outdoor Events (e.g., theme and amusement parks, fairs, campgrounds, parades).
Public Assembly (e.g., arenas, stadiums, aquariums, zoos, museums, convention centers).
Real Estate (e.g., office and apartment buildings, condominiums, mixed use facilities, self-storage).
Retail (e.g., retail centers and districts, shopping malls).
Sports Leagues (e.g., professional sports leagues and federations).
What is at risk?
The public
Facilities that are open to the public
Communications
The Communications Sector is an integral component of the economy, underlying the operations of all businesses, public safety organizations, and government. It is critical because it provides an “enabling function” across all critical infrastructure sectors. The sector has evolved from predominantly a provider of voice services into a diverse, competitive, and interconnected industry using terrestrial, satellite, and wireless transmission systems. The transmission of these services has become interconnected; satellite, wireless, and wireline providers depend on each other to carry and terminate their traffic, and companies routinely share facilities and technology to ensure interoperability.
What is its role?
The Communications Sector provides voice services, terrestrial, satellite, wireless and wireline transmission systems.
What is at risk?
Energy Sector
Information Technology Sector
Financial Services Sector
Emergency Services Sector
Transportation Systems Sector
Critical manufacturing
The Critical Manufacturing Sector is crucial to the economic prosperity and continuity. A direct attack on or disruption of certain elements of the manufacturing industry could disrupt essential functions at the national level and across multiple critical infrastructure sectors.
Products made by these manufacturing industries are essential to many other critical infrastructure sectors. The Critical Manufacturing Sector focuses on the identification, assessment, prioritization, and protection of nationally significant manufacturing industries that may be susceptible to manmade and natural disasters.
What is its role?
The Critical Manufacturing Sector produces products that are essential to many other critical infrastructure sectors.
What is at risk?
Primary metals manufacturers
Machinery manufacturers
Electrical equipment manufacturers
Appliance and component manufacturers
Transportation equipment manufacturers
Dams
The Dams Sector delivers critical water retention and control services including hydroelectric power generation, municipal and industrial water supplies, agricultural irrigation, sediment and flood control, river navigation for inland bulk shipping, industrial waste management, and recreation. Its key services support multiple critical infrastructure sectors and industries.
What is its role?
The Dams Sector delivers critical water retention and control services.
What is at risk?
Communications Sectors
Defense Industrial Base
Energy Sector
Food and Agriculture Sector
Transportation Systems
Water and Wastewater Systems
Defense Industrial Base
The Defense Industrial Base Sector is the worldwide industrial complex that enables research and development, as well as design, production, delivery, and maintenance of military weapons systems, subsystems, and components or parts, to meet U.S. military requirements.
What is its role?
The Defense Industrial Base Sector provides products and services essential to mobilize, deploy and sustain military operations.
What is at risk?
More than 100,000 Defense Industrial Base companies and their subcontractors.
Government-owned/contractor-operated and government-owned/government-operated facilities.
Emergency Services
The Emergency Services Sector is a community of millions of highly-skilled, trained personnel (along with the physical and cyber resources) that provide a wide range of prevention, preparedness, response, and recovery services during both day-to-day operations and incident response.
This sector includes geographically distributed facilities and equipment in both paid and volunteer capacities organized primarily at the federal, state, local, tribal, and territorial levels of government-such as city police departments and fire stations, county sheriff’s offices, Department of Defense police and fire departments, and town public works departments.
The Emergency Services sector also includes private sector resources, such as industrial fire departments, private security organizations, and private emergency medical services providers.
What is its role?
The Emergency Services Sector provides a range of prevention, preparedness, response, and recovery services.
What is at risk?
Federal, state, local, tribal, and territorial facilities.
Private facilities, such as industrial fire departments, private security organizations and emergency medical services providers.
Personnel
Operations.
Energy
The energy infrastructure fuels the economy of the 21st century. Without a stable energy supply, health and welfare are threatened, and the economy cannot function. The Energy Sector as uniquely critical because it provides an “enabling function” across all criticalinfrastructure sectors.
Country’s energy infrastructure is often owned by the public, private sector, supplying fuels to the transportation industry, electricity to households and businesses, and other sources of energy that are integral to growth and production across the nation.
What is its role?
The Energy Sector supplies fuels to the transportation industry, electricity to households and businesses, and other sources of energy that are integral to growth and production across the nation.
What is at risk?
Households and businesses across the Critical Infrastructure Sectors.
Finance
The Financial Services Sector represents a vital component of our nation’s critical infrastructure. Large-scale power outages, recent natural disasters, and an increase in the number and sophistication of cyberattacks demonstrate the wide range of potential risks facing this sector.
What is its role?
Providing financial services, such as: - Depositing funds and making payments to other parties, - Providing credit and liquidity to customers, - Investing funds for both long and short periods, - Transferring financial risks between customers.
What is at risk?
The public
Depository institutions
Providers of investment products
Insurance companies
Other credit and financing organizations
The providers of the critical financial utilities and services that support these functions
Food and Agricultrual
The Food and Agriculture Sector is almost entirely under private ownership and is composed of an multiple farms, restaurants, and food manufacturing, processing, and storage facilities.
What is its role?
Provides farming, food services, food manufacturing, processing, and food storage.
What is at risk?
Public’s access to food
Farms
Restaurants
Food manufacturing
Processing
Storage facilities
Government Facilities
The Government Facilities Sector includes a wide variety of buildings, both in the country and overseas, that are owned or leased by federal, state, local, and tribal governments. Many government facilities are open to the public for business activities, commercial transactions, or recreational activities while others that are not open to the public contain highly sensitive information,materials, processes, and equipment. These facilities include general-use office buildings and special-use military installations, embassies, courthouses, national laboratories, and structures that may house critical equipment, systems, networks, and functions.
In addition to physical structures, this sector includes cyber elements that contribute to the protection of sector assets (e.g., access control systems and closed-circuit television systems) as well as individuals who perform essential functions or possess tactical, operational, or strategic knowledge.
What is its role?
Facilities are used for business activities, commercial transactions, or recreational activities while others that are not open to the public contain highly sensitive information, materials, processes, and equipment.
What is at risk?
Public government facilities that are used for business activities, commercial transactions, or recreational activities.
Other government facilities that are not open to the public, but also contain highly sensitive information materials processes and equinment
Healthcare and Public Health
The Healthcare and Public Health Sector protects all sectors of the economy from hazards such as terrorism, infectious disease outbreaks, and natural disasters. Because the vast majority of this sector’s assets are privately owne and operated, collaboration and information sharing between the public and private sectors is essential to increasing resilience of the nation’s Healthcare and Public Health critical infrastructure.
What is its role?
Protects all sectors from hazards such as terrorism, infectious disease outbreaks and natural disasters.
What is at risk?
Communications Sector
Emergency Services
Energy Sector
Food and Agriculture Sector
Information Technology Sector
Transportation Systems
Water and Wastewater Systems
Information Technology
The Information Technology Sector is central to the nation’s security, economy, and public health and safety as businesses, governments, academia, and private citizens are increasingly dependent upon its functions. These virtual and distributed functions produce and provide hardware, software, and information technology systems and services, and-in collaboration with the Communications Sector-the Internet. This sector’s complex and dynamic environment makes identifying threats and assessing vulnerabilities difficult and requires that these tasks be addressed in a collaborative and creative fashion.
Information Technology Sector functions are operated by a combination of entities-often owners and operators and their respective associations-that maintain and reconstitute the network, including the Internet. Although information technology infrastructure has a certain level of inherent resilience,its interdependent and interconnected structure presents challenges as well as opportunities for coordinating public and private sector preparedness and protection activities.
What is its role?
Produce and provide hardware, software, and information technology systems and services, and-in collaboration with the Communications Sector-the Internet.
What is at risk?
Public Sectors
Private Sectors
Nuclear Reactors
From the power reactors that provide electricity, to the medical isotopes used to treat cancer patients, the Nuclear Reactors, Materials, and Waste Sector covers most aspects of civilian nuclear infrastructure.
What is its role?
Provide electricity to millions of Americans, materials for medical diagnostics and treatments, depth measurements at oil and gas drilling sites, sterilization at food production facilities, research in academic institutions, and examining packages and cargo at security checkpoints.
What is at risk?
Chemical, Emergency Services
Energy Sector
Healthcare and Public Health Sector
Transportation Systems
Water and Wastewater Systems.
Transportation
Transportation Systems Sector consists of 7 key subsectors: Aviation, Highway and Motor Carrier, Maritime Transportation, Mass Transit and Passenger Rail, Pipeline Systems, Freight Rail, and Postal and Shipping. The nation’s transportation system quickly, safely, and securely moves people and goods through the country and overseas.
What is its role?
Quickly, safely, and securely move people and goods through the country and overseas.
What is at risk?
Aviation
Highway and Motor Carrier
Maritime Transportation System
Mass Transit and Passenger Rail
Pipeline Systems
Freight Rail
Postal and Shipping
Water and waste
The Water and Wastewater Systems Sector is vulnerable to a variety of attacks, including contamination with deadly agents; physical attacks, such as the release of toxic gaseous chemicals; and cyberattacks. The result of any variety of attack could be large numbers of illnesses or casualties and/or a denial-of- service condition that would also impact public health and economic vitality. The sector is also vulnerable to natural disasters.
Safe drinking water is a prerequisite for protecting public health and all human activity. Properly treated wastewater is vital for preventing disease and protecting the environment. Ensuring the supply of drinking water and wastewater treatment and service is essential to modern life and the nation’s economy.
What is its role?
Provides a supply of drinking water and wastewater treatment and service.
What is at risk?
Public drinking water systems
Publicly owned wastewater treatment systems
Emergency Services
Healthcare Sector
Energy Sector
Food and Agriculture Sector
Transportation Systems
CIKR Interdependencies
ICS play a major role in the operations of each sector. Because many of the individual sectors are interdependent, a failure in one sector could cause a significant impact on other sectors, and possibly place national security and safety at risk. It is important to recognize the interdependency between the sectors.
For example, an ICS failure in the Energy sector resulting in electrical blackouts will likely affect other CIKR sectors that depend on electrical power. Such a failure may have cascading effects on other sectors such as transportation, communications, and the water sector-all of which depend upon electrical power.
Interdependencies can create cybersecurity concerns when a failure in any of the dependent processes causes the process to fail.
Cascading Effects
Northeast Blackout
The impacts to critical infrastructure during the 2003 Northeast Blackout is an example of sector interdependency. This power outage affected 55 million people in Canada and the U.S.
Since then, research has been performed to understand the interdependency of critical infrastructure sectors; how the failure in one sector can have a significant impact on other sectors; and how these sectors can protect against cascading threats.
Northeast Blackout impacts
Water Supply:
Some areas lost water pressure because pumps lacked power. This loss of pressure caused potential contamination of the water supply.
Four million customers in 8 counties within the Detroit water system were under a “boil-water advisory” for 4 days after the initial outage.
Macomb County, Michigan, ordered all 2,300 restaurants closed until they were decontaminated.
Twenty people living on the St. Clair River claim to have been sickened after bathing in the river during the blackout.
The accidental release of 310-pounds of vinyl chloride from a Sarnia, Ontario chemical plant into the river was not revealed until 5 days later.
Cleveland also lost water pressure and instituted an advisory.
New York City reported sewage spills into waterways, requiring beach closures.
Newark experienced major sewage spills into the Passaic and Hackensack Rivers, which flow directly to the Atlantic Ocean.
The City of Kingston, Ontario, lost power to sewage pumps, causing raw waste to be dumped into the Cataraqui River at the base of the Rideau Canal.
Power
With the power fluctuations on the grid, power plants automatically went into “safe mode” to prevent damage in the case of an overload. This put most of the nuclear power plants in the affected area offline until they could recover. In the meantime, all available hydro-electric plants (as well as many coal- and oil-fired electric plants) were brought online, bringing some electrical power to the areas immediately surrounding the plants by the morning of August 15. Homes and businesses in the affected and nearby areas were requested to limit power usage until the grid was back to full power.
Industry
A large number of factories were closed in the affected area, and others outside the area were forced to close or slow work because of supply issues and the need to conserve energy while the grid was being stabilized. At one point, a 7-hour wait developed for trucks crossing the Ambassador Bridge between Detroit and Windsor, Canada because the electronic border check systems were down. Freeway congestion affected the “just in time” supply system in many metropolitan areas. Some industries (including the auto industry) did not return to full production until August 22.
Communication
Cellular communication devices were disrupted due to the loss of backup power at cellular sites, where generators ran out of fuel. Many cell phones failed without a power source for recharge. Wired telephone lines continued to work, although the volume of traffic overwhelmed some systems and millions of home users had only cordless telephones that depended on electricity to function. Most New York and many Ontario radio stations were momentarily knocked off the air but were able to return on backup power.
Cable television subscribers could not receive news, health warnings, or information until power was restored to the cable provider. Those who relied on the Internet were similarly disconnected from news sources for the duration of the blackout; with the exception of dial-up access from laptop computers, which were widely reported to work until the batteries ran out of charge. Information was available by over-the-air TV and radio for those who were equipped to receive TV and/or audio via antenna.
The blackout impacted communications well outside the immediate power outage area. The New Jersey-based Internet operations for Advance Publications were among those knocked out by the blackout, and Internet editions of their newspapers as far removed from the blackout area as The Birmingham News, New Orleans Times-Picayune, and The Oregonian were offline for days.
Amateur radio operators with independent power sources passed emergency communications during the blackout.
Transportation
Railroad service was stopped north of Philadelphia and all trains running into and out of New York City were shut down. Both were able to establish a “bare-bones” all-diesel service by the next morning. Canada’s Via Rail, which serves Toronto and Montreal, suffered service delays; but most routes were still running, and normal service was resumed on most routes by the next morning.
Passenger screenings at affected airports ceased and regional airports were shut down. In New York City, flights were cancelled even after power had been restored to the airports because of difficulties accessing electronic ticket information. Air Canada flights remained grounded on the morning of August 15 due to a lack of reliable power for its Mississauga, Ontario control center. This problem affected all Air Canada service and canceled the most heavily traveled flights to Halifax and Vancouver. At Chicago’s Midway International Airport, Southwest Airlines employees spent 48 hours dealing with the disorder caused by the blackout.
Many gas stations were unable to pump fuel due to lack of electricity. In North Bay, Ontario, a long line of transport trucks was held up, unable to go further west to Manitoba without refueling. In some cities, motorists who simply drove until their cars ran out of gas on the highway compounded traffic problems.
Many oil refineries on the East Coast of the U.S. shut down as a result of the blackout, and were slow to resume gasoline production. As a result, gasoline prices rose significantly across the U.S. In both the U.S. and Canada, gasoline rationing was also considered by the authorities.
Manufacturing (Discrete and Process)
The manufacturing process is used to produce a product-be it electricity, chemicals, plastics, food, pharmaceuticals, or cars. The processes used to manufacture these goods are broadly classified as either discrete or process.
Discrete
Discrete manufacturing results in the creation of products that can be easily differentiated; products you can touch, such as cars, books, toys, furniture, or cell phones. Discrete manufacturing is not continuous, meaning it can be started or stopped at any time, depending on production requirements.
Discrete manufacturing involves creating, assembling, and handling individual components to make a product. Discrete products are easily counted and are measured in units, as opposed to process manufacturing products that are measured by weight or volume. Assembly robots are often used in discrete manufacturing.
Process
Process manufacturing involves using formulas, much like a recipe, to take a set of ingredients to make a final product. Examples of process manufacturing include oil refining, chemical refining, food and beverage production, and pulp and paper production.
Process manufacturing is different from discrete manufacturing because once the final product is produced from individual elements, it cannot be taken apart to get the original components. For example, it would be hard to retrieve the resins, pigments, solvents, and other additives from paint after it was made.
Within process manufacturing, there are 3 types of processes: continuous, batch, and hybrid.
Continuous
Continuous processes require an uninterrupted flow of material from start to finish during the transition from a raw material to a finished product, such as the process used to make chemicals. Generally, a continuous process runs constantly unless interrupted by an unscheduled outage, usually caused by an emergency or equipment failure.
Continuous processes may be shut down for scheduled maintenance, sometimes referred to as turnarounds. Turnarounds are used to keep refineries running in a safe operational state; however, not every sector has scheduled maintenance periods. The turnaround time will vary between sectors.
The ICS used in continuous processes must be flexible to control all phases of the process: from startup, to continuous operations, to emergency shutdowns, to maintenance shutdowns. During continuous operations, such as in a refinery, the ICS is constantly adjusting the valves and pumps to keep the process within specifications.
Batch
A batch process has a starting and ending point. Batch processes are similar to cooking, in that you have a list of items or ingredients and a procedure (recipe) consisting of a series of steps for mixing the various components to create a product. As one phase of the batch is finished, the system will transition to another phase of the batch process. Pharmaceuticals and specialty chemicals rely heavily on batch automation to create their products.
Many batch manufacturers will procure a batch management system that is used along with the control system. Batch management systems work with the control system to execute batch processes. They are also used to manage recipes and records. Records management is especially important in regulated industries where the operators are required to audit the batch process.
Hybrid
A hybrid process uses a combination of continuous and batch controls. Water treatment is a good example of a hybrid process. Water flows through the treatment plant where disinfectants are injected into the water to kill bacteria. The chemicals cause particles to clump together, where they are removed through sedimentation or filtration.
For the most part, water treatment is a continuous process as it flows through the pre-treatment, filtration, and post-treatment processes. However, as solid particles build up in the filters they need to be flushed. This is typically done through a process called backwashing. Backwashing uses batch control to automatically operate valves and pumps in a series of steps to reverse the flow of water through the filters to remove the particles.
Process Dependencies
A process relies on a number of upstream and downstream systems to produce a product. These interdependencies create both cyber and physical security concerns. A failure in the supply chain or any dependent process can cause the main process to fail, or result in the manufacturing of inferior products that can fail at a later date.
Most process and discrete manufacturing facilities have a control system that monitors and controls the main process.
Processes are not islands; they do not stand alone. An ICS-controlled process may have numerous dependencies on upstream and downstream systems and processes or energy sources that may pose environmental and safety concerns. A process can be defined for any type of critical infrastructure.
Upstream, Downstream, Processes, Safety
Upstream is the material that provides the feedstock or raw material for the primary process. For example: We cannot refine oil without crude oil, nor can make polymers without their monomer feedstock. This is the main process responsible for producing the final product. Whether it is electricity, chemicals, plastics, food, pharmaceuticals or petrochemicals. Obviously, if the main process fails, we will not be able to produce the end product.
The downstream process is responsible for handling the end product. The end product is either used as an upstream product for other processes, or distributed to customers. If the downstream process fails we may be able to store the product. However, when the storage is full or there is no storage, we will need to shut down the primary process. If the product is electricity, generators will probably trip offline since there is no good method to store electricity and later feed it to the grid.
Processes that require thermal heating or for that matter, cooling will fail if the energy process cannot provide the heating or cooling resources needed. For instance, a refinery process will shut down if the flow of steam is interrupted. Most processes depend on electrical power. Unless the process has a backup energy source, such as generators or batteries. The process will fail without electricity. Many processes depend on pressurised air for power and control valves. Without air the control valves will not control the process. The process could have other dependencies like solar, hydraulic, wind or nuclear.
A process cannot produce waste indefinitely without some form of waste management. Some waste streams become product streams, but other waste streams must be treated or dispose. If a process is polluting or leaking hazardous materials, a regulatory agency such as the environmental protection agency may force asset owners to shut down the process. Finally, depending on the hazard of the process, and or environmental regulations, the system may have a safety system.
Safety Systems are separate from the control system and are designed to safely shut down the process at the primary control system fails, best protecting the people, the environments and the equipment.
Siemens has provided a very good resource to understand different processes invovled in different industries such as Oil & Gas, Chemicals, Water, Power and others.
Communication Dependencies
In addition to having process dependencies, most ICS must communicate to other systems or other ICS to function properly. This may be as simple as an operator looking at two different screens from two different ICS and manually adjusting the systems. Or it may require that two or more separate control systems are networked together to share information. The information transfer must occur for proper control to be achieved.
A cyber-based event could interrupt critical communications and cause a process to fail. The failure could have both upstream and downstream consequences, as wellas impact the process itself. This is why system and process availability is of paramount importance to control system asset owners.
What do you think the consequences would be if the events described below occurred?
If a safety system monitoring a critical process stops receiving data then it may shut down the process, despite the process operating correctly.
If a community warning system fails following a facility chemical spill then local residents may not be notified to evacuate.
If a leak detection system fails to alert operators then hazardous materials may be released.
If operators don’t receive an alarm that a power line is compromised then they may not be able to take action in time to prevent a blackout.
If a subway control room stops receiving updates from track detection sensors then the trains may be routed to the wrong track, and result in a crash.
Types of Facilities
Site
The process and discrete manufacturing industries produce products at site facilities. A site facility is usually physically protected within a fence or other enclosure. The tools and personnel who support the equipment are typically located at the site, and can therefore quickly respond to onsite problems.
Tranmission
Transmission facilities can span counties, states, or countries. They are the transmission lines or pipelines that carry electricity, oil, gas, and water over long distances. Transmission facilities include the railroads and highways that trains and trucks use to carry goods from the manufacturing facilities to distribution warehouses. Transmission facilities are usually unmanned. Maintenance and operational staff only visit these facilities when scheduled or called out for emergency repairs. Most transmission infrastructure, such as pumps and compressor stations, are in remote locations, which are also difficult to secure.
Generation
Generation facilities produce energy. They consist of electric generators and auxiliary equipment for converting mechanical, chemical, hydro, wind, solar, or nuclear energy into electric energy. Larger facilities are usually physically secured, but some facilities such as dams, wind or solar farms, and other remote operations are publicly accessible.
Distribution
Distribution facilities are used to distribute products to the customers. They provide the infrastructure to deliver electricity, water, and natural gas to our homes and businesses. Distribution facilities are located throughout the country, and many remote stations are generally unmanned. A fence or other barrier physically protects most distribution stations.
The controls and safety systems within these facilities are accessible through on-site visits and, increasingly, through remote access. Distribution facilities may be monitored and controlled from a central control center, or may be stand-alone systems. Communications to the central control center are crucial for monitoring and controlling many of the more remote distribution facilities; and because the communication paths are routed beyond the fence line, securing the data transmission paths becomes a concern.
IT and ICS
IT vs. ICS Priorities
The protection of data in information systems has traditionally been of primary importance, with the integrity and availability of that data following close behind. Sometimes this hierarchy is referred to as C-I-A (Confidentiality-Integrity-Availability). There are instances in traditional IT domains where the availability and integrity are critical elements (e.g., in real-time financial transactions), but for the most part, organizations are more concerned with keeping data from prying eyes.
The critical infrastructure systems requirements for availability can exceed five 9s (99.999% uptime). Services must be running 24 hours per day X 7 days per week X 365 days per year. Safety and resiliency requirements in ICS demand that system information be exchanged at millisecond or sub-millisecond rates. It is easy to see why the control system domain is much more concerned with availability and integrity than confidentiality. This hierarchy is referred to as A-I-C (Availability-Integrity-Confidentiality).
Historically, data on the networks did not mean anything except to the operators, so prying eyes seeing the data had little impact on whether the system was operating normally. The speed at which many of these systems operated suggests that by the time prying eyes did see the data in transit, it would be of no use to them. Although confidentiality does play a role in defending mission-critical control systems, the availability and integrity of the control system and the data in it remain paramount to operations.
In the
IT world: Confidentiality is the Priority
ICS world: Availability is the Priority
The security focus of an
IT group is to protect the system from threats, both intentional and unintentional, and from inside or outside the organization.
ICS is to protect the system from use by unauthorized personnel, and to ensure the system maintains its functionality (availability and integrity) and continues to operate in a safe manner.
Example:
Sales Management: A simple example is a marketing executive trying to get the most recent sales figures. While the executive wants to ensure no unauthorized personnel will have access to the data, they also want that data to be correct. In this case, confidentiality is a priority followed closely by integrity of the data. In terms of availability, it is acceptable if the executive must wait a couple minutes to download the sales figures.
Natural Resources Management: The control systems within a refinery are responsible for cleaning the gas and pushing it into the pipeline. Under high pressure, the pipeline transmits the gas over long distances to delivery nodes. The information about how much gas is being refined or its destination is not secret. In many cases, the the information is published online. However, the timely and accurate information about the refining process itself, and the management of the pipeline, is critical. In this example, the availability of real-time data from the ICS running the refining process to the ICS managing the pipeline compressor stations is vital to safety and flow control. The integrity of the data is also critical as it allows operators to adjust control system parameters within the refining process and pipeline management activities. If the data reflect real-time operations and refining for the pipeline is unavailable or wrong, it can have significant negative consequences.
IT and ICS Security
We will identify security focuses with integration of IT and ICS.
The Changing Landscape
In the past, ICS were specialized stand-alone systems protected by a physical security perimeter (guns, guards, and gates) and controlled by onsite operators with manual switches and controls. In fact, many of the systems with analog/ manual controls are still in use.
Today, ICS owners and operators function under constrained budgets and are required to reduce the costs associated with managing and maintaining ICS. Control system technology has moved from using disparate, manual systems to interconnected digital systems and remotely controlled apparatus.
Although this evolution in system design is great for business and productivity, it bridges the air gap separating critical ICS from business and peer networks. While this has provided significant business benefits, it has blurred the boundaries between ICS and traditional security systems.
With the integration of IT and ICS networks, security concerns arise. The new interconnected architectures introduce new vulnerabilities, and there are now significant risks for ICS that were never a consideration before-such as worms, viruses, and unauthorized remote access.
Control system architectures, by their nature, operate with high trust. Security for these systems used to be focused on ensuring only authorized personnel had access to the control environment. Control systems were built with minimal security countermeasures, and asset owners assumed that anyone with access to the control system was authorized to interface with it. Unauthorized access can be a grave concern, as are the can be a grave concern, as are the consequences of malicious activity on the ICS and its potentially devastating downstream effects.
In the past, there were issues on who was the responsible authority for managing authenticators. IT personnel typically managed and provided login IDs and established clear policies for their use, but many ICS are unable to follow these policies.
Security Goals
The security goals between an ICS and an IT system are different but are base on the same principles. When we think about security, we generally define it using confidentiality, integrity, and availability.
Almost every instance to involving protection of information or information systems will fall into one of these categories. Business objectives often dictate how these categories and the activities to supporting them are prioritized.
Preventing unauthorized personnel from viewing protected information (confidentiality) is the main concern when implementing security controls for business systems. However, in a control system environment, availability and integrity exceed confidentiality.
Business system owners are concerned about the inadvertent disclosure of proprietary information. They are also concerned the information is correct, but are generally willing to wait to get the information. In an ICS environment the system must always be available and must send the correct instructions to the system it controls-so confidentiality is not the primary goal. The security focus of an IT group is to protect the system from threats, both intentional and unintentional, and from inside or outside the organization.
The security focus of ICS is to protect the system from use by unauthorized personnel, and to ensure the system maintains its functionality (availability and integrity) and continues to operate in a safe manner.
IT and ICS Communication
We will compare IT and ICS communication. While both IT and ICS share similar-if not identical, technologies-the implementation and upkeep of these systems can be drastically different.
Determinism
IT Systems
IT systems generate network traffic on demand when a user requests resources or when maintenance activities are performed. As a result, there is a high amount of irregular traffic. The traffic is generated by any number of IT elements or events and can be sporadic and unpredictable. This is expected in corporate networks where diverse users using disparate resources perform a broad range of activities.
IT systems are used for a variety of purposes. As a result, a broad range of applications are used to support the diverse requirements of the organization. Therefore, they often expect and are often granted unfettered Internet access.
OT Systems
Most control systems are purpose-built and are designed to accomplish specific tasks and to perform those tasks continuously. It is repeatable, predictable, and designed so that fluctuations from normal operations can be easily detected. This is what creates an environment where the network traffic is highly deterministic. This is important in the control system domain because detection of anomalies and errors is vital to sustaining operations. Having this type of predictability and determinism can make it easy to set up effective intrusion detection system (IDS) monitoring for ICS networks.
The applications running in an ICS environment are typically limited to those required to monitor and control a process. They are limited in functionality and are specific to the task they were designed to perform.
Internet connectivity to ICS has traditionally been unavailable, primarily because ICS components are usually operated in an isolated environment. However, the creation of control system networks establishes many different business cases where Internet access is required-especially where remote administration, remote vendor support, and budget limitations are important.
Don’t organizations prohibit their control systems from connecting directly to the Internet?
Although most organizations prohibit their control systems from connecting directly to the Internet, some asset owners support it. There are two common operational benefits of allowing ICS to be connected to the Internet.
The ability to maintain and support systems with remote staff.
To enable vendors to provide support and updates to the system.
Is it common to see field equipment directly connected to the Internet as part of a larger control system operation?
It is not uncommon to see field equipment directly connected to the Internet as part of a larger control system operation. These connections are often read-only and provide an operator at a control center real-time or near real-time system information.
Host Applications
IT Systems
IT systems are used for a variety of purposes. As a result, a broad range of applications are used to support the diverse requirements of the organization.
OT Systems
The applications running in an ICS environment are typically limited to those required to monitor and control a process.
They are limited in functionality and are specific to the task they were designed to perform.
Internet Access
IT Systems
Users of a traditional IT network expect a broad range of services and applications to be available. Therefore, they often expect and are often granted unfettered Internet access.
OT Systems
Internet connectivity to ICS has traditionally been unavailable, primarily because ICS components are usually operated in an isolated environment.
However, the creation of control system networks establishes many different business cases where Internet access is required-especially where remote administration, remote vendor support, and budget limitations are important.
Don’t organizations prohibit their control systems from connecting directly to the Internet?
Although most organizations prohibit their control systems from connecting directly to the Internet, some asset owners support it.
There are two common operational benefits of allowing ICS to be connected to the Internet.
The ability to maintain and support systems with remote staff.
To enable vendors to provide support and updates to the system.
Is it common to see field equipment directly connected to the Internet as part of a larger control system operation?
It is not uncommon to see field equipment directly connected to the Internet as part of a larger control system operation.
These connections are often read-only and provide an operator at a control center real-time or near real-time system information.
IT and ICS Operations
We will compare IT and ICS operations.
Security/CIKR Compliance
There are compliance requirements driving security for IT and ICS that dictate how the systems must be secured.
ICS plays a major role in the functioning of critical infrastructures and key resource (CIKR) sectors. These assets, systems, and networks (whether physical or virtual) are so vital that their incapacitation or destruction would have a debilitating effect on security, national economic security, national public health or safety, or any combination.
Example: The electric sector is subject to North American Electric Reliability Corporation (NERC) critical infrastructure protection (CIP) requirements. This is a comprehensive set of cybersecurity standards designed to protect critical cyber assets supporting the reliability of the North American bulk power system. Failure to comply with the CIP regulations can result in stiff penalties, and numerous entities have been fined.
It is good to read Wind Power Threat assessment and SPower incident. First, losing connection to remote sites that are part of the power generation network could produce impacts to power generation. As they are a power company, they fall under NERC and lose of the site could be a regulatory issue. Power generation is part of CIKR. If the remote sites that had this problem were to go offline, that could cause problems delivering power to critical businesses.
Secure System Development
As cybersecurity awareness increases, the implementation of security controls is extended beyond how a system is configured and deployed, to how a system is developed.
Secure application architectures, secure coding, supply chain management, and the procurement of secure applications have also risen in importance in reducing the targets of opportunity for cyber threats.
As many organizations go through the process of obtaining new IT systems as part of their ICS architecture, they include cybersecurity requirements directly in the procurement process. There are few things that can cause complications.
Is there anything that can cause complications?
The requirements that ICS must operate in high-availability, high-capacity modes make the implementation of some common security countermeasures difficult. This does not mean that cybersecurity countermeasures cannot be implemented; it means that special care must be taken to ensure the countermeasures do not impede the ICS operators from performing specific tasks, or prohibit them from performing urgent or time-sensitive actions.
What are some of the more common differences in control system environments?
One was the use of individual usernames and passwords for each computing resource, and the requirement that passwords must be complex and changed regularly. While some ICS still utilize shared accounts, more now use individual user IDs and passwords effectively and efficiently so as not to impede an operator’s ability to access the system under duress.
Do many control systems require rigorous security practices?
They do not. The continued use of these inferior practices is usually due to the age of the control system, the lack of opportunity to update the system, and the cost associated with trying to retrofit legacy automation systems with contemporary security controls.
ICS owners, regardless of the age of their systems, have several excellent options for implementing effective cybersecurity strategies. In some cases, the static nature of the architecture and the age of the systems can work to their advantage. ICS owners must balance security with functionality requirements and creatively apply best security practices, while enabling ICS operators to perform their duties unimpeded.
So, what happened at sPower?
Firewalls that protect these sites should be on a firmware update cycle. DOE said in a statement sent to Archer News that the event is “related to a known vulnerability that required a previously published software update to mitigate.” That means someone had already found a security hole in the past, the device maker reported it and came up with a patch for it, but the power company may not have applied that patch. Also, if there was an attack (or problem) then they should have a backup way to get into the sites. This should be part of your disaster recovery plan.
Physical Security
Physical security is generally scaled in proportion to the criticality of the information being protected.
In the IT domain, physical security prevents unauthorized access to locations where proprietary information is handled or stored.
However, many people are familiar with the concept of physical security as it relates to the electric power grid. Substations, transformer stations, and maintenance offices are usually well protected with fences, cameras, and possibly a security guard.
However, physical security for power stations located in urban centers is everywhere, and easily seen. There are also some unstaffed facilities in remote locations, and security is limited to a single gate or padlock.
Although additional security mechanisms may be protecting critical devices and critical cyber assets within the station yard, getting past perimeter defenses can be a trivial task, usually because of the cost associated with protecting remote assets.These systems can be a significant cyber risk if accessed by an adversary.
What can be done to reduce cyber risk?
Establishing a robust physical security perimeter around ICS assets is critical to reducing cyber risk. Integrating corporate and control system networks is also justified because it provides a more cost-effective management model.
However, the blending of IT and ICS networks can negate the physical security perimeter and open the system up to threats across the world.
Are firewalls part of physical security?
Indirectly. The remote site firewall could be part of the physical security network. There might be cameras, sensors, or other security related controls attached to it. These controls would alarm back to the main control room and warn of any physical security breach.
During an unexpected reboot, sPower should have a backup way to contact the remote site so they can verify physical security controls. Worst case they would need to send someone out to the check the site-just to make sure it was not a physical security incident.
Change Management
Change management is a challenge for both IT and ICS.
This is correct and there are two fundamental elements impacting the change management process.
The verification that the change is not going to impact the system in a negative way.
The time to make the change to the system.
So what happens after these changes?
All changes made to an ICS, whether to operator consoles or field devices, must be tested. ICS asset owners cannot simply accept a vendor’s promise that a system patch or application modification will work. ICS administrators must implement the change in a test environment to determine what negative impacts are observed. This can be a long and expensive process, especially if the change is to be implemented across a wide range of production elements.
Is it complicated?
All sPower needed to do was have some change management in place to keep these firewalls up to date. The vendor should also be aware of any issues that firmware in a device might have. If a firmware upgrade was available, then they should have had it on a plan for install. They did do a good job of testing before deployment, as to not cause any additional issues.
IT and ICS Support
We will compare IT and ICS support
Anti-Virus and Anti-Malware
What can Anti-virus and Anti-malware do?
Being able to manage malware effectively is critical to the survivability and security of any network environment. In the IT domain, virus scanners and anti-malware technology are common and are placed on centralized servers, gateways, and individual user desktops. Organizations work hard to ensure their anti-malware defenses are optimized, and usually automate updates to anti-malware solutions across all cyber assets.
But we work in ICS? What does it mean to us?
In the with ICS systems. Well, because of their specialized functions and limited processing capability, many ICS do not have the capability to run anti-virus software. Many anti-virus vendors do not have a product designed toward ICS protocols or threats. Signatures are not available for many ICS-centric malicious code variants. Also, timely updates to virus signatures are difficult to deploy because of the “always on” nature of ICS.
Do the computing resources traditionally used by an anti-virus impede control system software from operating at peak efficiency?
Yes it can. Anti-virus technology can consume significant processing capacity during real-time and scheduled scans. If memory is not available for other critical processes, there is a significant risk software related to control system operations may not perform properly.
Few control system vendors support the use of anti-virus on their ICS and provide specific guidance on which directories to avoid when scanning.
Some do, although this may not solve the resource problem. It can mitigate historical issues associated with the anti-virus moving. It can also accidentally delete files critical to control system operation.
We have anti-virus and anti-malware at our facility.
Is it updated? Virus scanners that are not up-to-date cannot provide protection against the latest threats from malicious code. To ensure anti-virus signatures and anti-malware capabilities are most useful, they need to be constantly updated.
The most efficient way to do this is to have a primary server obtain updates and automatically push them to the critical cyber assets.
Servers and individual workstations may need to be connected to the Internet to obtain the updates. In theory this makes good sense, but directly connecting to a server located in the control system domain to the Internet can introduce cyber risk. That does seem kind of risky. What else can be done?
Well, the deployment of a secure file transfer mechanism across the boundary separates the corporate domain from the control system. Tasking administrative personnel with manually obtaining anti-virus and anti-malware updates and physically applying them to control system assets may solve some of the issues with maintaining up-to-date signatures for anti-virus solutions.
Manual updates may be cost prohibitive, however, and do not address the requirement that processing on ICS is uninterrupted.
Patch Mangement
What is Patch Management?
Patch management is closely related to change management and can be a challenge for both IT and ICS administrators.
Did you know the disclosure rate of cybersecurity vulnerabilities is growing, and security patches are released almost daily by responsive vendors? That is usual for someone working with ICS systems. Patches are often applied on the IT systems as soon as they are released. However, there is still potential they will damage the system or network. IT domains face an increasingly threatening cyber landscape from both internal users and external adversaries. Maintaining up-to-date security patches is considered a normal best practice in countering those threats.
But many ICS systems are not built to run on contemporary operating systems.
ICS operators rely on the isolation and supplemental physical security for control system operations to minimize the opportunity an attacker may have to exploit a vulnerability.
ICS asset owners must decide whether to implement patches at all. Often they are not applied unless they enhance functionality and sometimes not even then!
I heard a vulnerability is only exploitable when the adversary is sitting at an operator console.
True, The asset owner may consider that the only realistic threat actor to capitalize on this vulnerability would be someone who breaks into the facility control room or a malicious insider. The asset owner may choose not to apply the patch because of the physical security they have in place to protect against an unauthorized individual getting into the control room, as well as their exhaustive personnel screening program designed to mitigate an insider attack.
Yeah, we feel that isolation from the Internet or corporate network reduces our cyber risk to zero.
Careful, With the pervasive use of portable media, such as USB or optical drives, malicious code or transport hacking software can be introduced into the system. Then an intruder can use the code or software against older or un-patched operating systems. The result is a growing number of un-patched control systems at risk of cyberattack.
Support Personal
What do you mean by support Personal? We have a great team!
ICS operators are the link between the system and the function it performs. Did you know that a system is only as secure as its weakest link?
Human error, whether accidental or intentional, can wreak havoc on an ICS and bring the functionality and safety of the system to a halt.
The most prevalent human threats are untrained operators causing accidental infections.
Is it that bad?
Untrained operators may accidentally infect the ICS system or network by not following policies (such as not accessing email or the Internet through the ICS network) and inadvertently introduce malicious code into the system through spam, spyware, or phishing campaigns.
The potential for damage is increased because an untrained user does not know what the indicators of compromise are and may not realize the system is infected until it is too late.
While IT faces the same human threat as ICS, the consequences for ICS can be far reaching and could impact the safety and well-being of millions of people and devastate a company’s finances and industry reputation.
I trust my engineer’s decision 100% and think they are making the right choices.
Just be careful, this could be a vulnerability.
Security testing
What’s the purpose of Security Testing?
The goal of security testing is to gain a comprehensive understanding of the cyber-risk profile of a system. Testing provides insight into system vulnerabilities, as well as workable countermeasures, and helps asset owners determine how best to improve their security. Testing objectives can be broad in nature, and the methods used to perform security testing can be diverse.
What does it focuses on?
It focuses on confidentiality, integrity, and availability. Contemporary testing tools and test frameworks are used to categorize the areas where the security of a system may be weak and where the defenses are functioning as intended.
What is the range?
Security testing tools can be configured to model a range of attack types with different capabilities. They are designed to be used on resilient, stable, and robust IT systems. These testing tools scan large numbers of systems for a variety of vulnerabilities, and complete the test over a short period. As a result, security testing can be aggressive.
That’s like a good thing.
ICS devices and components were not designed to withstand the high volumes or types of traffic used by most testing tools, and intense scanning can often cause undesirable effects. Most ICS operate in highly deterministic and predictable ways, and traffic flows are usually within a well-defined norm.
Aggressive testing can introduce abnormal conditions, causing the system to experience tremendous stress and result in system failure, improper operations, or physical damage to the equipment. Some security testing tools are beginning to incorporate the capability to test control systems.
Is there anything we can do?
Security researchers and vendors are working together to create test frameworks and plug-ins to allow asset owners to obtain vulnerability data, without causing undue stress and damage to the ICS elements being tested.
How effective is it?
The effectiveness of security testing and the impact of testing on the control system are governed by the level of expertise and experience of the tester. Individuals with only testing experience on modern IT systems should not be allowed to test control system environments unless they have been properly trained to minimize the consequences associated with aggressive scanning techniques.
Security tests for ICS should only be performed in a test bed or sandbox because testing within a production environment can often result in unforeseen and undesirable consequences.
Incident Response and Forensics
What is Incident Response?
As part of an effective defense-in-depth strategy, organizations need to be able to respond to cyber incidents and investigate how those incidents happened. Most organizations have an existing incident response capability to manage those cyber incidents that impact normal operations.
We did have something happen, but we are not sure if we need a team to come and investigate?
Incident response and forensics techniques applied to the IT domain are straightforward. Well-documented processes and procedures and numerous recommended practices and guidelines are available to assist organizations in developing their incident response plans and the methods for executing a forensics investigation. Modern IT equipment meets most, if not all, the requirements necessary to carry out an investigation, and provides well-developed capabilities to enable effective incident response and attack mitigation. Most common commercial off-the-shelf tools have been developed with traditional IT in mind.
That is good and everything but we work in ICS Industry.
Issues can prevent IT security technologies from being effective for ICS incident response. The age of the average ICS can negatively impact both the incident response capability and forensics effectiveness. The technical capabilities required to facilitate appropriate response and investigation techniques, such as event logging, do not exist in many of the control systems in operation today. Luckily, some ICS vendors are making it easier for asset owners to perform incident response and forensics by moving to more contemporary operating systems as the platform for control system solutions.
What does that mean?
By doing so, asset owners investigating control system cyber incidents can take advantage of the response and investigation tools designed for modern computing platforms. Often, both IT and ICS are required to simply reboot an impacted computing resource, reconfigure equipment, or rebuild/restore from a system image to facilitate a quicker restoration of the system.
Rebooting or rebuilding systems for either IT or ICS can result in loss of forensics data. These methods may destroy evidence artifacts that could have been used to determine root cause(s) of the cyber incident.
Well, we think it was just a mechanical malfunction and do not need an investigation.
Just be careful, that could be a vulnerability.
Outsourcing
We did the research and went with the best vendor. We should be good? Right?
Competitive technology markets, an increasingly mobile workforce, and resource limitations have caused many organizations to outsource the management and implementation of their IT functions. Cloud computing, infrastructure as a service, software as a service, and security as a service are being used more throughout the IT domain. These benefits could possibly outweigh the risks for companies striving to maintain a competitive advantage, and to ensure their data are available anywhere, any time.
Wait, what risks?
The implementation of IT services is well known and understood; however, ICS poses a much greater challenge to a company offering services outside the confines of the corporate network. ICS operations are much more sensitive to failure than IT systems. ICS also require in-depth knowledge of the operational technologies and their limitations to function at expected performance levels for availability and integrity.
ICS are under more stringent levels of oversight and require experienced technicians to manage and maintain them.
Well, yeah. Like you said, they are expected to function at expected performance levels.
Sadly, outsourcing for ICS operations is not at the same level of maturity as IT operations.
The risks to critical infrastructure and the consequences of service failures should be carefully examined before outsourcing is considered for ICS.
I agree, Let me call my vendor and see what they say?
Just be careful, that could be a vulnerability.
Conclusion
In August 2017 and the Saudi plant has reported another breach! Reports say that a flaw in the code gave the hackers away before they could do any harm.
A flaw in the code triggered a response from a safety system in June, which brought the plant to a halt. Unfortunately, the first outage was mistakenly attributed to a mechanical glitch. Then in August 2017, several more systems were tripped, causing another shutdown. After the second outage, the plant’s owners called in investigators. The malware was finally found. The hackers appear to have been inside the petrochemical company’s corporate IT network since 2014.
The hackers could have been inside the petrochemical company’s corporate IT network since 2014. They probably found a way into the plant’s own network. Most likely through a hole in a poorly configured digital firewall that was supposed to stop unauthorized access. They then could have gotten into an engineering workstation, either by exploiting an unpatched flaw in its Windows code or by intercepting an employee’s login credentials.
ICS vendor Schneider Electric has drawn praise for publicly sharing details of how the hackers targeted its Triconex model at the Saudi plant, including highlighting the zero-day bug that has since been patched.
How do the personnel play into all of this?
Well, first of all, the June Triconex system outage occurred on a Saturday evening. This is a time when most engineers aren’t typically in the plant. Secondly, the petrochemical firm called in Schneider to assist in troubleshooting the Triconex system failure. The vendor pulled logs and diagnostics from the machine, checked the machine’s mechanics, and, after later studying the data in its own lab, addressed what it thought was a mechanical issue.
We were just talking about vendors. How did they play a part?
In June 2017, the end user suspected an issue had occurred with their safety controllers and took one Triconex system offline, completely removing the Main Processors, and sent them to Schneider Electric’s Triconex lab in Lake Forest, Calif. Schneider said “At this stage, the end user wasn’t aware of a cyber incident. At the time, Schneider Electric was not onsite to review the safety controllers.” Once these controllers were removed from power, the memory was cleared and there was no way to conclude that the failure was the result of a cyber incident. Schneider was only able to analyze whether the controllers were working correctly within their safety function. After they completed their analysis, they determined there was no fault with the system components and they returned them to the end user.
What about security testing? Was any of that done?
It is not known what types of security testing occurred, if any. We do know the petrochemical firm called in Schneider to assist in troubleshooting the Triconex system failure. The vendor pulled logs and diagnostics from the machine, checked the machine’s mechanics, and, after later studying the data in its own lab, addressed what it thought was a mechanical issue. No malware was found.
Were any teams called in for investigations?
A forensics and incident response team was not called in during the June incident. However, a team was called in for a rapid-response engagement after the now-infamous August shutdown of the Saudi Arabian firm’s Triconex ESD system. This company was very lucky. If there wasn’t a flaw in the code, it would have still gone undetected. Many believe that something bigger was being planned but was interrupted.
ICS Process Flow
Process Data
Processes are designed to change the chemical or physical properties of upstream materials to more useful downstream products. These changes must be monitored and controlled during this transition. Accurate and timely measurements on the physical and chemical properties of the process are crucial to controlling the process, as well as the outcome. These measurements referred to as process data.
Numerous field devices are used to generate process data. Flow, level, temperature, and pressure are common sources of process data; however, many other parameters of complex processes must also be measured. For example, a water treatment plant relies on accurate pH and turbidity measurements in addition to flow, level, and pressure to produce clean water. An ICS uses these measurements for adjusting control devices to ensure the product (in this example, treated water) meets specifications.
Process data are not only used for control. For example, the data can also be used to:
Meet regulatory requirements
Track energy costs
Provide design inputs for the process enhancements (i.e., identify and justify capital improvements)
Control inventory/ determine product pricing
ICS Process Flow – Control Loop
A typical ICS contains numerous control loops, human interfaces, as well as remote diagnostics and maintenance tools. They are built using an array of network protocols on layered network architectures, allowing ICS support staff and vendors access to diagnose and correct operational problems.
A control loop, single-loop control is the fundamental building block of industrial control systems. It is communication used to regulate the process. It consists of a group of components working together as a system to achieve and maintain a desired value of a system variable by manipulating the value of another variable in the control loop.
For example, a field device sensor produces a measurement of a physical property and sends this information as controlled variables to the controller. The controller interprets the signals and generates corresponding manipulated variables, based on set points, which it transmits to the actuators (field devices), such as control valves, breakers, switches, and motors. These field devices are used to directly manipulate the controlled process based on commands from the controller. Control can be fully automated or include a human in the loop.
ICS Process Flow – Diagram
This graphic depicts the ICS process flow. We see the field devices that provide the process data. This is where the actual physical process happens, be it the mixing of chemicals or the management of trains, or the measuring of the pressure of gas at a certain point in a pipeline.
A field controller collects information from field devices and assesses, manages, and processes state information about the process.
The HMI monitors the information and presents it to an operator. The operator uses the HMI to observe the process, watch for events and alarms, and to make decisions or adjust the system to keep the process stable and safe, as required. The operator function can be performed by a person, or a system such as an EMS, DCS, or any a specialized system that may be unique to a particular sector.
ICS Control Loop – Cascading Control
Sometimes these control loops are nested and/or cascading – whereby when multiple sensors are available from measuring conditions in a controlled process, a cascade control system can often perform better than a traditional single-loop. For example, the steam-supplied water heater shown heats water using cascade control. The second controller has taken over responsibility of manipulating the valve opening based on measurements from a second sensor monitoring the steam flow rate.
ICS Communication Flow
How information moves within the control system environment?
control systems collect information about some process or function and using a communications infrastructure to send that data back to an operator, the operator reviews the data, typically in a graphical format, assesses the operational status of the process, and then tunes the system for optimal performance if required. The primary purpose of the ICS is to measure our control process, as shown in a slide.
The primary process is measure and control the process.
The major components that achieve these objectives are field devices, field controllers and the HMI.
Field devices are instruments and sensors that measure process parameters and the actuators that control the process. It is the interface between ICS and the physical process. It is the mixing of chemicals, the measurement of trains, the management of trains, or the measuring of pressures in a gas pipeline. This is the point in a system where you actually collect information about the process, as well as modify or control the process. The sensors or measuring instruments are often referred to as input devices, because they input data into the ICS, by contrast, switches, valves and other types of actuators that control the process are called output devices.
Field controllers are responsible for collecting and processing input and output information, sometimes referred to as IO. They also send the process data to the HMI as well as process control commands from the operators. Field controllers are often located close to the field devices, because the information needs to be processed as quickly as possible in some applications, for large distributed systems, the field controllers are responsible for collecting and aggregating information from hundreds or 1000s of sources. Field controllers are essentially mini computers or embedded systems that are designed to withstand the rigors of industrial environment. Like personal computers, they have a microprocessor and internal aisle from the field devices. In the case of input points, they convert the electrical signal from field devices into digital signal, ones and zeros. And for output points, they convert a digital signal into an electrical signal. Many different types of field controllers are on the market and support the various sectors. Examples include PLCs, intelligent electronic devices. IEDs, remote terminal units are RTUs distributed controllers and Process Automation controllers. PACs.
One of the key things to remember is that this technology collects, assess, manages and processes state information about the process. - Finally, we look at where this information goes to servers, HMIs and engineering workstation take the information from the field controllers and displays it in manner to depict what is going on in the process. In the case of servers, the information is collected and either used in calculations or stored in historical database and Process automation, controllers. PACs, one of the key things to remember is that this technology collects, assesses, manages and processes state information about the process. This capability usually exists at the centralized or distributed control center, where operators see a complete set of unified control system data presented in a graphical user interface. This user interface is usually referred to as the HMI, and allows the operator to have a real time, or near real time, operational view of the process. It is through this interface that the operator can observe the process, watch for events and alarms and tune the system so these three components are linked together using networks or communication protocols.
As you can see, the communications between the field devices and field controllers are separate from command and control communications, numerous protocols, the rules or languages that these devices use to communicate with each other have been developed specifically for applications or industries by vendors as they sought ways to differentiate their products with capabilities such as speed and throughput. The selection of the communication medium and the protocols will depend on the requirements of the system. And the business organizations dealing with the reliability and management of bulk power system have different data rate requirements than an operation that needs to measure water levels in a massive reservoir once or twice a day.
ICS Data Flow
The data flow through ICS varies by vendor, topology, and protocols. The network can be wired or wireless, but it links the components of the ICS. The HMI receives information from the field controllers, relaying information through communication protocols and providing the operator with a view of what is happening in the process.
This diagram is a simplified representation of an ICS communication network. The process is controlled by an application running inside the field controllers, which communicates with a series of field inputs and outputs devices. The field controllers consolidate the data and transmits it to the HMI stations where it is presented on displays.
ICS collect information about some process or function using a communications infrastructure to send the data back to an operator. The operator reviews the data, typically in a graphical format, assesses the operational status of the process, and tunes the system for optimal performance.
Field Devices are the instruments and sensors that measure process parameters and the actuators that control the process. This is the interface between the ICS and the physical process. These sensors or measuring instruments are often referred to as input devices because they “input” data into the ICS.
Field Controllers are responsible for collecting and processing input and output information, sometimes referred to as I/O. They also send the process data to the human machine interface (HMI) and process control commands from the operators. They are often located close to the field devices.
Servers, HMIs, and engineering workstations take the information from field controllers and display the data in a manner that depicts what is happening in the process. The user interface, usually referred to as the HMI, allows the operator to have a real‐time, or near real‐time, operational view of the process. These three components are linked using networks or communication channels.
Field Devices (Meters, Sensors, Valves, Switches) <——-> Field Controllers (PLC, IED, RTU, Controller, PAC) <———–> HMI (SCADA Server, HMI, Workstations, EMS)
Direct connection or Device level protocols (HART, Foundation Fieldbus, Profibus) <———-> Command and Control Protocols (DNP3, Modbus, Ethernet/IP)
- Field Controllers –> Primary Historian –> Secondary Historian
|—> Configuration Database —> HMI —-> HMI
Protocols (ANSI X3.28, BBC 7200, CDC Type 1/2, Contitel, DCP, DNP, Gedac 7020, ICCP, Landis, Modbus, OPC, ControlNet, DeviceNet, DH+, Profibus, Tejas 3/5, TRW 9550, UCA)
Indusoft (HMI Software?)
Connected Components Workbench
ICS topology
ICS network topologies are similar to IT topologies. However, there are some fundamental differences critical to ICS applications. ICS networks require redundancy to ensure availability, which is not a common practice in IT. In addition, many older ICS networks use proprietary technologies, which are not used in the IT domain.
Some similarities that an IT network engineer might notice are the serial technologies. RS232, RS485, and Ethernet reside at the physical layer in the OSI stack and are common to both domains. RS232, which is an older technology in the IT domain, is still commonly used for point-to-point communications in RTUs and PLCs. RS485 is the foundation for many proprietary control networks used at site facilities.
Bus
The bus topology is constructed on a single cable, referred to as the bus, that each node on the network connects to. Each of these nodes passively listens for data being transmitted along the bus. If one node wants to transmit data to another node along the bus, it sends out a signal to the entire network, letting everyone know that a transmission is occurring. This transmission then travels down the bus, being ignored by all other nodes until it reaches its destination node and is accepted.
Ring
The ring topology is constructed from a closed loop cable, known as a ring, that each node on the network connects to. In this topology, the network forms a circular shape and data is transmitted clockwise via a token that each node in the network actively listens for. If a node does not want to transmit data, the node will act as a repeater and send the token around the ring. If a node does want to transmit data, it must wait until the token makes its way to the node and is no longer carrying data.
Star
The star topology is constructed from a central device, either a switch, router, or hub, which every other node in the network connects to. In this design, each distinct cable only connects two physical devices, with one end hooking up to a node on the network and the other hooking up the central device. If one node wants to transmit data to another node, it must send its transmission to the central device, which will then act as a relay station and pass along the transmission to the destination node.
Polling Methods
An ICS master station repeatedly communicates with field controllers through a process called “polling.” The master station will send a request for updates whereby a field controller, such as an RTU or PLC, responds by sending back the requested information. assets are polled to verify they are still functioning as expected. When the asset is polled, it sends data about it’s state back to the SCADA, and if everything is OK, it will be polled on a regular schedule. If it’s not OK, it will be polled again to determine whether a technician needs to get to the asset to perform maintenance on it.
The poll can contain a general request such as a pressure reading or it can be related to specific metrics. An example could be that the pressure has exceeded required parameters. Has it overheated? Has it seen events that would cause you to believe the life cycle is being degraded? This polling process can happen every couple of seconds, every couple of minutes, every couple of hours, days, months or years. It is dependent on the asset, the role the asset plays in a system, and the risk that asset presents.
Master/Field Controller Relationships
Some field controllers are assigned a higher priority where they are polled more frequently than other field controllers. For example, critical field controllers may be polled twice a second, while lower priority devices are polled once per minute.
Polling rates also depend on the information being reported. The status of regional high voltage transmission lines will be checked more frequently than ambient temperature because the line status can change quickly compared to relatively slower temperature changes.
This image shows several types of relationships between the master and the field controllers. For instance, a master can communicate directly with a single field controller or with multiple field controllers. Multiple masters can also communicate with a common field controller. Most modern control system networks are made up of a blend of communications methods, topologies, architectures, and protocols, which introduces complexity when creating security mitigation strategies
Physical Media
A number of choices are available to support the physical layer in ICS networks. Selecting the proper physical media depends on a number of factors, including costs, reliability, data rates, distance, topology, and availability. Some options are easier to secure than others.
Leased Line
Leased lines are dedicated communication circuits, usually provided by a phone company. They are popular options for ICS communications to remote facilities because they are ubiquitous and affordable. Typically, the installation costs are minimal, and the monthly recurring costs are reasonable. The two types of leased lines are analog and digital.
Analog leased line circuits are older lines and limit ICS communications to 9,600 bits per second, although there are some technologies that can squeeze higher data rates out of these circuits.
Digital leased lines support higher data rates, but they are not available in all areas. Leased lines are not as secure as dedicated lines because the asset owner does not control the physical infrastructure or access. In addition, the leased line infrastructure is not isolated from other phone systems, creating a potential vulnerability from outside sources
Dedicated Lines
Dedicated lines are more secure than leased lines, because they are owned and managed by the asset owner. Unlike leased lines, dedicated lines are not shared with the public, so the exposure is reduced. The capital costs to install a dedicated network can be substantial because of labor and material costs; however, the recurring costs are generally lower than leased lines.
Wired Media - Copper and Fiber
Fiber and copper lines provide the physical layer in an Ethernet-based environment; this medium is often referred to as wired media. These environments are used extensively in ICS networks. They are so popular because they offer fast, reliable, and inexpensive services. Fiber optic cabling is known to be harder to tap into than copper-wired media. However, tapping an optic cable is not impossible therefore, a significant cyber risk still exists.
PowerLines
PowerLine1
Power line carrier systems transmit data on electrical conductors. The single main advantage of Power Line Communication is that the power line infrastructure ensures coverage almost everywhere. Power line communication systems are favored by utilities because they allow data to reliably move data over an infrastructure they control.
Electric, gas, and water utilities are adopting power line communication as a means to communicate with meters, enabling them to send and receive information on current consumption, gather diagnostic information, and remotely manage loads. Building owners and managers are also using power line communication to monitor, diagnose, and control loads such as lighting and HVAC systems.
PowerLine2
Broadband over power line (BPL) is a technology that allows data to be transmitted over utility power lines. This technology uses medium wave, short wave, and low-band VHF frequencies, and operates at speeds similar to those of digital subscriber line (DSL).
BPL has existed for many years, but so far, has not been implemented in the United States on a broad scale because of signal degradation, and technical difficulties involving interference with radio signals used for communications by public safety officials during emergencies.
The term used for these traditional systems is Power Line Communications or Power Line Telecommunications (PLT). In Europe, the term Power Line Communications is also used for the Broadband data transmission, whereas in North America, the term Broadband over Power Lines (BPL) is more commonly used.
Wi-Fi
Wi-Fi is a wireless networking protocol that allows devices to communicate without direct cable connections. Wi-Fi is common in local plant operations. Longer distances are also possible with the use of directional antennas. WI-FI’s low cost makes it extremely attractive solution. From a security perspective, vendors are able to provide several different authentication and encryption technologies. For more information, see the Recommended Practice Guide, “Securing WLANs using 802.11i.”
Radio Frequency
Radio Frequency is any of the electromagnetic wave frequencies that lie in the range extending from below 3 kilohertz to about 300 gigahertz and that include the frequencies used for communication signals (as for radio and television broadcasting and cell-phone and satellite transmissions) or radar signals Radio frequency communications is commonly used locally within a site facility for plant operations, as well as for some long-distance communications to transmission and distribution facilities. The speeds associated with older radio systems are slow and analogous to those seen when using low-speed modems, although some proprietary spread spectrum radios do support Ethernet.
Microwave and cellular
Microwave and cellular data transmission rates are fast, but the installations can be expensive, and they ultimately require line-of-sight to achieve “five nines” availability. Cellular communications leverage existing telephone networks, and many vendors have incorporated either CDMA (Code Division Mobile Access), which is being gradually phased out, or GSM (Global System for Mobile communications), commonly known as 3G and 4G, respectively, into their products.
Communication channels
A communication channel is a physical transmission path that data follows from one device to another. The medium can be a wire or a logical connection. Protocols define the rules for the communication and detail the interactions between these devices. The protocols include mechanisms for how the devices identify and make connections, arranging rules to specify how the data is packaged into sent and received. Basically, protocols specify interactions between the communicating entities.
ICS Communication Channels
The three main segments of an ICS – field devices, field controllers, and HMI – are connected using these communication protocols. As illustrated, the communications between the field devices and field controllers are separate from command and control communications. Initially, ICS were isolated systems running proprietary control protocols using specialized hardware and software. Widely available, low-cost Ethernet and Internet Protocol (IP) devices are now replacing the older proprietary technologies
The communications medium and the protocols used in ICS will depend on the device selected, the requirements of the system, and the business. For example, organizations dealing with the reliability and management of a bulk power system, in real time have different data communication requirements than an organization that needs to measure the water level in a massive reservoir once or twice a day.
ICS Common Protocols
There are dozens of protocols used in the ICS domain. Many of these protocols were developed to support a specific technology, and as such, are uncommon or only applicable to a single vendor. Some ICS devices are old—they can be in use 25 or 30 years—and use proprietary protocols developed by vendors that are no longer in business. The owners of these systems often resort to buying used equipment to keep their systems operational.
Most, if not all, common ICS protocols are openly published and available for review. The protocols are typically transmitted in clear text, meaning they are not encrypted. This makes them easy targets for eavesdropping and subject to man-in-the-middle (MiTM) attacks. Many of the older protocols were adapted for a network environment by “wrapping” them in TCP/IP packets. This does not improve security because TCP/IP is not a secure protocol.
The ICS vendor community has been under pressure by the ICS owner/operator community to move toward greater inter-operability, and toward a more common set of protocols for communications. Unfortunately, many of these protocols are not secure by design—they were designed for reliability.
Modbus
Modbus is one of the oldest and most popular ICS protocols in use today, largely because of it’s openness and simplicity. Modbus is a digital communication protocol for two or more devices to talk to one another. Modbus is related to the application-level protocols of the Open System Interconnection (OSI) network model. The physical layer is not specified in the Modbus protocol. As a result, Modbus implementations are not limited to a single communication media. This frees the communications engineer to select the best physical media for transporting Modbus packets. It has an open source code, which allows most field controllers to support Modbus, and this has made it very popular. Click the icon below to learn how Modbus is used.
Modbus
Simple protocol
Low cost development
Minimum hardware requirement to support
Master/slave protocol
Communicates with up to 247 devices
Uses standard TCP/IP protocols
Modbus can be found in:
Industrial Buildings
Commercial Buildings
Infrastructure
Transportation
Energy Applications
Modbus - Master/Slave Architecture
Modbus is a serial communications protocol, which acts as a message structure to a establish a master/slave or client/server communication between intelligent devices. This means that a master device talks to all the other devices on the network. It can query them for information or tell them what to do. Unlike most other protocols, however, Modbus is used for both command and control and device level communications.
Modbus - Protocol Versions
There are several versions of the Modbus protocol because the protocol was originally developed for serial connections but has been adapted for the networking world. These include:
Modbus ASCII: Original serial version. Data is transmitted in ASCII characters, which makes it easy to troubleshoot when there are problems
Modbus Plus: An extended, proprietary version that runs on RS485
Modbus RTU: Serial protocol that transmits data in binary form, making data more compact and transmission more efficient than the ASCII version. More commonly used than Modbus ASCII. (Based on Serial communication like RS485, RS422, and RS232.)
Modbus TCP/IP: The TCP/IP encapsulated version of Modbus. (Based on Ehternet.)
Modbus – Vulnerabilities
Modbus Flooding Attack
Modbus Flooding: The Modbus protocol, like many control protocols, does not include any mechanisms to protect confidentiality, although there is Cyclical Redundancy Check (CRC) integrity checking. CRC is a common method used by ICS protocols to determine if the data were unintentionally changed during transmission.
The original Modbus protocol does not protect the system from malformed packets and out-of-scope data storms. As a result, attacks such as denial of service, session hijacking, and integrity compromise, are easily executed against the Modbus protocol.
One attack example is called ModBus flooding. The aim of the attack is to control the system through this flood of messages, effectively drowning out legitimate commands from the HMI.
Modbus protocol is a master/slave protocol: the master reads and writes slaves’ registers.
Modbus RTU is usually used via RS-485 (serial network): one master is present with one or more slaves. Each slave has an unique 8-bit address.
Modbus data is used to read and write “registers” which are 16-bit long.
Holding register: 16-bit; readable and writable
Input register: 16-bit; readable
Coil (Discrete Output): 1-bit long; readable and writeable
Discrete input (Status Input): 1-bit long; readable
Distributed Network Protocol 3 (DNP3)
DNP3 is a communication protocol used in SCADA and remote monitoring systems. DNP3 stands for Distributed Network Protocol 3rd version. It is widely used because it is an open protocol, meaning any manufacturer can develop DNP3 equipment that is compatible with other DNP3 equipment. Because DNP3 was designed to support communications with geographically dispersed facilities, it is also used extensively by the oil and gas, water, and wastewater sectors to communicate with distribution and transmission facilities. It supports communications between station computers, RTU, IED. DNP3 also:
Provides features and functions missing from Modbus
It is an open protocol, therefore numerous vendors support it
Most often uses TCP, but also supports UDP
Uses Port 20000
Traffic is sent in plain text
Does not provide for authentication or authorization
Originally designed to operate on serial communications, but has been migrated to work on IP
Designed primarily for the electrical industry
Supported functions include:
Send request
Accept response
Confirmation, timeouts, error recovery
SCADA/EMS applications
RTU-to-IED communications
Master-to-remote communications
Emerging open architecture standard (Port 20000)
Also available DNP over UDP (User Diagram Protocol)
DNP3 Secure Authentication
DNP3 – DNP3 Application
DNP3 uses the TCP/IP protocol stack and exists on top of the transport layer (TCP or UDP). Three distinct layers contained within the DNP3 application are DNP3 Data Link layer, DNP3 Transport layer, and DNP3 Application layer.
Just as Modbus DNP traffic is sent in plaintext, DNP3 connections are susceptible to session hijacking, denial of service, and other attacks found in modern networking environments. Although the DNP3 protocol was designed to be very reliable, it was not designed to be secure from attacks that could potentially disrupt control systems or disable critical infrastructure.
DNP3 does not natively provide authentication or authorization as a function of the protocol standard; however, the security specification extensions developed for DNP3 are now compliant to the IEC 62351-1 standard (International Electrotechnical Commission) and, when used, provide mitigation to some modern attack methodologies. Even though DNP was originally designed to operate on serial-based communications, the migration to IP has been successful and embraced by the ICS community.
Recognize that the assignment of the protocols is a function of the port used, and not necessarily the payload of the packets. For instance, if the screenshots were taken of two devices using a torrent server for music file downloads using port 20000, the protocol would be classified as DNP because DNP is the standard protocol mapped to Port 20000 using IETF (Internet Engineering Task Force) Port allocations.
Inter-Control Center Communications Protocol (ICCP)
Inter-Control Center Communications Protocol (ICCP), also known as the Telecontrol Application Service Element 2 (TASE.2), is a vendor-independent standard protocol. It is designed specifically for real-time data exchange between ISO (Independent System Operator) control centers, power pools, regional control centers, transmission utilities, distribution utilities, and generation facilities over LAN and WAN.
ICCP is based on client-server communication. All data transfers originate with a request from a control center (the client) to another control center that owns and manages the data (the server). ICCP also provides services for data transfer, depending on the type of request. For example, if the client makes a one-time request, the data will be returned as a response.
If the client makes a request for the periodic transfer of data or the transfer of data only when it changes, the client will first establish the reporting mechanism with the server. This will specify reporting conditions such as periodicity for periodic transfers, or other trigger conditions such as report-by-exception only. The server will then send the data as an unsolicited report whenever the reporting conditions are satisfied.
Also known as IEC60870-6 or TASE.2
Used within the electrical sector between control centers (Port 102)
Data source is mapped at the client and server
Secure version of ICCP incorporates digital certificate authentication and encryption
Some non-SCADA networks are incorporating ICCP into their systems
ICCP Security
ICCP provides the ability to read objects, make configuration changes on remote objects, and control objects. Because the protocol is clear text, visibility to network traffic allows an observer to gain important information regarding relationships between clients and servers.
Standard ICCP is inherently insecure, however, a version called Secure ICCP can be used. Sites not using Secure ICCP should consider using OpenSSL, IPSec, and data link encryption to provide inter-node data security for standard ICCP communications
Fieldbus
Fieldbus is a generic term that describes not one protocol, but a collection or group of industrial computer, digital communication protocols. The idea behind Fieldbus was to eliminate any point-to-point links. Basically, Fieldbus works on a network that permits various topologies, such as the ring, branch, star, and daisy chain.
Fieldbus is a LAN dedicated to industrial automation. It replaces centralized control networks with distributed control networks and links the isolated devices such as smart sensors/transducer/ actuators/controllers.
A few of the characteristics of the fieldbus include:
Bi-directional – This means it is a duplex port; the data can be transmitted in two directions at the same time.
Multi-drop – This is also referred to as multi-access and can be interpreted as a single bus with many nodes connected to it.
Serial-bus – This means the data is transmitted in small packets in a sequential manner.
Multiple Topologies – Fieldbus works on network structures such as daisy-chain, star, ring, branch, and tree topologies.
Fieldbus – Levels
A simple fieldbus consists of four main levels. As the levels increase the level of complexity increases.
Level 4 – The most complex level where all computers and departments are located. This computer-driven level allows data monitoring, file management, and file transfer at a large scale.
Level 3 – This is where high-level data communication happens. Controllers, such as PLC, are connected to each other alongside HMI for complete control of the network.
Level 2 – Increased complexity scale. All sensor bus networks are connected to this network. Variable speed drives and motor control centers are connected to these for individual control over elements.
Level 1 – This level is the least complex and includes all isolated field devices.
Fieldbus – OSI vs Fieldbus Model
There are basically two sections to the fieldbus system: interconnection and application. Interconnection refers to passing of data from one device to another. This is the communication protocol part of fieldbus. The application is the automation function the fieldbus performs.
As seen in the diagram, an OSI model requires data to move sequentially through each of the seven layers. With the fieldbus, the process has been simplified as Layers 3, 4, 5, and 6 are not intended to make fieldbus faster and easier to implement in devices with limited processor power, such as field devices. Fieldbus has no interconnections between networks, which is the purpose of these layers
Profibus
Profibus is a smart fieldbus technology. It is specifically designed for high-speed serial I/O in factory and building automation applications. It is recognized as the fastest fieldbus in operation. Profibus is an open-standard fieldbus defined by German DIN 19245 Parts 1 & 2. Devices on the system connect to a central line. Once connected, these devices can communicate in an efficient manner, but can go beyond automation messages to participate in self-diagnosis and connection diagnosis.
The data link layer is defined in Profibus as the Fieldbus data link layer (FDL). It is based on a token/bus/floating master system. Profibus is a network made up of two types of devices connected to the bus: master devices and slave devices. It is a bidirectional network, meaning one device (the master) sends a request to a slave, and the slave responds to that request. The bus contention is not a problem because only one master can control the bus at any time, and a slave device must respond immediately to a request from a master. Profibus can support addresses from 0-127, only 0-125 are used, because 126-127 have special uses and are not assigned to operational devices.
There are three types of Profibus: Fieldbus Message Specification (FMS), Profibus DP (Distributed Peripherals), and Profibus PA (Process Automation). FMS is used for general data acquisition systems. DP is used when fast communications are needed to operate sensors and actuators via a centralized controller. Profibus PA is used in areas when intrinsically safe devices and safe communications are needed, such as to monitor measuring equipment in process automation applications.
It is very easy to connect all three versions together on the same system because the main difference between the versions is the physical layer. This would allow a company to run lower-cost devices in most of the plant with FMS, DP where speed is needed, and PA in those areas requiring intrinsically safe devices.
Of the three versions, PA and DP are those most commonly used. Profibus PA was developed to connect directly with Profibus DP. The graphic below demonstrates how the two systems are connected.
Profinet
Profinet IO is an Ethernet-based fieldbus protocol with real-time capability specified in IEC 61784-2. In Real-Time (RT) mode, sending cycles of up to 1ms are specified. This is achieved by precise timing and direct communication on the MAC layer. If lower cycles are required, e.g., in motion control systems, the Isochronous Real-Time (IRT) mode can be applied, which, however, requires special hardware due to the use of an adapted MAC layer, here. Furthermore, there is also a Non-Real-Time (NRT) mode that is based on UDP/IP. It is used for non-time-critical communication, such as diagnostics and configuration. A minimal Profinet IO system consists at least of one PLC and one or more devices as peripheral equipment connected over Ethernet. The standard supports star, tree, and ring topologies as well as a line topology implemented by the integrated switch functionality in the Profinet IO devices.
Profinet IO Device Classes
Profinet IO defines three device roles. The IO Supervisor is an engineering device used for project engineering, diagnostics, and troubleshooting. It usually is a PC, a Human Machine Interface (HMI) or a programming device. The automation routine is executed in the IO Controller, which is typically a PLC. An IO Device is a distributed field device that exchanges data (e.g., sensor values) with one or more IO Controller. Every Profinet IO setup contains at least one IO Controller and one IO Device.
Configuration
The above figure depicts the eight steps from the configuration to the operational stage.
At first, (1) the system is planned with the help of the IO Supervisor. In detail, an engineering software is used to model the desired topology as well as the automation process.
Thereafter, (2) the IO Supervisor sets the IP address of the IO Controller and then (3) the device name.
Next, the engineered project setup from (1) is then transferred to the IO Controller. After that, the work of the IO Supervisor is finished.
The IO Controller (5) checks the name of the device and (6) assigns the configured IP address.
Before any process data can be exchanged, (7) a logical channel called Application Relationship (AR) has to be established between the IO Controller and the IO Device.
Within an AR, further Communication Relationships (CRs) are set up, as shown in the below figure. For the acyclic transmission of records (e.g., configuration parameters, diagnostics), a RecordDataCR is used over the non-real-time channel, whereas cyclic data exchange and alarms are sent over the real-time channel. The connection is established and (8) the real-time data exchange starts.
Name and IP Assignment Using Profinet DCP
Before setting up a connection, the IO Supervisor assigns names to the IO Devices using the Discovery and basic Configuration Protocol (DCP). The name must be unique for every device of the Ethernet subnet and complies with the DNS conventions. An example setup is illustrated in Fig. 3a. Here, the name “device1” is assgned to the IO device. First, a DCP Identify request with the desired name is sent by the IO Supervisor to the Profinet IO multicast address. If a device has already assigned this name, it sends a DCP Identify response immediately. If no response is received within a timeout time (DCP Timeout) the supervisor assumes that the name is not already set. In this case, a DCP Set request is sent to the MAC address of the IO Device to set the desired name “device1”. When the process is successful, it is concluded with a DCP Set response to the supervisor.
The situation is similar for the assignment of the IP address (see Fig. 3b). Initially, a DCP Identify request is sent by the IO Controller to the multicast address to ask if the name “device1” is already assigned. The IO Device answers with a DCP Identify response directly to indicate that the name is assigned for this device. In the next step, an ARP request is broadcasted to determine if the desired IP address “192.168.0.10” is already assigned to another device. When no ARP reply is received within a certain time, it is assumed that the address is still available and a DCP Set request is sent to the IO Device containing the desired IP address. If this was successful, the device sends back a DCP Set response to the controller. Another possibility is to set the IP address via DHCP.
Open Platform Communication (OPC)
OPC (Open Platform Communication, formerly OLE for Process Control) is a series of standard, manufacturer-independent programming interfaces through which an automation application client such as an HMI can access data coming from remote devices such as PLC, fieldbus devices, or real-time databases. OPC has become the most versatile way to communicate in the automation layer in all types of industry.
OPC is a client/server based communication, which means that you have one or more servers waiting for several clients to make requests. Once the server gets the request, it then answers that request before returning to a waiting state. The client can also tell the server to send updates when the server receives such updates. It is ultimately the client that decides when and what data the server will gather.
According to the OPC Foundation’s website, “OPC is open connectivity via open standards.” For example, an operator pulls up a display on the HMI. The HMI has an OPC client that sends a request to an OPC server to provide the data needed to populate the display. The OPC server, using a protocol such as Modbus or DNP 3.0, obtains the data from the RTU and passes it to the client.
OPC Classic Specification
OPC does not represent a network protocol in the traditional sense, but rather a capability to support the interfacing and interconnection with disparate vendor technologies.
OPC is a set of several specifications for sharing data based on Microsoft technologies COM, DCOM, OLE, and RPC. Microsoft has since replaced these technologies with .NET and no longer supports these legacy technologies. OPC standards based on COM and DCOM are referred to by the OPC foundation as OPC Classic Specifications.
The OPC Data Access (DA), the most basic of the protocols and is the original of the OPC Classic Specs, defines the exchange of data including values, time, and quality information. The second protocol to be added, Alarm and Events (A&E), defines the exchange of alarm and event message information. It is a subscription service where the client recieves all incoming events. The OPC Historical Data Access (HDA) specification defines query methods and analytics that may be applied to historical, time-stamped data, and supports record data sets for one or more points.
OPC Unified Architecture (UA)
These classic specifications have served the industry well, but as technology has evolved, so did the need for OPC specifications. In 2008, the OPC Unified Architecture (UA) was developed as a platform independent service-oriented architecture to address the issue of platform interoperability by using Web services-oriented architecture (SOA) in place of .NET and DCOM. UA significantly expands the use of OPC to include non-Windows platforms such as field controller, cellphones, UNIX, and Linux enterprise servers, as well as Window servers.
The biggest difference between OPC classic and OPC UA is that OPC UA doesn’t rely on OLE or DCOM technology (windows), making it possible to implement OPC UA on any platform, such as Apple, Linus, or Windows. Another important UA feature is its ability to use structures and models, so data tags or points can be grouped and given context, making governance and maintenance easier.
Why is OPC so Popular?
Of all the protocols, OPC is most popular. “To understand why OPC is so popular, consider the example of printer drivers: Under MS-DOS, the developer of each application also had to develop a printer driver for every printer, one for an Epson FX-80, one for the HP LaserJet, and on and on. Microsoft solved the printer driver problem by incorporating printer support into the operating system. Today, printer drivers provided by printer manufacturers serve all applications.
In the industrial automation world, each company writes its HMI software and a proprietary driver for each industrial device (including every PLC brand). Rockwell wrote its HMI and a proprietary driver to each industrial device (including every PLC brand, not just its own) and so on. By adding the OPC specification to OLE technology in Windows, Microsoft is providing the infrastructure to solve the industrial device driver solution as well.
The image below depicts the 1:1 relationship between devices used in the analogy.
OPC simplifies protocol development by eliminating the need for an ICS vendor to produce an OPC client, foregoing the expense and effort of developing multiple protocols for their products.
OPC Relationships
OPC provides an elegant solution to the protocol problem by introducing the concept of using an OPC client/server architecture in the ICS environment. The client makes a data request to the OPC server, and the server obtains the data from the field controller by communicating with it using its native protocol.
The OPC server supports almost any ICS protocol imaginable, including Modbus, DNP3, ICCP, and Foundation Fieldbus.
ICS Cybersecurity Risk
Risk
Risk Elements
Not all threats are intentional. Weather, material fatigue, or human error all contribute to risk. However, in our case, we will be focusing on intentional threats, or threats from humans who are actively seeking to access or cause harm to control systems.
Threat * Vulnerablity * Consequence == Risk
Risk Equation is sort of guideline to understand the level of risk we are taking. There are different equation elements and different factors that contribute to elevated risk, the security concerns that we should be aware of by integrating our IT with our OT control systems.
Risk is a function of threat, vulnerability, and consequence. The better we understand each of these factors, the easier it is to plan appropriate security measures around control systems. The most complex attribute is threat because it can be intentional or unintentional, natural or man-made. When trying to develop defensive strategies to protect control systems, it is important to understand the threat landscape for appropriate countermeasures or compensating controls to be deployed. The risk equation should not be taken literally as a mathematical formula, but as a model to demonstrate a concept.
Risk is the possibility of something undesirable occurring and we need to understand how to increase or decrease the chances of that happening.
Threat
We’re mainly concerned about people with a malicious intent. A threat would be the potential for someone to exploit a particular information system vulnerability. In the context of cybersecurity, this could be a hacker. As technology has improved in recent years, it has also opened up vulnerabilities for not only malicious nation and terrorist organizations, but other organized and mainstream threats. - Three attributes of a threat? Capability. Opportunity. Intent. When all of these requirements are met, in all likelihood the attack will succeed. Applying cybersecurity strategies that help to deter, detect, and itigate a threat can alter how or where an attacker might have opportunity. Removing the opportunity increases the chances of an attack failing. - In our situation, we may choose different policies and security measures to enact to protect our control systems. Some ideas include:
Network segmentation with strict ingress and egress firewall rule sets.
No externally routable network connections.
Network monitoring, host logging, and maintaining a Collection Management Framework (CMF).
Secure Credential Management, no credential sharing, active directory, or account/group management.
Incident response plan, the ability to detect and declare a cyber incident.
Keep in mind that the Internet, removable media, and email are the main attack vectors for Industrial Control Systems.
Vulnerablity
A vulnerability is any weakness that can be exploited by an adversary or caused through an accident. In our conversation, we’re mainly concerned about intentional attacks. For example, a hacker may use phishing scams to gain login credentials. They may possibly exploit an older or unpatched vulnerability in a system. From there, they can pivot into different networks, including the control systems, and potentially cause great harm. Mitigating these vulnerabilities can be challenging.
What are some challenges when mitigating vulnerabilities? In ideal situations, asset owners will have a program in place that provides timely information about ICS vulnerabilities. Even with accurate vulnerability information, verifying the applicability of the vulnerability to an ICS can be difficult. Mitigating these vulnerabilities can be even more complex because:
Extensive testing needs to be performed prior to the application of a mitigation (such as applying a patch) to ensure it does not affect critical system functions; and
If a patch or update is considered viable, strategic planning and downtime are required to implement it. In high availability control system environments, finding downtime can be challenging.
Even after testing, the system must be monitored to ensure the mitigation is working as intended.
Consequences
Financial loss and damage to our systems can have terrible results!
Historically, consequences have been measured in terms of financial loss and has been easy to calculate as it relates to IT systems. The calculations have included factors such as lost revenue, asset replacement cost, cost of system repair, etc.
The consequences with ICS are similar, but in many cases, other factors can contribute to the overall consequences. For example, a bridge operator uses a control system to raise a lower a bridge for passing ships. Imagine being locked of out the control system. Failure to raise and lower the bridge for passing ships could result in not only an accident, but a loss of life and confidence.
Some threats are beyond our control, like a hurricane. However, knowing a hurricane could hit can help business owners assess weak points and come up with a plan to minimize the impact.
Elevated Risk
The cyber risk, measured in threat, vulnerability, and consequence, was limited since intrusion would most likely originate from an insider accessing the control system.
While there were vulnerabilities in the ICS, the risk was perceived as acceptable because of physical controls, such as door locks, were used to prevent unauthorized access.
What elements have contributed to past ICS incidents?
All sorts of things! People, processes, systems, components. Typically, we put those things in one of two groups: Cultural and Technical factors. The highest concentration of factors is technically-based.
Cultural: Cultural factors include any of the people or processes involved with designing, building, operating, and maintaining ICS. Today’s businesses require formerly isolated ICS to be connected with their corporate and customer networks and the Internet.
Technical: Technical factors have to do with the actual systems and components of which ICS are composed.
Cultural Factors
Cultural - People (Owner, IT, etc.)
While process or policy might prevent adequate cyber security,it is important to note that people created those processes with a lack of knowledge and awareness of cyber security risk decisions get made that can introduce technical vulnerabilities. Owners and ICS engineers haven’t always perceived that there were credible cyber security threats that justified the added expense of securing their control systems. This was true when these systems were isolated or air-gapped and running on proprietary hardware. However, as people gain a better understanding of the vulnerabiltiies created by an interconnected ICS there is an increased awareness of the cybersecurity threats to their systems which can lead to a lower, overall risk.
Cultural - Policies & Procedures
Many subject matter experts consider culture to be the most important factor in developing and maintaining an effective control system cybersecurity system. Previously, processes and policies didn’t allow for threats to be considered, or vulnerabilities to be protected, or for consequences to be mitigated. Working under old assumptions and paradigms created opportunities for someone to access control systems.
Remember:
Be mindful of outdated processes and policies that don’t account for cybersecurity.
Culture allows for technical factors to increase in quantity.
Technical Factors
Technical - Vendors
When using vendors, it is important to be mindful of the vulnerabilities that can be introduced. Although the vendor and research community do an excellent job uncovering system-specific vulnerabilities, the implementation of countermeasures required to mitigate the vulnerability may result in the control system operating in an undesirable or unexpected manner. For example, an antivirus program can have between a 2 to 19 percent slowdown in passive discovery and a 6 to 57 percent slowdown in Full-scan mode. In time-critical processing, this is not acceptable.
Be mindful of system-specific vulnerabilities
Keep in mind that solutions may hinder required function of the ICS (example, an antivirus program that locks up a control system or another tool that requires an ICS to be shut down while running an update).
Technical - Cybersecurity
As technologies have developed over time there has been a gap growing in ICS. Originally, ICS designers didn’t factor in cybersecurity as being an issue when control systems, such as those found in water, electrical, and other locations were put in place. As a result, there are exploitable, technical vulnerabilities at the network and device level. These vulnerabilities are found in both legacy systems and some current designs.
Vulnerabilities in Cybersecurity
Legacy devices (old modems, computers, ICS, etc.)
Current devices (holes in security between ICS and networks)
Technical - Interconnected Networks
As we’ve learned more and seen consequences from other cyber security incidents there has been a shift in perspective. Asset owners and operators are beginning to understand that interconnected IT and ICS networks can create opportutnities for an adversary to gain access to control systems. This can include remote access capabilities peer-to-peer networking direct internet connectivitiy or network modifications that enhance business performance. When networks are linked, and there is no protection between them, it creates a vulnerabilitiy.
Technical - Increasing Threats
Here is an alarming factor, the interest and number of malicious activity groups is on the rise. A recent 2020 report showed that there has been an increase from 11 malicious activity groups to 15. These new ICS activity groups are primarily targeting energy and manufacturing.
Control System Solutions
One of the most important technological issues relating to cybersecurity and ICS is the fact that some vendors create their control system solutions to run on contemporary standard operating systems, such as Microsoft Windows. This means a vulnerability within an ICS may not be in the ICS application itself but in the operating system on which the application is dependent. Do you remember our bridge controller from earlier? His system was running an outdated OS that he hadn’t patched or updated. When an attacker can compromise an operating system, the attacker has compromised every application run on that system since they have control like a regular user.
Security issues created by integrating IT systems with ICS
Being more aware and knowledgeable will help. However, cybersecurity is always being balanced against the business perspective. Merging IT and ICS networks have done things like optimizing the workflow, making access easier and more universal-these all increase revenue. But it increases our vulnerabilities, which in turn increases risk! In order to counter threats, we need to understand how an adversary thinks. Will it be a targeted attack against a specific system? Or a broad set of systems within a larger corporate enclave? Understanding the intentions of the attack is important since it helps to know how an ICS could be compromised and why.
Security Mitigations
Few things to consider when implementing security mitigations to ICS:
One, communication speeds. IT solutions like cryptography and firewalls can cause latency issues. You don’t want that in most ICS svstems
Two, detection. Anomaly detection that looks for irregularities works better than intrusion detection systems that relies on signatures to work when there may be no ICS-specific signatures to monitor.
Three, intrusion prevention isn’t always favored by ICS. This is because of their active response need, which could interrupt critical ICS operations, and the impact on data availability and integrity.
Four, resources. Antivirus solutions, while great, can consume a CPU’s capacity, locking an operator out during a system scan for long periods of time.
Five, methods and tools that work in IT environments may not transfer well to ICS.
There can be adverse and irreversible effects on equipment and services.
Although many security mitigation techniques are useful and effective in IT domains, requirements for data availability and integrity force us to revisit how we implement them within an ICS.
Threat
Introduction
Risk and Threat
Risk is a function* of threat, vulnerability, and consequence. The most complex attribute is threat because it can be intentional or unintentional, natural or man-made.
When trying to develop defensive strategies to protect control systems, it is important to understand the threat landscape for appropriate countermeasures or compensating controls to be deployed.
Note
The risk equation should not be taken literally as a mathematical formula, but as a model to demonstrate a concept.
Understanding the Threat
Finding the appropriate balance of effective countermeasures that don’t impact control system operations can be challenging, and in many cases asset owners need to identify levels of acceptable risk to their systems.
Understanding the threat helps asset owners:
Understand the realistic profile of a cyber adversary that could target specific control systems.
Make better informed decisions regarding what assets to protect and how.
Have the right information to fine tune cybersecurity training for specific personnel involved in control system operations.
Define the cybersecurity criteria to be met during system design and when the system is fully operational.
Understand what countermeasures can be deployed to escalate cyber defenses beyond the capability of recognizing adversaries.
Design appropriate security monitoring strategies addressing threat aspects with the greatest contribution to cyber risk
Attributes of Human Threat
What is Threat?
A threat is any person (threat actor), circumstance, or event with the potential to cause loss or damage.
It is important to consider threat relative to capability, opportunity, and intent. From a defensive perspective, if we know the capability of our adversaries and the vulnerabilities that would most likely provide them opportunity to attack, we can create countermeasures removing those opportunities. We can also create defenses requiring capabilities beyond the adversary’s ability to compromise.
If there is a certain condition associated with compromising a control system, and we create countermeasures forcing the adversary to work at a level beyond that condition, the economics suggests the attacker may abandon the attack altogether. Ultimately, understanding capabilities and motives should help improve security postures to create countermeasures appropriate to the risk, while minimizing impacts to business operations.
Human Threat Attributes
The attributes associated with a human threat are capability, intent, and opportunity.
Capability
Capability is the means or resources available to perform an attack. This includes attacker expertise and knowledge, as well as the money and tools for carrying out the attack.
Generally, adversaries will have a static capability and will need to adjust, depending on their intent and the available resources.
For an adversary to determine if current capability is adequate or needs to be modified, the adversary needs to have as much information as possible about the target. Incorrect calculations regarding requirements needed to successfully attack a target can result in too much or not enough capability for the attack.
Intent
Intent is the motive or goal of the attack, and is usually the one attribute cyber defenses cannot impact.
There are many different motives for launching a cyber attack, including curiosity, economic advantage, industrial espionage, national security, revenge, or promoting a cause.
The intended consequence plays a factor in intent as well as the selection of the target.
Opportunity
Opportunity is the set of conditions that need to be met for adversaries to be confident their attack will be successful. This can be related to the actual access an adversary has to a target, as well as access to specific knowledge about the system.
The opportunity also extends beyond access and knowledge of the system to include timing, which has the potential to change the value of a control system as a target. Opportunities are related to the exposure and vulnerability of targets, two things defenders can control.
Attributes
It is important to understand attributes because they are interdependent when it comes to determining whether an adversary may execute an attack. Attributes also allow defenders to create strategies that may thwart attacks.
Alignment of all three attributes—capability, intent, and opportunity—may indicate an attack is imminent. Alignment greatly impacts the probability that a threat actor can execute a successful attack.
As we will see, influencing an adversary’s intent is rarely possible, but improving defensive and detection capabilities to render an adversary’s capability insufficient is always possible. Deploying security countermeasures has a direct effect in removing or changing adversarial opportunities to attack control systems.
Unlike the risk equation, the individual attributes of threat are summed, not multiplied. This means adversaries with a strong intent or motive can still be a threat, even though they may not have the capability or opportunity to launch an attack. Over time, they may acquire or create the capability and opportunity.
Threat is often the least understood and most difficult to quantify because human behavior can be unpredictable, and involves diverse capabilities, intent, and opportunities. Unpredictable behavior creates situations where static countermeasures may not be adequate to protect critical systems.
Threat Actors
Hazards vs. Threats
Threats are not predictable in the same way as hazards, meaning cybersecurity cannot be assessed in the same way as safety. Defense-in-depth strategies can help compensate for the diversity of threat actors and their wide range of capabilities. As such, it is important to recognize that as the threat landscape changes, so must our ability to defend the systems.
Hazards and threats are two distinct, but related items, as shown in the table below.
Hazards (Safety) |
Threats (Security) |
---|---|
|
|
Hazards
Hazards are considered situations possessing inherent and known dangers. Examples of hazards include electrical, confined space, or flammable. The failure of a piece of electronics that causes a chamber filled with acid to overflow is also an example of a hazard. In general, this acid overflow hazard falls into the category associated with safety. In safety studies, we have proven historical data about equipment failures that is tied to known dangers and risks, and we can calculate probabilities on when undesirable events might happen. In some cases, we can calculate the actual average time it will take for a system or device to fail based on environmental factors and past use cases. But the data used to do this are based on predictable behavior.
The field of industrial automation has historically collected information on hazards that are used to develop safety guidelines. Databases of hazards and historical events are used to determine the probability of a dangerous event occurring. This in turn allows professional certification of systems to meet measurable safety requirements. Things to consider include system lifetime, mean time between failures, and other measurable attributes that can help system owners proactively manage the safety and resiliency of equipment while optimizing performance.
Hazards can be categorized. Certain attributes are associated with different hazards. These attributes offer analysts information that may be used to develop fairly precise forecasting of different types of events, allowing analysts to plan for certain incidents related safe operation.
Threats
Threats are not predictable. Cyber attackers, weather, animals chewing cables, personal events, or falling trees are all examples of threats. If a threats are not man-made, it is still hard to accurately predict how and when they will occur. We don’t have data or granular information to help us determine if and when a threat-based event will happen. For human threats, this can be difficult as we usually cannot define the combined value of capability/opportunity/intent. Safety and security have significant roles in the resiliency and reliability of ICS. Safety and security are complementary, but the disciplines themselves are different. It is important to calculate security risk for control systems, and even more important to calculate appropriate proactive and reactive security mitigation strategies.
Threats are also not predictable even when historical information exists. Being able to categorize threats and predict associated incidents with precision is difficult because people do unpredictable things. This unpredictability is often driven by a multitude of factors beyond the control of even the threat actor (i.e., weather, politics, personal events).
Threat Actor Categories
By better understanding the capabilities, intents, and opportunities of human threat actors, we can better design defenses for ICS. The types of threat actors can roughly be divided into three categories: mainstream; organized; and terrorist and nation state.
Mainstream |
Organized |
Terrorist / Nation Stat e |
---|---|---|
|
|
|
Mainstream
Group 1 is, historically, the largest threat group, although these mainstream threat actors are generally not well organized. The motivation of this group varies, but traditionally it has been related to notoriety, fame, or attacking a system to attract attention.
Because they are not always organized should not negate the fact their technical skills can be quite advanced. As their notoriety increases, the demand for their services (legal and illegal) increases.
Some cybersecurity researchers attack systems to improve their knowledge of how these systems work, which makes them more efficient programmers or engineers.
Although they usually operate independently, mainstream threats can combine to form small groups with limited organization
Example:
Event: A Polish teenager modified a TV remote control to change the Lodz Train track positions. As a result, he caused four derailments, injuring 12 people.
The teenager had the capability to modify a TV remote control.
His intent was a prank.
His opportunity was his ability to trespass in the tram depots.
Event: In January 2018, Anonymous Calgary Hivemind, a collective of ethical hackers, hacked five to ten Nest smart home security camera accounts and spoke with people on the other end of the camera to let them know the vulnerabilities of their security system.
Impact: The vulnerabilities of smart home products were exposed. Privacy of consumers were exposed.
Lesson Learned: Secure remote access with solutions such as VPN or two-factor authentication. Use secured internet connection for your smart devices.
Event: In April 2020, Which?, a consumer group, and their testing partners Context Information Systems hacked a Volkswagen Polo SEL TSI and Ford’s Focus Titanium Automatic as part of their internal research of security flaws in automobiles.
Impact: Volkswagen Polo: Infotainment unit storing the customer’s data revealed. Collision warning system tampered. Ford Focus: Tricked the system to display a false setting of the tire pressure by intercepting messages from the tire pressure monitoring system.
Lesson Learned: Automobile control systems are vulnerable to the same kind of attacks that are launched against internet-connected computers and there needs to be institute appropriate cyber safeguards.
Organized
Group 2 consists of more organized threats, typically targeting a particular group or groups
Group 2’s intent may be financial, revenge, theft of trade secrets, or drawing attention to a cause (hacktivists). Their attacks are more structured and sophisticated than Group 1, but it is not uncommon for Group 2 threats to include membership, capabilities, or skills traditionally found in Group 1.
As the structure of this group grows, there is the possibility of recruitment from Group 1 individuals to become part of a larger and more organized effort.
Example:
Event:
Disgruntled traffic engineers who maintained the Los Angeles traffic control computers hacked into the system and modified signals at four major intersections. This caused major gridlock and it was 4 days before traffic could return to normal.
Their capability was the insider knowledge they had as a result of maintaining the traffic control system.
Their intent included disgruntled employees and was thought to be motivated by a pay bargaining dispute between employees and the Engineers and Architects Association.
Their opportunity was their ability to hack into the system.
Event: In November 2020, FIN11, which is a part of the larger group TA505, infected South Korea’s retail giant E-Land Retail with their CLoP ransomware after a year of laying low. TA505 infected servers and stole customer data.
Impact: More than 2 million batches of credit card information were stolen, and 23 store locations were shut down.
Event: January 2020, Tillamook County experienced a ransomware attack from an international cybercriminal organization called REvil - also known as Sodinokibi or Sodin.
Impact: This ransomware attack shut down the county’s server, internal computer system, website, phone systems, and email networks. The cyberattack also created and encrypted backups of the county’s data, which i essential to official operations. The county spent some of its taxpayer funds to pay a ransom of $300,000.
Event: In December 2020, Doppleymayer, a cybercrime gang, initiated a ransomware attack on Foxconn’s Mexican facility.
Impact: Encrypted about 1,200 servers; Exfiltrated about 100 GB of data; Destroyed 20-30 TB Of backups
Event: April 2016, March 2018, and May 2018 - FBI warns that cyber criminals have developed a new attack called CEO Fraud, also known as Business Email Compromise (BEC).
Impact: Losses of over $2.3 billion over the past 3 years. Specifics: Attacker spoofs a message from an executive and tricks employees into wiring funds or sending highly sensitive documents to the attacker. Law enforcement was notified of victims in all 50 U.S. States and 79 countries.
Lesson Learned: Awareness is the best defense you have against these attacks. Have one-over review of significant transactions. Install anti-spoofing software
Terrorist/Nation State
Group 3 includes terrorist and nation state elements. The goals of this group’s attacks are to disrupt, terrorize, or eliminate major aspects of society. The impact or consequence of a Group 3 attack could be catastrophic.
Targeted groups include financial institutions, political establishments, military organizations, and media outlets. Intelligence sources are also concerned about utilities and manufacturing facilities.
Nation states with well-funded cyber warfare programs are also a concern.
Both terrorist and nation state threats have significantly more resources than Groups 1 and 2. As a result, Group 3 actors are able to launch more sophisticated attacks.
As Group 3 programs grow, it is expected they will recruit from, or use, capabilities, techniques, and procedures found in Group 1 and Group 2.
Example:
Event: ICS-CERT was notified of the existence of a new malware application called Stuxnet. It is believed to have been introduced through a portable media threat vector (USB stick). It contained more than 4,000 functions and used as much code as some commercial products. Stuxnet modified programs for a specific PLC and hid those changes. This attack was a game changer in the ICS hacking community because it is the first known malware to target a specific ICS configuration. - Though the author of Stuxnet is still unknown, it has all the hallmarks of a Group 3 attack. As one of the ICS-CERT advisories on Stuxnet reported, “The overall sophistication of the Stuxnet malware cannot be overstated.” According to Wikipedia, “The Guardian, the BBC, and The New York Times all reported that experts studying Stuxnet considered the complexity of the code indicates that only a nation state would have the capabilities to produce it.”
Event: Covid-19 Researchers Fight Against Threat Actors: In 2020, Strontium, Zinc and Cerium, alleged Russian and North Korean nation-state backed actors, along with others, launched a series of cyberattacks at companies conducting research and development of the vaccine for Covid-19.
Strontium used a password spray to steal login credentials while Zinc and Cerium used spear- phishing email lures to steal credentials. Cerium even posed as World Health Organization (WHO) officials in their spear phishing email lures. Majority of their attacks were prevented due to security protections; however, these events showed that there are no exceptions when it comes to nation-state threat actors and a world crisis.
Event: Armaco Attacked with Shamoon: August 2012 and January 2017, Shamoon a weaponized malware was used to wipe hard drives and disrupt operations at organizations in Saudi Arabia. Impact: 15 state organizations and private companies were affected in 2012. 3 state agencies and 4 private companies were affected in 2017.
Specifics: Iranian-backed hackers were suspected of using Shamoon, also known as W32.DisTrack, to wipe ~35,000 workstation hard drives owned by Aramco Oil Supply. The initial access was believed to be through a phishing email with powershell-laden document used to establish a C2 communications link to the attacker’s server. This link was used to remote shell and deploy tools, malware, and execute commands. A coordinated Shamoon outbreak was initiated, and computer hard drives were permanently wiped across the organization.
Insider Threats
Can the security of an ICS be threatened by a trusted insider (an employee or vendor) who has specific knowledge of, and access to, the ICS? Absolutely! Recall that threat attributes are summed.
Even though an insider may not have intent, they certainly have substantial capability and opportunity, which may make them a significant threat.
Unintentional Threats
Based on known ICS cyber incidents to-date, the most likely ICS attacks originate from an insider, or from an external adversary who has acquired credentials to operate as a trusted insider.
An insider could be acting alone or as a member of a Group 2 Organized attack, or the more serious Group 3 Terrorist/Nation State attack. The attack may be unintentional or intentional. The causes of an unintentional incident include:
Deceived: social engineering, phishing
Mistakes
Poor training
Careless, taking shortcuts, fatigued
A mistake or failure to follow adopted policies can also cause a cyber incident on an ICS that is as severe as a deliberate attack. A well-trained system administrator is crucial to protecting an ICS from cyber attacks.
Olympic Pipeline Explosion
As an example of an unintentional threat, a 16-inch gasoline pipeline operated by Olympic Pipeline Company ruptured due to a pressure surge caused by a faulty pressure relief valve. The rupture released gasoline into Whatcom Creek in Bellingham, WA. This unfolding tragedy was exacerbated by the inability of the Supervisory Control and Data Acquisition (SCADA) to perform control and monitoring functions. The gasoline in the river was accidentally ignited, which resulted in an explosion and fire.
The explosion resulted in three fatalities, over $45M in property damage, and matching fines of $7.86M against two companies.
The database used by the pipeline SCADA system was modified in real time, without the necessary review to ensure the changes would not impact normal operations and safety of the pipeline.
The unchecked changes were implemented to the live database, causing a critical slowdown in system monitoring. These changes resulted in the SCADA system polling operational data from the pipeline every 6 minutes, rather than every 3 seconds.
Explosion Findings
Although the SCADA system was not directly responsible for the rupture of the pipeline or the explosion, it did contribute to the tragedy because it was not operating properly during a crucial time leading up to and following the pipe rupture.
The findings of the National Transportation Safety Board included:
If the SCADA system computers had remained responsive to the commands of the Olympic controllers, the controller operating the accident pipeline probably would have been able to initiate actions preventing the pressure increase that ruptured the pipeline.
The degraded SCADA performance on the day of the accident likely resulted from the database development work done on the SCADA system.
Had the SCADA database revisions performed shortly before the accident been performed and thoroughly tested on an offline system, instead of the primary online SCADA system, errors resulting from those revisions may have been identified and repaired before they could affect the operation of the pipeline.
Olympic did not adequately manage the development, implementation, and protection of its SCADA system.
Intentional Threats
Motivations for launching an intentional attack on an ICS could be related to those cited earlier, but may also include:
Recruited : blackmailed, bribed, embedded
Revenge : disgruntled, terminated
Curiosity
Financial
Threat: Anthony Levandowski former Google Engineer
Event: August 2019, Levandowski was charged with 33 counts of theft and attempted theft of trade secrets from Google. He stole 14,000 files containing critical information about Google’s autonomous-vehicle research before leaving the company. Between 2009 and 2015, $1.1 billion was spent on the Waymo project to develop the technology that was stolen. Luckily, Waymo, which became a standalone Alphabet subsidiary in 2016, was able to prove the theft of trade secrets and get compensation ($245 million worth of Uber shares). They also entered into an agreement that Uber wouldn’t use the stolen information for their software and hardware.
Threat:Sudhish Kasaba Ramesh former Cisco engineer
Event: On September 24, 2018, Sudhish Kasaba Ramesh, a former engineer at Cisco, breached the company’s cloud infrastructure and infect it with malicious code. The consequences of this cyberattack resulted in the deletion of 456 virtual machines that were used for the company’s WebEx Teams application. How did the breach effect the company? Nearly 16,000 users were without access to their accounts for two weeks. The company also spent approximately $1.4 million in employee time to audit its infrastructure and fix the damage, and paid a total of $1 million in restitution to affected users. (Ekran, 2020).
Threat: Mario Azar, IT consultant
Mario Azar, an IT consultant for Pacific Energy Resources, successfully disabled an offshore oil platform’s leak-detection system remotely, using his company’s virtual private network (VPN) over the Internet. After receiving his last payment for contract work, Azar petitioned to continue work as a full-time employee, but Pacific Energy Resources declined to hire him.
Azar continued to remotely log into the leak-detection system, which was used to monitor three offshore oil platforms near Huntington Beach, CA. This resulted in impaired computer system monitoring for leaks on all three offshore platform
Threat Trends
Threats
As ICS developers began to leverage interoperability and open-system connectivity, they moved away from isolated architectures. However, during this transition, many of the systems were still dependent on legacy hardware and software, and the requirements for availability often prohibited asset owners from taking their systems offline for long periods for updates. As a result, appropriate security defenses were not installed, and the ones that were installed often did not provide sufficient defense against modern-day attacks.
ICS defenses have not evolved as quickly as those in the corporate IT world, and in many cases the average ICS is still years behind current levels of cybersecurity found in non-ICS technology.
Because of the rapid integration of technology and networks between corporate IT and control systems, there is a huge push to protect legacy ICS from modern-day attacks. As many of the security countermeasures were developed for environments that set confidentiality as a primary focus, deploying such security mitigation technology into ICS environments (where both availability and integrity are primary objectives) can actually have a negative impact productivity.
Current Trends
What are the current threat trends? As mentioned previously, Stuxnet was a game changer in that it was the first publicly known malware developed to target a specific ICS, as well as a specific process. After Stuxnet, new variants of the malware have surfaced, in addition to other ICS-specific attacks. For example, Havex malware sought control systems using a specific protocol unique to industrial automation.
The emergence of Advanced Persistent Threat (APT) is a trend that cannot be ignored. APT typically refers to cyber threats from nation states. As the name suggests, this type of threat employs advanced techniques to deploy sophisticated attacks that compromise systems, help advance an attacker’s goals, and avoid detection to stay resident as long as possible. The attacks are not necessarily limited to cyber methods of compromising a target. They may be combined with other intelligence resources to reach a specific goal or objective.
This persistence toward reaching a specific goal is different than the objectives of more opportunistic types of attacks where attackers are looking for any information to exploit for financial (or other) gain. Because these are specific attacks, adversaries execute unique, customized code to achieve a specific objective. These threat actors have extensive resources (capability) and motivation (intent) to reach their goals.
Trends
We have discussed how complex and sophisticated attacks are now being bundled into easy-to-use tools. The Metasploit framework is an example of such a tool, but similar tools are being adapted to allow attackers to accelerate and simplify attacks on ICS environments.
Threat actors are also showing an interest in the vulnerabilities of ICS products and architectures, as evidenced by increased chatter in the hacker community.
Furthermore, Department of Homeland Security’s Cyber + Infrastructure (CISA) is seeing an increase in the number of reported incidents.
The combination of these trends indicates the threat is not going away, but is increasing. Considering this, asset owners need to adopt more comprehensive mitigation strategies to protect their control systems from attack.
Attacker Tools and Techniques
The phases of the attack life cycle include:
Reconnaissance/targeting
Vulnerability assessment
Attack/penetration
Just like a carpenter uses a variety of tools to build a house, an ICS threat actor also uses a number of different tools and techniques to execute an attack.
Specific tools are designed for each specific phase in the attack life cycle. Some tools research the target, some gain and maintain access to a system, and others launch an attack. Part of successfully defending a system depends on understanding your opponent’s capabilities.
Attacker Tools
Just a Portion of the Tools Attackers Use to Compromise Systems
Phishing: Email containing malicious files or links to nefarious websites
Denial of Service (DoS): Makes networks or computer resources unavailable
Social Engineering: Used to get privileged information from an insider on the targeted computer system or network
Zero Day Exploits:Takes advantage of vulnerabilities not known to a broad community, and for which no countermeasure or mitigation has been developed
Malware: Malicious software: There are several common classifications of malware.
Backdoor: A method or program for bypassing authentication or obtaining remote access to a computer.
Botnet: A large number of infected computers that generate spam, relay viruses, or execute Denial of Service attacks.
Ransomware: Systems or data are held hostage by a cyber actor until a ransom is paid.
Rootkit: Code that modifies the operating system to maintain privileged access.
Trojan Horse: Malicious software posing as a used full program.
Virus: Parasitic software that copies and inserts itself into a host file or boot sector. They are unintentionally spread by transferring the infected file between computers.
Worm: Malicious software that independently replicates, executes, and travels across the network without a host program.
Evolving Attack Tools
Many modern ICS threats and exploits are due to the rapid research advancement of more complex attack techniques. There is growth in the number of activities correlating to the system attack life cycle, such as:
Reconnaissance/targeting (Cynsys/EternalBlue/Shodan)
Vulnerability assessment (Sniper/Ettercap)
Attack/penetration (Metasploit, Gleg Agora, Nessus Scripts, Immunity Canvas)
Adversary and Research Capabilities
Now let’s look at adversary trends as the global interest in ICS security increases.
There has been an increase in ICS-specific presentations at conferences worldwide. There has also been an increase in collaboration within news groups specific to the cyber underground, along with more research and publication relating to ICS vulnerabilities.
The interest in ICS cybersecurity has grown tremendously due to several factors; including Stuxnet, an increase in open-source incident reporting, and the number of vulnerabilities being disclosed for coordinated research efforts. Overall, ICS cybersecurity is still fairly immature and is an attractive domain for researchers of all types.
Interest in ICS cybersecurity is driven by:
New independent research
Increasing number of disclosed vulnerabilities
Asset owner requirements
Vendor market differentiators
More understanding about influence of ICS on critical infrastructure
Increase in incident reports
Increase in easy-to-use attack tools
Control System Vulnerabilities
Because we are dealing with industrial automation, control system vulnerability discussions cannot take place without considering the consequences and impact on critical infrastructure. This makes ICS targets appealing to a broad audience and will attract interest from adversaries in all threat actor groups.
Finally, there is a notable increase in the interest in ICS cybersecurity because asset owners are introducing training and compliance efforts to their personnel. This, in turn, drives demand for briefings, conferences, and academic activities—all of which create literature that is available to the community at large.
Vulnerability
We will examine some of the current trends in cybersecurity vulnerabilities contributing directly to cyber risk in ICS. To understand how to protect ICS, we must understand the core elements that can open ICS to cyber threats.
We will focus on the more common vulnerabilities, especially those addressed by using contemporary information technology (IT) security solutions. Our goal is to identify the root causes and their associated countermeasures that can be used to protect control systems. Understanding the root causes of vulnerabilities has a significant impact on our ability to build mitigation strategies, as well as create defenses to protect ICS.
Vulnerabilities are a significant component of risk. They are weaknesses or inadequacies in a system that, if exploited by an attacker, could cause harm or damage to the system.
Identify the Vulnerable Elements in an ICS
What is Vulnerable?
Previously, ICS interconnections focused primarily on enabling communications between the processor and controller, and were isolated. The introduction of modern information technologies and operating systems into the ICS environment has provided much needed system integration capabilities, but at the cost of exposing ICS to security threats previously known only to IT systems.
Consider a simple situation where a globally popular operating system is used as the foundation for deploying mission-critical ICS software applications. Each and every time a vulnerability is found in that operating system, the ICS using it is also vulnerable. As more ICS migrate toward using ubiquitous IT solutions, the risk of having critical infrastructure systems vulnerable to the same types of threats as IT systems rises.
Open Systems Interconnection
Well-known vulnerabilities exist in each of the seven layers of the Open Systems Interconnection (OSI) model. As ICS migrates away from traditional and proprietary protocols, the use of new and efficient operating systems can make them vulnerable to attacks.
The vulnerability lies with the transparency of the OSI model, as opposed to the obscure proprietary ICS frameworks known only to those who had designed them, or to those with specialized training.
As we leverage well-known IT communications protocols (which can be insecure) and use them in a manner that takes advantage of current networking functionality (which can also be insecure), we expose critical infrastructure systems to attack.
ICS Attack Targets
Any hardware or software processing, storing, or transmitting information digitally is vulnerable to cyberattack. Whether the system can be compromised is dependent on whether the vulnerability is exploitable. In other words, any system can be attacked, but not every attack will be successful. In control system environments, several types of digital assets can be targeted by a cyber adversary:
Networking devices
Programmable logic controllers (PLCs)
Remote terminal units (RTUs)
Human-machine interface (HMI) workstations
Data acquisition servers and historians
Engineering workstations
Remote access devices
Authentication and authorization servers
Even integrated safety systems could be vulnerable, and there is risk if connected directly to the control system network.
There are many pathways to communicate with an ICS network and its components using a variety of computing and communications equipment. These pathways can be used by anyone knowledgeable in process equipment, networks, operating systems, and software applications to gain access to the ICS.
Common ICS Vulnerabilities
Common Cybersecurity Vulnerabilities in Industrial Control Systems
Top Vulnerabilities
The vulnerabilities listed below are repeatedly seen during site assessments, CISA Incident Response Investigations, and CSET® assessments.
Credentials management: Includes weak password policies (no passwords, no enforcement of strong passwords, and use of default user names and passwords) and insufficiently protected passwords.
Network design weakness: No security perimeter defined and lack of network segmentation.
Lack of formal documentation: No security policy and procedures, and poor security documentation maintenance.
Weak firewall rules: Firewall bypassed, firewall rules not tailored to ICS traffic, and specific ports on host not restricted to allowable IP address.
Audit and accountability: Lack of security audits and logging.
Permissions and privilege access control: Improper user permissions, open-network shares, and poor security configuration.
Other ICS Vulnerabilities
CISA assessments have found vulnerabilities specific to control systems can often be tied to their supporting IT environments. The following issues could be mitigated by implementing contemporary security solutions, or with support from ICS vendors. provide more secure ICS solutions:
Plain text traffic and open protocols
System susceptible to denial of service
Susceptible to buffer overflows (stack and heap)
Use of weak or known passwords
Absence of embedded countermeasures
Dependence on underlying operating system
Advanced features expand vulnerability landscape (New features added without hardening for cybersecurity makes easy to compromise)
Contemporary IT countermeasures not always best fit (IT countermeasures are different and can be potientially used in OT. However, they need to be regressively tested before deployment)
Poor Code Quality:
Use of potentially dangerous functions in proprietary ICS applications.
Vulnerable Web Services:
Poor authentication
Directory traversal enabled
Unauthenticated access to Web server
SQL injection
Poor Network Protocol Implementations:
Lack of input validation: buffer overflow and lack of bounds checking in control system service
Weak or no authentication
Control system protocol using weak integrity checks
ICS product relies on standard IT protocol using weak or poorly implemented encryption
Poor Patch Management:
Unpatched or old versions of third-party software incorporated into ICS software
Unpatched operating systems
Older operating systems
Weak Authentication:
ICS incorporates standard IT protocol using weak or poorly implemented encryption
Use of standard IT protocol with cleartext authentication
Client-side enforcement of server-side security
Improper security configuration
No password required
Weak passwords and requirements
User names and passwords printed in user manuals
Least User Privileges Violation:
Unauthorized directory traversal allowed
Services running with unnecessary privileges
Information Disclosure:
Unencrypted proprietary control system protocol communication
Unencrypted nonproprietary control system protocol communication
Unencrypted services common in IT systems
Open network shares on control system hosts
Weak protection of user credentials
Information leak through unsecured services
Network Design:
Lack of network segmentation
Firewall bypassed
Network Component Configurations:
Access to specific ports on host not restricted to required IP addresses
Port security not implemented on network equipment
Poor Code Quality
One of the primary causes of security vulnerabilities in control system software and firmware is the use of poor programming techniques. Most developers do not intentionally write code with security flaws. However, ICS are developed with a focus on system availability and resiliency. When the requirement for high availability is the top priority, security is often not considered during the development lifecycle.
Code issues are often exacerbated in control systems that may be decades old, and running code that has not been updated since its installation. Changing coding practices or rewriting the source code for a flagship product can be expensive for vendors and customers, and applying patches in an operational environment is often difficult.
ICS owners should request that vendors certify their developers are trained in, and use, secure coding practices as part of their quality control process. ICS owners should also ensure they create the necessary communication paths needed to quickly learn of any code-based security problems, and to receive and deploy patches in an effective way.
Increasing cybersecurity awareness and its importance in the system and software development life cycle helps asset owners and vendors understand the need to proactively build security into the system, which will significantly increase the security of their ICS products.
Network Design Vulnerabilities
ICS networks are typically designed to support real-time data communications. It is common for security best practices to be excluded when designing ICS architectures. The network infrastructure environment within the ICS is usually developed and modified based on business and operational requirements, with little consideration for the potential security impacts of the changes.
Some ICS network architectures use flat networks with no zones, no port security, and weak enforcement of remote access policies.
To compound this problem, ICS networks may be directly connected to corporate environments without firewalls and zones, or allow direct connections to the Internet. Over time, security gaps may have been inadvertently introduced. Without remediation, these gaps introduce vulnerabilities into the ICS.
Flat Network Definition: A computer network design approach that aims to reduce cost, maintenance and administration. Flat networks are designed to reduce the number of routers and switches on a computer network by connecting the devices to a single switch instead of separate switches, or by using network hubs rather than switches to connect devices to each other.
Web-Based and Remote Service Vulnerabilities
Improvements and modifications to control system functionality are usually the result of customer demand. Embedded Web services, remote diagnostic tools, reporting features, and other value-added capabilities traditionally not used in control system solutions are increasingly being seen in the field. While Web-based and remote services allow ICS operators to more efficiently manage, monitor, and control the systems, this approach may introduce significant security vulnerabilities into the control system architecture.
For example of this, many control system vendors meet the demands of their customers by integrating easy-to-use interfaces for managing equipment. An example is incorporating simple and inexpensive Web services directly into their field devices. This allows operators to control and administer critical equipment from anywhere through a Web or Internet browser. Without a proper security analysis of that Web interface, it could be used as an attack vector into field equipment.
Vulnerabilities unique to such remote services are now built into many control system vendor solutions. If exploited, such vulnerabilities can reveal significant information to an attacker or provide them with access to the device itself.
Vulnerability Reports
Weak firewall rules, poor network design, and lack of event monitoring are prevalent vulnerabilities in the way owners and operators design, implement, configure, and maintain their ICS. These three weaknesses point to an underlying problem that ICS networks are often designed for availability and optimization, rather than security.
Some owners do not have written cybersecurity policies and procedures for their ICS. Effective and comprehensive policies and procedures are the foundation of a solid cybersecurity program.
These are prevalent issues found in many assessments conducted by CISA.
Discuss the Factors Contributing to ICS Vulnerabilities
Vulnerability Factors
The convergence of IT and ICS creates new pathways that can be used to exploit a large number of cyber vulnerabilities. For instance:
Requirements for rapid information exchange, such as data moving from plant operations to executive decision makers, can limit the effectiveness of adequate defense-in-depth strategies or recommended best practices for ICS cybersecurity.
Requirements for near real-time accessibility to critical operations may facilitate the use of remote access solutions focusing on availability and not necessarily security.
To optimize performance and resiliency, asset owners often provide direct access to the operational environment for their vendors and integrators, thus opening possible channels of attack.
Pressure from Industry:
Other factors include industry pressure to downsize, streamline, automate, and cut costs to maintain profit margins–resulting in even more interconnections.
These interconnections are often not well secured because the trust relationship between the IT and ICS networks could be taken advantage of by an opportunistic attacker.
Technology Architecture
Most new control systems have been migrated away from using serial-based communications and toward Ethernet-based networking technology to provide a more agile communications infrastructure. The use of Transmission Control Protocol Internet Protocol (TCP/IP) has proved beneficial to both vendors and asset owners in terms of network management. However, the ability to spoof IP packets and the fact that IPv4 does not check the validity of the source address and source port in a packet’s headers is one of the primary vulnerabilities in the TCP/IP protocol suite. As more ICS use standard communication protocols, the potential for an IP-based compromise escalates.
Many corporations have begun using cloud technology within their enterprises. A compromise of one of these integrated systems would not only have a negative impact on that member of the shared services, but an exploit could expand throughout the entire information infrastructure, including the interconnected ICS infrastructure.
Remote Access
Remote access has become a key component in supporting isolated field devices in geographically challenging locations, helping organizations control costs associated with maintaining disparate, isolated systems. The benefits of using remote access technologies is well established. However, providing these services in an ICS environment has changed the ICS security perimeter from the fence to the world.
Even the use of virtual private networks (VPNs), which are considered an effective countermeasure to many remote access threats, often allows full access to the internal network. VPN traffic is usually invisible to intrusion detection system (IDS) monitoring. Using firewalls and other perimeter controls creates a barrier for traffic flowing from one security zone to another. However, a VPN may provide a way around them. An internal system might communicate over a private network to a hostile system on the outside, allowing an external attack to succeed.
Describe the Root Causes of ICS Cyber Vulnerabilities
Root Causes of Vulnerabilities
Several root causes can be identified as significant contributors to control system cybersecurity vulnerabilities.
Legacy Control Systems
Migration to Information Technology (Platform and Network Vulnerabilities)
Cybersecurity Culture
Device Programming
Connectivity and Network Architecture
Legacy Control Systems
Because replacing an aging control system can be expensive and disruptive to operations, the life cycle of many control systems is 15 years or longer. These legacy systems are not designed to provide protection from modern-day attacks, or may not be updated to provide the protective mechanisms developed since being placed in service. Ongoing assessments, independent cybersecurity research, and self-disclosure from vendors suggest there are inherent security vulnerabilities in ICS that are residual from past engineering and development activities.
These did not start as vulnerabilities, as they were originally system features designed to facilitate the efficient and safe operation of mission-critical systems required to be available all the time (high availability). They were also designed to be used on systems isolated from untrusted networks.
Today, many of these same features could be used to seriously damage the system if used by operators who have become disgruntled, or if an adversary or attacker is able to acquire the role of the authorized user.
Legacy Systems - Compare and Contrast
This table allows us to compare and contrast some legacy control system features.
System Failure |
Historical Justification |
Contemporary Impact |
---|---|---|
Plain text traffic |
Easy integration of disparate solutions |
Traffic analysis with malicious intent |
Expedites unauthorized system penetration |
||
No least privilege restrictions in applications |
Operators require complete system control |
Expedites escalation of unauthorized access |
No authentication of new applications |
Applications installed locally by trusted entity |
Easier to run malware and exploit code |
No check for integrity of HMI data |
Data communications on isolated network |
Spoof operator console and operational data |
Guaranteed high availability |
Mission criticality requires high availability |
Concession for availability leads to poor security in code |
Easy connectivity to corporate network |
Expedite real-time performance processing |
Creation of various attack vectors |
Legacy Systems: Plain Text Traffic
Few ICS manufacturers and vendors deploy data obfuscation and cryptography to prevent traffic eavesdropping, and plain text protocols are ubiquitous across almost all control systems. ICS protocols were originally designed for use in isolated environments, and because availability was the highest priority, there was no need to defend these systems from data theft. More importantly, plain text makes it easier to integrate disparate systems. As owners, vendors, and integrators push for interoperability, plain text traffic remains common. Plain text protocols are also simpler and faster to troubleshoot than encrypted protocols.
Adversaries with access to control system networks can potentially perform real-time traffic analysis, as well as harvest network traffic for offline security testing. Considering the trust relationships in control system environments (e.g., between operator consoles and field equipment, or database-to-database), an attacker who has captured a plain text password can exploit these relationships by impersonating trusted cyber assets or injecting data into the data stream, causing an undesirable event.
Access to control traffic in plain text allows a threat actor to execute numerous attacks–including denial of service, man-in-the-middle, session hijacking, and other network-based attacks, ultimately impacting integrity and availability.
Legacy Systems: Hard-coded/easy passwords
Password management is a fundamental component of any security program. However, few ICS operators have provisioned their systems with unique passwords supported by robust security policies, such as routine password changes—especially default passwords. Because ICS are always on, most ICS asset owners use an easily remembered, shared password for all operators; or the default passwords are never changed after installation. While this ensures operators can quickly access the system, it also makes it easy for an attacker to do the same.
Some vendors have designed their systems with hard-coded or unchangeable passwords. Hard-coded passwords are used internally by ICS programs needing authorization to communicate with other computer resources, such as databases, or are used to simplify software installations and program configurations.
For example, initial authentication credentials are exchanged between ICS historians using hard-coded passwords. It is trivial to discover most hard-coded passwords: they are passed in plain text across the network, or openly published in equipment manuals or the vendor website. Advanced malware has been developed to exploit hard-coded passwords in conjunction with other vulnerabilities, leaving systems using them at risk.
Legacy Systems: No Least Privilege
Coding methods previously used by ICS vendors emphasized availability because these systems were historically isolated. Availability, not security, was important for the associated applications and system, so they were run with unlimited privileges. This essentially gave operators complete administrative control of the system.
When the system is operating with administrator-level privileges, both vital and non-vital applications are running with a high level of authority. If threat actors compromise the system, they have administrative privileges to control and damage the applications and processes.
Many processes found in ICS do not need to run with unbounded privileges. Applying the principle of least privilege means running systems, processes, and applications with the minimal amount of authority needed, thus restricting the level of system access should the system be compromised. Typically, this is accomplished by having the user log into the system with a user level vs. administrator-level account, restricting which permissions are available to the user.
Legacy Systems: No Authentication
System functionality facilitating the addition of new applications without security checks is a common problem. ICS downtimes are few, and it is critical that local and trusted entities be allowed to install applications without delay.
However, as requirements and capabilities for control systems have matured, new and complicated third-party applications are integrated into the control systems, and not all can be trusted.
The malware group, Havex, replaced the normal installation files of third-party software with tainted copies. They surreptitiously installed a remote access trojan (RAT) on the computers of targeted companies through the ICS used to automate everything from switches in electrical substations, to sensitive equipment in nuclear power plants.
Legacy Systems: No Check
Data integrity checks ensure the ICS information an operator is monitoring is correct and has not been modified. When ICS were isolated with limited connectivity, there was little doubt the readings were accurate. As more ICS become interconnected, risks of critical operation data being altered increases.
For example, if an HMI is compromised, it could indicate to the ICS operator that a critical valve is closed, when it is actually open. The false information displayed by the system could cause a catastrophic incident. Ensuring data integrity in the HMI is of vital importance, especially as ICS are becoming more interconnected, escalating the risk of compromise.
Legacy Systems: Guaranteed High Availability
The primary focus for ICS, especially mission-critical ICS applications, is high availability. Unfortunately, coding a system for high availability differs from one designed to be secure. Designing high-availability systems often creates vulnerabilities easily discovered by an attacker.
Vendors do not want security vulnerabilities in their products any more than users and asset owners do. Yet a vendor may be slow to fix a vulnerability because of the level of effort required. As a result, the application may be vulnerable when the system or application is not designed to, for example, check for abnormal data inputs, which can be exploited using attacks such as denial of service, buffer overflow, or data injection. The public availability of easy-to-use exploit code increases the number of potential attackers by including those who are unskilled, thereby increasing the severity of the vulnerability.
Legacy Systems: Easy Connectivity
As corporate entities and peer locations have a need to obtain real-time control system or process information, new methods for exchanging information between a trusted control system and an untrusted enclave are developed. An organization’s ability to obtain real-time data is important, because it provides management with accurate and timely information to make better decisions regarding the operations of their critical infrastructure systems.
While there is no doubt that interconnecting ICS with business systems has improved productivity, every data channel creates a potential vulnerability, increasing the risk to the control system. This includes database sharing, peer-to-peer communications, VPN access, remote vendor access, and any other conduit allowing direct access to mission-critical control system activities.
Often, postmortem analysis of cybersecurity incidents indicates these information-sharing channels are often the vectors used by malware, or to facilitate unauthorized remote access.
Migration to COTS
As computer systems developed for businesses increased in power and speed, the ICS community started adopting commercial-off-the-shelf (COTS) hardware and operating systems for monitoring and controlling industrial processes.
While this migration provided control systems with significantly more functionality at lower costs, it also introduced IT vulnerabilities into the ICS. Many of the vulnerabilities in ICS are similar to IT vulnerabilities. Either the vulnerabilities are related to IT vulnerabilities, or they are related to modern networking weaknesses integrated with control systems.
Consider a globally popular operating system used in mission-critical software applications. For each operating system vulnerability discovered, every ICS using that operating system is also vulnerable to the same weakness.
Migration - Vulnerabilities
Platform Vulnerabilities: IT (and by default ICS) vulnerabilities include hardware, software, firmware, applications, and almost any element correlating to one of the seven layers in the OSI model.
Network Vulnerabilities: Protocols originally developed for industrial automation were intended to be deployed in an isolated environment. They were designed for reliability and ease of use. Many of these protocols were devoid of any inherent security countermeasures because the security of the system was correlated with availability and access to the system, and defended by physical countermeasures.
As traditional control system protocols are modified for use in contemporary networking environments, inherently insecure protocols are simply overlaid onto the ICS communication protocols, which increases ICS cyber risks.
What is a Cybersecurity Culture?
Merriam-Webster defines culture as “a way of thinking, behaving, or working that exists in a place or organization (such as a business).” With this definition in mind, think about the shared attitudes, values, goals, and practices associated with ICS cybersecurity in your organization. What are the shared attitudes, values, goals, and practices associated with ICS cybersecurity of your operators? Vendors? Consultants? System integrators?
Historically, the staff responsible for supporting ICS came from an engineering or Operations and Maintenance (O&M) background. With this background, they understood the process and how to safely and reliably control it. Fifteen years ago, the idea that an outsider would attempt to subvert an ICS was a foreign way of thinking for most control system engineers. The field of ICS cybersecurity was in its infancy and not a familiar skill set.
As ICS architectures transition from proprietary closed architectures to open standards, control system engineers need to develop new IT skills in networking, operating systems, and databases to support these new systems.
Staff with IT backgrounds are often brought in to support the transition, and in some cases, replace control system engineers. However, when this happens, IT-focused personnel may not have the process background possessed by the control system engineers, leading to difficulty maintaining reliability and availability
Cybersecurity Culture
Organizations struggle with how best to protect their ICS, while ensuring safety, availability, and integrity. Should the support staff reside in the ICS division where they can work closely with the ICS staff? Or should this responsibility be handled by the IT department, where they are well versed with best practices and knowledgeable on how to integrate their ICS networks with business networks?
There is also a feeling that successful ICS cybersecurity programs are based solely in technical activities. When the IT department is tasked with securing the control system environment, the solution is usually the deployment of commercial security technologies. However, those technologies have been designed for protecting traditional corporate environments, and are solutions that do not address the requirements of the control system domain.
Multiple threat elements are now combining to significantly increase the ICS threat landscape. Hacktivist groups are acquiring and using specialized search engines to identify Internet-facing control systems, taking advantage of the growing arsenal of exploitation tools developed specifically for ICS.
Asset owners should take these changes in threat landscape seriously, and take immediate defensive action to secure their systems using defense-in-depth principles.
Cybersecurity Culture - Goals
Although many asset owners and vendors have begun to enact cybersecurity practices, policies, and procedures for their control systems, there is no federal mandate to enforce these same requirements across all owners and operators of national critical infrastructures. Developing a strong cybersecurity culture is crucial in helping people protect and defend an ICS.
A cybersecurity culture starts with:
Establishing cybersecurity goals.
Adopting cybersecurity best practices.
Developing, publishing, and enforcing cybersecurity policies and procedures.
Providing continuous cyber-focused training.
Device Programming
The functionality and capability of control system equipment, specifically field devices, have increased tremendously. Many vendors embed additional services in their control systems, including Web, file transfer protocol (FTP), simple network management protocol (SNMP) services, and other types of functionality designed to enhance operations.
While these features provide many easy-to-use services and increased functionality, they also introduce new vectors for an attacker to remotely configure and program these devices, or to modify the firmware. While this is not an issue for all vendors, it does pose a significant cyber risk. These devices are connected directly to ICS components, which monitor and control process equipment, creating a target-rich environment.
Current device programming technologies used include:
Network enabled
Remotely programmable
Onboard I/O servers and Web servers
FTP and SNMP enabled
Physical cybersecurity devices
Next generation directions
Device Programming - Field Device Issues
Historically, field devices have been deployed in a manner that assumes they will be placed in a secure environment, one not allowing for unauthorized access to the device in any way. If there is no risk of unauthorized access, the devices often use protocols devoid of authentication and authorization.
Why would you need authentication and authorization if the only people accessing the device were trusted users? This lack of access security is an artifact from the legacy control system environment where there really wasn’t a need for access across the network. As a consequence, ICS don’t always support access control.
In many architectures, any host can communicate with, and send commands to, any other device, provided they both use the same protocol. The design of some ICS uses a polling system. By design, a device’s onboard processing power may be insufficient to handle large amounts of data.
As a consequence, field devices are vulnerable to catastrophic failures because of data storms, packet flooding, malformed data, or other events caused by regular behaviors. The breakdown of such a critical piece of equipment can result in failure to deliver critical process information and loss of control.
Connectivity and Network Architecture
Engineers can easily bridge ICS networks and business networks with the adoption of IT designs and architectures. While the benefits may be significant, this push to integrate networks also creates one of the most significant sources of ICS vulnerabilities. ICS networks that were previously physically and electronically isolated may now be integrated and connected with other networks, including the Internet. As a result, exploits previously accessible by physical proximity only can now be delivered to a control system from anywhere in the world.
Many asset owners establish an electronic security perimeter around their ICS to protect from a cyberattack. This perimeter creates a trusted environment within the ICS network and protects assets from direct exposure to untrusted domains.
Historically, the level of trust between ICS domains mirrors the trust level between corporate operations and the Internet. However, as corporate requirements for real-time monitoring and analysis of ICS have become more common, the architecture must be designed to support a more trusted relationship.
To facilitate better communications, we see architectures built with a transitive trust between ICS and corporate networks. This relationship creates a need for more robust protection mechanisms and assurance that an attacker who has a presence on the corporate network cannot use this trusted path to access the control system network.
Connectivity - Communication Mediums
Unlike most IT devices, the assets found within control systems (such as RTUs and PLCs) don’t always have cybersecurity protection capabilities.
Modern networks make it possible to access an ICS remotely to troubleshoot and modify programs. However, remote connectivity also makes it possible to attack the control system by intentionally issuing unauthorized set points, modifying the PLC program, and setting or resetting the PLC that creates a denial-of-service condition.
Wireless: Wireless (radio, Wi-Fi, microwave):
Insecure wireless access methods provide a significant avenue of attack for adversaries. While many radio systems use encryption to secure communications, assessments have shown that many COTS solutions provide limited or no security regarding wireless access, or have incorporated security countermeasures that have been publicly broken.
Modems:
These devices may be used when operators require analog wire or wireless communications circuits to transmit data. Faster and more reliable digital and network circuits have replaced analog communications.
Nevertheless, modems are still used in many legacy architectures and contain a number of vulnerabilities, including public dial-up numbers and poor access control, putting the DCS at risk.
Public Networks - The Internet is a public domain, and its open nature makes it the “wild, wild West” of connectivity. Any connection to, or through, public channels puts the entire control system domain at risk from any potential threat in the world. - Corporate networks provide a connection to the Internet to conduct business. This creates a conduit to and from the outside that an attacker can use to launch an attack.
Private Networks (lease lines, private lines, fiber optics):
Physical and electronic access to network infrastructure, especially leased lines, provides a potential vector an attacker could use to compromise a field controller.
These circuits are easy to sabotage. A simple cut makes them unavailable to transmit real-time data.
Connectivity - Trusted Connections
A DCS is not the only system that can be attacked by piercing the electronic perimeter. Any remote access providing engineers and technicians the ability to access the control system from an external network extends the electronic perimeter. Typically, VPN connections (the primary method for establishing remote communications), if properly installed and configured, are often safer than firewall exceptions. However, because the perimeter has been extended to a remote location, the network engineers do not have as much control over the end device as they do with workstations located on company premises.
One final point about trust: just like in the corporate environment, databases and third-party applications (such as document viewers) can be important components in an ICS. However, these third-party applications, if not properly secured and patched, can provide an exploitable vector. Attackers can also exploit the trusted relationship between databases replicating data between the control network and the business network.
Describe Existing DHS Programs Assisting Asset Owners and Vendors in Identifying ICS Vulnerabilities
Questions for Vendors
The Common Vulnerabilities Report also lists vulnerabilities discovered during vendor product assessments.
While ICS owners may not be aware of potential vulnerabilities in their ICS software, they can certainly familiarize themselves with the types of software vulnerabilities discovered in control system software.
They can also use the information to ask vendors what they are doing to prevent poor coding practices that could lead to vulnerabilities in their products.
If customers are knowledgeable about the vulnerabilities in their vendor’s products and demand more secure products, the vendors will respond by putting extra effort into delivering safer products.
You may want to refer to the following documents:
-
What are you doing to eliminate input validation errors such as buffer overflows, OS and SQL command injection, and cross-site scripting?
Have you addressed code quality issues such as implementing secure coding practices, protocol authentication checks, and security reviews of deployed IT software?
Does your system have any hard-coded passwords?
Do you run your servers as a process or a service?
Have you incorporated latest versions of third-party software?
Do you test operating system patches?
Do you provide security patches for your ICS application as vulnerabilities are discovered?
Additional Resources
The National Vulnerability Database (NVD), hosted by the National Institute of Standards and Technology (NIST), is the U.S. government’s repository of standards-based vulnerability management data represented using the Security Content Automation Protocol (SCAP). According to the NIST website, SCAP is a suite of specifications for organizing, expressing, and measuring security-related information in standardized ways. SCAP enables the automation of vulnerability management, security measurement, and compliance.
NVD includes databases of security checklists, security-related software flaws, misconfigurations, product names, and impact metrics. The primary resources for the NVD include:
Common Vulnerability and Exposure (CVE) Flaws
National Checklist Program
Vulnerability Notes
Open Vulnerability Assessment Language (OVAL) Queries
Impact Metrics from Common Vulnerability Scoring System (CVSS)
Common Weakness Enumeration (CWE)
Other simple-to-use programs make it easy for a security analyst or attacker to launch complicated exploits on targeted software. One of these programs is available at Kali.org
Consequences
Worst Case Scenario
The compromise of an organization’s cybersecurity defenses results in the unauthorized operation of an industrial control system and renders safety systems useless. The attacker releases toxic elements into the environment, exposing workers and the general public to a potentially deadly poison. What do you do?
Introduction
Imaginary exercises can provide a realistic environment for organizations to test a crisis response without actually causing harm. The results offer insight into the seriousness of a cybersecurity event, and determine whether you are prepared.
Recall the Risk Equation: Threat x Vulnerability x Consequence = Risk. Two aspects of cyber risk are threats and vulnerabilities. To fully understand how to calculate risk, we need to look at the third component in the risk equation, consequences.
Note
The risk equation should not be taken literally as a mathematical formula, but rather a model to demonstrate a concept.
Control system owners typically calculate the mean time to failure and other associated function-based metrics to proactively mitigate system problems before such problems can cause an undesirable event. When you factor in a potential cyber vulnerability, the impact extends beyond the simple failure of a piece of equipment and can begin to affect intangible elements, such as the integrity and availability of the data.
Consequences, both real and imagined, are usually the most clearly understood risk attribute because they are specifically associated with an asset. Regardless of the sector, most asset owners understand what could happen if a specific piece of technology should fail, and can determine cascading impacts the failure may cause. Asset owners can also calculate the costs associated with a failure, such as system restoration effort and lost production costs.
Using modern-day networking in ICS environments has introduced an element of complexity in system management and security, making it more difficult to differentiate normal control system behaviors and cyberattack indicators.
Consider the potential impact of a successful cyberattack on your ICS. We usually assume such attacks will be more severe if you are manufacturing a toxic chemical than if you are making simple widgets.
A cyberattack resulting in the release of a toxic chemical that kills 10 people is more significant than a cyberattack that temporarily disables the HVAC in a control room. Or is it?
What if the control system in the HVAC scenario operates equipment manufacturing a widget that is a critical component for an automobile braking system, and can only operate correctly when it is kept below a certain temperature?
What if the attack causes the temperature to rise above tolerance levels, causing a manufacturing fault in the widgets, which were later installed in 5,000 cars?
And what if the brakes in those cars failed, resulting in the deaths of 5,000 people?
The consequences of that widget cyberattack scenario are much more significant than the toxic release example. Determining consequences and downstream risks in control systems can get complicated, and influences which security strategies should be used to protect a system. While it is not intuitive an HVAC system can be a high-risk system, it may be.
Questions to Consider
With the growing integration between IT and ICS networks, it is difficult to tell where one system starts and the other ends. Operational priorities between the two systems are no longer as clear cut as they used to be, and those priorities can change based on a number of variables-such as system type, process type, market demand, and even the season.
Consider the following:
Should a business system meet the same availability requirements as an ICS if it feeds critical production data to the ICS?
What about the real-time data fed from an ICS to an organization’s system? Is data integrity a concern?
Are the data from the ICS system used in confidential reports? If so, should the confidentiality of the data be protected as rigorously as a business system?
Business Impacts
Most asset owners understand the potential impacts associated with loss of availability to control systems. The consequences are most often described in terms of monetary loss, but they can also be defined in terms of operational cost and long-term organizational impacts.
Control systems influence overall productivity, and when a control system is not working, asset owners are faced with some difficult questions.
What are the operational impacts/cost of having the ICS down for an extended period of time due to an unplanned outage?
Can operators manually control the process with little or no disruption, or does the loss of an ICS require a process shutdown?
How long does it take to shut down and restart the ICS?
What is the impact of the shutdown on product quality?
How does lack of local availability impact upstream or downstream production?
Is there a potential for equipment damage when the ICS is suddenly rendered unavailable?
The answers to these questions are diverse, and depend on whether the control system is involved in purifying water, refining oil, or making cars. The failure of a control system can result in widespread and cascading business impacts as well. For example, consider a catastrophic failure of the control systems used for managing power grid operations:
What do you think the impact would be on other critical sectors?
Which sector would be the first to be impacted?
What are the potential long-term effects on the organization?
Events That Can Lead to Disruptions
Operational Events
System interruptions fall into two broad categories: operational and non-operational. An event that leads to a system interruption or failure most often impacts the availability of the system; but it can impact system integrity and may also lead to a loss of data confidentiality.
Operational events usually occur within the normal working environment and can be mitigated through procedures, training, better materials, better engineering, and improvement processes, such as Kaizen or Six Sigma.
Hardware Breakdowns: These failures may be due to mechanical or chemical wear and tear, faulty engineering, or the age of many ICS still in use. Depending on the function of the device, a hardware failure can cause the control system to stop, or introduce problems forcing an operator to reduce capacity until the problem is resolved. If the failure impacts a safety system, the breakdown will most likely force a shutdown of the entire process until the hardware is repaired or replaced.
Software Bugs: Whether intentionally introduced into the system or an unintentional flaw, bugs can cause system lockups or unexpected behavior, and could require a system reboot, upgrade, or trigger a cyber investigation. All these activities require the system to be offline, and could require a prolonged interruption, such as when forensic analysis is required.
Human Error: Inadequately trained personnel make more mistakes, leading to system interruptions. Human error can cause both physical and logical damage to the asset, and can slow remediation time when the root cause of the failure is not initially known.
Field Device Malfunctions: There are many reasons for input and output device malfunctions (e.g., instruments, control valves). These include improperly specified materials, hysteresis, or malformed code such as the introduction of a virus or a poorly tested software or firmware patch. The failure of a field device can force a process to stop, or reduce the capability of the system to work at full capacity.
Non-Operational Events
A non-operational event is outside the control of the asset owner or operator. Interruptions can be caused by either intentional or unintentional threat actors, such as natural events, small animals, or hackers.
Natural Events: Events such as tornadoes, earthquakes, or floods can cause interruptions to critical processes and damage equipment.
Small Animals: Animals such as squirrels, raccoons, or birds can make their homes in critical areas and cause problems by chewing wiring or bringing in debris that could damage or disable the ICS.
Hackers: Hackers can install malware, wreaking havoc on control systems and resulting in a DoS condition or a compromise of data integrity. This could cause a system-wide failure and require the system be taken offline for analysis and repair.
Responses to both operational and non-operational events can be anticipated. Incident response plans should be developed before an event occurs, and all individuals with a role in response activities should be trained in what to do. Exercises to test response plans should mimic real-world conditions as closely as possible, and be tailored to the unique threats that apply to the sector, location, and system.
Troubleshooting
As an ICS operator, we are trained to quickly isolate the problem and begin recovery when an event occurs. Due to our system’s high-availability requirements, it’s necessary that we provide a swift operational response. Usually, there is no time to perform a root cause analysis, and it may be difficult to distinguish between normal system anomalies and a possible cyber incident.
Often, we use a system reboot to quickly return to normal operations. However, this approach can destroy logs and other system information critical to a successful forensic analysis of the event. With system resumption a top priority, adding cybersecurity breaches to the list of potential causes adds another dimension to troubleshooting for our operational support staff.
Understanding how a system is supposed to work, and recognizing when it deviates from normal behavior helps us evaluate and troubleshoot ICS problems. Staying current with the cyber vulnerabilities and exploit information provided by Department of Homeland Security’s Cyber + Infrastructure (CISA), vendors, and independent researchers helps us identify potential cyber issues. In addition, incident response planning and training helps ensure we respond appropriately.
Let’s look up a video produced by the Chemical Safety Board regarding an explosion at the BP Texas City Refinery. Even though this incident is fairly old, the lessons learned are just as valid today.
Focus on the points below:
How reliant was the operator on information from the Human-Machine Interface (HMI)?
How important are reliable instruments to control system operations?
How important is good communication between operators?
How important is data integrity to ICS operations?
Lessons learned from the Texas City Refinery incident
Data integrity is crucial to the safety and optimization of a control system, as operators make real-time decisions based on what they perceive to be accurate process data.
The failure of critical field equipment can have a detrimental impact on the perceived functional health of the control system.
When designing control system interfaces, engineers should take into consideration how single points of failure can be avoided. Redundancy in networking, error detection, and alarms should be built in.
In many cases, incorrect data can be just as damaging, and possibly more damaging, than having no data at all.
System operators assume the information provided is correct. In the event the operator is not provided accurate data, secondary safety systems must be properly maintained so the system can either revert to a safe state or shut down.
What Do YOU Think?
How did the HMI prevent the operator from having the information needed to recognize there was a serious issue?
Some data displayed was inaccurate.
Some data displayed was incomplete.
Some data was displayed in separate locations.
To what extent did the instruments and control interfaces contribute to this accident, as opposed to human error or negligence? Were they contributors? Were they the sole contributors?
They were contributors, but not the sole contributors.
What were BP staff (managers, executives, boards of directors) instructed to do by the Chemical Safety Board following the explosion?
Monitor process safety performance using appropriate indicators.
Maintain an open and trusting safety culture where near misses are reported and investigated.
Invest sufficient resources to correct problems.
Carefully manage organizational changes and budget decisions to ensure safety is not compromised.
Analyze and correct the underlying causes of human errors, including fatigue and miscommunication.
Ensure equipment and procedures are maintained and up-to-date.
Loss of View, Control, and Denial of Service
View and Control
In this section, we will analyze events demonstrating loss of control, loss of view, and denial of service (DoS). But before we jump right into the type of losses, you need to understand the basic elements of command and control. We will walk through them first, then talk specifically about the losses.
Field Devices <--> Field Controllers <--> HMI
The above highlights the basic elements of command and control within the control system environment.
Field Devices
Field devices are components such as pumps, valves, or sensors. This equipment measures process parameters, or has the capability to perform a kinetic action to support the process function. The equipment will either observe or adjust the process. The pumps or valves provide inputs to the system, and sensors collect outputs from the system.
Field devices that measure a system’s status include meters, sensors, transmitters, and converters. Field devices are where physical changes are made to the system based on operator commands or instructions provided by the field controllers.
Switches, valves, and circuit breakers are considered field devices and allow physical elements of the system to be controlled and modified. Devices that measure the status of a process work with those devices that perform the process to ensure process control. Specific examples of field devices include devices that mix chemicals, manage the movement of trains, or measure pressure in a pipeline. It is important these devices are able to collect, transmit, and interpret data properly.
If they do not provide or process information in a timely or accurate manner, the operator cannot observe or adjust the function as needed. If the data is inaccurate, the operator may make misinformed decisions, and could make changes to the process that damages the equipment-inadvertently causing a service interruption or compromising the safety of the system.
Field Controllers
Field controllers support the process by exchanging data with the field devices, and influence system operations based on readings from the sensors and changes made to the state of the pumps or valves. The field controllers play a critical role. Not only do they ensure the process information from the field devices is collected, interpreted, and presented to the operators at the Human-Machine Interface (HMI). They also convey the instructions from the HMI to the field equipment.
Field controllers are responsible for collecting, assessing, and processing the information collected from the field devices and for interfacing with the HMI. For large distributed systems, these controllers are responsible for collecting information from hundreds, or even thousands, of field devices.
Terminology used for these components can differ by sector. For example, sectors dispersed over large areas (Communications, Energy, and Water and Wastewater Sectors) often use the term supervisory control and data acquisition system (SCADA).
You may also know these technologies as programmable logic controllers, remote terminal units, and intelligent electronic devices. As ICS technology advances, the definition of these terms becomes increasingly unclear, as many of these devices are being designed to perform multiple functions. However, all aggregate data from a process and can provide input into a process.
Field controllers act as intermediaries between two of the most critical assets in a control system, and their availability is essential. If the field controllers are unavailable, the operator cannot get an accurate reading of the process state or control the system from the HMI.
HMI
The HMI is where the operator views a graphic representation of what is happening within the process, and may influence the process if required. The HMI provides the operator with a real-time or near real-time operational view of a process. It allows the operator to observe the process, watch for events and alarms, and make changes to the system as required. An HMI can be a computer display, a control panel with lights, dials and displays, a tablet or phone application, or any combination of the three.
Command and Control
The speed at which a process operates dictates how the process is controlled. Some processes are slow, do not require rapid status updates, and change at such a rate that a human operator has ample time to correct an issue or modify set points. Examples of slow processes include monitoring the position of a coal train or the water levels in a municipal water reservoir. Other processes work fast, and require technical solutions to help make decisions in milliseconds. Examples include energy management systems for electricity and batch chemical manufacturing.
To meet the needs of these systems, there are two types of command and control configurations: open-loop and closed-loop.
Open-Loop Configuration
In an open-loop configuration, data from a field device is sent to the field controller where it is aggregated, digitized, and sent to the operator for review. In real-time, the operator sees process information and determines whether changes need to be made. If there are alerts or alarms, or some system adjustment is required, the operator is notified and uses the HMI to make the changes. The instructions then move back through the field controllers and into the field device. Generally, open-loop operations are used in large-scale, non-time-sensitive control system operations.
For example, field controllers at a large water reservoir receive values from the field equipment that the water levels are higher than normal. An alarm shows on the HMI and an operator can review weather conditions, expected water consumption for the days ahead, and other factors to either dismiss the alarm or send a command to pump more water out of the reservoir.
Closed-Loop Configuration
In a closed-loop configuration, operations happen quickly and must be continuous. The field controller usually performs the process management instead of the system operator. In a closed-loop configuration, the information from field devices is collected and processed by the field controller. The field controller determines whether the system is operating within normal parameters. If it is normal, no change is necessary. If the parameters are not correct and a modification is needed, the field controller automatically initiates the change. Information is provided to the operator’s HMI, so the operator can observe the process and is aware of notable events, but the field controller has primary control over the process. The operator may have the option to make changes, but is generally functioning as a monitor for an automated process.
For example, a field controller receives information from a field device that a turbine isn’t spinning fast enough to generate the appropriate level of power. The controller sends a signal to the throttle valve to generate more steam, which in turn increases the speed of the turbine. The operator monitoring the HMI can see that the field controller sent the command to the valve and decide whether to manually adjust it or allow the automatic process to continue.
Loss of View
If operators lose the capability to view a process through the HMI, they cannot determine whether the process is functioning properly, or whether it has failed. While well-trained operators can sometimes work around a loss of view, it can cause them to make operational decisions based on inaccurate or missing information. The results of those decisions could be harmful to equipment, people, or the environment.
Most control systems are designed so an event resulting in a loss of view can be compensated for by the safety system. However, malicious attackers could compromise an HMI and purposely cause a loss of view to mask their actions. While the operator cannot monitor the system, the attackers could modify the process, change settings, disable systems, and alter or disable the safety functions.
Some industries are required to report when a loss of view occurs. Asset owners can be fined if they operate without view for an extended period. For example:
A loss of communication between the HMI console and a source for data (field controller or device).
A disruption of electricity or a power surge causing the power to the HMI console to be lost.
A workstation locking the operator out after a certain number of login failures.
An instrument experiencing signal drift or having the signal overwritten. This can be caused by both operational and non-operational events.
Loss of Control
A loss of control means that despite being able to see a process, the operator has no ability to control it. A system can experience a loss of control through a failure in the system, an operational failure such as a valve getting stuck, or through a focused attack.
Examples of loss of control include:
A control loop being stuck in a particular mode that does not allow the operator to override it.
A combination of interlocks that fail and result in cascading failures, where Device A has to be operational for Device B to perform a task. A failure in Device A causes a failure in Device B.
A machine-in-the-middle (MitM) attack that changes the values in a field device.
Denial of Service
A denial-of-service (DoS) condition occurs when a function becomes unusable or unavailable. A resource flood tying up capacity, whether CPU cycles, memory, or bandwidth, will cause a DoS. In ICS, where the availability of system components is critical to operations, any interruption in availability can create undesirable consequences.
A DoS can be caused by intentional or unintentional activities, such as:
A security scan of an ICS network or device (if done improperly) may result in a DoS because it can flood an ICS network with too much traffic and cause devices to shut down or stop communicating.
An ICS server looking for non-existent settings or values may cause the processor to overload, resulting in a DoS on itself.
A program with a memory leak will, over time, eventually use up all available memory and cause a DoS.
Malware that causes perpetual reboots, corrupts the information on the drive, or locks out authorized users will result in the resources not being available in a DoS.
Attack methodologies in IT and ICS
A good defense understands what the offense can do. So, the better you can think like an adversary, the better defenses or security you can set up that are specific to your system.
It is important to know that attack methodologies can include both technical and non-technical methods, and that these methods can be combined to create attacks that are very specific to ICS environments.
To properly defend a control system, it is important to know what specific ICS areas would typically be targeted and how an “attack life cycle” could take shape.
In this section, we would
Describe the anatomy of a cyberattack.
Recognize how attack methods can apply to control systems.
Security Attacks
Before deciding which defense-in-depth protections an organization needs for its ICS environment, it is important to understand the methods malicious actors use to successfully attack these systems.
There is no single attack methodology that is used by all threat actors, as the exploits used tend to be designed to accommodate for the uniqueness of the targets. Each attack will be defined by the desired outcome and the uniqueness of target attributes, and this is especially true for ICS.
However, because all attacks usually share some essential activities, a general process represents key concepts that are common with cyberattacks. Many of the techniques used by threat agents are the same as those used by security professionals to test networks and systems for vulnerabilities, helping determine which defense-in-depth countermeasures to put into place.
It is a constant “cat and mouse” game, and the challenge is to ensure the information and systems accessible to infiltration are constantly monitored and updated to protect against ever-emerging threats.
Anatomy of a Cyberattack
A cyberattack generally follows a process allowing the attacker to perform reconnaissance or discovery of the targeted business, then develop and execute the attack, and finally use the attacker’s command and control presence to extract data and/or achieve the attacker’s goals on the target system. The video below provides an overview of the attack process.
Attack Process
Discovery
Characterize systems
Find weaknesses and vulnerabilities
The goal is to find any way possible to get into the system
A threat agent performs reconnaissance by probing the network perimeter to characterize the system. Specifically, to determine:
if there is a firewall and what type
what types of web or other Internet-facing servers are used
whether there are any opencommuniocation ports
They may also harvest publicly available corporate information (company principal’s names and emaii addresses, photos that may show physical security barriers, support personal names and numbers) to gain any advantage they can for social engineering or email-based attacks.
Attack
Exploit vulnerable people, processes, and components
Once they find a way in, they select an attack method and begin the actual attack
Potential intrustion vectors can range from technical, brute force, hacking using exploit tools, to showing up at a site dressed as a worker
The goal is to exploit any an all vulnerable people, processes, or components to gain entry
Adversaries may have a direct target in mind, or merely wish to deposit code on any available machine in order to maintain a presence on the network or system and allow for future unauthorized access
Intrusion
Data exfiltraiton, denial of service, command and control operations
Once they have found their access point, intruders can accomplish their intent through network intrusion - whether it is data exfiltration, creating a denial of service, or taking over command an cotnrol of the process, system, or entire network
Once they have compromised a system, they may access it multiple times and potentially use it to access other systems. Many intruders leave residual back doors, accounts, or port openings for future or continued access
Discovery Phase
Let’s walk through the attack process more in-depth, starting with discovery. Asset discovery allows a potential intruder to decide what systems to target, gauge how easy it will be to launch an attack on a particular entity, and find weaknesses in the target’s security. This allows threat actors to determine the best method for launching a successful attack.
Discovery is an iterative and continuous process through the entire attack process. While the information obtained during discovery is used by the attacker when making a target selection and choosing attack methods, it also provides an attacker continuous feedback to understand what is and is not working during the attack. If the attacker is not operating in an opportunistic manner, feedback is vital in ensuring the attack is optimized and proceeding as planned.
There are many ways to gather information about a site and its systems. The initial research is generally accomplished remotely so there is no direct contact with the target. This is because the attacker wants to stay away from the actual target itself but collects as much available information as possible.
Physical Surveillance
Physical surveillance methods can range from simply looking at the facility (looking for cameras, locks, fences, guards), to bolder approaches, such as walking into the facility and asking for a tour, or posing as a vendor, maintenance person, or an employee’s visiting friend.
It is amazing how much information one can gather this way. A pile of boxes outside a facility can give a would-be intruder insight into what new equipment the asset owner is installing. Tours can provide a wealth of information, including how well an organization has physically protected a particular site; what systems are on site and where they are located; and how the operations are conducted.
Another way to get information is to befriend an insider. A seemingly innocent conversation about work can allow an intruder to gather valuable information about the control systems at a site and how well the organization protects them.
Open-source Intelligence
Electronic asset discovery methods can range from simply collecting and aggregating seemingly random information accessible through the public domain, to using freely available tools designed to fingerprint systems.
The adversary will use information collected from open sources, public data banks, and other data sources that can provide information to develop a target.
Examples of public domain information could be:
Lists of locations
Vendor partnership announcements
Requests for proposals
Lists of company principals and their contact information
Contact lookup engines
Job openings
Holiday closure calendars.
Public Domain Information Aggregation
Public domain information can provide an intruder with spear-phishing targets and content, times/dates when a plant or office is unattended (and, therefore, not closely monitored), device-specific intelligence, and also gaps in a organization’s skill sets or planned system- or security-focused projects.
When an attacker researches a target, accurate and inaccurate information will be available. The attacker may not always know what is correct, and the age and accuracy of the information must be considered.
For a fee, an attacker can get access to large research and marketing databases to validate or extend information that has been acquired freely, and this is often up-to-date and accurate. However, detailed information about ICS architectures is often harder to acquire using traditional data collection means, and some effort is required to find and validate sensitive ICS data.
Scanning
It is important to map the general network topology with asset IDs, IP addresses, open ports, and services in order to understand the operating systems of all devices on the network.
Threat actors can easily obtain IP address ranges through domain registry searches. Crafting an Internet query designed to return an error message can also yield information about the system (for example, database errors often indicate that someone is using the database).
Many tools exist to map a network, both fee-based and free. As many tools are used by both administrators and attackers alike, the tools may already be installed on the target network for use by the system owner and can easily be leveraged by the attacker. Most discovery tools require access to the network being enumerated, so attackers look to see if these tools are already installed.
Tools can tell an intruder:
what devices, ports, protocols, and services are open
whether a firewall is in place
what type of firewall they use
the name of the network where they keep it.
On an ICS network, attackers often probe for external connections to the ICS through peer or vendor support sites. Because of the prevalence of modems still used to access ICS networks remotely, attackers sometimes perform war dialing. War dialing scans a range of telephone numbers for modem connections. Attackers can use a range of potential telephone exchanges common to an area, or those published on a company website to pinpoint the systems. War dialing was once considered an obsolete discovery method; however, it is seeing a rapid resurgence due to the use of Voice over Internet Protocol (VolP) or legacy systems using dial-up for remote control on many critical systems. Published tools have created software leveraging VolP systems that can war dial up to a thousand numbers hourly.
“War driving,” which is riding or walking around a facility and scanning for unsecured wireless access points, is a combined electronic/physical discovery method that a would-be intruder can also use. Attackers may also look for radio and satellite connections or microwave communication pathways.
High traffic scans, or a system lockup or failure, may be taken as signs to alert operations or network personnel that an attacker is targeting their system.
Tools
Wireshark supports ICS protocols such as Modbus, DNP, and CIP, which can be used for network traffic sniffing on an ICS network; specifically, looking for clear pathways to other systems or usernames and passwords.
Nessus is a tool that can be used to test for vulnerabilities, and is typically used internally to test a organization’s own vulnerabilities. Nessus has modules called ‘plugins’ that allow the user to test for specific types of vulnerabilities on specific types of systems. Once an attacker has enumerated a target network and determined system types and functions, vulnerabilities in those systems can then be tested.
These plugins are derived from publicly known vulnerabilities, and the Nessus tool can be updated at regular intervals to include the tests for the latest vulnerabilities. With the growing interest in ICS cybersecurity, Nessus now includes many control system plugins that are specific to vendor technologies and vulnerabilities.
Scanning a system’s vulnerabilities takes time, and the time required depends on how the tools are configured. More aggressive scans take less time but introduce a tremendous amount of traffic, causing systems to lock up or fail in some cases.
There are many tools and techniques available for gathering and organizing information. Some examples include:
Social Media - Facebook, Instagram, WhatsApp, Tumblr, Twitter, QZone, etc.
Scans.io - Public archive of research data about hosts and sites on the Internet.
Censys.io - Allows users to discover the devices, networks, and infrastructure on the Internet.
Deviceinfo.me - A Web browser security testing, privacy testing, and troubleshooting tool.
robots.txt - Tells search engine crawlers which pages or files the crawler can or cant require from your site.
Google Hacking Database - Index of search queries (called dorks) used to find publicly available information.
Wayback Machine - Digital library of Internet sites and other cultural artifacts.
BuiltWith Technology Lookup - Web technology information profiler tool.
Shodan.io - Search engine for Internet-connected devices.
Maltego - An interactive data mining tool.
OSINT Framework - A collection of open-source intelligence tools to make intel and data collection tasks easier.
Using other search capabilities (such as Google Maps) can enhance the attacker’s understanding of the target system’s location. Because many ICS are physical sites in larger geographical areas, many large ICS assets appear on Google Maps displaying details that can aid an attacker.
ICS vs. IT Discovery
Historically, and unlike IT domains, ICS assets do not manage the introduction of large amounts of non-control system traffic very well. This forces the adversary to adjust the tempo of the exploitation activities. However, the attacker can perform these scans over a long period if required. In fact, it is somewhat normal for an attacker to perform scans over a long period - an adversary may study your organization for a long period before they decide on you.
This brings up a key difference in the discovery between IT and ICS. Although IT networks and systems are typically much larger, the time to scan these systems will be shorter than an ICS. The ICS environment is far more sensitive than the IT environment. The ICS must be operational 24 hours a day, 7 days a week, 365 days a year; but such equipment also tends to be more sensitive to network scans. The IT environment, on the other hand, can be scheduled to run scans in off hours. As a result of using more modern technology, the nature of IT systems are tolerable and are less likely to be affected by large-scale automated scans.
Much of the detailed information required by an attacker may be distributed over a broad range of sources. Once that data has been aggregated, it becomes much more powerful and useful to the attacker.
The process of discovery is similar for both industrial control and IT systems. In both cases, the attacker knows the system to be targeted. If the target is a database server or some other information resource that is commonly found on an IT domain, the attacker likely will research information for IT systems. If the attacker is targeting a control system, they will research specific information about vendors to learn more about that vendors’ products and possible vulnerabilities. This may include information about specific hardware and software involved in process automation. Therefore, vendor materials, manuals, and anything related to the security of that hardware or software may be valuable to an attacker.
Detailed information about how a control system is designed and what vendor solution is used is vital knowledge to an attacker. Vendors and integrators, as part of their business development process, often design case studies for past work performed. Although this material is useful for generating new business and describing capabilities, often the material is so detailed that it can provide an adversary with operational data that would otherwise be unavailable.
Even dated information can be useful for an ICS attack because control systems are often operational for many years. Moreover, to meet increasing customer support demands, vendors openly make available installation guidebooks, deployment procedures, and administrative advice. This type of information is valuable to an attacker because it provides insight to a number of things, including default usernames and passwords, architecture addressing schemes, and other information.
Attack Phase
Once the attacker has determined potential intrusion vectors, they determine what weaknesses and vulnerabilities they will target. Many well-known security challenges inherent in control system environments provide an intruder with a “target-rich environment” from which to choose their intrusion vectors. Learn more about a few of these security challenges below.
Configuration Management
Poor configuration management is one of the most common ways an attacker can find an opening into the control system domain.
General purpose operating system (OS) platforms provide numerous processor and network services that automatically run by default. The result is unmonitored, open ports that are actively executing code, leaving the system vulnerable to network exploits, such as buffer overflows. Leaving these unneeded ports and services active allows easy access for would-be intruders.
Software and Security Updates
Because of the high or constant availability and critical response time requirements inherent in ICS, any change to the system necessitates exhaustive testing for software and security updates. Schedule all patching or update activities far in advance and permit them on a very infrequent basis.
In addition, ICS components may not tolerate security software because of critical timing requirements. Control system components are often so processor-constrained that running security software itself creates unacceptably high delays in response, threatening system stability. The result is outdated OS revision levels and outdated or no malware protection software. Even if antivirus software is up to date and configured for proper execution, ICSs built on standard platforms are vulnerable to newly discovered malware threats that, once again, cannot be patched in a timely fashion.
Protocols
Technology-related vulnerabilities within TCP/IP-based control system environments leave critical networks and systems open to compromise. Examples of vulnerabilities in IT system technologies that could migrate to control system domains include the susceptibility to malicious software (including viruses, worms, and so forth), escalation of privileges through code manipulation, network reconnaissance and data gathering, covert traffic analysis, and unauthorized intrusions into networks, either through or around perimeter defenses. System vulnerabilities also include hostile mobile code such as malicious active content involving JavaScript, applets, Visual Basic (VB) Script, and Active-X. With a successful intrusion into ICS networks come new issues, such as reverse engineering of control system protocols, exploits leveraging vulnerabilities on operator consoles, and unauthorized access into trusted peer networks and remote facilities.
Outdated, inherently insecure protocols, such as FTP and Telnet, are generally used for ICS operations. Personnel often send passwords in cleartext. One standard protocol for data communication between control devices, object linking and embedding (OLE) for process control (OPC), must run without authentication. SCADA and ICS communication protocols for control devices, such as Modbus/TCP, Ethernet/IP and DNP3, do not typically require authentication to remotely execute commands on a control device, and no encryption options are available.
Multi-prolonged Attack
Sometimes threat actors will plan a multi-pronged attack.
For example, an intruder may decide to use a targeted spear-phishing attack to infiltrate the corporate network and use it as a vector into the control system architecture. The intruder may use both war dialing and war driving to find open modem connections and wireless networks connected to ICS, and then exploit common vulnerabilities in the operating systems, applications, and/or databases discovered, using specific exploits tailored to the system under attack.
A quick search on Google and you can see this multi-pronged approach is the norm for cyberattacks.
Vulnerablity Information
There is a large amount of publicly available information on known cyber vulnerabilities, whether they are related to IT, to ICS, or other technologies. The rate of discovery and disclosure of vulnerabilities related to ICS continues to increase, as does the rate of the exploit tools used to take advantage of those vulnerabilities.
In many cases, the time between a vulnerability being announced and the release of a tool to exploit that vulnerability can be hours rather than days. Furthermore, many hacker sites post walkthroughs and videos on how to exploit vulnerabilities as they are published.
In addition, easy to use point-and-click tools, such as Metasploit, assist an attacker in exploiting a vulnerability. Many after-market providers make modules available for use within Metasploit and other tools, making it easy for attackers to be successful.
It is not always an exploitable vulnerability within the ICS application or system that can lead to compromise. In some cases, the operating system the ICS is running on may have a vulnerability that, if exploited, gives the attacker direct control over the control system application. This underscores the importance of applying security patches to core operating systems in addition to patches specific to the control system.
Research suggests that errors in programming create opportunities for attackers to find unknown vulnerabilities. In fact, some of these vulnerabilities are only known to the researcher and no countermeasure has been developed.
These are known as “Zero-Day” vulnerabilities and have not been publicly disclosed or mitigated.
Slow patch management gives an attacker a greater opportunity for an attack. Unlike IT systems, this risk is common for ICS.
In addition, ICS software adds more potential vulnerabilities to be exploited because there can be more ICS applications per server than on an IT system. Another area of concern in the ICS environment is embedded systems. These systems are generally installed and never updated, which leaves them vulnerable to all exploits available for their operating system, applications, and firmware.
A great resource for current activity and alerts is the CISA National Cyber Awareness Systems website.
Intrusion Phase
We now move into the intrusion phase. Once an attacker has exploited a vulnerability to gain access to a target, a mechanism will be created to allow for repeated access in the event the connection is lost.
Maintain Access
Maintaining this access can be accomplished in a number of ways, most of which involve using administrative privileges to create new (stealth) accounts, or creating rogue processes that provide the attacker a “backdoor” into the system.
This can be done at the ICS application level, at the operating system level, or within some process running in the background that may not be related to either the core operating system or the specific control system.
In order to ensure repeatable access, attackers may:
Use open-source rootkits to modify system codes
Install malware on hardware memory
Install malware on the system BIOS.
Each of these can survive a reboot, full system install, or hard drive replacement.
Escalate Access
Many of the techniques used to exploit system vulnerabilities and maintain access will also allow an attacker to gain access to other user accounts on a system (such as a service ID) or to escalate their current privileges.
For example, many Windows systems run certain service IDs or system IDs with local administrator privileges. Attacking applications by using these accounts may give an attacker escalated privileges. Attacking the operating system is one way to gain privileges.
Cross-zone scripting can also escalate privileges and is an example of a technique used to bypass security functions in a browser. There are many examples and techniques associated with privilege escalation. One of the worst consequences is malicious code that patches (“fixes”) systems so that no one else, including an administrator, can delete IDs.
A common operational security error is when organizations do not remove or modify the default user accounts the vendor uses to manage and operate control systems. This provides additional attack vectors that can be used to gain access to resources and escalate privileges.
This vulnerability can be exacerbated when asset owners fail to remove or modify the default user accounts on the actual control system components. These accounts, in combination with shared user IDs that have full administrator access, create a landscape of opportunity for the attacker.
When an attacker has this kind of access to the control system, they can perform numerous unauthorized and undesirable activities, most of which may cause significant impact to an operator’s ability to maintain control system functionality.
Note
Some control systems are shipped with hard-coded passwords that cannot be changed, and therefore must be protected by other means.
Undetected Access
The removal of elements that can lead to discovery increases the attacker’s opportunity for persistence on the target system. Attackers that use extensive avoidance techniques are sometimes referred to as Advanced Persistent Threats (APTs). Many asset owners believe that the complexity of their control systems, combined with the use of obscure protocols, makes it impossible for an attacker to succeed. This perception often results in inadequate protection for the ICS domain.
An attack can easily go undetected in ICS networks that do not have proper security mitigations or techniques to detect and log attack activities. In addition, some ICS assets do not facilitate logging and capturing system irregularities that could be indicative of a cyber incident or attack.
To ensure the longevity of the attack, it is critical that an attacker goes undetected. Although there are some cases where detection does not matter, most of the time detection results in an attack being stopped by the victim, or countermeasures being implemented that greatly reduce the usefulness of the attacker’s success.
Attackers who do not want to be detected will ensure no artifacts or evidence of system penetration are present. They need to constantly ensure their cyber footprints are removed. Minimizing the attacker’s exposure is part of every aspect of the attack.
Covering Tracks
A range of activities can be performed to hide or remove evidence of system compromise.
Deleting and uninstalling tools or code used in the attack. However, an attacker may leave tools or code on the system for future use.
Removing all audit logs from a system. Attackers will attempt to delete all instances or recordings of their actions while in the system. When log files cannot be removed, attackers will update them with appropriate timestamps or modify their contents so they do not appear suspicious.
Using anonymous proxy servers and encrypted connections to hide their attacks. This makes it difficult for administrators and investigators to determine where the attack came from. Modern proxy servers and anonymizers can hide attack origins and even make it look like the attack came from a different region, country, or continent.
Performing attacks in volatile memory ensures the attack will be optimized, and all traces of the attack destroyed if the system is turned off. Although this methodology includes the risk of losing escalated privileges, advanced attackers will ensure they have added system persistence in case they lose their presence because the systems were restarted.
Compromise
Once the control system environment is accessed, the attacker may have a number of options to reach their goal. Depending on the level of access the attacker has and which of the attributes of integrity, availability, and confidentiality are to be attacked, several activities can be performed.
Export the operator console screen: Once the attacker gets a graphic of the operator console screen (also referred to as the HMI), they then know what kind of process with which they are dealing and can develop nefarious plans regarding the type of targeted attack they may launch.
Compromise a trusted partner’s network: An attacker may acquire a trusted connection that has minimal security. Therefore, the attacker could migrate to a partner (vendor, sister utility, etc.).
Change data in the database: Many problems may arise when the attacker changes historical data. Regulatory problems, impacts on how the control system is performing from a safety perspective, or even preventing the company from selling their product are all possibilities.
Insert commands in the application stream: Unexpected actions on the control system may result.
Change the operators display: The operator may take an action based on faulty display information. This is a significantly important consequence as any undesirable kinetic activity in the control system would appear to have been caused by the operator (and not the attacker).
Copy data (set points, formulas, etc.): Corporate espionage or stealing data that could be used to craft or advance a targeted attack on other networks or other companies.
Change configuration in the controllers: This is similar to what Stuxnet did. It modified the controller instruction set (i.e., it changed the command set) used to control the field equipment. This type of attack not only changed how the field devices were performing, it augmented the information sent to the operator’s screen, thus reducing attack exposure and aiding in persistence of the attack.
This last point is important. One of the key differences between attacking IT domains and attacking control systems is that ICS environments are usually monitored by operators. They have an interface that shows the status of the system. Monitoring can be accomplished by visual inspection of graphical user interfaces that provide insight into what the system is doing and how it is performing. It also provides visual and audible alarms to the operator when the system begins to operate abnormally. It is expected that if there is a change in the system it will manifest on the operators display screen. Unlike IT systems, attacks on ICS are not limited to just the technology running the process. They also include elements that hide the attack from the operator.
Application of Attack Methods
This section describes a series of attack methods, many of which intruders have launched against critical infrastructure sectors across the world. We will highlight:
Triton
BlackEnergy
Unauthorized Access Attacks
Database and Unauthorized Access
Operation Dragonfly
Man-in-the-Middle (MitM) Attack
BlackEnergy
BlackEnergy is a crimeware toolkit that has evolved significantly since it emerged in 2007. Recent versions of the toolkit use social engineering to trick a user into opening an email or a document attachment that drops a Trojan or infected legitimate executable file on the target computer, resulting in the installation of a malicious software component.
The malware can infect a system by exploiting a standard feature in Windows that elevates the user privilege of a system file. The elevated privilege essentially allows administrative privilege to a user.
This attack scenario is targeting various vendor-specific Human Machine Interface (HMI) products. Organizations with HMI systems directly connected to the Internet are the most susceptible. The malware is highly modular, and threat actors can customize it for deployment to each victim site. Once an HMI host becomes infected, a command and control component executes to perform additional attack operations and communicate back to the attacker with gathered data. In addition, the malware uses the new information to deliver additional attack modules to search out additional resources and new targets of opportunity within the network.
Operation Dragonfly
A campaign called Operation Dragonfly utilized a multi-pronged intrusion chain to establish a presence on the network, perform reconnaissance, and then established a command-and-control capability to “phone home” to the intruder and allow them to launch other specific attack tools. It employed the “Havex” Remote Access Trojan (RAT) within targeted spear-phishing campaigns against industry asset owners, and it used a watering hole campaign to circumvent the normal practice of users accessing vendor resources.
The next stage in the intrusion involves the enumeration of the asset owners’ OPC servers, specifically targeting a vulnerability in the OPC Classic protocol. This provided the threat actor with the capability of performing deep discovery into the ICS, finding legacy equipment and software and using the information to support the development of additional intrusion vectors.
Over the past several years, more and more organizations have started to use underlying services in these environments, some of them being the OLE, distributed component object model (DCOM), and remote procedure call (RPC).
Vulnerabilities range from simple system enumeration and password vulnerabilities to more complex remote registry tampering and buffer overflow flaws. These vulnerabilities expose many ICS to critical risks such as the installation of undetected malware, denial-of-service (DoS) attacks, escalated privileges on a host, and/or the accidental shutdown of ICS because of an overload flaw.
Man-in-the-Middle Attack
ICS environments have traditionally been considered protected because they are in a completely separate environment from IT systems (“air gapped”). In ICS networks, asset owners often do not secure the data that flow between servers, resources, and devices because they assume the data reside on a “protected” network.
With more and more organizations connecting ICS networks with business systems, security issues arise from this assumed trust, including the ability for an attacker to reroute data in transit on a network, to capture and analyze critical traffic in plaintext format, or the ability to reverse engineer control protocols in order to gain command over control communications. By combining all these, an attacker could assume exceptionally high control over the data flowing in a network, and ultimately direct both real and “spoofed” traffic to network resources in support of the attacker’s desired outcome. To accomplish this, the attacker executes a Man-in-the-Middle (MitM) exploit.
In any environment, MitM exploits are exceptionally dangerous; in the ICS networks, this mode of attack could be catastrophic. Common vulnerabilities in ICS such as weak authentication protocols or poor integrity checking in firmware could be exploited by a MitM exploit.
The management of addresses in a network, be it a control system or a business LAN, is critical to its effective operation. Address Resolution Protocol (ARP) maintains correct routing by mapping network addresses to physical machine addresses (MAC addresses). Using ARP tables in each of the network devices ensures that computers and other devices “know” how to route their traffic when requesting communication. Manipulation (or poisoning) of the ARP tables is the key goal of the threat actor, because poisoning the ARP tables can force the routing of all network traffic (including control traffic) through the computer the attacker has compromised. This forces all resources on the network to “talk” to the rogue device that the intruder is using instead of the proper machine or device without knowing they are communicating with the attacker. Moreover, the intruder can see, capture, replay, and inject data into the network and have it interpreted as though it were authorized and originating from a trusted source.
Assuming a threat actor has gained access onto the control systems network using any of the aforementioned attacks or others, they will use network reconnaissance to determine what resources are available on that network. If the goal of the intrusion is to gain access to and compromise the control domain, the asset owner can capture (sniff) plaintext traffic and take it offline for analysis and review. This allows the intruder to view and re-engineer packet and payload content, modify the instruction set to accommodate the goal of the intrusion, and re-inject the new packet into the network.
By using ARP poisoning to collect traffic, the threat actor can establish and maintain control over the communications within the network. If the intruders choose, they can acquire and analyze unique control system protocols, and can see, capture, and manipulate control data. The time required to reverse engineer key control data and to manipulate that data for nefarious purposes can vary depending on the skill of the threat actor and the complexity of the data. By taking the data offline, the intruder is now able to work at a comfortable tempo.
Summary
An attack life cycle is made up of actions including discovery, attack, and intrusion. No single attack methodology is used by all attackers. An attack is generally designed and customized around the goal the attacker has to accomplish, the capabilities the attacker has, and the accuracy of information about the system. The desired consequence of an attack on an IT system (compromising confidentiality) will differ from ICS (compromising availability).
Attackers research their targets by looking for information anywhere they can find it and using any means they can to get it. Many tools are available that can assist attackers in learning about the target architecture, the vulnerabilities in the targets, and how to exploit those vulnerabilities. Vulnerabilities in ICS aren’t always in the control system applications, but can exist in the underlying operating systems or computers.
Successful exploitation of a vulnerability can result in system access, creation of back doors, and escalation of privileges (administrative or root); all of which allow attackers to maintain and enhance their presence on a system. An important step in a successful attack is the attacker’s ability to cover their tracks in order to remain undetected.
As ICS grow in complexity and connect to business and external networks, the number of potential security issues and their associated risks grows as well. The wide variety of attack vectors that target multiple resources on control systems can give rise to asynchronous attacks over an extended period of time and could target multiple weaknesses within a control system environment. Organizations cannot depend on a single countermeasure to mitigate all security issues. In order to effectively protect ICS from cyber-based attacks, organizations must apply multiple countermeasures—thus reducing risk using an aggregate of security mitigation techniques.
Remember, the better you can think like an adversary, the better you will be at designing security defenses specific to your system.
Mapping IT Defense-in-Depth Security Solutions to ICS
In defense of Industrial Control Systems (ICS), we work to mitigate vulnerabilities found across multiple sectors. Unfortunately, pronouncing your system has been properly secured against cyber threats requires more than just a few simple steps. Instead, successfully securing your ICS is a process that requires planning, designing, developing, implementing, and testing recognized security measures that eliminate or mitigate known vulnerabilities.
It also requires the development and enforcement of security policies and procedures, as well as an ongoing commitment of continuous review and improvement of our security infrastructure. In other words, it requires a security culture.
The same strategy used to secure your IT systems is recommended for securing your ICS. That strategy is called defense-in-depth.
Define defense-in-depth.
Create a baseline for defending an ICS.
Describe the security management layer of defense.
Describe the physical security layer of defense.
Security by obscurity
Before we delve into the details of defense-in-depth, let’s briefly discuss one security strategy that does not work: Security by Obscurity.
Some control system engineers and administrators feel their proprietary control systems are immune from cyber attacks. They think because their systems are unique, only highly-trained technicians who understand their systems’ obscure protocols, networks, or operating systems would have the skills to compromise their ICS.
This is not the case. Security by obscurity is a false sense of security. Typically, these systems were not designed with security in mind and are riddled with vulnerabilities. Furthermore, with the Internet, attackers have a wealth of information instantly available to them, even on rarely used protocols or operating systems.
Attackers can easily use this information to exploit these older systems. If the open-source information is not available, the network protocols are not overly complicated, making it realistic to reverse engineer them. As asset owners replace their older systems with newer IT-based system, they no longer have the pretense of hiding behind security by obscurity. The new systems use well-known and documented protocols, networks, databases, and operating systems.
Define defense-in-depth
Defense-in-Depth
It has been demonstrated time and time again that there is no silver bullet in cybersecurity. There is not just one magical thing you can do to secure your system and consider yourself safe. Rather, you need to develop multiple layers of defense, or “defense-in-depth.”
The Department of Homeland Security (DHS) document Recommended Practice: Improving Industrial Control System Cybersecurity with Defense-in-Depth Strategies defines defense-in-depth as: “a holistic approach—one that uses specific countermeasures implemented in layers to create an aggregated, risk-based security posture helps to defend against cybersecurity threats and vulnerabilities that could affect these systems.
No One Solution
Defense-in-depth was originally a military technique intended to delay, slow, and discourage an attacker. Using countermeasures to protect vital assets, the defense-in-depth strategy attempts to create specific protection mechanisms of identified areas that are known to be weak, vulnerable, and easy to attack.
For defense-in-depth to work, it must include people, technology and operations.
People must be trained and aware of the security environment and reduce at-risk behavior to an acceptable level.
Appropriate technologies should be implemented and adapted to particular needs of the system being protected.
How the operations work within the organization or group should be looked at to ensure at-risk operations are minimized.
If any of the three are left out, a gap will result that weakens the defense.
Layered Approach
Defense-in-depth is a layered approach to defending an ICS. It requires developing defenses for all systems and subsystems in an ICS, including security management, physical security, network security, hardware security, and software security.
Doing one without the others will offer some benefits, but the most synergetic solution will come from doing them all. There are five different layers:
Security Management
Physical Security
Network Security,
Hardware Security, and
Software Security.
Create a baseline for defending ICS
Plan the Work
A defensive plan is a process - not a single event. Constant improvement is required because attack vectors keep changing each time a vulnerability is discovered. The process includes the following:
Create or recognize your baseline and audit what you have. Documenting it will establish that baseline and allow you to show progress when you make changes.
Understand your policies. Policies are vital. They need to be written down and formalized. Too often we find companies that have “unwritten” policies.
Implement your polices in your ICS environment. Employees and contractors need to be formally trained as to what the policies are.
Identify your ICS network components. Try to identify all the devices on your network, look for those forgotten devices such as modems that were used during commissioning and left in place (just in case) and then forgotten.
Compare what you identified to the baseline. Use change control to update your network maps. Identify improvements to your network. Look for the vulnerabilities in your system, what needs to be upgraded, what needs to be patched, what changes need to made into policies, etc.
Prioritize and implement those changes.
The final step is to go back to the beginning and start all over again, re-establishing your baseline.
Baseline
A great place to start to develop your baseline is with a software tool that the Department of Homeland Security (DHS) has developed, called the Cyber Security Evaluation Tool (CSET®).
DHS wanted to have some way to assess critical infrastructure and the security posture of the control systems at critical infrastructure facilities. This tool was envisioned as the cornerstone of building that capability. The intent is to provide an evaluation that could be used as a baseline from one year to the next to evaluate if security has improved. The basis for the CSET® tool is derived from currently recognized cybersecurity standards. All requirements in the tool link back to an identified standard. The intent is to collect information that is already available together in one place as a foundation, and then build on it as new standards are created and old ones enhanced.
The tool uses, for example, NIST, NERC CIP, DoD, and international standards. Although not all included standards are directly related to control systems (e.g., NIST SP 800-53, CNSSI 1253), many facilities are required to comply with them, and thus, they are included in the CSET® too
CSET
The CSET® tool is designed to assist in evaluating your control system network. The tool can provide solutions, recommendations, and best practices, as well as provide focus in specific standards and components. For example, the updated CSET® solution library contains firewall configuration whitepapers. The CSET® tool provides a baseline security posture that allows comparison between years. However, the tool cannot scan networks and provide a detailed topology analysis.
As with any tool, there are limitations as to what CSET® can do. The most significant limitation of the tool is that it’s only as good as the responses that go into it. The tool cannot fix the gaps it identifies. It is the responsibility of the users to implement solutions.
CSET® will also not indicate whether your answers are correct or incorrect. Nor will it verify compliance with security policies and procedures, or ensure correct implementation of security enhancements.
The Results dashboard displays four charts:
The Assessment Compliance chart indicates the assessments overall score, standards-based score, and components-based score.
The Top Ranked Categories chart lists the six highest priority categories based on answers provided during the assessment.
The Standards and Component Summary charts depict how questions were answered by Standard and Diagram Components.
Security Management Layer
Security Management
There is sometimes a mindset that “throwing technology” at ICS vulnerabilities is the solution for reducing ICS risk. While technology is certainly an important component for improving ICS cybersecurity, technology alone will not solve the problem.
Improving cybersecurity requires a sustained effort by management and staff to diligently apply sound cybersecurity practices during the entire life cycle of the ICS - from design, implementation, and operation, to the retirement of the ICS.
This requires establishing a cybersecurity culture in the organization and recognizing that people are crucial in defending and protecting ICS from a cyber attack. It also requires shifting from a reactive mode, which is dealing with cybersecurity issues after they occur, to a proactive mode, which tries to prevent and minimize the impact of a cyber attack before it happens.
Security Model
As part of becoming proactive, the asset owner will need to develop a plan for ICS cybersecurity. The graphic below shows a good example of a security model that can serve as a template for developing such a plan. If you look closely at the model, you will see that it is comprised of several key security strategies. The model depicts security as a process, not a product or even a project. Once again, ICS cybersecurity requires a sustained effort by an organization’s staff.
Map Architecture: The first place to start is mapping the ICS architectures. Having an accurate and well-documented architecture can help an organization deploy effective security countermeasures. Engineers and technicians cannot protect the system if they do not have a good understanding of the architecture and key system components. The documentation from mapping the architecture will be crucial when developing the risk assessment.
Risk Assessment: A process of identifying the threat, determining the likelihood and impact of the threat, and identifying the vulnerabilities that could be exploited by the threat. An effective risk assessment involves learning as much about the system, its threats, vulnerabilities, and impacts as possible.
Digital Asset ID: Identifying digital assets is more than tracking the physical components of the ICS. It also includes assigning values to individual ICS components. For instance, a programmable logic controller (PLC) that monitors and controls a critical pump has a greater value than a PLC that monitors a noncritical tank.
Profile Model: Defines what type of protection is needed for each ICS component. This indicates the level of effort needed to secure a device. Coupling this information with the component’s digital asset value provides an effective method for determining the priorities when allocating limited resources.
Identify / Remove Vulnerabilities: With all components identified, the organization has a better understanding of their specific threats, vulnerabilities, and consequences, and has established priorities. It is now time to remove or mitigate the vulnerabilities found in the hardware, software, and network.
Standardize Policies: Following the mitigation of vulnerabilities, policies and procedures must be developed and enforced to ensure future modifications or newly discovered vulnerabilities or threats do not degrade the system security. Policies are directives, regulations, rules, and practices on how an organization will protect its cyber resources. Procedures provide the details or steps for achieving the goals outlined in the policies.
Incident Response: Planning should also include details on how the organization will respond to an incident if attacked. This means developing the policies and procedures for responding to an attack. This also means having the tools in place for detecting an attack, conducting forensics, troubleshooting, and restoring the system. All stakeholders should have a clear understanding on what they need to do and who they need to notify if attacked. This also includes state and federal agencies.
Training: Ongoing cybersecurity training is essential for protecting an ICS. Users need to understand how their behaviors and actions increase or decrease cyber risk. They should be fully trained on the cybersecurity policies and procedures they must follow when using the system. System administrators and network engineers also need continuous training to keep current with the latest best practices for securing their ICS.
Security Policy
Policies and procedures form the foundation for a solid ICS cyber defense and are essential parts of a defense-in-depth strategy. Making it harder to gain access includes establishing the correct policies based on the platform you are using and the organizational structure you have. For example, users should know how organizational polices determine the setup of network domains and how those domains are managed by the active directory. The following is a summary of the benefits of developing, maintaining, and enforcing policies and procedures:
Establishes expectations for staff, vendors, and contractors on how they should interact with the system
Codifies enforcement of policy breaches
Provides order and organization for routine and nonroutine events
Prevents “reinventing the wheel” when modifying or updating the system
Documents best practices that have been developed over time
Reduces panic and confusion during a crisis
Decreases the time to restore the system following a crippling event
Improves the overall cybersecurity and reliability of an ICS.
While policies and procedures have been widely adopted by IT organizations, ICS groups have been slow to develop policies and procedures for their ICS. There are many reasons for this, including a lack of resources, not knowing where to begin, plus a feeling policies and procedures are too restrictive and bureaucratic.
Recommendations
It is recommended that policies and procedures cover a variety of topics. Some, such as change management, access control, malware prevention, and incident response. Information on other recommended topics, such as acceptable use policy, Internet policies and practices, and hiring qualifications (verify background) can be viewed by selecting associated buttons.
Acceptable use policy: Acceptable use policies define what users and administrators are allowed and not allowed to do with the computers and networks. They also define penalties or sanctions if the terms of the policy are violated. Typically, a user must sign the document before they are allowed access to the company’s networks.
Internet policies and practices: Policies and practices for accessing the Internet from an ICS should be short and sweet: “No accessing the Internet from an ICS.” The risk of downloading malware from the Internet is too great to let operators browse the Internet from an HMI workstation. If they need access to the Internet, a separate computer on a separate network should be provided specifically for this purpose.
Hiring qualifications: Organizations should have procedures for verifying backgrounds of candidates for sensitive positions, such as control system engineers and system administrations. Background checks should verify a candidate’s technical qualifications and aptitude for securing an ICS. It should also answer questions regarding their character, such as do they have affiliations with hacking organizations or terrorists groups, and are they honest and dependable?
Acceptable use policy, Internet policies and practices, and hiring qualifications are just a sample of the policies and procedures that an organization should consider for ensuring the security of its ICS.
Other issues are considered as well as including policies and procedures for remote access, third-party support, and software backups. Rather than developing the policies from scratch, you may want to look at the CSET®. It has several templates for generating ICS cybersecurity policies and procedures.
Do not make the policies so complex or onerous that users look for ways to bypass or circumvent the measure that the policy was trying to protect. In other words, forcing users to change a 16-character complex password every week is not a good idea.
Change Management
ICS are not static systems. They are often changed or modified once they are installed. Depending on the change and how the changes are implemented, they can either enhance or reduce the overall cybersecurity of the ICS. Changes are driven by a variety of factors.
Process equipment modification and additions: Examples include a new tank added to increase the storage capability of refined product, or an old, unreliable recirculation pump that is replaced with a newer model.
Technical obsolescence of ICS components: Examples include a vendor discontinued making a remote terminal unit (RTU) that was widely used by the asset owner, or an ICS software application that is no longer supported by the vendor because they went out of business.
Operation changes: An example is dam operations lowering the minimum operating levels on their lakes due to drought conditions.
Maintenance efficiencies: An example is when technicians need reports on pump run times to trigger preventative maintenance.
Security vulnerabilities: An example is when a buffer overflow is discovered in a third-party application that an attacker could exploit for malicious intent.
Software fixes: An example is when an intermittent software glitch, which causes the ICS computer to reboot, is fixed.
Slow response times: An example includes a new version of an ICS application that provided trend and alarm upgrades that enhance the operator’s abilities to control the process.
ICS product enhancements: An example is as more PLCs are added to the system to monitor and control several new facilities, the response times start to drop.
New regulatory requirement: An example is a state mandate that requires water utilities to collect turbidity data every 10 minutes for 3 years.
Common ICS modifications
Process equipment changes can have a significant impact on ICS. Database and alarm points need to be configured, graphic displays and trends need to be built, and RTU or PLC programs must be written to control and monitor the new process. If these activities are not properly completed and tested, they could impact the operator’s ability to control the new, as well as existing, processes. Even simple operational changes often require ICS configuration modifications. For example, a chemical company lowers the operating temperatures of a process to reduce energy cost. As a result, they need to lower the Hi and Hi/Hi-temperature alarms and possible modify the PLC program.
ICS vendors release patches to fix software problems or provide upgrades to extend the functionality of their product. If these patches or upgrades are not tested on our system, they could introduce new cyber vulnerabilities or “break” existing software. PLC and RTU vendors issue firmware upgrades to fix problems. Here again, if the firmware is not fully tested, it could cause compatibility problems with existing software and /or introduce cyber vulnerabilities.
Third-party software, such as operating systems, databases, office applications, PDF readers, and antivirus software, is used extensively in ICS. Like ICS vendors, these third-party vendors also release patches and upgrades for their products. Patching these applications, especially security patches for operating systems, can significantly reduce ICS vulnerabilities, but they also run the risk of breaking existing software.
Most ICS vendors recommend that users not install operating system patches until after the vendor has tested the patches for compatibility with their ICS software. For examples, most vendors recommend that for Windows systems, “Automatic Updates” be turned off.
Network equipment, such as routers and firewalls, may require configuration changes. Firewall rules need to be modified, and routers may need tuning.
In summary, ICS are not static, and there are many components and subcomponents in ICS that are changed during the ICS life cycle. If these changes are not carefully managed, they could introduce vulnerabilities or compatibility issues with existing software. However, avoiding system changes, especially the application of security patches, is not a good strategy for preventing vulnerabilities. If an owner does not apply security patches, the system will become more vulnerable over time as attackers find new methods for exploiting systems.
Change Control
Change control, in addition to policies and procedures, can help avoid some of the pitfalls of modifying ICS. Changes should be managed as a “mini-project,” which includes some, if not all, of the steps of good project management (scope, definition, design, approval, test, implementation, and documentation). Steps for recurring changes should be documented in procedures.
All too often changes are made to ICS “on-the-fly” without peer review and/or managerial authorization. Furthermore, changes are not documented nor are they adequately tested. While this approach is quick and fast, it can lead to short-term problems if the changes are not compatible with existing software, and long-term problems with ongoing maintenance. It will also make it more difficult to identify unauthorized changes, perform forensics, or recover the system if it is attacked.
Following a change control process that is documented can prevent many problems. Depending on the modification, the procedure should specify what, if any, cybersecurity issues the change will have on the system, as well as the roles and responsibilities of those authorized to make and approve the change. Furthermore, the procedure should discuss how the change is implemented, tested, and documented.
These procedures should not be so overly bureaucratic that significantly slow down “real work,” but they should stress the importance of properly documenting the reason for the change and what changes were made.
Example: Changing an Alarm Limit
Scope: A process engineer was given the task to reduce energy at the chemical plant.
Design: They determined the temperature for a given process could be safely lowered without impacting product quality. This effort would require a change to the operating procedures and to the Hi- and Hi/Hi- temperature alarm limt in the ICS.
Approval: The process engineer and operations manger approved the change.
Implementation: The process engineer trained operations staff on the new operating procedure and worked with the control system engineer to modify the alarm limits.
Testing: The procedure was successfully tested with a few minor modifications.
Documentation: Documentation was developed that explained the history and reason for the alarm limit change.
Access Control
Asset owners of ICS should also develop procedures for access control. Access control includes two important concepts: authentication and authorization. Authentication verifies that the person or application is who they say they are, and authorization tells the system what limitations or privileges the user or application has while using computer resources.
Many of the newer ICS, such as Microsoft’s Active Directory, are adopting IT directory services. Directory services are great tools for helping system administrators manage user authentication and authorizations, especially on larger systems. They are used to grant access to a system and its resources, manage user ID, enforce complex passwords and password expirations, and grant and limit privileges from a central location.
Least Privileges is an IT strategy that should be adopted for enhancing ICS cyberseurity. System administration grant users only the rights and privileges necessary for monitoring and controlling the process. If they have the responsibility to perform backups or other system administration functions, they should be given a separate user ID for performing these tasks.
This concept also applies to system administrators. Normally, a system administrator uses an account that has the least privileges necessary for supporting the system. If the system administrator needs to perform a task that requires a higher privilege such as installing new software, they would log on with a separate account that grants them the rights to make those changes. Once this task has been completed, they should log off. Privileges should only be escalated when necessary.
Running systems with least privileges prevents malware from accessing system-level functions that it could use to exploit the system. It also prevents mistakes, such as if an administrator accidentally runs a “dangerous command” in an account that uses least privileges.
Multiple Layers
User ID and passwords are the typical tokens for gaining access to computer resources. Unfortunately, short or dictionary word passwords are easily “cracked” by password recovery tools such as “Cain & Abel.” Strong or complex passwords make this process more difficult.
Ensuring passwords are protected against a brute force attack is similar to using a safe. Every safe is rated for how long it would take a skilled technician to break in. A good security routine ensures that a guard checks the safe within that time frame. Passwords are not different. A password, depending on the complexity allowed, will take x amount of time to crack. Good passwords policies require changing the password before the time limit. This reduces the risk of having the password cracked.
The following are characteristics of a complex password:
At least 8 or more alphanumeric characters
Contains characters from at least 3 of the following 4 categories; uppercase (A-Z), lowercase (a-z), numbers (0-9), and special characters (!@#$%^&*, etc.)
Does not contain a proper name, login ID, email address, initials, first, middle, or last name
Are not words found in a dictionary
Authentication
There are more secure methods for authenticating than a user ID and password. For instance, a FireID is an administrator account that requires two separate passwords for two different people. Both passwords need to be entered to make any changes to the system.
Two-factor authentication also provides a more secure process for authenticating. The method of access control requires that the person provide at least two of the following three tokens before they gain access to the system:
“Something they know” such as a password
“Something they are” such as a retinal or thumb print scan
“Something they have” such as a key or card.
In addition to the topics we just highlighted, policies and procedures should also define who approves access to the system, who creates and maintains user accounts, and who audits user accounts.
Malware Protection
Many large companies have specialized IT groups dedicated to securing their IT infrastructures. These groups are responsible for patching servers and workstations, detecting viruses, locking down servers and workstations by eliminating unnecessary services and applications, monitoring and storing logs, and creating and maintaining workstation and server images.
These same tasks are also essential for securing ICS. However, because of limited resources or lack of knowledge, many of these security functions are not performed, rarely performed, or poorly performed on ICS.
This is where policy and procedures can help. The policies and procedures should stipulate who is responsible for performing the task (patching, virus scanning, etc.), what they need to do, when or how frequently they should perform the task, where the equipment is located, why do it, and how to do it. This approach will significantly help asset owners secure their ICS by adopting practices that IT uses to secure their networks.
Incident Response
Cyber incident response is the way in which an organization responds to a perceived cyber-related incident that may impact ICS owner assets or their ability to operate. An incorrect response may result in chaotic and reactionary actions that are ineffective or increase damage. Every organization should strive for a smooth, planned response with minimal impact to a company’s operations. Accomplishing this will require plans and procedures that are in place and tested before a cyber incident occurs.
The five phases of incident response are shown below.
Preparation: The preparation phase is proactive. This phase includes building a team, planning a response strategy, documenting that strategy, training the team, and gathering intelligence in preparation of an incident taking place.
Identification: Identification starts when an incident is detected. This phase may be difficult and could require the use of forensic tools. Having trained staff and a documented process for saving data will help during this phase.
Containment: Containment will start as soon as the problem has been identified. Ensure you stop information from leaving your network and malware from spreading. Identify ways to quarantine infected computers, allowing the computer to keep the forensics data, but not infect others.
Clean-up and Recovery: Depending on the severity of the incident, the clean-up and recovery phase may be the most difficult phase. In this phase remediation, intrusion clean-up, and system recovery takes place. Having backups could play an important role in the time it takes to fully recover.
Follow-up: The follow-up phase includes discussing lessons learned with the team, updating documentation, and implementing any necessary security initiatives as a result of the incident.
Testing
Although it may be inconvenient and disruptive to plan for, conduct, and evaluate the results from an incident response drill, it is essential considering the stakes involved. Even the best response plan cannot anticipate all the obstacles that will be faced when a real incident happens, nor can it anticipate how people will react to unforeseen situations. The people who were expected to be available and fill certain roles may be inaccessible. New people may have replaced previously trained workers. Unanticipated events may occur where decisions need to be made with little or no time for analysis.
Many problems that may occur in a real incident, may also be present in a test exercise or drill. This means an opportunity is available to review, analyze, and change procedures without suffering the effects of catastrophic decisions or lost production. This is only true, however, if the plan is tested in an environment that closely replicates the production system.
The exercise should mimic real-world conditions as much as is practically possible in order to discover weaknesses in the incident response plan. The closer the exercise is to the actual circumstances of the operating environment, the more problems will be found and resolved before a real event occurs. Actual equipment (test environment) should be used if possible in order to gain accurate insight in to how the incident response plan plays out. This may mean working with a vendor to provide temporary equipment specifically for the exercise.
The more realistic the test, the more planning must be done to ensure the test does not disrupt normal processes. As a result, the cost for a more rigorous test increases. The following are types of exercises used to test an incident response plan:
Structured walk-thru: This type of exercise is also referred to as a table-top exercise. Exercises are typically conducted in a conference room without touching the ICS. The facilitator will pose a series of problems or scenarios called “injects” for participants to discuss. The scenarios are designed to have participants walk through how they would handle a cyber attack. Where possible, they would use their procedures to manage the attack. This helps the response team identify and fix any deficiencies with the procedure and incident response planning.
Simulation test: An actual cyber attack is launched against a test ICS to see how well staff can detect, contain, remediate, and restore the system.
Full Interruption Test: Cyber researchers run penetration tests on the production ICS to discover vulnerabilities. This can pose significant risks on a live system if the team mishandles the exercise. However, it does provide a realistic exercise and helps researchers identify actual system vulnerabilities.
Forensics
Tools and techniques for executing cyber forensics, which is a post analysis of a cyber incident, are common on modern IT systems. IT networks, through data exchange mechanisms, data storage devices, and general computing components provide the infrastructure for supporting effective cyber forensics.
However, adapting an IT cyber forensics program to an ICS can be a challenge. ICS are not easily configurable to accommodate forensics programs. Nonstandard protocols, legacy architectures, and proprietary technologies make the creation and implementation of ICS cyber forensic programs difficult. The field controllers seldom have logging capability, and if they do, these features are often disabled or there is insufficient memory for storing all crucial events for diagnosing problems.
Oftentimes, the owner or operations staff do not have the skills to collect and analyze an ICS cyber incident. Instead, they rely on the vendor/integrator for support. This can delay the analysis, which may result in the loss of critical logs or context needed to analyze the event. Although the systems can alert operators of anomalous behavior, the end user still needs the technical skills to interpret and correlate the data in a timely fashion. Countermeasures can be implemented for any and all of these issues, but the cost associated with making these changes to the control systems is generally too high in both time and testing.
Insufficient or complete lack of logging capabilities: Many ICS components do not have an effective method for collecting data for post incident security analysis; and if they do, many of these features are disabled. However, central logging, which is discussed in Part II, can mitigate some of these issues.
Volatile vs. Non-volatile data storage: Volatile memory is lost when the power goes off. However, this type of memory is also fast, which is why it is used to store real-time parameters as well as other data in ICS. Unfortunately, these data are unavailable for forensics analysis when servers, workstations, and PLC are rebooted as is often the case following a cyber incident. By contrast, nonvolatile data, such as hard drives and flash memory, retain data for analysis following a power loss.
Access to impacted resources: It is sometimes difficult to take an ICS component out of service for forensic analysis because they are controlling critical processes.
Time synchronziation: For the ICS architectures that do use modern cybersecurity procedures and technologies (firewalls, IDS, IPS, etc.), the forensic data collected by these systems cannot always be correlated with device controller and ICS logs because the time stamps may not be synchronized.
In-house expertise: Post incident analysis is often dependent on the vendor, and any lessons learned are not incorporated into a defense-in-depth strategy.
Example
The Sentient Hyper-Optimized Data Access Network, or SHODAN, is a search engine that catalogs all Internet-facing devices including control systems. It is a great tool for performing reconnaissance. It provides information an attacker would find useful, including ports, host name, country, server operating system, server version, and more.
In February 2011, a researcher was able to identify and easily access an ICS using information gleaned from SHODAN. Fortunately, there was minimum impact to business functionality because the researcher only “looked around.” The researcher reported the vulnerability, and ICS-CERT worked with the asset owner to secure the system.
Example Summary
Summary of recommendations:
Place all control systems assets behind a firewall, separated from the business network
Deploy secure remote access methods such as Virtual Private Networks (VPNs) for remote access
Remove, disable, or rename any default system accounts (where possible)
Implement account lockout policies to reduce the risk from brute force attempts
Implement policies requiring the use of strong passwords
Monitor the creation of administrator level accounts by third party vendors.
Physical Security Layer
Physical Access
Physical security is the second layer of defense. It refers to preventing unwanted physical access or theft of ICS components, such as servers, workstations, laptops, networking equipment, portable media, and device controllers.
Physical security is an important defense-in-depth layer, because an attacker can compromise a system with less effort if they have physical access to the components.
Physical security includes, but is not limited to, restricting and limiting access to buildings, off-site backup centers, and server rooms. For example, a stolen laptop can be booted from a disc and the data copied without breaking into the laptop, when its hard drive is not encrypted. Also, the supporting infrastructure, such as electrical power feeds and communications circuits should also be protected. An attacker who gains access to the ICS power switches can launch a denial-of-service attack by simply shutting off the power to its critical servers.
Second Layer
Many IT departments do a good job of guarding, locking, and monitoring access to servers and core services. Instead of physical keys, card readers and in some cases two-factor authentication (cards and identification) are used to restrict physical access to the corporate IT infrastructure. This makes it easier to revoke and control access (potentially, remotely, and centralized). Also, many IT functions and core services are in smaller geographical spaces, making physical security easier to manage.
This approach to physical security is not always applied to protecting control systems. While many organizations lock their ICS servers in a room with restricted access, there are many examples where ICS servers are located under a desk in a unsecured control room. In addition, industrial controllers are often dispersed throughout a facility making it more difficult to control physical access to them. Policies and procedures should be developed for ICS physical security. The policies and procedures should stipulate who has physical access to ICS components and who is responsible for their protection, what systems should be protected, when should physical access be granted, where should the components be located, why protect these systems, and how best to secure these components.
Summary till now
Defense-in-depth. A layered approach to defending an ICS. It requires developing defenses for all systems and subsystems in an ICS.
Create and document a baseline. Understand and implement your policies. Identify your ICS network components and compare them to your initial baseline. Use change control to update your network maps. Go back and start all over again.
Physical security. Physical security includes, but is not limited to, restricting and limiting access to buildings, off-site backup centers, and server rooms.
Network Security Layer
Network security is the third layer of defense in our model. Networks are essential for sharing data and resources. Networks also provide an opportunity for reconnaissance, as well as a pathway for attackers without physical access to deliver and install malware. Network security can reduce the risk of these types of network attacks, but is based on segmenting a network with well-established security boundaries.
Purdue Model
So how do we begin? To answer this question, we should first discuss the Purdue Enterprise Reference Architecture (PERA) model, or Purdue model for short. This industry-adopted model can be used as a concept to divide your network into different levels; allowing you to lock down communications between the levels of essential traffic. Cyber risk is significantly reduced by defining critical network paths and only allowing specific data to pass through these network interfaces. This prevents noncritical network traffic from entering the control domain.
One of the greatest advantages for using the Purdue model is organizations can create data pathways into and out of specific levels, and create appropriate policies to secure that data transfer. Using this model, network architects group network devices into levels based on physical location and risk. Network architects then apply proper mitigations based on the associated risk for the devices within each level.
For example, a company would dedicate more resources defending a high risk level than they would a lower risk level. The Purdue model maps consequence to countermeasure, because each level will have an appropriate and well-defined risk and will require mitigations that may be inappropriate in other areas.
Network Personnel
Typically, a core group of ICS and IT network engineers, an ICS vendor, and integrators work together to maintain and upgrade the network. They each bring their own set of talents and understanding of the network to develop and deploy the best solution. The ICS vendors are knowledgeable about the networking requirements for their products. The IT group has networking engineers that understand TTL (Time to Live) protocol and bandwidth to ensure the best use of network resources. The ICS engineers should become familiar with how their networks are laid out so they can maintain them in the future.
Finally, the integrators or contractors who deploy the networks can offer insights that improve reliability and possibly save money. Most organizations have a core group of contractors they use for installations. If the group of contractors and integrators have an established relationship with the other groups, questions can be answered faster, and information exchanged more efficiently.
Security Technologies
Modern IT network defenses include security technologies and network management. Many of the devices deployed on IT networks are robust enough to include security. In addition, many third-party devices are available from a variety of vendors, which improve security.
Some organizations use Honeypots or network Canaries to help spot and limit attackers. A network Canary is a device that acts like a normal part of the network, but is heavily monitored to see if any automated scans or attacks are taking place on the device. For example, an extra workstation is placed in a segmented network VLAN where other sensitive information is stored. If that workstation is being scanned or attacked with malicious code, the sensors will notify administrators.
Network management takes an active approach to security. This includes establishing robust policies and procedures that are audited, such as network configuration management.
Modern switches and gateways have network configuration features such as access control lists (ACL). This prevents users and applications from accessing resources they are not authorized to use.
Networks in IT can be segmented through a VLAN making it difficult for an attacker to traverse the network.
While many of these techniques are common on IT networks, ICS networks are just starting to deploy these defensive measures.
External Connections
All organizations have some form of external connection for email and Internet access. Many of these connections are also used by an ICS for remote support. At the corporate level, external connections are usually well documented. IT systems use Virtual Private Networks (VPN) that are managed by the corporate IT department. As a result, IT can limit access to certain personnel. They can also employ countermeasures, such as blacklists at the Internet gateway, to prohibit traffic to certain sites and employ encryption to protect data.
Modem connections are rare. If they do exist, they are usually well-documented, password protected, and turned off when not in use. Connections that use public lines are actively monitored. Typically, the vendor will provide a service level agreement that includes security levels.
For Internet services, corporate IT deploys Service-Oriented Architecture (SOA) security for external facing SOA to peer sites or customers. Also, IT uses a wide range of encryption technologies, such as a secure VPN, for securing public lines, external facing servers, and direct connections to remote staff. This is typically the method used by vendors and on call staff to support the ICS.
Depending on regulatory requirements, not all critical infrastructure sectors in the U.S. allow remote connectivity to their ICS. The regulatory environment (e.g., NRC, NERC and FERC) may differ between sectors. Connectivity that is permitted in one may be disallowed, by law, in another.
Perimeter Components
The perimeter components that help secure a typical IT network include firewalls, intrusion detection systems (IDS), intrusion prevention systems (IPS), and/or unified threat management systems (UTM). ICS network perimeters are predominately secured by firewalls only, although some organizations are starting to deploy IDS on their ICS networks.
Firewalls are the most common first line of defense. There are many types of firewalls. They all have a common purpose of restricting or locking ports and services. Firewalls monitor ingress and egress traffic using specified rules. Important, often overlooked configuration practices on firewalls are firewall rules. While all ICS firewalls are tuned to block nonessential incoming traffic, the outgoing firewall rules are often inadvertently disregarded. Rules should be established for both incoming and outgoing traffic.
IDSs in the enterprise can come in two main categories: Network Intrusion Detection Systems (NIDS) and Host Intrusion Detection Systems (HIDS).
NIDS detect anomalies on a network, such as protocol-based attacks on ports, switches, gateways, and hubs, as well as traffic that passes through the network. These anomalies are reported to the network administrator as alerts.
HIDS are designed to monitor, detect, and log the modifications to the files on a file system and are installed on host applications and operating systems. HIDS can be designed to detect file tampering, malicious code, and application-level changes.
Just like security cameras warn guards of a potential break in or theft, IDSs can notify system administrators of anomalous network traffic. An IDS can be a powerful tool in securing an ICS network.
Like IDSs, IPSs come in both host and network forms and detect anomalies in traffic. The key difference is that IPSs actively attempt to make changes to or block network packets that violate the IPS rules. IPS can be in line to all traffic on the network. An IPS device and matching packet-layer firewall are widely used to protect the perimeter of modern IT networks. However, control system engineers are reluctant to use IPSs on ICS networks. They are concerned, understandably so, that an IPS will stop critical packets during an emergency.
UTM devices combine many types of network and host-based defenses for easier management. Because it combines multiple defense technologies, it is often less expensive to deploy than each technology individually. UTM devices are more common in small and mid-size businesses.
Purdue Model Revisited
Let’s revisit the Purdue model. Compartmentalization is based on segmenting the network into security levels and locking down traffic to essential functions. Cyber risk is significantly reduced by defining critical network paths and ensuring only specific data are transferred across network interfaces. This prevents noncritical network traffic from entering the control domain.
The Purdue model allows organizations to create data pathways into and out of specific levels. This lets the organization create appropriate policies to secure the information that crosses the level boundaries, and protects the cyber assets in each level. By establishing security levels, security policies are then applied at each level to ensure cybersecurity is preserved between conjoined areas.
When you apply the model, you separate out different functions of the ICS networks. This creates better security and an easy to follow network design. Security is increased by eliminating a flat network and increasing the number of hops to get to different network segments.
There is no reason to connect a corporate or remote cyber asset directly to an ICS environment. By using Level 3, often referred to as demilitarized zones or DMZ, asset owners can more securely transfer data from the control system to the corporate network.
Firewall
Firewall Placement
The Purdue model supports a layered firewall model. Firewalls are placed between each level and can be designed for very granular control. The firewalls could be placed in other locations as well as deployed in redundancy for resiliency and security.
The emergence of field-level firewalls that have been designed to specifically protect field devices and monitor field protocols has had a tremendous impact on control system cybersecurity. These firewalls can be situated locally to the plant or at a distance to protect equipment. Outputs from these dedicated field-level firewalls can be collected and tuned, and be used as inputs to UTM or IDS.
Firewall Implementation
Firewalls support communications between trusted environments or between trusted and untrusted environments. Regardless of the types of firewall technology deployed, several key firewall rules should be followed.
The default firewall configuration should be set to “deny all” connections and all ports set to “closed mode.” The firewalls also need to monitor appropriate system events for all domains it protects. This system monitoring should include oversight for embedded servers and services resident in the firewall.
Firewalls normally sit at the perimeter of a network and are deployed to ensure information and ICS resources are protected from unauthorized entities. This assumes everybody on the protected network is a trusted user.
Although more common on IT networks, we sometimes see multiple firewalls either in series or parallel that bolster a defense-in-depth strategy.
The firewall should not be considered a silver bullet. The firewall is usually the gatekeeper or border guard at the zone boundary. A compromise or failure at that point could lead to significant impact on communications and data flow.
Firewalls that have embedded servers are also targets for attackers. You should assume that a compromise with one of the embedded servers could result in a compromise of the entire firewall.
Firewall Tuning
Once you have identified the levels and information flow, generating firewall rules at the zone boundaries will be much easier. The data transferred between the levels will use certain ports. Because the data passing from one level to another will vary, the firewall rules will also vary. In other words, there is no universal firewall rule set for ICS networks. The firewall rules must be customized based on the data and equipment being protected.
To reemphasize the point, inbound or ingress traffic should be examined to ensure that no device is attempting to communicate with the ICS network that shouldn’t be. Furthermore, outbound or egress traffic should also be examined or filtered to ensure that only authorized traffic is leaving the ICS network.
A control room may possess several zones within the same room. For instance, the control room may house a business zone that contains servers for the operators to access email or the Internet. Certainly, the control room will have an ICS zone, which includes (among other devices) operator workstations for monitoring processes. Furthermore, there may be a DMZ that has a workstation for engineers or operations managers to view historical data or Internet-based information. Each level boundary should have firewalls that protect that zone from information leakage. This protection is highly dependent on the firewall rules. Therefore, time and effort should be taken to ensure firewalls are properly tuned.
Layered Rule IT
Let’s take a look at a strategy for developing rule sets for an IT firewall. Then, we will focus on ICS firewalls. This strategy for creating and maintaining rule sets not only applies to firewalls but can also be used to configure an IPS, IDS, and UTM.
Rule sets should be built using a layered strategy starting with the vendor’s default rule set.
The next layer involves researching and applying public best practice rule sets.
Next, we start to customize the rule sets. We analyze our corporate traffic and create specific rule sets depending on architecture, technology deployed, and traffic patterns.
Finally, we can apply rules at the host level to protect applications and operating systems. A robust IT security rule set for IPS and IDS also has multiple layers. We will look into building rule sets in greater detail when we apply this technique to the ICS environment.
Firewall Diversity
There are two main types of firewalls, host and network based.
Host-based firewalls are a software solution that operate on servers and workstations, protecting the ports and services of their host. Many operating systems include integrated host-based firewalls, but you can install third-party host-based firewalls. System administrators can create rule sets that track, allow, and deny incoming and outgoing traffic on the device. Host-based firewalls can also be used on mobile devices and laptops. This is especially important if these devices are used in the ICS domain.
Network-based firewalls can be further classified into those listed below.
Packet filter firewalls analyze packets passing through it and either permits or denies passage based on pre-established rules. Packet filtering rules are based on port numbers, IP addresses, and other defined data. Although usually flexible in assigning rules, this type of firewall is well-suited for environments where quick connections are required. It is effective for environments, such as ICS, that need security based on unique applications and protocols.
A circuit-level firewall validates the connection between two hosts before allowing a connection. Traffic is not allowed unless a session is open and valid.
Proxy gateway firewalls, often called Application-level gateways, hide resources on the networks they are protecting. They are primary gateways that act as a proxy for the protected resources, such as workstations and servers. The proxy gateway firewalls filter at the application layer of the OSI model and do not allow any connections if there is no proxy available. This type of firewall is good for analyzing data inside the application (POST, GET, etc.) as well as collecting data about user activities (logon, admin, etc.). They are gateways and require users to direct their connections to the firewall. They also impact network performance because of the latency caused by processing proxy requests and analyzing data. This type of firewall is well suited to separating the business and control LANs as well as providing protection to DMZs and other assets that require application-specific defenses.
Stateful inspection firewalls include many of the features and functions of the other types of firewalls. They filter at the network layer, determine the legitimacy of the sessions, and evaluate contents of the packets at the application layer. Rather than run proxies, stateful inspection firewalls use algorithms to process data at the application layer. These firewalls look at the state of the packets and analyze the packets against pre-observed activities. They also keep track of valid sessions and protect key assets in the control domain. Because many of the vulnerabilities in ICSs are related to trust between servers and devices, being able to track and react to valid and invalid sessions improves system security.
Firewall Implementation
The ICS community does not have an accepted standard for ICS firewalls. Although there is ample guidance on how to choose a firewall, there is no standard. Unless you are dealing with federal systems including the military, there is no accredited evaluation level that needs to be met by the technology.
Not only are firewalls considered to be the backbone of security in critical domains, they are also considered to be the silver bullet, which is not good. This cultural attachment to the firewall as a mandatory and often stand-alone security element can get us into trouble. Assessments have shown that control system architectures have been built using firewalls, but that the firewalls have been deployed to only meet minimal standards associated with security guidelines. Investigating some of these deployments shows that the firewall configurations are actually doing more damage than good, and the access control lists that have been created are serving no purpose. Appropriate firewall configuration is essential to properly securing the network.
Installing firewalls at all external connections increases the layers of security at the network perimeter. Asset owners may want to consider adding a second firewall from another vendor. The two firewalls should have matching rules/configuration and be deployed at the same location. This adds another layer of defense in case the firmware on one of the firewalls is breached. It also gives the network administrator time to patch the firmware in the event of a breach
Layered Rule ICS
Let’s extend the layered firewall rules for ICS.
The rules become more specific as we move up through the layers from the hosts to the control system. ICS security is significantly enhanced when applying network-specific rules (e.g., who is allowed to talk to whom) and vendor-specific rules (such as DCOM) should never be used on this system. The top layer denotes ICS protocol-specific rules. These rules monitor ICS command/data protocol as closely as an IDS monitors TCP protocols (FTP, HTTP, etc.).
Good IDS practices build rules from the ground up, starting with broad default rules that become more specific as you move up the layers. There is no reason corporate-specific rules cannot be applied to ICS networks. Host-type rules are specific to the operating system; for instance, Windows system rules versus Linux/UNIX system rules. The technology used for firewalls or an ICS should include default rule sets that are unique to that technology. Just because rules may not be applicable to a system today does not mean they will not be applicable in the future. As with IT, public best practices and corporate specific rule sets can be used to enhance default rule configurations for ICS.
IDS/IPS
IDS
An IDS monitors the network, then logs, and alerts network administrators if there is anomalous traffic. Typically, an IDS is coupled with firewalls, which further secure the network.
IDS protection can take significant effort to set up in the corporate IT environment. Because an IDS generates a lot of data and alerts, it must be tweaked and analyzed repeatedly to limit false positives and negatives. This can be challenging in an IT environment because it is difficult to develop the rules to capture all the anomalies with the diverse data flows.
IPS
An IPS blocks malicious network traffic. There is concern about using an IPS in an ICS network because of the possibility of mistaking critical network packets as malicious data, but a growing community believes that IPSs work well in ICS networks. At this point, there is no consensus about using IPSs in ICS networks.
A few creative asset owners use IPSs in the passive mode. These systems generate signatures on the fly that are then used to populate the IDS. This creates an effective warning system when used in tandem with customized alarms and triggers.
IDS/IPS in ICS
Setting up an IDS in a control system environment is much easier than in an IT environment. The traffic between ICS devices, i.e., servers, workstations, and PLCs/RTUs, is more predictable. As a result, it is easier to develop the rule sets to alarm when unexpected or disallowed traffic is detected. This approach saves time and effort, and may provide effective countermeasures instead of forcing administrators to write and create specific signatures for every plausible attack scenario.
Generally speaking, it is important to understand what intrusion detection and intrusion prevention can and cannot do. IDS products are only as good as the rules they use to generate alerts.
What IDS/IPS can do for you
Tell you what it knows about specific information it is programmed to watch
Tell you there may be a problem such as a configuration error
Tell you the system is being attacked by a known method
What IDS/IPS cannot do for you
Tell you if the system was exploited
Tell you what happened on the system console
Alert on new and unique attack techniques
Detect attacks off the network
Perform analysis
IDS Placement
Placement of IDS is similar to placement of firewalls. IDS should be located everywhere there is a network segment or zone that could use monitoring. Think of an IDS as an alarm on a building doorway or window. While it may not deter an attack, it does notify you of anomalous activity.
As with firewalls, IDS can either be network or host-based.
Commercial products for the IT environment may not be appropriate for ICS networks. Many commercial IDS products run on signature-based systems and have hundreds of thousands of signatures in a default store. The protocols, data packets, and network activities in a control environment are different than those found in IT networks. As a result, commercial IT technologies may not be effective in control environments. With the addition of IDS and monitoring capabilities, we have a cohesive defense-in-depth strategy.
Data Flow in ICS
We can place an IDS in a network to examine the data packets that flow between devices. If the rules are properly configured, the IDS will notify the system administrator of any irregularities.
IDS Implementation
One of the more common IDS programs is SNORT. (NOTE: this is not an endorsement of SNORT, this is an acknowledgment that SNORT has a large presence in the IDS market.)
Here is an example of a SNORT rule. Let’s break it down.
On the top line we have “alert” in blue. This tells SNORT to create an alarm.
Next, is the protocol type in red. “IP” means Internet Protocol traffic.
This is followed by the blue source address and port of the packet. This identifies the trigger.
The packet direction is in red. In this case, the rule applies to packets traveling in both directions.
The following command, in blue, is the destination address and port. Notice the exclamation mark in front of the destination address. This is a logical NOT. The rule states that if the HMI communicates with any device other than the controller, the IDS will issue the alert in the green message.
Finally, the signature ID of the alert is set in red.
HMI to Field Controllers
More than likely, the attacker will perform reconnaissance and launch other probes as well as enumerate the network before compromising the system. These are abnormal activities that, with properly designed rules, would trigger an alert. One of the main characteristics of an ICS is fairly well-established communications between nodes, meaning that ICS nodes have predictable data flows.
As shown in this example, the HMI primarily communicates with the controller, thus the reason for creating a rule to issue an alert if another node tries to communicate with the HMI.
Let’s look at our example rule with the data flow diagram. The HMI addresses are 10.0.10.20 and 10.0.10.30, while the controller is 10.0.10.15. The rule reads that it will trigger an alert if there is a TCP connection between the HMI and any node or device other than the controller. We used the SNORT numbering schema to ensure this rule is above 1 million for personal/private use. Notice that SNORT will issue an alert if a new configuration change were downloaded to the HMI from the configuration database.
IDS to Nothing
A network Canary is a node on the network that acts as a decoy. Its sole purpose is notifying the network administrator that another device is trying to communicate with the Canary. This is a good indication that an attacker may be scanning the network.
In this example, the IDS server that has an IP address of 10.0.10.1 also acts as a network Canary. The rule is written to alert the network administrator when either the Canary or another device on the network is attempting to establish communications. This rule triggers on abnormal network traffic, i.e., communicating with the IDS. This is a simple approach to establish a Canary without having to deploy extra resources.
Improper Firewall
We can also develop a similar rule to monitor the firewall. This rule triggers an alert if there is any communication with the firewall using TCP Port 80, to connect with the firewall. This would alert the network administrator that someone is trying to view or change the firewall configuration using a web browser.
Antivirus
Antivirus and unified threat management (UTM) must be carefully deployed to avoid impacting the operational integrity of an ICS.
Antivirus, especially on older systems, may consume significant resources (CPU and memory) when scanning the system. This could slow down the system, delaying other tasks while the antivirus monitors and controls the scan procedure. To prevent some of these problems, the antivirus scanner may be configured to skip scanning files that are loaded in directories that are used to doing fast data loads. Unfortunately, this also provides malicious code a safe place to hide.
To keep current with emerging malware, the antivirus data signatures must be constantly updated. This can be time consuming, especially if the ICS vendor does not allow automatic updates.
There is also concern that a bad signature could shut down a critical service, impacting the ability of the ICS to control the process. Ideally, each signature update should be tested on an engineering workstation that mimics the “live” ICS, commonly referred to as a test system or test bed.
Some ICS vendors have partnered with antivirus vendors to provide solutions for their customers. This may be a problem, especially if the asset owner has a site license with another antivirus vendor, and insists on using their antivirus software in the ICS instead of the antivirus software supported by the vendor.
UTM solution is an aggregate of all the individual cybersecurity solutions. They combine firewalls, IDS, IPS, routers, VPNs, and whatever features the vendor has chosen in a single solution. There is no standard list of features in a UTM.
If there is an antivirus component in a UTM, the signatures for the antivirus component will need to be updated, just like with regular antivirus software.
Malware Preventions
There are many common conduits that attackers use to deliver and install malware on systems. Best practices for preventing malware on a control system network are adopted from IT best practices. However, the IT best practices do need to be modified for ICS networks. Although the corporate domain is used as a buffer between the Internet and the control system, it is common to find email, browsing software, and a number of other technologies that facilitate Internet communications loaded on HMIs.
Modify IT Best Practices for ICS
Common conduits below to learn how to modify IT best practices for ICS.
Email and Instant Messaging/ Instant Relay Chat
Ensure no incoming or outgoing email or IM/IRC traffic is on the control system network. This should be enforced by removing the software and auditing HMIs to ensure the software was not reinstalled intentionally (for internal use) or unintentionally (via patch or upgrade).
Dedicate a separate system if email or IM/IRC traffic is needed for operations (e.g., vendor support).
Configure firewalls, IDS/IPS, and Internet proxies to restrict email and IM/IRC traffic into and out of the control system network.
Internet browsing
Restrict Internet browsing to allowed sites such as vendor support and internal references. Use Internet filters in addition to firewalls to restrict Internet access. Preferably dedicate a separate system for this purpose.
If Internet browsing is required for operations on control systems, use Internet browsing protections such as restricting the use of JavaScript that could run malicious payloads.
Portable systems
Portable devices, such as laptops and cell phones, can be another source of malware. In many cases, the use of laptops on an ICS network is unavoidable, especially when they are required for configuring controllers or troubleshooting networks.
Before joining an ICS network with a portable device, verify that the operating system is patched, antivirus software and data files are up to date, and host-based firewalls are properly configured.
Ideally, a laptop that is not used for office functions, such as email or Internet browsing, should be dedicated for supporting the ICS.
If allowed at all, establish and enforce policies and procedures on the use of visitor computers on the ICS network. Vendors that support several customers may be carriers of malware. They can partially mitigate the spread of malware by burning data to a DVD or CD that will be moved to an ICS. While this is not always practical, it is safer.
Data Transfers
Data transfers, especially over untrusted infrastructure like the Internet, can be used to introduce malware. The following are recommendations to safely transfer data:
Use encrypted or secure protocols (e.g., SSH, VPN, secure FTP, SSL), not clear text protocols (e.g., FTP, TFTP, HTML, TelNet).
Use digital signatures and hashes (e.g., md5, SHA) to ensure files received have not been modified or compromised.
Prohibit drive sharing of systems between the control system network and other networks. Firewall rules and system configurations can enforce this.
Portable media
Attackers also use portable media, such as USB sticks, DVDs, and CDs, to deliver malware. These delivery mechanisms can be especially dangerous because they bypass perimeter controls such as firewalls. The following recommendations provide mitigation strategies to allow the safe use of portable media:
Dedicate a separate system for loading files such as firmware upgrades. Once downloaded to the dedicated system, the upgrade should be tested and written to new or “clean” media before transferring them to control systems from a trusted system.
System administrators should disable or physically restrict USB ports, DVD disc drives, and CD disc drives on ICS workstations and servers. If this is not feasible, audit systems regularly for prohibited software. Cover plates and drive locks can be used to physically restrict portable media.
Scan portable media, before and after use, for malware with antivirus and antispyware software.
Central Logging
Network devices such as routers, workstations, firewalls, and servers generate a variety of logs on the system status. This includes the status of applications, operating systems, system security, hardware, and networks.
The application log records events generated by programs. The application developer determines the types of events written to the application log. For instance, Internet Explorer may document an unplanned module shutdown. The operating system may log a driver failure in the systems event log or an invalid login in the security event log.
Frequent monitoring of event logs can identify and diagnose potential problems. By monitoring and crosschecking logs, a system administrator may be able to spot malicious activity. Frequent monitoring is also an important tool in researching incidents, i.e., forensics.
Reviewing logs from multiple sources can be laborious and time consuming. This task can be simplified by implementing a centralized log server where the events are gathered from all network devices and stored in a single location.
Data protection: An attacker will want to delete or modify events to prevent system administrators from discovering their activities. Because the logs are sent in near real-time to the central log server, it is unlikely an attacker would have time to alter the source log files before they are sent. The log server must be hardened to prevent the attacker from altering the logs once they are stored on the server. Unneeded services must be shut down, unused accounts should be removed, and the server should be physically secured in a locked room.
Log analysis tool: Central log servers can provide tools for analyzing the data that may not be available on the network device–such as searching, sorting, and alarming on events. Furthermore, there are third-party applications that can automatically process and analyze the logs.
Log archiving: It is much easier to archive logs from a central log server than to pull and archiving logs from multiple sources.
Log correlation: Storing logs in a central location makes it much easier to get a complete picture of an incident, rather than trying to analyze and correlate logs from multiple locations. An analyst can glean clues surrounding an incident by viewing all events from all sources in chronological order. However, it is crucial that all network devices are time synced to the same time source. Any mismatched or out of sync time stamps can cause confusion or lead to inaccurate diagnosis of an incident.
Log Consolidation
The first step in forensic analysis is to gather data–that is, determine what data would be useful to have, and find a way to store the data.
Then, software solutions such as IDS and SIEM (Security Information and Event Management) may be used to provide real-time analysis of security alerts generated by network hardware and applications.
SIEM is a product capable of gathering, analyzing, and presenting information from network and security devices; identity and access management applications; vulnerability management and policy compliance tools; operating systems, database and application logs; and external threat data. A key focus of SIEM is to monitor and help manage user and service privileges, directory services, and other system configuration changes, as well as provide log auditing, and review and incident response.
Complications
As with most ICS security solutions, log consolidation also has its drawbacks:
It adds more traffic to the network.
Moving that amount of traffic may alert an attacker to the location of valuable information.
There is always a fine line between keeping the business going and “cool” technology. Adding technology does not mean security is achieved. Similarly, standards compliance does not mean the system is secure.
Hardware Security Layer
The fourth level of our defense model, hardware security, is technology built into the devices designed to prevent unwanted code from executing.
Processors with the No eXecute or NX bit are examples of hardware security. They segregate areas of CPU memory into different classifications and only allow areas of memory to run as per its classification. Some areas of memory are only storage, and the NX bit processor prevents any code execution in that particular area of memory. This prevents certain types of malicious software from taking over computers by inserting code into another program’s data storage area and running the code within this section of memory (such as a buffer overflow).
Hardware Security
Accelerator cards that attach to a motherboard on a server provide another example of hardware security. The cards will decrypt encrypted programs, thus offloading this responsibility from the servers processor and memory. This will speed up the servers processing, making it more possible to use encryption to improve security.
Modern IT hardware components automatically detect when newer revisions of firmware are available and will update the firmware. This defense-in-depth strategy adds an extra layer of defense on top of a patch management process that also checks for firmware updates.
Some vendors provide systems with Trusted Computing, which is a technology that uses cryptography to help enforce a selected behavior. Trusted Computing vendors supply hardware that uses encryption keys that no one knows, including the owner of the system. Many government agencies (e.g., DoD) require that a Trusted Computing vendor provide their computers.
Software Security Layer
The final layer in our defense-in-depth model is software security. Software countermeasures start with knowing the vulnerabilities of deployed software. IT departments commonly scan systems on their networks for vulnerabilities in operating systems and software packages, and for unapproved software. This is usually a planned, scheduled, and documented event verified by IT staff and management. Scanning helps with patch management. An extensive list of security tools exists to help detect, scan, and deploy patches, security hot fixes, and service packs to operating systems and applications.
Software Security
The IT industry has embraced patch management. There are regularly scheduled patch releases by vendors that IT departments anticipate and deploy in a reactive manner. Deploying patches on ICS is trickier because of the availability requirements, plus patches have been known to “break” ICS.
Modern browsers also employ software security because they are frequently used and exposed to malicious code. Many modern browsers incorporate sandbox-style memory and code execution, making it difficult to hop security zones in the browser.
Patch Management
Part of software development is debugging the code; that is, fixing those problems in the software. Patches have two main purposes: to fix vulnerabilities and to enhance functionality. While enhancing software functionality is important, we are more concerned about reducing vulnerabilities.
The types of software that require patching include the operating systems, the ICS software, and any third-party applications such as Internet services, databases, and document readers.
Unfortunately, patches are often not applied to ICS because the systems are working, and as mentioned previously, patching ICS can cause problems. The attitude is “If it is not broke, why fix it?” Often asset owners will not risk the problems associated with patching if the system is working acceptably.
As with high blood pressure, just because you do not feel symptoms does not mean you do not have a problem. Penetration testers, such as Metasploit make it extremely easy, even for a novice attacker, to exploit vulnerable systems. Furthermore, malware also uses these vulnerabilities to compromise systems.
Patches
Applying patches is crucial for preventing pen testers and malware from taking advantage of known software vulnerabilities. Despite the time and risk of deploying patches, patching should be part of any program to improve the ICS cyber defenses.
Patching older ICS that run on unsupported operating systems may not be an option. In these cases, asset owners may want to consider updating their operating systems or use virtual patching. Virtual patching identifies and stops attacks on the network before they reach a vulnerable server. Tools, such as Web application firewall (WAF) and IPS, are used for virtual patching. Specific rules are written for these devices to identify and stop known attacks on the network.
The risk of installing a bad patch on an ICS is causing unintended consequences. To reduce this risk, operating system patches should be tested by the ICS vendor and asset owner before deploying them on a live system. Patches should be tested on a sand box system (test environment) such as an engineering workstation to ensure they do not cause problems. Using an offline system is the best approach for testing patches, but if that is not an option, you should always have a recovery plan and the technical know-how to handle any problem before installing a patch on a live system. Operation managers should be aware of the risks of patching a live system or be willing to provide funding for an engineering workstation.
Patch management servers, such as Microsoft’s Windows Server Update Services (WSUS), can help system administrators better manage and deploy operating system patches after they have been tested for compatibility with the ICS. They are especially handy when supporting large ICS consisting of several servers and workstations.
Secure Code Development
Vendors can reduce the need for applying security patches by adopting secure code development. The software development life cycle should include security during each phase of software development. This should include security reviews and coding techniques for each of the following processes:
Testing the software for security errors in programming
Testing and reviewing code to comply with secure coding practices
Using or developing tools to audit and automate secure code techniques
Secure coding is a vendor’s responsibility. However, asset owners can acquaint themselves with secure coding practices and ask their vendors what they are doing to build security into their products.
Cybersecurity can be built into the software development life cycle (SDLC)
Encryption
Encrypting data that flow over networks or data stored in memory and hard drives is another tool that can be used in defending ICS. Encryption prevents an attacker from viewing cleartext data and inserting malicious code in data streams. There are several popular encryption techniques. Depending on the technology, data are encrypted at the network, application, or data layers in the OSI model.
Because encryption and the subsequent decryption process use algorithms to create ciphers, every layer of encryption adds latency because of the overhead to process these algorithms. Many IT products use hardware accelerators to run encryption algorithms in hardware rather than the CPU. This significantly speeds up the encryption and decryption of data.
Secure Socket and Transport Layers: This is end-to-end encryption used for Internet bound traffic.
Hyper Text Transfer Protocol: This secures Web-based HTTP communications.
Pretty Good Privacy: This encrypts and secures data. Each user creates a public and private key. Public keys are linked to the data (such as a file), and the only people who can decrypt the file are those who hold the corresponding private key.
RSA Keys and PKI Certificates: These are types of encryption that use a key assigned to a user or a group of users to authenticate. The RSA key is typically a physical device the user carries with them, and the PKI certificate is attached to their account.
Secure Shell: This is a protocol for secure communication over a network. SSH protocol not only provides confidentiality and integrity using encryption, but it also provides authentication to remote devices.
Wireless Encryption: Wireless encryption, used extensively on wireless networks, is not as robust as the other encryption technologies.
Drawbacks
Knowing encryption’s advantages and disadvantages can help you make an informed choice as to whether to include encryption in your defense-in-depth strategy.
Encrypting and decrypting data are resource intensive and can introduce latency. This can be a problem when processing real-time data. Hardware encryption can mitigate some of these issues although it will increase the cost of the system and the complexity of the network architecture.
One of the big pitfalls in using encryption is that it blinds an IDS. Encryption not only obfuscates the data stream for an attacker, but it also obfuscates the data stream for an IDS, thereby rendering an important cyber defense tool worthless.
With encryption, data are only protected while they are in transit. If an attacker intercepts the data before they are encrypted, or after they are decrypted the attacker can steal the data or inject malware in the data stream. Encryption can give a user a false sense of security if they do not have an understanding of its limitations.
Virtual Private Networks (VPNs) let organizations establish a private network over the public Internet, thus avoiding the expense of installing a private communications network or leasing expensive communication circuits. ICS operators use this technology so their staff and vendors can provide remote support of their ICS. VPN circuits use encryption protocols, such as SSL or Internet protocol security (IPSec), for securing the data.
So far, we have mainly focused on network encryption; however, encryption can also be used on hard drives. This is not a common practice for ICS workstations and servers because of latency concerns, but it should be considered for portable laptops used for remote support, especially if theft is a concern. Full encryption encrypts the entire hard drive, while partial encryption is used to only encrypt folders containing sensitive files.
Impacts
How do these cybersecurity countermeasures impact the requirements of confidentiality, integrity, and availability? This chart shows a simplistic view of the observed levels of impact: starting with confidentiality, then integrity, and availability for different technologies of mitigation we have covered: firewalls, IDS, IPS, antivirus, and UTM.
The impact on confidentiality is low regardless of the mitigation. This means that any of the mitigations have little impact on confidentiality. By contrast, the listed mitigations, with the exception of intrusion detection systems, have a high impact on availability. High impact as it relates to availability implies that a measurable latency is introduced by the mitigation that could impede operations. There would also be a high impact if the mitigating device prevented ICS nodes from communicating. For example, an IPS drops normal ICS communications mistaking it for a cyber-attack.
Summary
These are things you can do to improve your cybersecurity posture.
Patch management. Fixing known problems can prevent exploits from getting easy access to your systems. Patching is not always easy.
Network security that starts with laying out your network properly. Work with network design engineers to ensure the battlefield (your networks) gives you as much advantage as possible and still fulfills your operational requirements.
Firewalls assist in securing your networks, much like guarded gates secure a fence. If the gates are not guarded, this is analogous to not having appropriate rules in place. If the gates are guarded properly, traffic coming into and out of your network will be appropriately restricted.
The use of IDSs can assist in discovering network anomalies. By monitoring against what you know should be the norm, anomalous behavior can be identified and investigated. In this way, you can keep watch for things that may be out of place.
Multi-Factor Authentication (MFA)
Multi-Factor (MFA) Authentication has no unique solution for ICS as it uses the same technology but can be applied to IT and ICS. It uses two or more of the following criteria below:
Something you have: Configuration, like differing credentials or a separate token for ICS, would be the only difference. An example is remote access into the IT enterprise may require MFA (like a token code from an app like Google Authenticator or Duo) but access into the ICS network may require MFA using a token, such as a PIV card/PIN. An access card is a form of MFA, where it is something you have. Swipe a card to gain access control. Adding a PIN to access cards adds another layer of access control (defense-in-depth).
Something you are
Something you know
Internet vs. Intranet
Internet is public or world wide web (www), while intranet is private and internal to a company or facility (Internet is a proper noun while intranet is a regular noun). An ICS network is an example of an intranet that could even be considered a sub-intranet network as they typically are internal to a company’s intranet.
Defense in Depth for this situation is to access the IT network from the Internet, then (using different credentials) access the ICS network. This provides additional layers of protection in case the IT credentials are compromised, the ICS credentials remain intact. Additionally, monitoring remote access at 2 points (into the IT enterprise and into the ICS network) adds additional layers of administrative control.
Separation of Duties
Separation of Duties is an administrative control that separates actions a single person can enact. A simple example is an employee who is not able to request a purchase and approve the payment. Approval must be given by another person. An ICS example is a process engineer needing to change process parameters but it requires a control systems operator to load the new configuration. The control systems operator doesn’t have access to the network share where the process configuration files are kept.
Least-Privilege
Least-privilege is an administrative control which grants users only those accesses required to perform their duties.
Example:
A customer can walk through a store but not got to the shipping area
Process operators can view engineering diagrams from the server but can’t edit or upload changes
A Security Guard can view process status on an HMI but cannot make changes to the readings
Security Guards and Surveillance Cameras
Security guards and/or surveillance cameras help provide another layer of security.
Policies, Procedures, and Awareness: Training on passwords, policies, and data classification.
Physical: Locks, fences, and security guards (physical defense-in-depth).
Perimeter: Firewall, VPN, and packet filters.
Internal Network: Firewall, intrusion detection, and encryption.
Host/Accounts: Platform OS, patches, and malware protection.
Application/Program: Single sign-on (SSO), authentication, and authorization.
Data: Database, content, and message security.
Network Discovery and Mapping
Discovery Process in both, Passive is much more stealthy and Active is aggressive in trying to learn things. In both cases, we are mapping out the environment. Often is the case, when we are presented with a case of understading in-production environment, with no-prior person to enquire from, documentation is little, suggestions to how to handle certain performance issue.
Passive Discovery
What?
What is Passive Discovery?
Using information discovered from local memory of any host, to build a vision of an existing Control system environment.
Practicing safe methods to explore and perform reconnaissance.
Attempt to identify network details without sending network packets.
Why?
Why perform passive network discovery?
Safer practice regarding Control System networks (don’t want to break something).
Can yield information that active discover may not be practical for, such as data found in various files.
Use tools passively
When exploring a Control System network, practice passive techniques when mapping.
Utilities and commands are not neccessarily defined as passive. Using a tool passively is a responsibility of the user.
Daily operation of production Control Systems already create expected traffic. Try not to interfere or manipulate pathways when exploring.
Examples and Effects
Neglect to disable name resolution in commands
resolution queries could alert and IDS unnecessarily.
Scanning your own host, from the same host (to know what it is running?).
Self inflicted scans will preoccupy a host’s network resources and may alert a host-based IDS.
Restarting services without planning (often we try turning off and on again without planning). For example, if a watchdog timer checks for a open-port and restarting doesn’t start the service and the port remain closed.
Watchdog timers (checking for a particular state or change in state) could generate timeout signals, and trigger alarms to an operator. Meaningless errors can appear in logs.
Clearing Cache
Clearing cache will cause bursts of packets to repopulate tables.
Artifacts
Tools
ipconfig
,ip
,ifconfig
netstat -anob
/netstat -pantu
route print
/route -n
iptables
tcpdump
+wireshark
EtherApe
History + Logs
.bash_history
Browser History
Remote Desktop History
var/log/messages
var/log/syslog
Configuration files
crontab -l
/etc/network/interfaces
C:\windows\system32\Drivers\etc\hosts
/etc/resolv.conf
Cache
arp -an
nbstat -c
ipconfig /displaydns
How?
ARP
Linux
arp -a -i eth0
will do the DNS resolution that will send the network packets to the DNS server asking for name resolution (Active scan).arp -a -i eth0 -n
will not do the DNS resolution (more passive).
Windows
arp -a
EtherApe is a good tool to understand what traffic is being generated
Explore the ARP table
Control systems can participate using ethernet.
Investigating ARP Tables are a great local cache to start with.
Use the arp command to view the table.
Take note of the MAC addresses mapped to IPv4 addresses.
Research discovered vendoes from first 3 bytes of the address (OUI - Organization Unique Identifier) and figure out what vendor is famous for what in control systems? (router/PLC/HMI/firewall/Cameras?).
Why look at the ARP table?
Display a list of remote hosts or devices, with with the host has recently communicated.
See if there are two ARP tables? (which probably means two network interfaces in a host connected to different networks?)
Check the table again later. It may change. If it does, this might be an indication of scheduled tasks. Investigate further.
IP
Check IP addressing
Control systems can also participate using IP.
HMI workstations could be PC operating systems. Learn it’s potiential reach with other IP networks.
IP addressing commands can reveal much more than IP address.
Compare previously discovered MAC addresses mappings.
Why look so closely to IP addressing?
PLC’s, RTU’s and various SCADA devices are often controlled by HMI workstations. Knowing the IP connectivity is important security awareness.
Windows
ipconfig /all
check hostname, IP routing enabled (to see if its a router), subnet, gateway, DHCP/DNS servers?
Linux
ifconfig -a
if we do
ifconfig
, it would only show interfaces that are up and in configured state.if we do
ifconfig -a
, it would show interfaces that are configured/present in an up/down state (we might see vlan, vpn, bonded interfaces).
ip a
or ip addr show eth0
DNS
cat /etc/resolv.conf
When a host is set to use a DNS server, generally ALL applications can query it.
HMI software becomes configured with network addresses. If the configurations are populated with names instead of numeric IP addresses, then we will be at the mercy of DNS server.
TCP/UDP Ports
Ports
Review any Listening or Established ports.
Compare TCP and UDP port numbers that maybe associated with Control system vendors.
Control System Port Number Examples
BACNet/IP : UDP 47808
DNP3 : TCP 20000, UDP 20000
Ethernet/IP : TCP 44818, UDP 2222, UDP 44818
ICCP : TCP 102
Modbus : TCP 502
Other port ranges
Well-know ports range from
0 - 1023
Registered port ranges from
1024 - 49151
Dynamic port ranges from
49152 - 65535
netstat
What is netstat?
Tool for looking at a host current network sessions and listening ports that are being offered.
Why use netstat?
Determine which local servers are TCP or UDP based.
Search for potiential connections being made with any known Control Systems.
View all currently Established connections taking place with HMI, Controller, Historian or other hosts.
Windows
netstat -ano -p tcp
-a
all sockets-n
no name resolution-b
owning process name-o
owning processs ID
Linux
netstat -pantu
-p
owning process ID-a
all sockets-n
no name resolution-t
tcp-u
udp
Check Local Address, Remote Address, State column (listening (those port numbers are listening), established (the host is talking to some other host check what ports (ICS Ports?)))
Probably, we can figure out what the local machine is used for HMI (connects to several devices and a database)
Check if IP addresses are in the same subnet or different (File server, HMI accessing files from an outside network) helps to figure out different subnet or boundary of different subnets.
Routing Table
What is a routing table?
A local table of IP network destinations that the host is able to reach.
Why look at the routing table?
Identify router/gateway IP addresses.
Identify network destinations.
Identify individual host destinations.
When viewing a route table, learn to notice the IP address ranges. Determine which ones appear public and which one appear private.
Make not of any public IP addresses that may appear in configurations found on Control System networks.
Private IPv4 ranges:
10.0.0.0 - 10.255.255.255 /8
172.16.0.0 - 172.31.255.255 /12
192.168.0.0 - 192.168.255.255 /16
If there’s any public IP printed in route table, if exists try to understand why control system needs to talk to the public IP address.
Windows
route print
Linux
route -n
or netstat -rn
Any gateway entry of
0.0.0.0
specifies local interface that has IP address setup on them.Check if any static IP addresses are setup?
Any host with more than one interfaces can act as a gateway.
Linux: check
/proc/sys/net/ipv4/ip_forward
0
- Not forwarding1
- Forwarding.
Windows: Registry :
HKEY_LOCAL_MACHINE\System\CurrentControlSet\services\Tcpip\Parameters
Check value
IPEnableRouter
netBIOS
What is netBIOS?
Network Basic Input/Output System (netBIOS) - allows applications on diffrent computers to communicate within a local area network.
Used by Microsoft File and Printer Sharing
How can netBIOS be helpful?
Discovers networks and hosts by looking at netBIOS cache (
nbtstat -c
)Cache contains recently contacted systems.
Check the naming convention of the name. For example: FSWCB1, AD2.
FS/AD might represent FileServer or Active Directory.
Numbers 1,2 might highlight that there could be more than 1 server.
TCPDump/Windump
Captures and analyses common network traffic for the command line.
Uses standard libpcap/winpcap to capture/parse network traffic.
Uses Berkeley Packet Filter (BPF) syntax for creating capture filter expressions.
tcpdump can also be active, so probably do
-n
to avoid doing name resolution.Also, each tool can have a vulnerablities, it’s better to run the tool using a different user
-Z username
.
wireshark
GUI network protocol analyszer and packet sniffer.
Libpcap standard library for opening and capturing network traffic.
Customizable dissectors (modules) for proprietary protocols.
Security Notes:
vulnerablities in wireshark could leave your system at risk of compromise if used on active networks.
Not required to run with root privileges
Long-term traffic monitoring should be done with
tcpdump
Rule of Thumb: Capture with tcpdump and analyze with Wireshark using a normal user account.
Files and Others
Browser history
Control system facilities may have workstations where various routine operations are performed. If particular personnel are no longer avialable, we can still explore a frequenctly used browser to collect information passively.
Address bar pre-populating with any URLs.
Saved usernames and passwords.
Bookmarks or Favorites, relating to Control Systems addresses.
Keystrokes to open recently closed tabs and windows.
Learn how to explore temporarty cache of the specific browser.
.bash_history
What is .bash_history
file?
History file containing a record of executed commands.
Every user of the has their own history file. It is located in the home directory of each user.
Files starting with a period appear hidden by default.
Why look at .bash_history
file?
Routinely executed commands help identify what tasks are performed at the workstation.
Host addresses and filenames could appear with specfied commands. Such as ssh, wget, ftp, rsync, mail and others.
People make mistakes. It may also contain username passwords.
Use the
history
command to view the contents.Check for any new IP address or any file extensions that might be of interest or any mail commands (employee addresses/file names) or any mysql commands (username, password or database name, or remote host (if not present that means mysql server is locally hosted)).
It may provide info on local hosts directories.
Check if any commands shows any USB/HDD/SDD was connected (any
/media
entries).
Active Discovery
What?
What is Active network discovery?
Send network packets and wait for a response in order to identify host and network targets
Can be extremely noisy and easily detected
Why?
Why use active disovery methods?
Identify targets that cannot be otherwise identified using passive discovery techniques.
Provides specific service, port and version information for a given targets.
Identify vulnerablities of accessible services.
How?
arp-scan
arp-scan -g 10.10.10.2/24
nmap
Designed to allow system administrators and individuals to scan large networks to determine which hosts are up and what services they are offering.
network discovery tool that can be used for identifying the systems currently connected to the network
nmap allows to audit what services are running on the identified hosts.
Can be dangerous to IT, SCADA and PCS systems, ICSs and embedded devices.
What is Nmap?
Open source tool for network mapping and security auditing.
Why use nmap?
much faster than manual discovery.
can scan an entire network quickly, and offers several options to customize a scan and its results.
How does nmap work?
Hosts on the network
Services (ports)
Operating systems etc.
Two-stage process
Host discovery
Port scanning
nmap - Discovery methods
User Datagram protocol (UDP)
unreliable stateless communication
No handshaking
Tranmission Control Protocol (TCP)
Reliable stateful communication
3-way handshake
Internet Control Message protocol (ICMP)
Provides control, troubleshooting, and error messages.
Normally used by ping and trace route commands.
Address resolution protocol (ARP)
Discovers Link Layer addresses of network devices.
Communicates in the bounds of single network.
Three-way handshake
Host Discovery
What is host discovery (HD)?
process of identifying active and interesting hosts on a network.
Why does Nmap do HD?
To significantly reduce the amount of time to complete network scans.
Narrows a set of IP ranges into list of active or interesting hosts to be port scanned.
How does HD work?
Uses combination of ARP, ICMP, TCP SYN, TCP ACK packets to identify active hosts.
Default Host Discovery Settings
LAN sends ARP scan (
-PR
)WAN (privileged) sends TCP ACK packet to Port 80.
(
-PA
) and an ICMP echo request query (-PE
)WAN (unprivileged) sends TCP SYN packet (
-PS
) usingconnect()
system call instead of TCP ACK packet.By default nmap will use arp-response for local network host discovery. If we want to use ICMP, use
--send-ip
-P
(Host discovery)
Port Scanning
What is port scanning?
process of identifying the status of interesting ports on hosts that are discovered on a network.
Why does nmap do port scanning?
to identify ports that are open on a host
How does port scanning work?
attempts to communicate with each port with a specified set of ports.
port scans are performed on hosts that were identified as active or interesting during HD.
Nmap Port states
Open: Application on target machine is listening for connections or packets on that port.
Closed: No application listening at the moment
Filtered: Firewall, filter or other network obstacle is blocking the port so that Nmap cannot tell if the port is open or closed. Nmap received no response.
Unfiltered: Port is accessible but nmap not able to determine if open or closed.
Open | Filtered: Unable to determine if open or filtered.
Closed | Filtered: Unable to determine if closed or filtered.
Nmap default port scanning settings.
SYN scan (
-sS
) for privileged users.Connect scan (
-sT
) for unprivileged users.
If it starts with
-P
(host discovery)-s
is for port scanning.
Timing and Performance options
What are timing and performance options
Settings used to control scanning delays, timeouts, retries and parallelism.
Why use timing and performance options?
Help speed up scanning process
Slow down scan to avoid IDS detection
Timing and performance options
Manual options are available but templates are usually sufficient
Template timings options offer throttling abilities not available using manual options.
Nmap results
Why save your nmap results?
easier to analyze and compare scans results (using ndiff)
Results overflow the console window buffer.
Output options
-oN filename.nmap
: Output results in normal format-oX filename.xml
: Output results in XML format-oG filename.gmap
: Output results in grepable format-oA filname
: Output results in all formats.-v
: Verbose output results
--reason
tells the reason.
OS and Version detection
What is OS and version detection.
Identifies operating system by looking at packet charactertistics.
Identifies the version of a service running on a host.
Why use OS and version detection?
Provides information that could help in the selection of exploits and payloads used against a target
How does OS detection work?
Nmap sends a series of TCP and UDP packets to the remote host and examples every bit in the responses.
Nmap compares the results to its database of known OS fingerprints and prints out the OS details if theres is a match.
How does Service and Version Detection Work?
After TCP and/or UDP ports are discovered, version detection interrogates those ports.
Database of probes for querying various services and match expressions to recognize and parse responses.
Tried to determine application name, version number, hostname, device type, OS family, and misc. information.
Nmap Address Schemes
Target hosts can be specified in many ways
1.2.3.1-254
: All 254 possible IP addresses on this subnet.1.2.3.0/24
: Equivalent to above but signifying a Class C address block.1.2.1-4.1-254
: Ranges are allowed for subnets as well.1.2.0.0/16
: The 16-bit netmask will scan the entire clas B address block.
--exclude
exclude a host/range.-sn
only do host scanning phase
ICS challenges
scans can cause computer system to restart
scans can cause embedded devices to freeze or lose configuration and in some severe cases requires vendor involvement.
Nmap considerations
Use connect scan (
-sT
) to prevent dangling connections.Don’t use OS (
-O
) and version detection (-sV
) (Control system would be running PLCs, RTU)Slow the scan down by reducing the rate at which packets are being generated and sent by Nmap.
Consider using exlusion lists (
--exclude
or--excludefile
)
Nessus Vulnerablity Scanner
Can be dangerous to ICSs.
Plugin modules for various ICS protocols.
Security auditing tool consists of two parts
Server (in charge of the scanning process).
Client (presents the interface to the user).
Nessus ICS Plugins
Areva/Alstom Energey management system
DNP3 Binary Inputs access
DNP3:
Link layer addressing DNP3
Unsolicited Messaging
ICCP
ICCP/COTP protocol
ICCP/COTP
TSAP Addressing
LiveData ICCP Server
Matrikon OPC Explorer
Matrikon OPC Server for ControlLogix
Matrikon OPC Server for Modbus
Modbus/TCP
Coil access
Discrete Input Access Programming
Function Code Access
Network Defense, Detection and Analysis
Identify
Asset and Information inventory
An asset inventory is necessary to understand and manage ICS risk and determine priorities for security defenses. The asset inventory is critical for understanding the potential impact of an intrusion
Know your environment
What?
Needs to be protected (PLC, pump, valves, non-electronics still something physical - how it is protected?)
Protection levels are available (What is available by vendors to protect the systems). How data is gathered from the ground-up?
Inter-connections and dependencies are required (what talks to what?, pump talking to PLC (controlling pump speed or flow) if not it might cause something to fail?)
Why?
Are systems critical (any special use, any special vendor?)
Are assets valuable ($$ and information)(produce gas or oil, electricity?)(Does the information provide insights to business to make decisions?)
Who?
Has responsibility for the asset (Who’s responsible System Admin, SPOC (single point of contact))
How?
Are worst-case scenarios identified if compromised (Do we have any plans in place in terms of outside/inside attacker?)
Are methods available for user access to the asset (Does the person have to visit the control room to access the devices or can be access remotely or via VPN?)
Does the information flow throught the system (where it starts/stops? Goes to firewall? Business IT network?)
Other
Field Devices
Easy to forget in asset inventory - “out of sight, out of mind”.
Field devices may be accessed remotely because it is more convenient or may require that a human being physically visit the remote device. When accessing remotely make sure the communication is secure and the device accessing the field devices is secure.
Security Challenges regarding Field Devices.
No centralized management for older field devices.
May lack security capabilities (maybe serial only, make sure we understand what capabilities they have)
Increased use of portable devices to access field devices (Laptops/Tablets?).
Possible Mitigations
Lock down unneeded services, ports and restrict access (Disable unused ports on the switch).
All devices used to interface with the field devices should be secured and monitored (have anti-virus and properly logged and accounted for).
Think about what devices are present and how they are communicating with central system and how they are controlled?
Least Functionality
Determine necessary ports, protocols, and services (What are the vendor recommendations/talk to the vendors what needs to be open on firewall/router)
Deny all others at the host and firewall
Harden devices (be careful while hardening and test whether everything is working or not; Never test on live system.)
Network access control (What can talk to what or each others? )
Use the data from a scan such as Nmap, to identify unused ports and service and disable all unused ports and services off. This should be done at the host. However, if it cannot be done at the host, use other mitigations, such as a firewall, to block any access to the services or any traffic leaving these hosts on these ports.
Hardening systems using security guidelines or controls will also reduce your attack surface. Work with vendors to determine hardening guidelines/settings for ICS equipment
Least Privileges
Establish user accounts for administrators (separate accounts for engineers, administrators and test that they are able to do their work and perform their responsiblities)
Appropriate use of the escalated privilege function (Check if the user needs esclated privileges and it is logged properly and they use it appropriately (whenever it is really required)).
Review work requirements for necessary access requirements
Role-based access (provide appropriate access for appropriate person).
Tools
GrassMarlin (Retired)
GrassMarlin can be used to identify traffic and systems on ICS network.
GrassMarlin is a passive network mapper dedicated to ICS and SCADA networks in support of network security assessments.
GrassMarlin passively maps, and visually displays, an ICS/SCADA network topology while safely conducting device discovery, accounting, and reporting on these critical cyber‐physical systems.
GrassMarlin gives a snapshot of the ICS network including:
Devices part of the network;
Communications between these devices;
Metadata extracted from these communications.
Reads in Zeek Connection logs, PCAP files and PCAP-NG files or can listen on the wire
Protect
IT-OT Convergence
Does IT/OT talk to each other? (They should be able to work together and help each other and whenever they have problems they talk to each other and solve problems by respecting each other.)
What we can do to improve communication between IT and OT teams (invite them to meetings, talk to them regarding something they are expert in and can help (firewall issues))
Human element
Policies and Procedures specific to ICS
Outline rules with regard to securing ICS (What kind of things we need to secure?)
Computer use policy (helps to understand what’s expected and what’s not)
Make security a priority (everyone should be aware of the ICS security)
Training and awareness
Employees are part of your defense (They are the most important people. Employee errors or unintentional actions often leads upto 50% incidents).
See something, say something (If they see something that is not right, ask them to mention)
Talk about security in staff meetings (something going on in your network, group or unit and training around security)
Lessons learnt from past incidents
User education is important.
Do regular phishing tests (As an OT person, we can take help of IT department to set this up.).
Explain to users the consequences of clicking bad links (Usually people often don’t understand why it is bad to click on links, if they understand they are more careful.)
OPSEC
Operational Security, or OPSEC, is when we protect unclassified information from leaking out via our own actions and behaviors. The goal of Cybersecurity OPSEC is to minimize your digital footprint /information leakage and to minimize the damage when things go bad. In the best of scenarios you might almost drop off the grid completely. Remember that OPSEC does not replace any other security disciplines - it supplements them.
Always be aware of what your company is presenting to the outside world (what your network looks from outside? Do we have FTP/SSH server accessible from internet? )
Do you know what is on your company’s external webpage and social media feeds?
Are vendors using your company for free advertising?
Are your IP address ranges showing up in Shodan ICS? If you give data to vendors, do you know how they are storing it?
The OPSEC process is categorized into 5 questions/steps. One of the first questions is, who would want access to the data in question, what needs protected?
The OPSEC process
Cybersecurity practices can prevent the disclosure of critical information to threat actors. A primary security goal is to control information about your organization’s capabilities and intentions in order to prevent such information from being exploited. The longer it takes an adversary to obtain critical information, the more time you have to discover problems and block access to the information and your assets. In addition, most of us already use cybersecurity practices in our personal lives without even realizing it.
Cybersecurity practices include:
identifying critical information,
analyzing the threat,
analyzing the vulnerabilities,
assessing risk,
and applying countermeasures. In all steps, view the situation from both friendly and adversarial points of view.
Practicing cybersecurity is a continuous process, not one that “ends” when you complete the fifth step. In fact, the steps do not necessarily have to be followed in a particular order.
Identify Critical Information
What needs to be protected?
What adversaries might want to do? / What information will the adversaries need to accomplish their goal? (Be sure to analyse the from both friendly and adversial point of view.). It is the aggregation of information that can be gathered on a target that poses the threat.
A company critical information could include
Network Diagrams
Employee Data - email addresses and work schedule
List of usernames
Social media profiles are often analysed to aggregate information. Profiles are gold mines of information for attackers. They provide an idea of
What people do?
Where they work?
What type of software are used?
Any issues that can be replicated in corporate environments.
Profile pictures are also useful when gathering information.
Before you post comments or share content on support forums and social media. Ask yourself “Does this give an attacker any information they could use to build a profile (or further build) on me or my company?”
Critical Information ICS Examples
Industrial Process Information
Proprietary Information
Patent details
Results of testing data
Can be used for phone introductions and craft phishing emails.
5 Rules of Thumb
01 Protect sensitive information.
02 Determine what information may be considered sensitive.
03 Consider the who and why.
04 Research yourself.
05 How can this information cause harm? Bring up your concerns to the right people.
Analyse the Threat
Who is the threat?
A threat is a potential danger. It is often defined as any person, circumstance, or event with the potential to cause loss or damage. Threat requires both intent and capability. If one of these isn’t present, there is no threat.
To analyse a threat, we need to identify
Who are the potential adversaries (e.g., competitors, insiders, terrorists)?
What is the adversary’s intent and what capabilities do they have? For example, a disgruntled employee might have different capabilities than a competitor.
What does the adversary already know? For example, what might they know from researching information published on the Internet or in trade journals.
What does the adversary need to know to succeed (e.g., control system commands, how to gain remote or physical access)?
Where is the adversary likely to look to obtain the information (remember, an adversary is apt to go to more than one source)?
Thinking from the adversary’s point of view will help you analyze the threats in your work environment.
Three categories of threat:
natural (fire, flood, torando etc),
unintentional (unintentioal power outage caused by car accident/ accidental deletion of important data by an employee),
intentional (insider threats, disgruntled employee, malicious cyber attack)
Some examples of collecting information are observation of our actions to detect patterns to predict behavior, using the internet to collect data from social media sites (web pages, blogs, chat groups), going through our trash for sensitive documents accidentally thrown out, or prying on conversations in public, social engineering sensitive information via forums or support pages.
Analyse the vulnerablities
What are my vulnerablities?
Determine the weaknesses (that is, vulnerabilities) that may be exploited by an adversary to gain critical information. Vulnerabilities include:
Inadequate training of employees
Use of unsecured communications
Publishing the control system manufacturer or vendor used
Systems designed without security in mind.
It is important to think like the adversary in this step. One way to discover vulnerabilities is to look for indicators.
Indicators are observable or detectable activities or information that, when looked at by themselves or in conjunction with something else, point to a vulnerability regarding your organization’s operations. For an adversary, indicators are clues that a vulnerability exists and can be exploited.
For example, a fence suddenly put up where one did not exist before could tip off an adversary that something valuable is inside the fence.
Other examples of indicators include: people in unusual places, unfamiliar cars in an employee parking lot, and late-night meetings.
Although indicators are not vulnerabilities by themselves, they can point to or reveal vulnerabilities.
Vulnerabilities can be found in many different places such as the physical environment of the work area, policies, and procedures, or when using the internet. An adversary can detect a vulnerability through observation -just watching, such as security procedures we follow as we enter our building. But the biggest vulnerability is us, people, in what we say and do Adversaries can and will exploit the human element to circumvent security. “Hacking” the human instead of the machine.
Access Risk
What is the threat level?
Assessing risk incorporates using the risk formula and conducting risk assessments
Risk is the liklihood that an adversary will gather and exploit your critical information, thereby having a negative impact on your organisation.
Risk is the product of threat x Vulnerablity x Consequence
Threat: Any person, circumstance or event with the potential to cause loss or damage.
Vulnerablity: Any weakness that can be exploited by and adversary or by accident.
Consequence: The negative impact (loss or damage) your organisation would incur if an attack were successful.
Risk increases when any factor increases. If a factor is missing, risk doesn’t exist because zero multipled by anything is always zero.
For example, if we are certain that no person or organisation is interested in causing your company harm, then there is no threat and therefore no risk (this situation is higly unlikely, because there are always people who act maliciously just because they can).
Similarly, if your network and protection devices (e.g. firewalls) are properly patched with all the latest updates, the vulnerability (and the associated risk) in this area maybe greatly reduced.
Finally, if a threat and a vulnerability exist but the consequences are nonexistent or minimal, then the risk is also nonexistent or minimal.
Risk assessment is a process in which you decide if a countermeasure needs to be assigned to a vulnerability based on the level of risk this vulnerability poses to your organization.
When you assess a vulnerability,
also consider the adversary’s intent and capability—is the adversary willing to exploit your vulnerability, and does he or she have the means to do so?
Next, determine the consequences if the vulnerability were successfully exploited. This determines the level of risk.
You then decide if the level of risk warrants the application of one or more countermeasures.
Looking at risk as a function of consequence (as opposed to asset value) may allow for easier calculations applicable to control system environments. Elements critical to the control domain, such as loss of life, time to recover, and environmental impact, can help in these calculations.
Keep in mind that consequences aren’t always something that have an immediate financial impact. The failure of a control system could result in negative media attention.
In cybersecurity, A risk is the potential for loss or damage caused by a threat exploiting a vulnerability. However, Risk is a funtion of Vulnerability, Threat, and Consequence or Impact. If we put Vulnerability, Threat, Consequence in a Vienn diagram then then the intersection of Vulnerability, Threat, and Consequence, is the risk. If you are certain that no person or organization is interested in causing your company harm, then there is no threat and therefore no risk (this situation is highly unlikely, because there are always people who act maliciously just because they can). Similarly, if your network and its protection devices (firewalls for example) are properly patched with all the latest updates, the vulnerability (and the associated risk) in this area may be greatly reduced. If vulnerability increases, then the risk will likewise be increased.
Apply countermeasures
How should we combat the threats?
A countermeasure can be anything that reduces an adversary’s ability to exploit vulnerabilities. Countermeasures don’t need to be complicated or expensive. For example, locking your car door and removing the keys from the ignition are simple, smart ways to make it harder for someone to steal your car.
Countermeasures are implemented in an order of priority directly proportionate to the risk posed by different weaknesses (the most significant consequences to your mission, operation, or activity). Often implementing several low-cost countermeasures provides the best overall protection.
Consider all possible countermeasures, and then assess the potential effectiveness of each one against a specific vulnerability or multiple vulnerabilities.
Few countermeasures:
Controlling Distribution: Limiting sharing of information to those who need it.
Cyber Protection Tools: Implementing anti-virus software, firewalls, and intrusion detection systems can greatly reduce an adversary’s ability to cause damage.
Speed of Execution: Accelerating the schedule can limit the ability of an adversary to act on the information they have obtained.
Awareness Training: Educating employees about all aspects of cybersecurity practices is one of the most effective countermeasures.
Physical Security: While it may be wise to employ security guard patrols, an organization must also ensure that patrol schedules are somewhat randomized, and shift changes are kept secret in order to prevent an intruder from determining a pattern.
An OPSEC program should maintain a generic list of countermeasures that are available for utilization. Some ideas may include changes in procedure, controlling distribution, cyber protection tools, awareness training, and physical security. Countermeasures are designed by cost, timing, and feasibility to address an acceptable amount of risk. Simplicity, straightforwardness, and inexpensiveness are key to the most effective countermeasure solutions.
Secure Passwords
Adversaries focus on gaining legitimate credentials to traverse the network
NIST SP 800-63B Guidelines (Digital Identity Guidelines - Authentication and Lifecycle Management)
Fewer complexity rules enforced
Expiration of passwords no longer based on a time schedule (If the passwords are good and strong, maybe no need to change them every time)
Passwords should be screened again lists of dictionaries and common, easily guessed passwords (mention to employees that we will try to guess and crack their passwords and they will create strong passwords)
Allow paste functionality from Password Managers (also store your passwords in a safe secure location)
Industry compliance documents or your organisation policies may differ.
NERC CIP standards (CIP-007-5)
NIST-800-53
Base password - Think of three of your favorite things.
For example: Let’s say we love icecream, tacos, and vinyl records so that give us MintChipVinylTacos
Now separate each word with your favorite character. Let’s say we love money so separate it with dollar or pound.
$Mint$Chip$Vinyl$Tacos$
.Now, add a familar number like your postal code or reverse birth year like
$Mint$Chip$Vinyl$Tacos$90277
Now, the above password meets all requirements like upper, lowercase, numbers, special characters etc.
The above can be your base password and that’s a pretty strong password.
Now, humans a lot of the time we want path of least resistance, so it will be tempting just to use this new, awesome, password for all of your accounts. Don’t do this!
Make them special!
Know you can make them unique with a special identifier for each sensitive site.
Maybe for Facebook: FB add it at the beginning, end or even split it up.
Although that’s a risk if hacker get your password or understand your pattern but mostly it is a great way to have long unique passwords.
Vendor Access
Vendor connections to the ICS Network
One of the most common ways malware and viruses are introduced into ICS environments is the use of media that has been shared or used on systems outside the production environment.
To mitigate that risk consider implementing the following:
Implement a dedicated workstation to transfer files and patches to trusted devices that is up to date with the latest virus and malware definitions not connected to the ICS network.
Do not allow vendors or 3rd party USB’s in ICS environment (We have no idea who’s USB device it is, where it has been, what it contains?)
Have a device whitelisting application or ability to disable media ports.
Provide security policies to govern use.
Configure your removable media policy to notify your security team of events of when access to USB ports or unapproved media is attempted to be used.
Removable Media
If possible, do not allow personal devices to be used in the ICS network (people charging their phones on ICS network?, malicious USB (USBDucky, OMG Cable and others?))
If this is not possible, provide good security policies to manage the use of personal devices, and use company resources to help implement the policies.
Enterprise device management technology can help ensure that only approved assets can be attached to ICS networks and computers.
Lessons learnt from past incidents
Good network segmentation can prevent malware call backs.
Monitor USB usage especially in the ICS environment (inventory of allowed USB devices, who have them and what they are using it for?).
Secure Authentication
Multi-factor Authentication
An increasing number or organizations are implementing multi-factor authentication to add a layer of protection (defense-in-depth) to security. By requiring a second authentication method in addition to the standard user name/password method, organizations implement a powerful countermeasure.
Definition:
What the user knows (password), what the user has (security token), and/or what the user is (biometric validation).
Something you know (password or PIN) + something you have (such as access token or security token) + something you are (such as fingerprints or retinal scan)
Single factor authentication increases the attack surface.
Use multi-factor authentication for remote access and critical administrative access.
Can be used with VPN, network device access, administrator access to systems.
Example: Many asset owners use single-factor authentication for remote access. If a user has a vulnerable machine, the attack surface is greatly increased.
Secure VPN access
Limit VPN access to business requirements - vendors, technicians, integrators (who has access to what? If providing access to vendor, terminate VPN as close to edge as possible and provide access to only required systems/segmented network/DMZ. Good idea to define that in vendor contract agreements)
Require company issued and configured systems be used without Admin access (No admin access provided until and unless really required).
If they require admin access or access to a particular resource, work with them to figure out how we can provide that securely. Otherwise, technical users will always figure out a way to achieve it which might result in undocumented access.
VPN security policy should check for patches, a personal firewall, and an antivirus product.
Utilise a jump-box, or a virtual desktop for further network access.
Utilise a second domain controller (Have a separate IT/OT domain controller)
VPN Logs
VPN appliance provides a wealth of logging information regarding the perimeter of your network. This information can be used to monitor the health of the system and potentially detect malicious activity. It is important to:
Find unusual login attempts: Look for unusual situations, such as the company President logging in from a Starbucks in England, when the President is actually in the middle of a safari in Africa.
Monitor failed authentication attempts: All devices or processes that require identity authentication should log and/or alert when an identity validation attempt fails.
Monitor successful authentication attempts from different sources: If available, all devices or processes should log and/or alert when the same user logs in simultaneously from two different source locations.
Monitor successful authentication under duress: For critical systems, consider deploying an authentication mechanism that supports duress codes. This allows a user under duress to log into a system using a secondary credential, but alerts that the access was performed under duress.
Monitor failed access attempts: All devices or processes that manage access control to communications, data, or services should log and/or alert when access is requested that is not allowed.
Monitor successful access attempts: All devices or processes that manage access control to communications, data, or services should log when access is requested and allowed.
Lessons Learned
Virtual Machine Use Case
Incident: VM was configured in an ICS environment with the VM hardware (vmware/hardware machine) located in the ICS DMZ. Management interface provided direct connectivity to the corporate network for ease of use. Further, ICS servers in the VM bridged the DMZ firewall to the ICS network
Lesson: Bridged the corporate protected communications to the VM management interface located in the ICS DMZ. Utilize VMware security guidance to setup VMware systems.
VPN/Password Use case
Incident: A user had a VPN connection and was logged in as administrator. The user’s home PC was dual homed with VPN client and a public interface.
Lesson: Proper configuration of VPN client. Limit VPN access to business requirements. Do not allow users to run as admin.
ICS Network segmentation
The Purdue Enterprise Reference Architecture (PERA) Model is suggested by the DHS Assessment Team as a best practice for segmenting networks.
The PERA model segments industrial control devices into hierarchical “levels” of operations within a facility. Using levels as common terminology breaks down and determines plant wide information flow. Zones establish domains of trust for security access and smaller LANs to shape and manage network traffic.
This model groups levels into the following zones for specific functions:
Enterprise Zone: Levels 4 and 5 handle IT networks, business applications/servers (e.g. email, enterprise resource planning - ERP) as well as intranet.
ICS Demilitarized Zone (IDMZ): This buffer zone provides a barrier between the ICS and Enterprise Zones but allows for data and services to be shared securely. All network traffic from either side of the IDMZ terminates in the IDMZ. No traffic traverses the IDMZ. That is, no traffic directly travels between the Enterprise and ICS Zones.
ICS Zone: Level 3 addresses plant wide applications (e.g., historian, asset management, authentication, patch management), consisting of multiple Cell/Area Zones.
Cell/Area Zone: Levels 0, 1 and 2 manage industrial control devices (e.g., controllers, drives, I/O and HMI) and multi-disciplined control applications (e.g., drive, batch, continuous process, and discrete).
Typical Flat network
Poor asset inventory
Poor boundary protection (HMI’s directly connected to the Internet)
Poorly Secured Remote Access
Recommended Secure Network Architecture
Good Asset Inventory and Data flows (How does data flow and what data flow is important/critical (what must always be available))
Good Boundary Protection
Secured Remote monitoring and Access
Isolation of Safety Instrumented Systems (How are safety systems implemented?)
Firewall Implementation
The firewalls are placed at the front line of defense for each of the various zones. These firewalls provide the trusted path for users and applications to communicate with and between all of the various pieces.
There are two complimentary principles for segmenting networks.
The first principle includes the general functions of a system:
Serve external customers
Handle facility environmental controls
Support IT
Process HR data
Run/supervise ICS process data
Run/Supervise ICS
The second principle is trust level.
What is the sensitivity of the data/system/data path?
Segmentation should be implemented using firewalls or at least routers with access control lists (ACLs). Some considerations for firewalls:
Know your environment
How does data flow?
How is data used? (What does that data mean?)
Who uses the data? (Who is the owner of the data? Mostly historian from ICS persecptive)
Newer next generation firewall support multiple ICS protocols/standards.
Trade off efficency vs. security vs. cost (Every device can provide or hinder efficency or has a cost to it)
Erroneously deployed as a cornerstone of architecture (requires month of planning/architected)
Firewall Rules
Without rules, firewall is basically a router.
Block direct traffic from the control network to the corporate network. All ICS traffic should end at the DMZ.
Every protocol permitted between the control network and the DMZ should be explicitly denied between the DMZ and corporate networks (and vice versa).
ICS networks should not be connected directly to the Internet, even if they are protected by a firewall.
Firewall Logs
Firewalls logs provides insights into security threats and traffic behaviour regarding the perimeter of your network. Information can be used to monitor the health of the system and potentially detect malicious activity. It is important to:
Identify traffic denied at the firewall - e.g. traffic from inside the network that is bouncing off the firewall (what traffic is trying to get out?)
Identify traffic allowed at the firewall
Identify multiple connections from multiple devices in your network to a few target locations
Data Diode
A data diode is a unidirectional gateway intended to move data from a more secure network to a less secure network.
A data diode creates a physically secure, one-way communication channel from the control system network to the corporate network. Data diodes can be implemented in hardware, software, or a combination of both. The hardware implementation is the most secure because it is physically impossible to send any messages in the reverse direction.
Data Diode vs. Firewalls
Data Diodes
Behaves like a Proxy Server: converts TCP sessions to UDP
Uni-directional communication: reverse tunneling not possible
May cost more than some firewalls
Fewer rules: rules require less auditing
Transmits only the data: no connection between systems.
Firewalls
Two-way communications: tunneling possible.
Rules require more auditing due to complexity of rule set
Cannot create a one-way communication. UDP is one way. Does not create anything but one way.
Patch Management
BEFORE PATCHING ANY ICS/OT SYSTEM (PLC/RTU/HMI) ENSURE YOU HAVE A GOOD BAREMETAL BACKUP OR ABILITY TO RESTORE THE SYSTEM TO THE CURRENT STATE!
Patches are intended to:
Fix known vulnerablities.
Enhance functionality
Software that needs patching includes
Operating System
ICS Application/hardware
Third-party applications
Patch deployment considerations
Test and validate
Offline systems vs. live systems
Work with vendors for patch applicability.
Patching Considerations
Considerations when deciding to patch systems:
How critical is each system to production?
What complications arise in patching critical infrastructure?
What is the cost of a patch?
What is the cost of not applying a patch?
What is the businesssecurity driver in patching?
Do you have a mitigating control in place if you decide patching is not an option?
Potential Patch Complications
Patching can break other software components
Patching can break 3rd party software components
Updating antivirus definitions can inadvertently stop legitimate processes
Sand box systems are not used directly for production
Balance in waiting to test the patch and applying a patch before it is fully tested
Systems remain vulnerable until they are patched, or mitigating controls are implemented.
Application whitelisting
Advantages
Blocks most current malware
Prevents use of unauthorized applications (have good software inventory. Process environment is very predictable)
Does not require daily definitions updates
Administrator installation and approval of new applications.
Limitations
Approved applications - compromised in supply chain.
Malware that exploits application that run in higher-level execution environments such as Java may not be found.
Disadvantages
Requires performance overhead
Requires regular maintainence
Causes some users to be annoyed
Detect
Identify a cybersecurity event
Intrusion Detection System
ICS environments provide a unique opportunity. Compared to a corporate environment, an ICS environment is a steady state. Once again, you must know your environment. Ask and answer the following questions:
WHAT is normal? (Is this documented?)
You know that host “A” talks to host “B,” but not host “C”…
WHEN does “normal” become abnormal? (indicators that something might be going on?)
Host “A” is now talking to host “C”…WHY?
WHOSE applications and services are on your critical networks?
WHICH protocols are used?
Known IT protocols (DNS traffic, HTTP traffic)
Vendor (Proprietary traffic)
IDS Types
Host: Sensors reside on the host system
Network: What traffic is on your network?
Application: Web application firewall, database, firewall, application protocol IDS.
Log: What is happening at the OS level? or at the application level?
Paper: Who came in?
Anomaly: Any combination of the above.
All methods of intrusion detection involve the gathering and analysis of information from various sources within a computer, network, and enterprise to identify possible threats posed by hackers inside or outside the organization.
IDS/IPS Functions
An IDS is not a cure‐all for network security problems. It is an alerting tool to let you know something has happened. An IDS can:
Provide forewarning
Provide forensics data
Provide “situational awareness”
Provide network troubleshooting
Identify policy abuse.
Placing an IDS outside of the firewall can be helpful for situational awareness and forewarning of activities. The IDS can detect scanning or other precursory attack activities that might be dropped by the firewall. An IDS cannot:
Tell you directly if the system was exploited
Monitor actions taken by the system console
Perform analysis of an event (requires human being to analyse ).
HIDS
Host-based intrusion detection (HIDS) refers to intrusion detection that takes place on a single host system. HIDS involves installing an agent on the local host that monitors and reports on the system configuration and application activity. Some common abilities of HIDS systems include:
Provides the “victims” view
Virus detection/mitigation
Local log analysis
File integrity checking
Policy monitoring
Rootkit detection
Network monitoring from the host viewpoint
Real-time alerting
Active response.
HIDS often have the ability to baseline a host system to detect variations in system configuration. In specific vendor implementations, these HIDS agents also allow connectivity to other security systems. This allows for central management of configuration policy and verification.
HIDS Deployment
HIDS tools are initially deployed in “monitor only” mode. This enables the administrator to create a baseline of the system configuration and activity. Active blocking of applications, system changes, and network activity is limited to only the most egregious activities. The policy can then be tuned based on what is considered “normal activity.” Once a policy is configured, it is then applied and distributed to the hosts. Benefits of central management architecture are:
Can be centrally managed with deployable policies.
Ability to apply changes to many systems at once
Create a “baseline” for known system types/use cases
Central authentication, alerting, and reporting
Central audit logging.
The main two concerns with using any HIDS in an ICS environment are:
Does Operating System even support the use of a HIDS?
Do the hosts have enough hardware capacity to support the HIDS (CPU, memory, network bandwidth, etc.)
Network Intrusion Detection (NIDS)
NIDSs scan traffic from its networks and look for known patterns in traffic (packets).
A NIDS can scan both sides of a conversation and can be reactive by blocking traffic when in IPS mode.
NIDS often does not know if the system is Windows, Linux, or a PLC. From a NIDS perspective traffic is traffic, and it simply reports on what traffic is seen on the network.
NIDS can have a high False-Positive or False-Negative rate based on the information used to generate the signatures.
NIDS are connected to the network via a SPAN/mirror port or a network tap.
When using a SPAN port, the switch sends a copy of all the network packets “seen” on one physical port (or an entire VLAN) to another physical port, where the packets can be captured and/or analyzed.
A networking monitoring tap can be used to collect network packets without having to configure a span port on a switch. Think of a tap as a special T‐connection that can read data from the network, but not inject any data of its own into the network traffic.
IDS Sensor Placement
The placement for IDS sensors is important.
Any change in trust zones should have an IDS/IPS deployed
A data diode should be attached to the historian. The IDS can also be deployed here
All points of presences for the external communications should have an IDS/IPS deployed
An IDS on either side of firewalls allows you to audit your firewall rules.
NIDS Signature vs. Anomaly Detection
Signature |
Anomaly |
---|---|
Ex. Snort, Mcafee |
|
Watches for specific events |
Watches for changes in trends |
Only looks for what it has been told |
Learns from gradual changes |
Can deal with any known threat |
Can deal with unknowns, but any attack is subject to false-negative (Doesn’t know what attacks are, just know it’s change in traffic) |
Unaware of network configuration changes |
Sensitive to changes in network devices |
Highly objective inspection |
Subjective, prone to misinterpretations |
Predictable behavior |
Unpredictable behavior |
Easy to tune manually |
Netflow Anomaly Detection
NetFlow is a network protocol developed by Cisco Systems for collecting IP traffic information. NetFlow has become an industry standard for traffic monitoring and is supported by platforms other than Cisco. Routers and switches that have the NetFlow feature enabled produce UDP data streams that are sent to a NetFlow collector (server) where it can be processed and stored.
Describes a set of packets sharing these characteristics: src, sport, dst, dport, protocol, type of service.
Data include: time, number of bytes, number of packets
Usually sent via UDP or Stream Control Transmission Protocol
Distributed Denial of Service
Massive increase in flows
Trojan Horses
“Well-known” or unexpected services
Firewall Policy Violation
Unexpected inside/outside flow
Example Alerts for Anomaly Detection
Hosts scanning for services:
Are there external hosts poking at more than __ internal addresses?
Are there external hosts poking at more than __ ports on 1 (or more) internal hosts?
Internal infected host scanning/talking to for external hosts:
Is some internal host poking at __ external hosts?
Is some internal host poking at __ internal hosts?
Is some internal host poking at dark space (un-allocated Internet address space)?
Internal hosts talking to “Interesting Net blocks” (pick your favorite countries here)
Are there pokes from __ net blocks that may be of interest?
Are there pokes to __ net blocks that may be of interest?
Increased network traffic:
Distributed Denial of Service (DDOS)
Unexpected high volume - Data mining, egress?
Zeek IDS
Open-source
Allows scripting of monitoring policies
Collect logs for analysis (Non-standard ports, Connections, DNS, FTP, Files, HTTP requests, SSL, SMTP activity).
Analyzers for many protocols including Modbus and DNP3
Unexpected protocol level activity.
Logs can be used by several other security products.
IDS vs. IPS
IDS
Watching/ Passive alerting
IPS
Inline, Passive Alerting, Active Response
SNORT
Snort is an open-source network intrusion detection and prevention system. Snort is widely used and has become the standard for IDS/IPS.
Learning to write Snort rules is useful because most IDS/IPS applications will either use the Snort rule format or provide a way to import Snort rules.
If you are able to understand the data flow in your environment, you will be able to design simple anomalous traffic signatures quickly without regard to the actual details of the protocol used.
Snort rules are composed of a rule header and rule options. There are five types of rule options:
Metadata
Payload detection
Non-payload detection
Post-detection
Thresholding and suppression
We will focus on Metadata and payload detection
alert ip ![10.0.10.20, 10.0.10.30] any <> [10.0.10.15] any (msg:"ALERT - Field Controller interacts with another node"; reference:url,mysite.org/rule1; reference:cve,2018-0000;sid:3000001;priority:1;rev:1;)
action |
alert, log, pass, active, dynamic, or a custom defined type |
protocol |
ip, tcp, udp, icmp, any |
src ip and src port |
See below |
direction |
->, <> direction of the traffic that the rule applies to |
dst ip and dst port |
See below |
Msg |
Used by analyst to quickly identify the signature |
Reference |
Can use a predefined tag for a security web site or use “URL” to include any web site reference in the rules |
Sid |
The signature ID is used by Snort to uniquely identify rules. We recommend using a number > 3,000,000 |
Priority |
Allows the user to set the priority of the rule. Highest - 1, Lowest - 10 |
Snort Preprocessors for ICS
A number of attacks cannot be detected by signature matching alone in the detection engine, so protocol “examine” preprocessors step up to the plate and detect suspicious activity. These preprocessors include packet fragmentation, TCP stateful inspection, portscans, and many other Network/Application protocol‐specific activities.
Others modify packets by normalizing traffic so that the detection engine can accurately match signatures. These preprocessors defeat attacks that attempt to evade Snort’s detection engine by manipulating traffic patterns.
Snort cycles packets through every preprocessor to discover attacks that require more than one preprocessor to detect them. If Snort simply quit checking for the suspicious attributes of a packet after it had set off a preprocessor alert, attackers could use this deficiency to hide traffic from Snort.
Preprocessor parameters are configured and tuned via the snort.conf file. The snort.conf file lets you add or remove preprocessors as you see fit. Of particular interest to the ICS community are the DNP3 and Modbus preprocessors.
ICS Specfic: DNP3/Modbus
Other useful preprocessor: SSH, SSL, Portscan, httpinspect
DNP3 Preprocessor Rule Options
dnp3_func: Matches Function Code inside an Application-Layer request/response header
dnp3_ind: Matches on the Internal Indicators flags in Application Response Header (Similar to TCP flags)
dnp3_obj: Matches on request or response object headers
dnp3_data: Reassembled Application-Layer Fragments.
DNP3 Preprocessor Examples
Here are some examples of the new DNP3 preprocessor rule options:
Alerts on DNP3 Write Request:
alert tcp any any -> any 20000 (msg:"DNP3 Write request"; dnp3_func:write; sid:3000001;)
Alerts on reserved_1 OR reserved_2 being set:
alert tcp any 20000 -> any any (msg:"Reserved DNP3 Indicator set"; dnp3_ind:reserved_1,reserved_2; sid:3000002)
Alerts on Content in Re-assembled Application-Layer Fragment:
alert tcp any any -> any any (msg:"badstuff' in DNP3 message"; dnp3_data; content:"badstuff"; sid:3000003;)
Notice in the third rule, dnp3_data sets the content buffer to the beginning of the Re-assembled Application-Layer Fragment then looks for the content: “badstuff”
Modbus Preprocessor Rule Options
modbus_func: Matches against the Function Code inside of a Modbus Application-Layer request/response header
modbus_unit: Matches against the Unit ID field in a Modbus header
modbus_data: Sets the cursor at the beginning of the Data field in Modbus request/response
Modbus Preprocessor Rule Examples
Alerts on specific Modbus function:
alert tcp any any -> any 502 (msg:"Modbus Write Coils request"; modbus_func:write_multiple_coils; sid:3000004;)
Alerts on unauthorized host
var MODBUS_ADMIN 192.168.1.2
alert tcp !$MODBUS_ADMIN any -> any 502 (msg:"Modbus command to Unit 01 from unauthorized host"; modbus_unit:1; sid:3000005;)
Alerts on Content in modbus data field
alert tcp any any -> any any (msg:"String 'badstuff' in Modbus message"; modbus_data; content:"badstuff"; sid:3000006;).
Example Rule Variables
ipvar HOME_NET [1.2.3.0/24,10.0.10.0/24]
ipvar EXTERNAL_NET [!HOME_NET]
ipvar CANARY 1.2.3.4
ipvar PCS [10.0.10.0/24]
ipvar CORP [1.2.3.0/24]
ipvar HMI [10.0.10.20,10.0.10.30]
ipvar AD 1.2.3.20
ipvar FC 10.0.10.15
ipvar HIST1 [10.0.10.150]
ipvar CONFDB [10.0.10.10]
portvar TAG 2000
portvar TAG_RANGE [2000:2020]
Example Rules
Field Controller (FC) talking to unknown system
alert ip ![$HMI,$HIST1,$CONFDB] any -> $FC any (msg:“ALERT - Field Controller interacts with unknown node"; sid:4000001; priority:1; rev:1;)
Configuration Database talks to unexpected system
alert ip [$CONFDB] any -> ![$FC,$HMI,$HIST1] any (msg:“ALERT - Configuration DB Communicate with new system; sid:4000002; priority:1; rev:1;)
PCS network communication with CORP network, trying to bypass the firewall
alert ip [$PCS,!$HIST1] any -> $CORP any (msg:”PCS network talking to CORP network”; sid:4000003; priority:1; classtype:unknown;)
Configuration Database updates (auditing tool)
log ip [$CONFDB] any -> [$FC,$HMI,$HIST1] any (msg:“AUDIT - Configuration Updates; sid:4000004; priority:10; rev:1;)
LOOKING FOR BAD TRAFFIC
Find traffic involving a canary
alert ip any any <> $CANARY any (msg:”The canary is talking”; sid: 4000005; priority:1; classtype:unknown; tag:session,256,packets;)
Monitor for the Field Controller talking to the Internet
alert tcp $FC any -> $EXTERNAL_NET any (msg:”PLC talking to the outside world”; sid:4000007; priority:1; flags:S; classtype:bad-unknown;)
Monitor for AD attempting to connect to the Internet
alert tcp $AD any -> $EXTERNAL_NET any (msg:”AD attempting to talk to the outside world”; sid:4000008; priority:1; flags:S; classtype:bad-unknown;)
Command shell on HMI
alert ip any any -> $HMI any (msg:”cmd.exe on HMI”; content: “cmd.exe”; sid:4000009; priority:1; classtype:unknown;)
Log Sources and Management
Logging Architecture
A central log server can assist in an incident by providing a chronological list of the events surrounding an incident that give the bigger picture.
Multiple systems/sources can send their data to a central log server where it can be correlated with other information.
Correlating with other logs can sometimes make the difference between recognizing an event for what it is (true or false) and then acting accordingly. The same data can provide valuable information (such as an IDS) to the security analyst.
There are some considerations in centralizing logs:
Properly prioritize the function of log management. Define requirements and goals for log performance and monitoring based on applicable laws, regulations, and existing organizational policies. Then, prioritize goals based on balancing the need to reduce risk with the time and resources necessary to perform log management functions.
Create and maintain a secure log management infrastructure. Identify the needed components and determine how they will interact (e.g., firewall rules, diodes). With the various types of information in one place, the log server becomes a valuable system to target a critical system to protect. It should only run the logging service and be in a highly protected area of your network.
Provide appropriate support for staff with log management responsibilities. All efforts to implement log management will be for naught if the staff members who are tasked with log management responsibilities do not receive adequate training, proper tools, or support to do their jobs effectively. The staff members need to understand what situations are normal, bad, and weird. Providing log management tools, documentation, and technical guidance are all critical for the success of log management staff.
Log sources
Firewalls
VPN Servers (maybe part of firewall logs)
Operating Systems (e.g Windows, *nix, Mac)
Proxy Server
Web Servers (e.g. IIS, Apache, NGinx)
Databases (e.g. MS SQL, Oracle, MySQL)
Others (e.g. PLCs, HMIs)
Log Transport
syslog
Defacto standard in IT community
Use UDP/TCP
Data diode can be used
Encryption can be used
Third-party tools maybe necessary for some OS or applications.
Operating System Logs
Operating system logs can be used to monitor the health of the system and detect malicious activity
Windows OS
Security Log
System Log
Third-party agent to send logs to a remote server.
Linux/Unix OS
Syslog transport part of OS
auth.log, messages
Security Audit Logging Web Server Logs
Review daily to determine a baseline
Web server logs will show:
who visited the website
when they visited the website
what they did while viewing the website (including SQL queries)
Where they came from?
Security Audit Logging Database Logs
User logins and logouts
Database system starts, stops and restarts
Various system failures and errors
User privilege changes
Database structure changes (tables that has been deleted/data that has been changed)
Most other DBA actions; and
Select or all database data access (if configured to be so)
Security Information and Event Management
Capabilities
Data aggregation
Correlation
Alerting
compliance
Forensics analysis
Honeypots & Canaries
Decoy systems (sit on your network and try to replicate how your network looks like)
Variant of an IDS
Any traffic seen talking to a Honeypot could be considered malicious
Open-source ICS Honeypots are available: Conpot
Canaries (doesn’t communicate with any other system on your network. If an IDS is watching for ANY traffic to/from the canary, you will get an early warning that something is going on that shouldn’t be).
Respond and Recover
Execute activities taken during and after a cybersecurity event.
The Respond Function supports the ability to contain the impact of a potential cybersecurity event. Examples of outcome Categories within this Function include: Response Planning; Communications; Analysis; Mitigation; and Improvements.
The Recover Function supports timely recovery to normal operations to reduce the impact from a cybersecurity event. Examples of outcome Categories within this Function include: Recovery Planning; Improvements; and Communications.
Incident Respond Phases
Preparation –> Identification –> Containment –> Clean-up and Recovery –> Follow-up
Preparation
Build your team
Plan your response
Secure and alternate methods of communication.
Scribe(s) for each group within the team.
Securable room where you can keep accurate and complete information
access to ALL of the logs and data.
Known, certified clean computer systems to do forensics.
Person with the authority to unplug from the internet (maybe your manager, CEO?)
Define your strategy.
Create documentation
Train your teams and users
A practiced plan
Gather threat intelligence
Feeds & threat reports
Yara rules and indicators of known malware (know whats going on in the world)
Use a checklist for starting point
Compliance and safety officers should review the IR plan.
Incident Response Team
Senior Technical staff
Lead and Forensics Analysts
Scribe(s)
Stakeholders from:
Corporate IT
Control Systems
Subject Matter Experts
Public Relations
Legal Counsel
Law Enforcement (if necessary)
IT and/or financial auditors (optional)
Identification
Starts when incident is detected (snort/log alert?)
Forensics tools
Use the intelligence gathered
Thorough analysis of logs and network traffic
Containment
Find the call back addresses
Stop the information flow leaving the network
Stop the malware from spreading
Clean-up and Recovery
Remediation
Intrusion Clean-up
Affected system back-in service
Follow-up
Incident report
Lessons Learned
Update incident response plan
update threat intelligence
Implement new security initiatives
Network Forensics
Main purpose: Incident response and Law Enforcement
Items to analyse in packet Captures
Pattern matching - match specific values
Conversations - identify all sessions of interest
Exports: export sessions of interest
Tools used in network forensics
Wireshark, Network Miner, Tcpdump/windump, tcpflow, tcpxtract, argus, YARA, others.
YARA
Main purposes: to help identify and classify malware samples
Yara Rules
consists of a set of strings and boolean expressions
can be found in security alerts and bulletins
can be used by different security tools
Cybersecurity Practices
Incorporating cybersecurity practices into your daily life can prevent the disclosure of critical information (CI) to potential adversaries. If you’re thinking, “But I work in a control system environment; control systems don’t store CI,” then consider our definition of CI:
Information that if disclosed would have a negative impact on an organization. It includes not only trade secrets and technical specifications, but also sensitive information such as the processes used by systems (e.g ., commands and access points), financial data, personnel records, and medical information.
CI also refers to the information that protects assets, such as passwords to access systems or passcodes to enter a building or room. Recipes, formulas, and strategies are usually CI. Even information such as your name, phone number, and email address—especially when all three of these information pieces are together—may be considered sensitive, because it helps an adversary launch a social engineering or phishing attack. In control system environments, the result of CI disclosure may be severe economic impact or loss of life.
Why Do It?
You probably incorporate cybersecurity practices in your personal life without even realizing it. For example, when you have prepared to go on a trip, have you ever done any of the following?
Stopped newspaper deliveries so newspapers wouldn’t pile up outside, letting people know you aren’t home?
Had your mail held by the post office or asked your neighbor to pick up your mail so the mailbox would not fill up?
Connected your porch lights and inside lights to a timer or light sensor so they would go on and off to make it look like someone is home?
Left a car parked in the driveway?
Had someone keep the lawn trimmed?
Asked a friend or neighbor to periodically open and close blinds or curtains?
The CI here is obvious - we do not want a burglar or other “bad guy” to know the house is unoccupied. The more clues we provide to an adversary that the house is unoccupied, the more likely it is the house will be robbed. The same holds true at work. We must reduce or obscure indicators to protect our critical information.
Information collection techniques
Who are these adversaries?
They may be competitors, criminals, spies, unhappy employees, terrorists, or troublemakers. They may be motivated by money, revenge, or political beliefs, to name a few.
There are numerous ways adversaries collect information. Some of the more common methods include social engineering, phishing, accidental disclosure, googling, and dumpster diving.
Social Engineering
Social engineering is a collection of techniques used to manipulate people into revealing sensitive or other critical information. Those who engage in social engineering rely on the humans’ natural tendency to trust. In fact, it’s often easier for an adversary to obtain information by simply asking the right questions than using technical hacking methods.
Social engineering is sometimes conducted by phone. The caller may pretend to be someone in a position of authority or a telephone or computer technician, gradually pulling information out of the targeted person. Often the adversary will call several employees and piece together enough information to launch an attack. Help desk employees are often targeted by an adversary because they’re trained to be friendly and provide information.
Social engineering can also occur through online social forums, at professional conferences, and at non-work social events, to name a few examples.
The first objective of an adversary attempting social engineering is to convince you that they are in fact a person that you can trust with critical information.
Phishing
Phishing scams may be the most common types of social engineering attacks used today. Most phishing scams demonstrate the following characteristics:
Seek to obtain personal information, such as names, addresses, and social security numbers.
Use link shorteners or embed links that redirect users to suspicious websites in URLs that appear legitimate.
Incorporate threats, fear, and a sense of urgency in an attempt to manipulate the user into acting promptly.
Some phishing emails are more poorly crafted than others, to the extent that their messages often exhibit spelling and grammar errors; but these emails are no less focused on directing victims to a fake website or form where attackers can steal user login credentials and other personal information.
If you receive a suspicious email, normally the best defense is to ignore and delete the message. Your organization may have specific procedures to deal with suspicious email and web pop-ups.
Do
Report impersonated or suspect email.
Be cautious about opening attachments, even from trusted senders.
Take your time. Resist any urge to “act now” despite the offer and the terms.
Restrict who can send mail to email distribution lists.
Check financial statements and credit reports regularly.
Don’t
Send passwords or any sensitive information over email.
Click on “verify your account” or “login links” in any email.
Reply to, click on links or open attachments in spam or suspicious email.
Call the number in an unsolicited email or give sensitive data to a caller.
Put critical information on a website, ftp server, social media etc.
Dumpster Driving
Dumpster diving is the act of rummaging through commercial or residential trash and recycle bins to find useful items (including information) that have been discarded.
At your workplace, adversaries may search for proposal drafts, financial data, architectural designs, and personnel data, both on paper and media such as thumb drives. Bear in mind that dumpster divers aren’t just looking for formal documents—Post-it® Notes, and scraps of notebook paper often contain phone numbers, passwords, and other critical information.
Take care with information that is no longer valuable to you because it may have tremendous value to someone else. Follow your organization’s policies and procedures on proper disposal of information and equipment when they are no longer needed. The following are some common practices:
Shred paper documents, using a cross-shredder if possible.
Whenever possible, sanitize or physically destroy hard drives and other electronic devices that store information.
For devices that cannot be sanitized, physically destroy them.
Wireless Security
Devices such as refrigerators, TVs, coffee makers, etc. now have the ability to connect to the Internet, play music, send pictures, alert you of problems, etc. With the inception of these devices, life has never been more convenient. However, these modern-day conveniences can pose some security issues if left unprotected.
Incorporating wireless security practices such as password protection and Wi-Fi encryption can prevent unauthorized access or damage to devices through wireless networks. Examples of encryption types include:
WPS: Wi-Fi Protected Setup uses an 8-digit code to protect the passing of a secret key between two parties (usually the access point and the connecting device such as a laptop, smart phone, or tablet).
WEP: Wireless Encryption Protocol (WEP) was developed many years ago and has proven to be weak and easily breakable.
WPA: Wi-Fi Protected Access (WPA) was developed as a second-generation to WEP. Additional encryption was applied to the same algorithms. Unfortunately, it is not much stronger than WEP.
WPA2: Wi-Fi Protected Access version 2 (WPA2) is a complete rewrite of the algorithm. The current version has the most encryption and is most implemented.
Best Practices for using public Wi-Fi
Think before you connect. Before you connect to any public wireless hotspot – like on an airplane or in an airport, hotel, or café – be sure to confirm the name of the network and login procedures with appropriate staff to ensure that the network is legitimate. Cybercriminals can easily create a similarly named network hoping thatusers will overlook which network is the legitimate one. Additionally, most hotspots are not secure and do not encrypt the information you send over the Internet, leaving it vulnerable to cybercriminals.
Use your mobile network connection. Your own mobile network connection, also known as your wireless hotspot, is generally more secure than using a public wireless network. Use this feature if you have it included in your mobile plan.
Avoid conducting sensitive activities through public networks. Avoid online shopping, banking, and sensitive work that requires passwords or credit card information while using public Wi-Fi.
Keep software up to date. Install updates for apps and your device’s operating system as soon as they are available. Keeping the software on your mobile device up to date will prevent cybercriminals from being able to take advantage of knownvulnerabilities.
Use strong passwords. Use different passwords for different accounts and devices. Do not choose options that allow your device to remember your passwords. Although it’s convenient to store the password, that potentially allows cybercriminals into your accounts if your device is lost or stolen.
Disable auto-connect features and always log out. Turn off features on your computer or mobile devices that allow you to connect automatically to Wi-Fi. Once you’ve finished using a network or account, be sure to log out.
Ensure your websites are encrypted. When entering personal information over the Internet, make sure the website is encrypted. Encrypted websites use https://. Look for https:// on every page, not just the login or welcome page. Where an encrypted option is available, you can add an “s” to the “http” address prefix and force the website to display the encrypted version.
Information Protection
Identify several methods to protect critical information.
Refer Passwords
Refer MFA
Remote Access
Any device that remotely connects to the corporate or control system network provides an opportunity for an adversary to gain access to the device and attack your network.
One preferred defensive method is the use of security tokens. The security token displays a number consisting of six or more alphanumeric characters (sometimes numbers, sometimes combinations of letters and numbers, depending on vendor and model). This number normally changes at pre-determined intervals, usually every 60 seconds. When it is combined with a password, the resulting passcode is considered to be multi-factor authentication.
To ensure this countermeasure is effective, you should never share your security token with anyone else. You should keep it locked away or on your person at all times.
Other examples of “something you have” are smart cards and USB tokens. “Something you have” methods use readers or scanners installed on a device such as a computer. They are effective because they use a unique trait (such as fingerprint) to identify an individual.
Internet and Intranet Access
Your organization probably has policies about what can and cannot be put on public Internet websites. It may even have a review process to ensure sensitive data are not publicly available. However, sometimes seemingly benign information can reveal more information about your organization than it should.
For example, do your job postings mention the control systems and other equipment used? If so, this may be a piece of information an adversary can use in planning an attack.
Also consider information about your organization on other companies’ websites. Do your vendors’ press releases list where they have deployed their products? Do they publish their products’ manuals (which include control commands) on the Internet? A diligent adversary will gather information in as many ways from as many different sources as possible. A simple web search may reveal far more than you might think.
Do not forget about internal Internet sites. Remember that threats often come from within an organization. Critical information such as network diagrams and proprietary software code should not be made available to anyone without a need-to-know. Think twice before you publish anything on the Internet or Intranet—and if in doubt, leave it out!
Sanitation, Destruction, and Reuse
Sanitization permanently removes all data from equipment (such as a computer’s hard drive) by overwriting the data to make it unreadable.
Destruction means physically demolishing media to prevent recovery of any of its information.
Reuse refers to transferring equipment to another employee or an outside entity.
Organizations vary widely in requirements for sanitization. At one extreme, some organizations require all equipment with memory or storage devices must be sanitized before being transferred (even to another staff member) or disposed of. At the other extreme, some organizations have no policy in effect at all.
If your organization does not have a specific policy—or has a lax policy—at least, you should consider the criticality and sensitivity of the information on the device, and determine if it should be sanitized or destroyed before transferring or disposing of it
Device Candidates for Sanitization
Any equipment with a storage device needs to be sanitized in certain circumstances. Such devices include:
Desktop and laptop computers
Personally owned equipment that has processed company information
Smartphones
Desk phones that store telephone numbers
Programmable logic controllers (PLCs)
Copiers
Fax machines
Many scientific instruments
Media such as USBs and removable hard drives
How to Sanitize your Data
When you “permanently” delete files, the operating system makes the space available for future use. New data will eventually overwrite the old data (the “deleted” files), but until those data are overwritten, they can be recovered by someone with the right tools and know-how. Similarly, when you reformat a hard drive, the original data are still there in raw form and can be recovered.
Deleting files, emptying the Recycle Bin, and reformatting the hard drive are not enough!
Sanitization makes the data unrecoverable by overwriting the data. Fortunately, there are tools available to make this fairly easy, at least for standard desktop and laptop computers.
Protecting Critical Assets
State ways to physically protect critical assets at work, home, and while traveling.
The traditional physical security measures of “guns, guards, and gates” are no longer enough for today’s organizations. Many control system environments have effective physical security measures in place in addition to the traditional “three Gs” listed above.
For example, additional measures could be the use of camera monitoring, electronic entryways that deny access to anyone without the proper credentials, and keypad locks. However, physical protection and control are also the responsibility of individual employees.
This section covers protection measures you can take at work, when traveling, and at home.
Protection Measures At Work
Being vigilant is key to physically protecting information assets. Some of your responsibilities may include:
Know your environment and take appropriate action when something is out of the ordinary.
Be aware of who is behind you (and who may try to “piggyback”) when you are entering a restricted area.
Limit access to systems you are responsible for to those who have a need-to-know.
When appropriate, use a password-protected screensaver or some other lockout method when leaving a system unattended.
Close and lock your office door when you leave for extended periods.
Supervise the use and maintenance of your systems.
Do not leave critical documents or systems (including systems that store critical information) unattended in a publicly accessible area (such as a conference room or building lobby).
Protection Measures When Travelling
When you’re traveling, your information and computer systems (e.g., laptop, smartphone, etc.) are at even greater risk of theft or unauthorized access. Take the following precautions when traveling:
Do not leave systems unattended during travel. If possible, transport your systems in your carry-on bags instead of checked bags.
Pay attention when going through airport security. Thieves may be able to steal your laptop while you are focusing on getting through the security checkpoint.
Whenever possible, don’t leave systems in an unattended hotel room. If you are unable to take your system with you, use the hotel safe if one is available.
Avoid accessing critical information on your laptop or other device on the airplane or other public places. If you must access critical information, use screen filters to prevent the information from being read by others.
Protection Measures At Home
If you use or store work computer systems or information at your home, provide the same level of physical protection that you would at work.
Do not allow others without a need to know to access or use your system or information.
Ensure your home is secure when leaving systems and data. If possible, store the system and data in a locked room or locked storage container when unattended.
Do not leave systems or storage media in your vehicle.
Report the theft of company property from your home in accordance with your organization’s policies.
Defense-in-Depth Approach
Defense-in-depth refers to the use of multiple techniques to help mitigate the risk of one security measure being compromised or circumvented. These techniques are often a combination of information protection and physical protection measures.
One example is a building with an electronic card reader to permit and deny access, and a receptionist in the same building who checks credentials before allowing access. An additional defensive measure would be training all employees to verify building occupants are authorized to be there. With every measure that is added, security becomes “deeper” and risk is lessened.
Maintaining integrity
Identify specific ways to maintain integrity in secured areas.
What is and is not allowed in a secured area, such as a control system environment, varies from organization to organization. This section will cover some of the most common equipment do’s and don’ts.
Computers
In many control system environments, computers that are not needed for control system operations are not allowed in the control room. One reason for this is that email, websites, and files from home are common sources of malware (viruses, Trojan horses, spyware).
Some organizations do not have Internet connections within the control rooms and may allow limited use of computers not related to control room operations within them. When an Internet connection is allowed, it should be on a separate computer for an explicit purpose.
If a laptop is brought into the control center (for example, to install an upgrade), it should be scanned for malware before being connected to a control system device.
Know your organization’s restrictions and adhere to them.
Corporate Security Hole: Employees Forwarding Email to Personal Accounts
Employees forwarding their work email to web-accessible personal accounts is a growing problem. When away from the corporate network, accessing email from these accounts is usually faster and easier than going through the corporate remote email solution.
Only software related to control systems should exist on computers on the control system network. Operating system extras, such as games and any other unneeded software, should be removed.
Many word processing and spreadsheet programs have the ability to run macros, which makes it possible for malicious code (Trojans and other malware) to run and infect a system and any systems connected to it. Do not run macros unless the file comes from a trusted source. Similarly, malicious websites can install malware on a computer without your knowledge
Additional guidance for applications in the control room
If Internet access is needed to run the control system environment, then it should be accessed from a different network from the control system network.
If Internet traffic is allowed into the control system network (for example, to download software and firmware upgrades), it should be restricted to a single dedicated system, not to control systems. Any downloads should be scanned for malware before installation on a control system device.
Internet traffic should never be allowed out of the control system network.
Removable Media
USB flash drives are a wonderful invention. You can transport large files to a customer’s office and access the data without worrying about compatibility. You can take work home, and you can travel with just the flash drive instead of lugging a laptop around. However, flash drives also present many risks.
Malware. Organizations can greatly reduce the spread of malware on their network by installing antivirus software on email servers and prohibiting certain websites, but the use of flash drives can bypass these safeguards.
Data theft. Any unattended and unlocked computer with a USB port is an easy target for an adversary with a flash drive.
Data Loss. The portability of flash drives also increases the potential for lost data falling into the wrong hands. Most of these devices have little or no security features. If you happen to lose your flash drive, anyone who finds the device may be able to access its data.
Removable media need to be treated with great care. These devices can be inserted into a control system or other system, and either accidentally or intentionally transmit malware or interfere with the system’s function. To prevent malware, the following precautions should be taken:
Media should come from a reputable source such as an employee or trusted vendor.
Media should be scanned for malware before being connected to any device in the control system environment.
Media contents should be reviewed before connection to a control system device.
Removable media include the following:
USB flash drives
MP3 players
digital cameras
removable hard drives
magnetic tapes
Vistors
If you are hosting or otherwise responsible for a visitor, you should ensure the visitor complies with your organization’s policies. For example, it is rarely appropriate for a visitor to be taking pictures of your control center with his or her smartphone.
Take care with what information you disclose to visitors, both verbally and through what is visible in your office or the control center. Although it’s natural to want to be helpful and talk about your work to an inquisitive visitor, never reveal critical information.
Recommended Practices
CISA has provided various documents detailing a wide variety of industrial control systems (ICS) topics associated with cyber vulnerabilities and their mitigation.
Recommended Practice: Updating Antivirus in an Industrial Control System
Recommended Practice: Creating Cyber Forensics Plans for Control Systems
Recommended Practice for Patch Management of Control Systems
Configuring and Managing Remote Access for Industrial Control Systems
Department of Homeland Security: Cyber Security Procurement Language for Control Systems
Mitigations for Security Vulnerabilities Found in Control System Networks
Social Engineering
Discovery does not necessarily stop after open-source information and electronic data have been collected. Validation of collected data and its relevance are required to help confirm targets and assist in the selection of attack methods.
As such, the discovery phase can also involve interaction with personnel involved with the target. These personnel can be workers responsible for the control system or other support functions related to regular business operations. Using false personas and pretending to be people in a position of trust, the attacker can interface with personnel and not only validate information that has been acquired, but expand their target folder by collecting new information directly from people involved in operations.
This type of attack is commonly referred to as “social engineering” and can manifest through phone calls, email spear-phishing campaigns, and other methods where the adversary convinces the target to release sensitive information.