Industrial Control Systems

Purpose

  • Help to better understand ICS networks and ideas on protecting them from cyber attacks.

  • Discuss common weakness and vulnerablities along with cyber risks for ICS.

  • Importance of knowing the networks that needs to be protected

  • Discuss mitigation strategies and defense in depth for more secure ICS environment

Basic Concepts

IT

  • IT refers to anything related to computing technology.

IT Infrastructure Components

  • The most common IT infrastructure components are: Switches and routers, Firewalls, Remote access, Databases, Clients, Local Area Network (LAN) / Wide Area Network (WAN), Servers, Wireless access.

Switches and Routers
  • Switches are used when connecting computers, printers, databases, and other networking equipment. To optimize communications and make sure data are going where they need to go, switches provide a high level of control and efficiency within the network. Switches can be used to isolate communications between specific devices and are configured to regulate network traffic, ensuring the network doesn’t get congested by too much information. If they’re not available, network data just don’t flow.

  • Switches generally come in two different types: Managed switches and unmanaged switches.

    • Managed switches are fully configurable. They provide tremendous flexibility and usually more capacity than unmanaged switches. They allow the administrator granular control over the network. Management can be done locally or remotely.

    • Unmanaged switches are switches that you buy, take out of the box, and put on the network. There is no requirement to configure them; in fact, they are often designed so they cannot be configured. An unmanaged switch is simple to use, such as the switch built into the router that your Internet Service Provider may provide for your home network.

  • Routers act as a dispatcher, choosing the best path for information to travel so it’s received quickly. While switches are used to connect components within a network, routers are used to connect networks together. Routers determine the best path for networks to connect and are configured so information is always up-to-date and accurate.

LAN/ WAN
  • A Local Area Network (LAN) supplies networking capability to a group of computers in close proximity, such as in an office building, a school, or a home. A LAN is useful for sharing resources such as files, printers, games, or other applications. A LAN often connects to other LANs, to the Internet, or a wide area network (WAN).

  • A WAN is a geographically dispersed telecommunications network. The term distinguishes a broader telecommunication structure from a LAN. A WAN may be privately owned or rented, but the term usually includes a connection with, or through, public (shared user) networks. An intermediate form of network in terms of geography is a metropolitan area network (MAN).

  • From a cybersecurity perspective, most organizations deploy their information infrastructures in a manner that implies trust between all assets connected to it. In a LAN environment, the trust relationship is easier to control because most, if not all, the assets connected to a LAN are managed internally. Securing a WAN is much more complex and requires cross-domain trust, authentication, and management.

Remote Access
  • Remote access allows a user to connect to a network or system as though they were physically located at the console.

  • Remote access extends the network outside of physical and network perimeters and allows access from anywhere. Organizations provide remote access services to support telecommuters, remote management and support, and vendor support access.

  • Remote access components can include modems, remote access servers, virtual links, or any capability that facilitates non-local user access into the IT infrastructure.

Firewall
  • A firewall is a network security system that controls incoming and outgoing traffic based on an applied rule set. A firewall establishes a barrier between a trusted secure network and another network (e.g., the Internet) that is assumed to be insecure and untrusted.

    • Administrators use access control lists (ACLs) to create rule sets to ensure only authorized communications occur between networks. Firewalls can be simple or complicated, but almost all have the capability to be actively managed by an administrator. Even the firewall you use to protect your home network has that capability.

Databases
  • Businesses demand users have access to vast stores of information; historical information as well as up-to-the-minute data that can influence current and future business decisions. Having timely access to data, either historical or recent, is vital for ensuring optimum performance.

  • Databases store information needed for operations, such as customer lists, marketing and sales data, accounts payable and receivable, payroll, and order tracking. Many business decisions are made, and operations function, based on the numbers or values that reside in databases. The networking component of the IT architecture provides communications between databases. The client queries the database for information and uses that information locally, then sends updated information back to the database.

  • Databases are often configured to reside on servers so clients from across the organization can request data, often in user-customized formats and configurations. Large, centralized databases may exchange information with peripheral secondary databases located across the business domain.

  • ICS databases hold critical information used to ensure proper set points and functions on devices, or to gather monitoring information used to determine system state. They can include time-stamped data, events, or alarms that are queried or used to populate graphic trends in the human-machine interface (HMI). ICS database security is an important consideration when designing overall defense strategies.

Wireless Access
  • Wireless access allows for the information infrastructure to be expanded quickly and effectively without having to lay network cable, drill holes, or adjust ceiling conduits. Almost every modern IT device has the capability to use wireless, and fewer organizations are building hard-wired networking infrastructures; their business architectures are being built primarily using wireless networking. ICS operators and vendors also use wireless communications to manage, monitor, and control their ICS devices. Many control systems include built-in wireless capabilities.

  • Wireless access points are actually routers and require specific attention regarding security and reliability. Wireless access points are an attractive target for the cyber adversary, allowing direct access into the infrastructure or devices connected to it. The availability of wireless access points is critical to the effectiveness of the network. If rendered inoperable, no one can connect to the network without being on the wire.

  • Historically, the communications protocols used by wireless systems were not very secure; however, current technology provides the capability to securely deploy and manage these devices if the organization enables it. Sometimes the overhead of applying and managing secure wireless configurations is considered too much for an organization to bear so it employs useful but insecure access points.

  • Because of the criticality of the devices in an ICS, and the potentially dire consequences of a compromise, ICS wireless architectures should be carefully configured with as many cybersecurity controls applied as possible while still allowing required functionality.

Servers
  • Web pages, mail, customer service portals, and other information services usually reside on dedicated network hosts called servers (or sometimes, application servers). Clients connect to servers to perform tasks or use services that are common to a group. Servers are the workhorses in the IT infrastructure—they “serve” the applications, databases, stored information, and services to the clients on a network. Because organizations often maintain large information stores on dedicated servers, they typically have a large amount of random access memory (RAM) and storage space.

  • Modern IT environments may have servers located in any number of physical locations. Connectivity between the servers and clients is supported through the networking infrastructure. As organizations continue to grow and more information needs to be made available to more users, businesses add more servers—and the security footprint of the business grows along with it.

  • The security of these assets is important because if they are unavailable or the information they contain is corrupt, users and applications cannot work properly. Security protection profiles can vary from server to server, depending on the information stored or processed on the server and the requirements of the business. The protection of the information stored or transmitted by servers, and controlling access to them are vital components of an organization’s cybersecurity strategy.

Clients
  • Clients are the information resources, such as personal computers, laptops, or smartphones that provide an interface for users to view and manipulate digital information.

  • Clients are the most common interface between human users and information. Clients often depend on information that resides both locally and on servers that could be elsewhere.

  • Local applications run on the client, and help us process information or connect to other devices in our networking environment.

  • In the control system domain, clients are called HMIs - a computer used to control and manage processes in critical infrastructure sectors such as energy, water, and transportation systems.

Client-Server Relationship

The relationship between clients and servers can be confusing because both can also be called a host. The table below helps to clarify this relationship.

Host

Server

Client

Always a physical node

Can be a physical node or a software program

Can be a physical node or a software program

Can run both server and client programs

Installed on a host

Installed on a host

Provides specific services

Provides specific services to clients

Accesses specific services available from the server

Serves multiple users and devices

Serves only clients

Stand-alone or part of a client-server network

UPS Battery Backup
  • Many data center and control system environments have back-up power on standby. This back-up power supply ranges from a simple off-the-shelf universal power supply (UPS) under a desk, to larger ones found in server racks, taking up the same space as a 4U server.

  • Other back-up power solutions involve a building being equipped to handle its functions, such as a different room with a wall of batteries, or generators ready to kick on.

Virtualization

HMI workstations, Historians and Databases, and anything that uses a standard operating system on any workstation or server platform, could be virtualized.

Cybersecurity Tenets

  • Confidentiality is defined by the International Organization for Standardization (ISO) as ensuring that information is accessible only to those authorized to have access.

    • For example, a credit card transaction on the Internet requires the credit card number to be transmitted from the buyer to the merchant and from the merchant to a transaction processing network. The system attempts to enforce confidentiality by encrypting the card number during transmission by limiting the places where it might appear, and by restricting access to the places where it is stored. Confidentiality is necessary for maintaining the privacy of the cardholder’s personal information held in the system.

  • Integrity is maintaining and ensuring the accuracy and consistency of data over its entire life cycle. All characteristics of the data, including business rules, dates, definitions, lineage, and rules for how pieces of data relate must be correct for data to be complete.

  • Availability is the proportion of time a system is in a functioning condition. For any information system to serve its purpose, the information must be available when it is needed.

General IT Security

  • Security controls are the mechanisms used to mitigate vulnerabilities. An example of a security control is patching.

  • The Information Security Standard 27002 (ISO-27002) outlines hundreds of potential controls and control mechanisms

  • SANS provides a set of security policy templates that can be used to define policies.

Security Policy
  • The IT/ICS security policy document should reinforce management’s commitment to information security and contain the following items:

    • A definition of information security and its importance to the organization.

    • The intent of the policy regarding goals and principles of information security in conjunction with business strategy and objectives.

    • The structure of risk assessment and risk management as a framework for establishing controls.

    • Essential security policies, principles, standards, and compliance requirements, including the Rules of Behavior expected for all computing users.

    • Specific Security roles, responsibilities, accountabilities, and authorities (R2A2s) for IT/ICS security management and implementation, including reporting IT/ICS security incidents.

    • References to other policies and procedures that identify detailed security processes that everyone in the organization is expected to follow.

Access Control
  • An access control policy defining user or group rules and rights should be clearly stated. Access controls for both logical and physical assets should be considered together. The policy should consider:

    • The security requirements of each application/system.

    • The identification of all information related to the application or system and risks associated with access to the information.

    • The implementation of Least Privilege concepts for access to systems and applications.

    • Standard user profiles for common job roles.

    • The segregation of access control roles, such as access requests, access authorizations, and access administration.

    • Formal procedures for access requests and approvals.

    • The removal of access rights, including requirements for notifying administrators when individuals are transferred, their roles or access authorizations change, or they are terminated.

Asset Management
  • IT/ICS assets include:

    • Information such as databases, systems and research information, logs, operational or support procedures, continuity plans, failure and recovery procedures, and archives.

    • Software assets such as application software, system software, development tools, and utilities.

    • Physical assets such as computer equipment, communications equipment, removable media, and test and analysis equipment.

    • Services such as computing and communications services, general utilities such as air conditioning, fire protection, surveillance services, and other services.

    • People and their capabilities, skills, and experience; in addition to the reputation and image of the organization.

  • All IT/ICS assets must be clearly identified, inventoried, and maintained. The ownership of assets must be clearly identified, and the asset owner should be responsible for ensuring that IT/ICS assets are appropriately classified based on risk and are periodically reviewed for access restrictions and classification.

Business Continuity
  • The business continuity management process is implemented to protect critical business processes from the effects of major failures in systems as they relate to IT and ICS environments. The reliance on automated processes leaves an organization in distress when these automated processes are no longer functional and is exacerbated when no prior planning or instruction exists on how to cope with the event.

  • Understanding the impacts of an interruption caused by a security incident are important tools, even for the most seasoned business executive.

Communications and Operational
  • Operating procedures should be documented, maintained, and made available to all authorized users. These procedures should specify instructions for the execution of each function, to include processing and handling of information, backup and/or restoration instructions, scheduling and interdependencies of work, abnormal execution instructions including support contacts, restart instructions, and the expectations for managing system log information.

  • Configuration management and change control policies and procedures must be controlled and maintained. Any changes to systems must be tested prior to implementation, and should be independently reviewed and security controls verified. Updates to all documentation associated with the change should also be accomplished before the change is implemented. Logs of all changes containing relevant test procedures and results should be maintained.

Compliance
  • Individuals must understand the legal ramifications of their activities. Compliance activities fall into categories such as regulatory, legal, statutory, contractual, security, and intellectual property/copyright/trademark issues.

  • Control of the organization’s legal obligations requires that any of these items be documented and kept up to date. Usually the legal department should be involved in any IT or ICS acquisition associated with any legal or statutory requirement.

  • Records associated with information, systems, applications, or security should be categorized and maintained, and a retention schedule identified. If the records are stored electronically, procedures to access the data for the retention schedule must be ensured, even when technology changes render obsolete systems unusable. These technology changes normally require a conversion process that should be overseen by the owner of the information

Human Resources
  • Security roles, responsibilities, accountabilities, and authorities (R2A2) for employees, contractors, and third-party users must be defined and documented according to the organization’s IT/ICS security policies. Security roles and responsibilities include:

    • Requirements to implement the organization’s security policies

    • Protect assets from unauthorized access, disclosure, modification, destruction, or interference

    • Execute security processes as required

    • Take responsibility for individual actions

    • Report security events that pose a risk to the organization.

  • These R2A2s must be communicated to employees, contractors, and third parties prior to beginning work. Job descriptions provide an effective means of communicating the security responsibilities. In addition, the continual reinforcement of R2A2s via regular training play an important role in keeping personnel aware of their security obligations.

Information Systems Acquisition, Development, and Maintenance
  • Procurement specifications for systems must take into consideration the security controls to be incorporated into the system. DHS provides templates for Procurement Specifications that include significant security provisions. They should be used for even the smallest of acquisitions and considered mandatory for large scale, sensitive environments.

  • A formal testing and acquisition process should be followed by the organization. Contracts should address the identified security requirements. Additional functionality built into the product that may cause a security risk should be disabled/removed.

  • Many products have been evaluated formally for security and are certified for a particular use. A process in place to review equipment and evidence of security is provided through an “Evaluation Assurance Level” document performed by independent contractors to review equipment and software to the formal requirements of Common Criteria (ISO/IEC 15408).

Physical and Environmental
  • Security perimeters must be used to protect areas containing sensitive IT/ICS facilities. These perimeters should be clearly defined, access controlled via electronic locks or other physical barriers, and monitored via surveillance equipment.

  • Third-party users (vendors, support personnel, etc.) should be physically separated from the organization’s sensitive facilities. Likewise, public access, delivery and loading areas should be controlled and isolated where feasible.

Risk Assessment
  • Business exposure to IT/ICS security risks is a balance of cost and potential harm. This includes physical harm to people and assets, loss of reputation, environmental harm, regulatory violations, and others. Each of these potential business exposures has a cost that may be equated to the bottom line in terms of profits.

  • An organization performs a risk assessment to identify, quantify, and prioritize the risk against criteria for managing the risk and objectives related to the business bottom line. The product of the risk assessment is an estimate of the magnitude of the risk (risk analysis) and the significance of that risk (risk evaluation), along with the identification of potential threats and the current vulnerability to the threats (risk exposure).

  • Risk assessments are as important to an organization as any other business undertaking and should be a key activity performed on a regular basis. They should involve key stakeholders of the organization, including those with knowledge to assess the risks and prioritize mitigation actions based on the criticality of the asset. These key human resources should include, but are not limited to, CIO, Legal Counsel, Risk Manager, Compliance Manager, Public Relations, and Physical Security.

Security Incident Management
  • Suspected security events must be reported through appropriate channels as quickly as possible. A formal security event reporting procedure is required, along with an incident response and escalation procedure identifying the actions to be taken in the event of a security incident.

  • A point of contact should be established for reporting and managing security incidents. The reporting procedure generally contains:

  • Instructions to the individual reporting the incident (e.g., to do nothing that would compromise the ability to perform forensics on the system).

  • Security incident forms to support reporting.

  • Steps to be taken by responders in case of a security event.

  • Feedback processes to ensure those reporting security events are notified of results after the issue has been closed

Security Governance
  • Security governance encompasses a set of multi-disciplinary structures, policies, procedures, processes, and controls. It is implemented to manage information at an enterprise level, and supports an organization’s immediate and future regulatory, legal, risk, environmental, and operational requirements.

  • As defined by Gartner, Inc., security governance is, “the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival, and deletion of information. It includes the processes, roles, standards and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals.”

  • Organizations take great pride in their use of technology to advance their reputation and worthiness to the public and other organizations. However, too much information provides an avenue of risk when this information slips into the hands of those seeking to use it for their own benefit to do harm. A fine balance must be achieved to ensure IT/ICS information technology does not provide that avenue of risk.

IT Vulnerabilities

  • To fully understand cyber risk, we need to understand vulnerabilities. Simply put, vulnerabilities are weaknesses that, if exploited, could result in an undesirable consequence (such as a system compromise). Vulnerabilities usually have a negative impact on the security posture of the system and need to be mitigated to reduce the cyber risk.

  • Vulnerabilities can usually be mitigated, either by a reconfiguration of the system or by applying a security patch issued by the vendor. People often associate vulnerabilities with weaknesses that give a potential attacker a specific opportunity to compromise a system; but the existence of a vulnerability doesn’t always equate to an opportunity upon which an adversary can capitalize. Many factors contribute to whether an adversary will take advantage of a vulnerability, such as:

    • The ease in which the vulnerability could be exploited.

    • Where the adversary needs to be, relative to the system to attack.

    • Whether the adversary must authenticate to the system to carry out the attack.

  • Interestingly, system owners can also decide whether to fix a vulnerability based on similar criteria:

    • What are the tools available to exploit the vulnerability?

    • How easy is it to exploit the vulnerability?

    • How much does it cost to fix the vulnerability?

    • How accurate is the information about this vulnerability?

    • Is there any collateral damage (in other systems) that can be caused by exploiting this vulnerability?

  • As the number of cyber vulnerabilities grow, so does the capability to track and score these vulnerabilities. Scoring establishes a common measure of how much concern a vulnerability warrants, as compared to other vulnerabilities measured the same way. Scoring allows organizations to prioritize their cyber risk reduction activities and provides valuable intelligence on the current status of known vulnerabilities, their mitigation strategies, and the constantly evolving changes in levels of difficulty associated with exploiting the vulnerability. The most popular vulnerability scoring system is called the Common Vulnerability Scoring System (CVSS), and is hosted at National Institute of Standards and Technology (NIST).

  • The decision to implement a countermeasure to mitigate a vulnerability is not always obvious. Scoring allows the system owner to assess the potential impact in a general sense; however, operational requirements must also be taken into consideration. Updating a system or application or applying a patch may not be feasible as it could alter the functionality, cause a service interruption, or even cause the service or process to fail. This is where defense-in-depth strategies are applied.

  • In some cases, the vulnerabilities are inherent in the system and users of the technology are not in a position to fix them. For example, known vulnerabilities in some network protocols have been in place since they were designed. Network administrators and security implementers have worked together to compensate for them (as opposed to fixing the root problem).

  • In addition to insecure protocols, there are inherent vulnerabilities associated with all operating systems. Given that relatively few operating systems support the majority of our global IT infrastructures, any vulnerability in that operating system provides a target-rich environment for cyber attackers. When vulnerabilities are discovered in operating systems, it impacts not just one or two IT architectures; it impacts all architectures that use it. In some cases, this can impact tens of millions of computers, many of which control critical processes.

OT/ICS

OT

  • OT refers to hardware and software used to monitor events, processes, and devics and make adjustments in industrial operations.

  • Operational Technology (OT) refers to systems used to monitor and control industrial operations.

ICS

  • ICS is a general term used to describe the integration of hardware and software with network connectivity in order to support critical infrastructure. An ICS is a system that handles process control and monitoring for the facility. It will take inputs from sensor and process instruments and provide output based on control functions in accordance with approved design control strategy.

  • The structures of ICS architectures are diverse, and depend upon system requirements, process function, and business needs. Vendor solutions sometimes dictate using specific ICS architectures; however, depending on the functionality and the complexity of the control action, there are common elements seen across all ICS architectures.

  • We should note that the differences between systems are diminishing as the capabilities merge. To reduce the confusion among the various types of control systems, we refer to them by their generic name, industrial control system (ICS).

Different ICS Terms

  • Industrial control system (ICS) is a collective term used to describe different types of control systems and associated instrumentation, which include the devices, systems, networks, and controls used to operate and/or automate industrial processes. Depending on the industry, each ICS functions differently and are built to electronically manage tasks efficiently. Today the devices and protocols used in an ICS are used in nearly every industrial sector and critical infrastructure such as the manufacturing, transportation, energy, and water treatment industries.

  • ICS: A computer-based system used within many critical infrastructures to monitor and control sensitive processes and physical functions. “Control Systems” is a generic term applied to hardware, firmware, communications, and software that are used to monitor and control vital functions of physical systems.

  • ICS refers to the facilities, systems, and equipment that comprise the operational real-time control environment, services, diagnostics, and functional capabilities necessary for the effective and reliable operation of automation systems. ICS are made up of a device, or set of devices, that manage the behavior of other devices.

  • An ICS system is an interconnection of components related in such a manner as to command, direct, or regulate itself or another system. This process could occur within a single factory (e.g., a batch mixing process contained in a chemical plant) or be distributed over a large geographical area (e.g., tracking and coordination of train movement over a busy rail system).

  • Industrial Control Systems (ICS) includes systems used to monitor and control industrial processes.

  • ICS refers to a broad set of control systems including

    • SCADA (Supervisory Control and Data Acquistion): A large scale, distributed measurement and control system (geographically spread out). SCADA systems are used in the transmission and distribution of oil (pipelines), gas, water, and electricity (pipelines).

    • DCS (Distributed Control System): A system where control is achieved by the distribution of live data (intelligence) throughout the controlled system, rather than from a centrally located single unit. DCS are used in power generation, chemical processing, oil refining, and wastewater treatment. They might be located at one location only such as Nuclear power station with reactor basement (ground and first floor), cooling tower and field controller (communicating with IO and sending data to control room) (Distributed control).

    • PCS (Process Control System): A general term that encompasses several types of control systems used in industrial production, including SCADA, DCS, and other smaller control system configurations such as programmable logic controllers (PLC). PCS are used in water treatment, chemical processing, mining, pharmaceuticals, and manufacturing.

    • EMS (Energy Management System): A system of computer-aided tools used by operators of electric utility grids to monitor, control, and optimize the performance of the generation and/or transmission system. EMS are used in electrical energy and pump optimization.

    • AS (Automation System): A technology concerned with performing a process by means of programmed commands combined with automatic feedback control to ensure proper execution of the instructions. The resulting system is capable of operatingwithout human intervention. AS are used in material handling and discrete manufacturing.

    • SIS (Safety Instrumented System): An engineered set of hardware and software controls commonly used on critical process safety systems. SIS are especially useful in safety shutdown and equipment protection systems. Separate system from a DCS created specifically for safety purposes. For example as long as the variables (temperature, pressure and other important variables) are within specfied all is good. If not, SIS will shutdown the systems.

    • Any other automated control system: An example of another automated control system is a building automation system (BAS), such as automatic doors, or controls for heating, ventilation, and air conditioning (HVAC).

Supervisory Control & Data Acquisition (SCADA)
  • SCADA is an acronym for supervisory and data acquisition, a computer system for gathering and analyzing real time data. SCADA systems typically used to control geographically dispersed assets that are often scattered over thousands of square kilometers. In the past, communications between field controllers and host computers were dependent upon serial communications, most typically RS232. Data rates rarely exceeded 9,600 bits per second and resulted in ICS needing to be co-located or include multiple relays.

  • As digital technology and data transfer rates improved, networks extended to include more remote locations, and asset owners started to migrate their serial SCADA circuits and converted to digital networks. While this migration offers asset owners significant benefits, there are pitfalls. An improperly designed network can be a conduit for cyberattacks.

  • Because a tremendous amount of data is collected, the success of the SCADA system is dependent on the master controller successfully communicating with field controllers, such as RTU, IED, and PLC. If communications fail, the field controllers must individually control the remote facilities until the system re-establishes communications and the RTU or PLC can report to the master station.

  • Using SCADA components provides flexibility, in that they can integrate the HMI from one vendor with the PLC or RTU from another vendor, provided they use the same protocol. This means you can replace the HMI software without having to replace the RTU or PLC (and vice-versa).

  • SCADA is used in a wide range of industries. Some of the common places that use SCADA for various processes include:

    • Electrical Delivery

    • Oil and Gas Delivery

    • Drinking Water Delivery

    • Wastewater Removal

    • Transportation Systems

Distributed Control System (DCS)
  • DCS were initially developed to support large process industries such as refineries and chemical plants. The DCS controllers are distributed throughout the plant; hence the name distributed control system. They are typically deployed at site facilities over the plant or control area.

  • DCS are different from a centralized control system where a single controller handles the control functions from a central location. DCS has each machine or group of machines controlled by a dedicated controller. These distributed individual automatic controllers are connected to the field devices.

  • The biggest advantage of DCS is its ability to have multiple controllers dividing tasks, because DCS is best suited for large-scale processing or manufacturing plants where a large number of continuous control loops need to be monitored and controlled. The biggest advantage to multiple controllers dividing the control tasks, because if any part of DCS fails, the plant can continue to operate irrespective of the failed area.

  • Due to the distribution of control system’s architecture of DCS, it has become prominent in large and complex industrial processes. They include:

    • Papermaking

    • Fixed Chemical

    • Water Treatment

    • Rail Transit

    • Power Stations

    • Petrochemical

    • Biopharmaceutical

SCADA vs DCS

SCADA

DCS

Data-gathering oriented

Process oriented

Larger geographical areas that use different communication systems, which are generally less reliable that local area network

Data acquisition and control modules located within a confined area and communication between various distributed control units is carried out by local area network (LAN).

No closed-loop control

Closed-loop control at process control stations and remote terminal units

Event driven - not scanned regularly, waits for an event to trigger actions

Process-state driven - scans the process regularly and displays to operator, as well as on-demand

Used in larger geographical locations, such as water management systems, power transmission and distribution control, etc.

Used in installations within confined-space, such as single plant or factory, for complex control processes

Process Control Systems (PCS)
  • PCS, sometimes called ICS, function as pieces of equipment along the production line during manufacturing, testing the process and returning data for monitoring and troubleshooting. PCS are architecturally similar to SCADA systems, but also perform many of the functions of a DCS, and are similarly used at site facilities. PCS supports a variety of manufacturing processes including continuous, batch, and discrete processing.

  • Many PCS applications overlap with DCS applications. PCS are scalable and used in small power plants, as well large production facilities. As a general rule, however, DCS implementations are more suitable for large refineries and chemical plants.

  • PCS use many of the same software packages and hardware components as SCADA systems. This includes PLC and RTU. The main difference between PCS and SCADA system is that a PCS communicates with the field controllers using a plant network, while SCADA systems traditionally use serial communications and remote networks for the same task

OT Infrastructure Components

Human-Machine Interface (HMI)
  • The user interface in a manufacturing or process control system. It provides a graphics-based visualization of an industrial control and monitoring system. Previously called an “MMI” (man-machine interface), an HMI typically resides on a computer that communicates with a specialized computer in the plant, such as a programmable automation controller (PAC), programmable logic controller (PLC) or remote terminal unit (RTU). The HMI generally comes in two forms: either a touch panel or a software-based application that is loaded on a personal computer, workstation, tablet, or smart phone.

  • HMI workstations are typically located at a centralized or distributed control center, where operators see a complete set of unified control system data presented in a graphical user interface. This allows the operator to have a real-time or near real-time operational view of the process. An operator typically uses the HMI to monitor and control the process. They are also capable of providing historical trends, alarms and event notifications, or support other applications that an operator may use to do their job.

  • From a security perspective, the HMI system and/or data is an obvious target, as they typically use standard operating systems and are interconnected with outside networks or available through remote access methods. Many HMI have command and control functionality, and if compromised, could allow an attacker to take over a mission-critical process. The SCADA Server is the data server that sends data to the field control devices via communications network using various communication protocols. It facilitates the communication through the system. The Energy Management System (EMS) is the energy data of the system and optimizes the energy use of the ICS.

Field Controllers
  • The devices that consolidate inputs and outputs, taking the instructions from the operators to make changes in the field. Controllers can be programmed or updated in the field (remotely). These devices were designed as if they were in a “trusted” (the network map should show information about the trusted vs. un-trusted environments) environment. Therefore, when given a command, they obey or respond. Most do not authenticate to make sure they are receiving commands from a specific source.

  • Field controllers collect and process input and output (I/O) data. They also send the process data to the HMI, as well as process control commands from the HMI to the field controllers. The field controllers are often located close to the field devices in order to process the information as quickly as possible. For large distributed systems, field controllers may collect and aggregate information from hundreds or thousands of sources.

  • Field controllers are embedded microprocessor devices and are designed to withstand the rigors of an industrial environment. Like personal computers, they have a processor and internal memory, but usually do not have a mechanical hard drive. They convert the electrical signal from field devices (input) into a digital signal (1s and 0s), and convert a digital signal to an electrical signal (output).

  • There are many different types of field controllers, and each is designed to support specific processes or sectors. The are four common types of field controllers: remote terminal units (RTU), intelligent electronic devices (IED), programmable logic controllers (PLC), and programmable automation controllers (PAC).

RTU
  • A remote terminal unit (RTU) is a microprocessor-controlled electronic device that interfaces objects in the physical world to a distributed control system or SCADA (supervisory control and data acquisition) system by transmitting telemetry data to a master system, and by using messages from the master supervisory system to control connected objects. As this interfacing involves the collection of telemetry data, the system is sometimes called a remote telemetry unit. One of the key characteristics of an RTU is that it relays information from a remote location over long distances to a centrally located host using/supporting a variety of communications mediums and ICS protocols.

  • RTU are capable of executing programs autonomously without having to involve the HMI or operator. This enables RTU to respond quickly to emergencies without operator input. For example, if the RTU program “sees” a high flow rate on one of the input flows, it can issue an output command to shut down a pump. In addition to converting analog or discrete measurements to digital information, RTU are also used as data concentrators and protocol converters. Typically, RTU are used by utilities and other industries that monitor and control geographically dispersed facilities.

  • Sectors using RTUs

    • Oil and gas – RTUs are used in offshore platforms, onshore oil wells, pipelines

    • Refineries and chemical plants – RTUs are used in environmental monitoring systems (pollution, air quality, emissions monitoring), outdoor warning sirens

    • Water and Wastewater – RTUs can be found in distribution systems, aqueducts, water resource management, collection systems

    • Electric power – RTUs are used in transmission and distribution systems across the country

    • Mine sites – RTUs are used in conveyor monitoring and control, mine water management, underground equipment monitoring, bore management, and material handling

    • Transportation Systems – RTUs are used in air traffic control, railroads, and trucking

Intelligent Electronic Device (IED)
  • An Intelligent Electronic Device (IED) is a term used in the electric power industry to describe microprocessor-based controllers of power system equipment. It is used by the Energy sector to monitor and control electrical power devices such as circuit breakers, capacitors, and transformers. IED receive data from field sensors (I/O) and power equipment and can issue control commands. These commands include simple things such as tripping circuit breakers if they sense anomalies in voltage or current. They can also instruct system output to raise or lower voltage levels in order to maintain the desired level. Common types of IED include protective relaying devices, load tap changer controllers, circuit breaker controllers, capacitor bank switches, re-closer controllers, and voltage regulators.

  • Many owners/operators leave their IED with their “fresh out of the box” configurations. These default configurations, unfortunately, make it easier for those with ill intent to make changes to the operational parameters of the device. Furthermore, some owners opt to keep the extra communication programming ports active so they can view or make online changes from the shop or control room. Considering that modern IED are fully network aware, and in some cases, may have embedded services that facilitate remote administration, there is a valid concern for the cybersecurity of these devices.

  • The utilities which operated the power transmission stations were some of the first to use IED. This early use was not to comply with regulatory requirements, but to save money. The use of IED in this instance meant a highly paid technician would not have to drive to a potentially remote transmission station to retrieve data.

Programmable Logic Controller (PLC)
  • Over the years, PLC functionality matured, and the devices are now found in other sectors. In fact, there is a new class of field controllers called Process Automation Controllers (PAC). PAC combine the best features of RTU, PLC, and Distributed Control System (DCS) controllers into a universal controller for use across multiple sectors. The onboard processor and memory, along with the network capabilities, make this device particularly interesting from a cybersecurity perspective. Click the controller image for the list of languages used by PLC.

  • A Programmable Logic Controller, or PLC, is a ruggedized computer used for industrial automation and was created to respond to the needs of the automotive industry. These controllers can automate a specific process, machine function, or even an entire production line. In the 1960s, the automotive industry used relays, timers, and switches, along with extended wiring and cabling runs to control its assembly lines. Every time a model changed, the assembly line required a tear down and rebuild, making the process of auto manufacturing incredibly expensive and time-consuming. Only skilled electricians were qualified to perform this re-purposing of an assembly line.

  • In 1968, General Motors issued a request for a proposal to replace the vast hardwired relay systems with a computer-based system. A company called Bedford Associates won the proposal and created the PLC. The first PLC were programmed in Ladder Logic. This programming language is designed to mimic the relay diagrams electricians used to wire relays and timers in the older assembly plants. The image shows PLC ladder logic illustrating basic motor start/stop control.

  • In addition to ladder logic, other PLC programming languages have been developed. They are ladder logic, structured text, function block diagrams, sequential function chart, and instruction lists.

Programmable Automation Controller (PAC)
  • PAC is a term that is loosely used to describe any type of automation controller that incorporates higher-level instructions. The systems are used in ICS for machinery in a wide range of industries, including those involved in critical infrastructure. They provide a highly reliable, high-performance control platform for discrete logic control, motion control, and process control. There is no specific agreement between industry experts as to what differentiates a PAC from a PLC. In any case, defining exactly what constitutes a PAC is not as important as having users understand the types of applications for which each is best suited.

  • A PAC is geared more toward complex automation system architectures composed of a number of PC-based software applications, including HMI functions, asset management, historian, advanced process control (APC), and others. A PAC is also generally a better fit for applications with extensive process control requirements, as PACs are better able to handle analog I/O and related control functions. A PAC tends to provide greater flexibility in programming, larger memory capacity, better interoperability, and more features and functions in general.

  • PAC provide a more open architecture and modular design to facilitate communication and interoperability with other devices, networks, and enterprise systems. They can be easily used for communicating, monitoring, and control across various networks and devices because they employ standard protocols and network technologies, such as Ethernet, Open Platform Communication (OPC), and Structured Query Language (SQL.)

  • PACs also offer a single platform that operates in multiple domains, such as motion control, communication, sequential control and process control. Moreover, the modular design of a PAC simplifies system expansion and makes adding and removing sensors and other devices easy, often eliminating the need to disconnect wiring. Their modular design makes it easy to add and effectively monitor and control thousands of I/O points, a task beyond the reach of most PLC.

Field Devices
  • The instruments and sensors that measure process parameters and the actuators that control the process. This is the interface between the ICS and the physical process, be it the mixing of chemicals, the management of trains, or measuring of pressures in a gas pipeline.

  • This is the point in the system where information is collected about the process, modifications are made, and the process is controlled. The sensors or measuring instruments are often referred to as input devices because they “input” data into the ICS. In contrast, switches, valves, and other types of actuators that control the process are called output devices. This input and output information is often referred to as I/O.

Field Devices - Input
  • Sensors, or transmitters, collect data, or input, and are built into control instruments. The sensor may monitor one input point or measure over 100,000 points, such as within large refineries or utility front-end processors. The sensors convert physical parameters, such as temperature, pressure, level, flow, motor speed, valve state, or breaker position to electrical signals. The input device allows the operator to communicate and transmit instructions and data to computers for transmission, processing, display, or storage.

  • Sensors are commonly described by their type: discrete, analog, and digital

    • Discrete: Discrete input sensors support binary events including alarms and states. For example, the tank is full, the door is closed, the pressure is too high, or the pump is turned on.

    • Analog: Analog input sensors (transmitters) measure continuous processes such as flow, level, or pressures within a range; 0-100%, empty to full, 0 to 100 mph. Typically, they transmit this information to field controllers using an analog signal such as a 4 to 20-mA.

    • Digital input sensors are similar to both discrete and analog instruments in that they measure continuous processes (such as flows) and support binary events. However, instead of using an analog loop signal or clean contacts, digital sensors use a digitally encoded ICS communications protocol format (representing an equivalent to 1s and 0s) signal to relay the data.

  • Signals generated by discrete and analog field devices are converted to digital format in a networked environment. The digital signals extend the network to the instrument, and consequently, the process.

Field Devices - Output

An output device is any peripheral that receives data from the field controller.

  • Discrete: Like their input counterpart, discrete output devices are also binary appliances. For instance, the field controller issues a signal to an output device, such as a circuit breaker, to open or close a breaker. Discrete output devices can communicate directly with discrete input devices. Furthermore, they can make control decisions and are programmable like a field controller.

  • Analog: The analog output transmits analog signals (voltage or current) that operate controls. Analog outputs are predominantly used to control actuators, valves, and motors in industrial environments. In this case, the field controller will send a varying electrical signal that can open or close the valve as needed.

  • Digital: A digital output allows you to control a voltage with a computer. If the computer instructs the output to be high, the output will produce a voltage (generally about 5 or 3.3 volts). If the computer instructs the output to be low, it is connected to ground and produces no voltage. As a result, they can communicate more quickly and reliably, thus enabling their use in environments that are more critical, covering a wider range of applications. Examples include: alarms, control relays, fans, lights, horns, valves, switches, motor starters, etc.

Servers

Used to store configuration for the ICS and saves process data in historians for later retrieval. The servers connect to business networks to allow remote operations, configuration, or information exchanges to improve productivity.

Engineering Workstations

A specialized type of HMI, typically interface with the servers to modify the database or controllers to ensure the critical process runs properly. As we gain an understanding of the similarities between IT and ICS architectures, we will have greater success mapping traditional cybersecurity issues into the ICS domain.

Safety Systems

Safety systems provide protection to the process, physical equipment, or people from harmful situations that may arise during operations. It is a counter action critical in industrial operations in the case of a process goes beyond allowable control parameters. While this would result in a loss of productivity, it would spare the equipment and people harm. Safety systems are traditionally, designed to be separated from the control systems they protect. However, they frequently share some communications, field devices, alarms, etc.

ICS components
Relationship

Machines installed in industrial plants use a variety of field devices for control and monitoring. These devices connect to field controllers, which connect to Human Machine Interface (HMI).

                                  |-----------------Human Machine interface
                                  V
Field Device (Actuator) <---- Controller  <--------- Field Device (Sensor)
ICS Segments In-Depth

ICS are composed of several components such as field devices, field controllers, and HMI. Each of these components can become complex. The images below shows common examples of each device.

  • Field Devices (Meters, Sensors, Valves, Switches)

  • Field Controller (PLC, PAC, RTU, IED)

  • HMI (Workstations, SCADA Server, Emergency Management System)

Cybersecurity Tenets

Availability, Integrity, and Confidentiality

Here’s an important fact to keep in mindmaybe you’ve heard of the C-I-A elements in IT environments? It is important to be cautious about how we utilizie security technology developed for IT and how we implement it into ICS environments

  • Availability; The proportion of time a system is in a functioning condition. For any information system to serve its purpose, the information must be available when it is needed.

  • Integrity: Maintaining and ensuring the accuracy and consistency of data over its entire life cycle. All characteristics of the data including business rules, rules for how pieces of data related, dates definitions, and lineage must be correct for data to be complete.

  • Confidentiality: Ensuring that information is accessible only to those authorized to have access.

Uses of ICS

  • A process is a series of steps taken to achieve a desired result. Purifying water, landing airplanes, and distilling chemicals are all examples of processes. ICS have components that are common to controlling all processes, even if the processes are different. However, because of differences within process environments, there will also be differences in ICS implementations.

  • For example, one process may be designed to shut off the product flow into a vessel based on the level the product has reached, while another process uses the product weight or a calculation of volumetric flow as a control. Each method has advantages and disadvantages based on costs, safety, environmental impacts, and product quality; but each uses ICS and the ICS used are implemented differently.

Examples of ICS attacks

Oldsmar Water Treatment plant
  • In February 2021, an attacker targeted a water treatment plant in Pinellas County, Florida. The plant was utilizing the software Team Viewer for remote access and assistance. This software was left running and the attacker was able to connect to the system through this channel.

  • The hacker increased the amount of sodium hydroxide setting from 100 parts-per-million (ppm) to about 11,100 ppm. This level is extremely dangerous in a water system.

  • The plant operator recognized the intrusion, observed the configuration change, immediately reversed the change, and initiated incident response protocols.

Colonial Pipeline
  • In early May 2021, Colonial Pipeline experienced a ransomware attack. Attackers entered the system via an unused but active VPN account. The attackers stole approximately 100 GB of data and installed ransomware.

  • An operator noticed the ransom message on a control room system early the morning of May 7. To stop the spread of the ransomware before it reached critical OT systems, the entire pipeline system was shut down 70 minutes after the initial discovery.

  • The pipeline delivered 2.5 million gallons of fuel per day to the southeast states. As word of the attack spread, people rushed to purchase fuel, causing shortages. A federal state of emergency was declared, allowing other means of transportation (road, rail, etc.) to attempt to ease the supply shortage.

Stuxnet
  • Stuxnet was a game changer because it was the first known malware to specifically target a control system. It is believed to have been introduced by a USB stick.

  • Stuxnet modifies programs for a specific PLC, hides the changes, and employs sophisticated evasion techniques. It only impacts ICS operating variable frequency drives.

Critical infrastructure and Key Resources (CIKR) Sectors

Chemical

The Chemical Sector is an integral component of the economy that manufactures, stores, uses, and transports potentially dangerous chemicals upon which a wide range of other critical infrastructure sectors rely. Securing these chemicals against growing and evolving threats requires vigilance from both the private and public sector.

Commercial facilities

The Commercial Facilities Sector includes a diverse range of sites that draw large crowds of people for shopping, business, entertainment, or lodging. Facilities within the sector operate on the principle of open public access, meaning the general public can move freely without the deterrent of highly visible security barriers. The majority of these facilities are privately owned and operated, with minimal interaction with the federal government and other regulatory entities.

Communications

The Communications Sector is an integral component of the economy, underlying the operations of all businesses, public safety organizations, and government. It is critical because it provides an “enabling function” across all critical infrastructure sectors. The sector has evolved from predominantly a provider of voice services into a diverse, competitive, and interconnected industry using terrestrial, satellite, and wireless transmission systems. The transmission of these services has become interconnected; satellite, wireless, and wireline providers depend on each other to carry and terminate their traffic, and companies routinely share facilities and technology to ensure interoperability.

Critical manufacturing
  • The Critical Manufacturing Sector is crucial to the economic prosperity and continuity. A direct attack on or disruption of certain elements of the manufacturing industry could disrupt essential functions at the national level and across multiple critical infrastructure sectors.

  • Products made by these manufacturing industries are essential to many other critical infrastructure sectors. The Critical Manufacturing Sector focuses on the identification, assessment, prioritization, and protection of nationally significant manufacturing industries that may be susceptible to manmade and natural disasters.

Dams
  • The Dams Sector delivers critical water retention and control services including hydroelectric power generation, municipal and industrial water supplies, agricultural irrigation, sediment and flood control, river navigation for inland bulk shipping, industrial waste management, and recreation. Its key services support multiple critical infrastructure sectors and industries.

Defense Industrial Base

The Defense Industrial Base Sector is the worldwide industrial complex that enables research and development, as well as design, production, delivery, and maintenance of military weapons systems, subsystems, and components or parts, to meet U.S. military requirements.

Emergency Services
  • The Emergency Services Sector is a community of millions of highly-skilled, trained personnel (along with the physical and cyber resources) that provide a wide range of prevention, preparedness, response, and recovery services during both day-to-day operations and incident response.

  • This sector includes geographically distributed facilities and equipment in both paid and volunteer capacities organized primarily at the federal, state, local, tribal, and territorial levels of government-such as city police departments and fire stations, county sheriff’s offices, Department of Defense police and fire departments, and town public works departments.

  • The Emergency Services sector also includes private sector resources, such as industrial fire departments, private security organizations, and private emergency medical services providers.

Energy
  • The energy infrastructure fuels the economy of the 21st century. Without a stable energy supply, health and welfare are threatened, and the economy cannot function. The Energy Sector as uniquely critical because it provides an “enabling function” across all criticalinfrastructure sectors.

  • Country’s energy infrastructure is often owned by the public, private sector, supplying fuels to the transportation industry, electricity to households and businesses, and other sources of energy that are integral to growth and production across the nation.

Finance

The Financial Services Sector represents a vital component of our nation’s critical infrastructure. Large-scale power outages, recent natural disasters, and an increase in the number and sophistication of cyberattacks demonstrate the wide range of potential risks facing this sector.

Food and Agricultrual

The Food and Agriculture Sector is almost entirely under private ownership and is composed of an multiple farms, restaurants, and food manufacturing, processing, and storage facilities.

Government Facilities
  • The Government Facilities Sector includes a wide variety of buildings, both in the country and overseas, that are owned or leased by federal, state, local, and tribal governments. Many government facilities are open to the public for business activities, commercial transactions, or recreational activities while others that are not open to the public contain highly sensitive information,materials, processes, and equipment. These facilities include general-use office buildings and special-use military installations, embassies, courthouses, national laboratories, and structures that may house critical equipment, systems, networks, and functions.

  • In addition to physical structures, this sector includes cyber elements that contribute to the protection of sector assets (e.g., access control systems and closed-circuit television systems) as well as individuals who perform essential functions or possess tactical, operational, or strategic knowledge.

Healthcare and Public Health
  • The Healthcare and Public Health Sector protects all sectors of the economy from hazards such as terrorism, infectious disease outbreaks, and natural disasters. Because the vast majority of this sector’s assets are privately owne and operated, collaboration and information sharing between the public and private sectors is essential to increasing resilience of the nation’s Healthcare and Public Health critical infrastructure.

Information Technology
  • The Information Technology Sector is central to the nation’s security, economy, and public health and safety as businesses, governments, academia, and private citizens are increasingly dependent upon its functions. These virtual and distributed functions produce and provide hardware, software, and information technology systems and services, and-in collaboration with the Communications Sector-the Internet. This sector’s complex and dynamic environment makes identifying threats and assessing vulnerabilities difficult and requires that these tasks be addressed in a collaborative and creative fashion.

  • Information Technology Sector functions are operated by a combination of entities-often owners and operators and their respective associations-that maintain and reconstitute the network, including the Internet. Although information technology infrastructure has a certain level of inherent resilience,its interdependent and interconnected structure presents challenges as well as opportunities for coordinating public and private sector preparedness and protection activities.

Nuclear Reactors
  • From the power reactors that provide electricity, to the medical isotopes used to treat cancer patients, the Nuclear Reactors, Materials, and Waste Sector covers most aspects of civilian nuclear infrastructure.

Transportation
  • Transportation Systems Sector consists of 7 key subsectors: Aviation, Highway and Motor Carrier, Maritime Transportation, Mass Transit and Passenger Rail, Pipeline Systems, Freight Rail, and Postal and Shipping. The nation’s transportation system quickly, safely, and securely moves people and goods through the country and overseas.

Water and waste
  • The Water and Wastewater Systems Sector is vulnerable to a variety of attacks, including contamination with deadly agents; physical attacks, such as the release of toxic gaseous chemicals; and cyberattacks. The result of any variety of attack could be large numbers of illnesses or casualties and/or a denial-of- service condition that would also impact public health and economic vitality. The sector is also vulnerable to natural disasters.

  • Safe drinking water is a prerequisite for protecting public health and all human activity. Properly treated wastewater is vital for preventing disease and protecting the environment. Ensuring the supply of drinking water and wastewater treatment and service is essential to modern life and the nation’s economy.

CIKR Interdependencies

  • ICS play a major role in the operations of each sector. Because many of the individual sectors are interdependent, a failure in one sector could cause a significant impact on other sectors, and possibly place national security and safety at risk. It is important to recognize the interdependency between the sectors.

  • For example, an ICS failure in the Energy sector resulting in electrical blackouts will likely affect other CIKR sectors that depend on electrical power. Such a failure may have cascading effects on other sectors such as transportation, communications, and the water sector-all of which depend upon electrical power.

  • Interdependencies can create cybersecurity concerns when a failure in any of the dependent processes causes the process to fail.

Cascading Effects

Northeast Blackout

  • The impacts to critical infrastructure during the 2003 Northeast Blackout is an example of sector interdependency. This power outage affected 55 million people in Canada and the U.S.

  • Since then, research has been performed to understand the interdependency of critical infrastructure sectors; how the failure in one sector can have a significant impact on other sectors; and how these sectors can protect against cascading threats.

Northeast Blackout impacts

Water Supply:
  • Some areas lost water pressure because pumps lacked power. This loss of pressure caused potential contamination of the water supply.

    • Four million customers in 8 counties within the Detroit water system were under a “boil-water advisory” for 4 days after the initial outage.

    • Macomb County, Michigan, ordered all 2,300 restaurants closed until they were decontaminated.

    • Twenty people living on the St. Clair River claim to have been sickened after bathing in the river during the blackout.

    • The accidental release of 310-pounds of vinyl chloride from a Sarnia, Ontario chemical plant into the river was not revealed until 5 days later.

    • Cleveland also lost water pressure and instituted an advisory.

    • New York City reported sewage spills into waterways, requiring beach closures.

    • Newark experienced major sewage spills into the Passaic and Hackensack Rivers, which flow directly to the Atlantic Ocean.

    • The City of Kingston, Ontario, lost power to sewage pumps, causing raw waste to be dumped into the Cataraqui River at the base of the Rideau Canal.

Power
  • With the power fluctuations on the grid, power plants automatically went into “safe mode” to prevent damage in the case of an overload. This put most of the nuclear power plants in the affected area offline until they could recover. In the meantime, all available hydro-electric plants (as well as many coal- and oil-fired electric plants) were brought online, bringing some electrical power to the areas immediately surrounding the plants by the morning of August 15. Homes and businesses in the affected and nearby areas were requested to limit power usage until the grid was back to full power.

Industry
  • A large number of factories were closed in the affected area, and others outside the area were forced to close or slow work because of supply issues and the need to conserve energy while the grid was being stabilized. At one point, a 7-hour wait developed for trucks crossing the Ambassador Bridge between Detroit and Windsor, Canada because the electronic border check systems were down. Freeway congestion affected the “just in time” supply system in many metropolitan areas. Some industries (including the auto industry) did not return to full production until August 22.

Communication
  • Cellular communication devices were disrupted due to the loss of backup power at cellular sites, where generators ran out of fuel. Many cell phones failed without a power source for recharge. Wired telephone lines continued to work, although the volume of traffic overwhelmed some systems and millions of home users had only cordless telephones that depended on electricity to function. Most New York and many Ontario radio stations were momentarily knocked off the air but were able to return on backup power.

  • Cable television subscribers could not receive news, health warnings, or information until power was restored to the cable provider. Those who relied on the Internet were similarly disconnected from news sources for the duration of the blackout; with the exception of dial-up access from laptop computers, which were widely reported to work until the batteries ran out of charge. Information was available by over-the-air TV and radio for those who were equipped to receive TV and/or audio via antenna.

  • The blackout impacted communications well outside the immediate power outage area. The New Jersey-based Internet operations for Advance Publications were among those knocked out by the blackout, and Internet editions of their newspapers as far removed from the blackout area as The Birmingham News, New Orleans Times-Picayune, and The Oregonian were offline for days.

  • Amateur radio operators with independent power sources passed emergency communications during the blackout.

Transportation
  • Railroad service was stopped north of Philadelphia and all trains running into and out of New York City were shut down. Both were able to establish a “bare-bones” all-diesel service by the next morning. Canada’s Via Rail, which serves Toronto and Montreal, suffered service delays; but most routes were still running, and normal service was resumed on most routes by the next morning.

  • Passenger screenings at affected airports ceased and regional airports were shut down. In New York City, flights were cancelled even after power had been restored to the airports because of difficulties accessing electronic ticket information. Air Canada flights remained grounded on the morning of August 15 due to a lack of reliable power for its Mississauga, Ontario control center. This problem affected all Air Canada service and canceled the most heavily traveledflights to Halifax and Vancouver. At Chicago’s Midway International Airport, Southwest Airlines employees spent 48 hours dealing with the disorder caused by the blackout.

  • Many gas stations were unable to pump fuel due to lack of electricity. In North Bay, Ontario, a long line of transport trucks was held up, unable to go further west to Manitoba without refueling. In some cities, motorists who simply drove until their cars ran out of gas on the highway compounded traffic problems.

  • Many oil refineries on the East Coast of the U.S. shut down as a result of the blackout, and were slow to resume gasoline production. As a result, gasoline prices rose significantly across the U.S. In both the U.S. and Canada, gasoline rationing was also considered by the authorities.

Types of Facilities

Site

The process and discrete manufacturing industries produce products at site facilities. A site facility is usually physically protected within a fence or other enclosure. The tools and personnel who support the equipment are typically located at the site, and can therefore quickly respond to onsite problems.

Tranmission

Transmission facilities can span counties, states, or countries. They are the transmission lines or pipelines that carry electricity, oil, gas, and water over long distances. Transmission facilities include the railroads and highways that trains and trucks use to carry goods from the manufacturing facilities to distribution warehouses. Transmission facilities are usually unmanned. Maintenance and operational staff only visit these facilities when scheduled or called out for emergency repairs. Most transmission infrastructure, such as pumps and compressor stations, are in remote locations, which are also difficult to secure.

Generation

Generation facilities produce energy. They consist of electric generators and auxiliary equipment for converting mechanical, chemical, hydro, wind, solar, or nuclear energy into electric energy. Larger facilities are usually physically secured, but some facilities such as dams, wind or solar farms, and other remote operations are publicly accessible.

Distribution
  • Distribution facilities are used to distribute products to the customers. They provide the infrastructure to deliver electricity, water, and natural gas to our homes and businesses. Distribution facilities are located throughout the country, and many remote stations are generally unmanned. A fence or other barrier physically protects most distribution stations.

  • The controls and safety systems within these facilities are accessible through on-site visits and, increasingly, through remote access. Distribution facilities may be monitored and controlled from a central control center, or may be stand-alone systems. Communications to the central control center are crucial for monitoring and controlling many of the more remote distribution facilities; and because the communication paths are routed beyond the fence line, securing the data transmission paths becomes a concern.

Manufacturing (Discrete and Process)

The manufacturing process is used to produce a product-be it electricity, chemicals, plastics, food, pharmaceuticals, or cars. The processes used to manufacture these goods are broadly classified as either discrete or process.

Discrete
  • Discrete manufacturing results in the creation of products that can be easily differentiated; products you can touch, such as cars, books, toys, furniture, or cell phones. Discrete manufacturing is not continuous, meaning it can be started or stopped at any time, depending on production requirements.

  • Discrete manufacturing involves creating, assembling, and handling individual components to make a product. Discrete products are easily counted and are measured in units, as opposed to process manufacturing products that are measured by weight or volume. Assembly robots are often used in discrete manufacturing.

Process
  • Process manufacturing involves using formulas, much like a recipe, to take a set of ingredients to make a final product. Examples of process manufacturing include oil refining, chemical refining, food and beverage production, and pulp and paper production.

  • Process manufacturing is different from discrete manufacturing because once the final product is produced from individual elements, it cannot be taken apart to get the original components. For example, it would be hard to retrieve the resins, pigments, solvents, and other additives from paint after it was made.

  • Within process manufacturing, there are 3 types of processes: continuous, batch, and hybrid.

Continuous
  • Continuous processes require an uninterrupted flow of material from start to finish during the transition from a raw material to a finished product, such as the process used to make chemicals. Generally, a continuous process runs constantly unless interrupted by an unscheduled outage, usually caused by an emergency or equipment failure.

  • Continuous processes may be shut down for scheduled maintenance, sometimes referred to as turnarounds. Turnarounds are used to keep refineries running in a safe operational state; however, not every sector has scheduled maintenance periods. The turnaround time will vary between sectors.

  • The ICS used in continuous processes must be flexible to control all phases of the process: from startup, to continuous operations, to emergency shutdowns, to maintenance shutdowns. During continuous operations, such as in a refinery, the ICS is constantly adjusting the valves and pumps to keep the process within specifications.

Batch
  • A batch process has a starting and ending point. Batch processes are similar to cooking, in that you have a list of items or ingredients and a procedure (recipe) consisting of a series of steps for mixing the various components to create a product. As one phase of the batch is finished, the system will transition to another phase of the batch process. Pharmaceuticals and specialty chemicals rely heavily on batch automation to create their products.

  • Many batch manufacturers will procure a batch management system that is used along with the control system. Batch management systems work with the control system to execute batch processes. They are also used to manage recipes and records. Records management is especially important in regulated industries where the operators are required to audit the batch process.

Hybrid
  • A hybrid process uses a combination of continuous and batch controls. Water treatment is a good example of a hybrid process. Water flows through the treatment plant where disinfectants are injected into the water to kill bacteria. The chemicals cause particles to clump together, where they are removed through sedimentation or filtration.

  • For the most part, water treatment is a continuous process as it flows through the pre-treatment, filtration, and post-treatment processes. However, as solid particles build up in the filters they need to be flushed. This is typically done through a process called backwashing. Backwashing uses batch control to automatically operate valves and pumps in a series of steps to reverse the flow of water through the filters to remove the particles.

Process Dependencies
  • A process relies on a number of upstream and downstream systems to produce a product. These interdependencies create both cyber and physical security concerns. A failure in the supply chain or any dependent process can cause the main process to fail, or result in the manufacturing of inferior products that can fail at a later date.

  • Most process and discrete manufacturing facilities have a control system that monitors and controls the main process.

  • Processes are not islands; they do not stand alone. An ICS-controlled process may have numerous dependencies on upstream and downstream systems and processes or energy sources that may pose environmental and safety concerns. A process can be defined for any type of critical infrastructure.

Upstream, Downstream, Processes, Safety
  • Upstream is the material that provides the feedstock or raw material for the primary process. For example: We cannot refine oil without crude oil, nor can make polymers without their monomer feedstock. This is the main process responsible for producing the final product. Whether it is electricity, chemicals, plastics, food, pharmaceuticals or petrochemicals. Obviously, if the main process fails, we will not be able to produce the end product.

  • The downstream process is responsible for handling the end product. The end product is either used as an upstream product for other processes, or distributed to customers. If the downstream process fails we may be able to store the product. However, when the storage is full or there is no storage, we will need to shut down the primary process. If the product is electricity, generators will probably trip offline since there is no good method to store electricity and later feed it to the grid.

  • Processes that require thermal heating or for that matter, cooling will fail if the energy process cannot provide the heating or cooling resources needed. For instance, a refinery process will shut down if the flow of steam is interrupted. Most processes depend on electrical power. Unless the process has a backup energy source, such as generators or batteries. The process will fail without electricity. Many processes depend on pressurised air for power and control valves. Without air the control valves will not control the process. The process could have other dependencies like solar, hydraulic, wind or nuclear.

  • A process cannot produce waste indefinitely without some form of waste management. Some waste streams become product streams, but other waste streams must be treated or dispose. If a process is polluting or leaking hazardous materials, a regulatory agency such as the environmental protection agency may force asset owners to shut down the process. Finally, depending on the hazard of the process, and or environmental regulations, the system may have a safety system.

  • Safety Systems are separate from the control system and are designed to safely shut down the process at the primary control system fails, best protecting the people, the environments and the equipment.

Communication Dependencies
  • In addition to having process dependencies, most ICS must communicate to other systems or other ICS to function properly. This may be as simple as an operator looking at two different screens from two different ICS and manually adjusting the systems. Or it may require that two or more separate control systems are networked together to share information. The information transfer must occur for proper control to be achieved.

  • A cyber-based event could interrupt critical communications and cause a process to fail. The failure could have both upstream and downstream consequences, as wellas impact the process itself. This is why system and process availability is of paramount importance to control system asset owners.

What do you think the consequences would be if the events described below occurred?

  • If a safety system monitoring a critical process stops receiving data then it may shut down the process, despite the process operating correctly.

  • If a community warning system fails following a facility chemical spill then local residents may not be notified to evacuate.

  • If a leak detection system fails to alert operators then hazardous materials may be released.

  • If operators don’t receive an alarm that a power line is compromised then they may not be able to take action in time to prevent a blackout.

  • If a subway control room stops receiving updates from track detection sensors then the trains may be routed to the wrong track, and result in a crash.

IT and ICS

IT vs. ICS Priorities

  • The protection of data in information systems has traditionally been of primary importance, with the integrity and availability of that data following close behind. Sometimes this hierarchy is referred to as C-I-A (Confidentiality-Integrity-Availability). There are instances in traditional IT domains where the availability and integrity are critical elements (e.g., in real-time financial transactions), but for the most part, organizations are more concerned with keeping data from prying eyes.

  • The critical infrastructure systems requirements for availability can exceed five 9s (99.999% uptime). Services must be running 24 hours per day X 7 days per week X 365 days per year. Safety and resiliency requirements in ICS demand that system information be exchanged at millisecond or sub-millisecond rates. It is easy to see why the control system domain is much more concerned with availability and integrity than confidentiality. This hierarchy is referred to as A-I-C (Availability-Integrity-Confidentiality).

  • Historically, data on the networks did not mean anything except to the operators, so prying eyes seeing the data had little impact on whether the system was operating normally. The speed at which many of these systems operated suggests that by the time prying eyes did see the data in transit, it would be of no use to them. Although confidentiality does play a role in defending mission-critical control systems, the availability and integrity of the control system and the data in it remain paramount to operations.

  • In the IT world: Confidentiality is the Priority

  • In the ICS world: Availability is the Priority

  • The security focus of an IT group is to protect the system from threats, both intentional and unintentional, and from inside or outside the organization.

  • The security focus of ICS is to protect the system from use by unauthorized personnel, and to ensure the system maintains its functionality (availability and integrity) and continues to operate in a safe manner.

Example:

  • Sales Management: A simple example is a marketing executive trying to get the most recent sales figures. While the executive wants to ensure no unauthorized personnel will have access to the data, they also want that data to be correct. In this case, confidentiality is a priority followed closely by integrity of the data. In terms of availability, it is acceptable if the executive must wait a couple minutes to download the sales figures.

  • Natural Resources Management: The control systems within a refinery are responsible for cleaning the gas and pushing it into the pipeline. Under high pressure, the pipeline transmits the gas over long distances to delivery nodes. The information about how much gas is being refined or its destination is not secret. In many cases, the the information is published online. However, the timely and accurate information about the refining process itself, and the management of the pipeline, is critical. In this example, the availability of real-time data from the ICS running the refining process to the ICS managing the pipeline compressor stations is vital to safety and flow control. The integrity of the data is also critical as it allows operators to adjust control system parameters within the refining process and pipeline management activities. If the data reflect real-time operations and refining for the pipeline is unavailable or wrong, it can have significant negative consequences.

IT and ICS Security

Identify security focuses with integration of IT and ICS
The Changing Landscape

In the past, ICS were specialized stand-alone systems protected by a physical security perimeter (guns, guards, and gates) and controlled by onsite operators with manual switches and controls. In fact, many of the systems with analog/ manual controls are still in use.

Today, ICS owners and operators function under constrained budgets and are required to reduce the costs associated with managing and maintaining ICS. Control system technology has moved from using disparate, manual systems to interconnected digital systems and remotely controlled apparatus.

Although this evolution in system design is great for business and productivity, it bridges the air gap separating critical ICS from business and peer networks. While this has provided significant business benefits, it has blurred the boundaries between ICS and traditional security systems.

With the integration of IT and ICS networks, security concerns arise. The new interconnected architectures introduce new vulnerabilities, and there are now significant risks for ICS that were never a consideration before-such as worms, viruses, and unauthorized remote access.

Control system architectures, by their nature, operate with high trust. Security for these systems used to be focused on ensuring only authorized personnel had access to the control environment. Control systems were built with minimal security countermeasures, and asset owners assumed that anyone with access to the control system was authorized to interface with it. Unauthorized access can be a grave concern, as are the can be a grave concern, as are the consequences of malicious activity on the ICS and its potentially devastating downstream effects.

In the past, there were issues on who was the responsible authority for managing authenticators. IT personnel typically managed and provided login IDs and established clear policies for their use, but many ICS are unable to follow these policies.

Security Goals

The security goals between an ICS and an IT system are different but are base on the same principles. When we think about security, we generally define it using confidentiality, integrity, and availability.

Almost every instance to involving protection of information or information systems will fall into one of these categories. Business objectives often dictate how these categories and the activities to supporting them are prioritized.

Preventing unauthorized personnel from viewing protected information (confidentiality) is the main concern when implementing security controls for business systems. However, in a control system environment, availability and integrity exceed confidentiality.

Business system owners are concerned about the inadvertent disclosure of proprietary information. They are also concerned the information is correct, but are generally willing to wait to get the information. In an ICS environment the system must always be available and must send the correct instructions to the system it controls-so confidentiality is not the primary goal. The security focus of an IT group is to protect the system from threats, both intentional and unintentional, and from inside or outside the organization.

The security focus of ICS is to protect the system from use by unauthorized personnel, and to ensure the system maintains its functionality (availability and integrity) and continues to operate in a safe manner.

IT and ICS Communication

Compare IT and ICS communication

While both IT and ICS share similar-if not identical, technologies-the implementation and upkeep of these systems can be drastically different.

Determinism
  • IT Systems

    • IT systems generate network traffic on demand when a user requests resources or when maintenance activities are performed. As a result, there is a high amount of irregular traffic. The traffic is generated by any number of IT elements or events and can be sporadic and unpredictable. This is expected in corporate networks where diverse users using disparate resources perform a broad range of activities.

    • IT systems are used for a variety of purposes. As a result, a broad range of applications are used to support the diverse requirements of the organization. Therefore, they often expect and are often granted unfettered Internet access.

  • OT Systems

    • Most control systems are purpose-built and are designed to accomplish specific tasks and to perform those tasks continuously. It is repeatable, predictable, and designed so that fluctuations from normal operations can be easily detected. This is what creates an environment where the network traffic is highly deterministic. This is important in the control system domain because detection of anomalies and errors is vital to sustaining operations. Having this type of predictability and determinism can make it easy to set up effective intrusion detection system (IDS) monitoring for ICS networks.

    • The applications running in an ICS environment are typically limited to those required to monitor and control a process. They are limited in functionality and are specific to the task they were designed to perform.

    • Internet connectivity to ICS has traditionally been unavailable, primarily because ICS components are usually operated in an isolated environment. However, the creation of control system networks establishes many different business cases whereInternet access is required-especially where remote administration, remotevendor support, and budget limitations are important.

    • Don’t organizations prohibit their control systems from connecting directly to the Internet? Although most organizations prohibit their control systems from connecting directly to the Internet, some asset owners support it. There are two common operational benefits of allowing ICS to be connected to the Internet. - The ability to maintain and support systems with remote staff. - To enable vendors to provide support and updates to the system.

    • Is it common to see field equipment directly connected to the Internet as part of a larger control system operation? It is not uncommon to see field equipment directly connected to the Internet as part of a larger control system operation. These connections are often read-only and provide an operator at a control center real-time or near real-time system information. It was having email on the system and not being careful that was the downfall here.

IT and ICS Operations
Security/CIKR Compliance

ICS plays a major role in the functioning of critical infrastructures and key resource (CIKR) sectors. These assets, systems, and networks (whether physical or virtual) are so vital to the United States that their incapacitation or destruction would have a debilitating effect on security, national economic security, national public health or safety, or any combination.

The electric sector is subject to North American Electric Reliability Corporation (NERC) critical infrastructure protection (CIP) requirements. This is a comprehensive set of cybersecurity standards designed to protect critical cyber assets supporting the reliability of the North American bulk power system. Failure to comply with the CIP regulations can result in stiff penalties, and numerous entities have been fined.

SPower incident: First, losing connection to remote sites that are part of the power generation network could produce impacts to power generation. As they are a power company, they fall under NERC and lose of the site could be a regulatory issue. Power generation is part of CIKR. If the remote sites that had this problem were to go offline, that could cause problems delivering power to critical businesses.

https://inl.gov/content/uploads/2024/02/INL-Wind-Threat-Assessment-v5.0.pdf

Secure System Development
  • As cybersecurity awareness increases, the implementation of security controls is extended beyond how a system is configured and deployed, to how a system is developed. Secure application architectures, secure coding, supply chain management, and the procurement of secure applications have also risen in importance in reducing the targets of opportunity for cyber threats.

  • As many organizations go through the process of obtaining new IT systems as part of their ICS architecture, they include cybersecurity requirements directly in the procurement process. There are few things that can cause complications. The requirements that ICS must operate in high-availability, high-capacity modes make the implementation of some common security countermeasures difficult. This does not mean that cybersecurity countermeasures cannot be implemented; it means that special care must be taken to ensure the countermeasures do not impede the ICS operators from performing specific tasks, or prohibit them from performing urgent or time-sensitive actions.

  • There are some of the more common differences in control system environments? One was the use of individual usernames and passwords for each computing resource, and the requirement that passwords must be complex and changed regularly. While some ICS still utilize shared accounts, more now use individual user IDs and passwords effectively and efficiently so as not to impede an operator’s ability to access the system under duress.

  • Do many control systems require rigorous security practices? They do not. The continued use of these inferior practices is usually due to the age of the control system, the lack of opportunity to update the system, and the cost associated with trying to retrofit legacy automation systems with contemporary security controls.

  • ICS owners, regardless of the age of their systems, have several excellent options for implementing effective cybersecurity strategies. In some cases, the static nature of the architecture and the age of the systems can work to their advantage. ICS owners must balance security with functionality requirements and creatively apply best security practices, while enabling ICS operators to perform their duties unimpeded.

  • So, what happened at sPower? Firewalls that protect these sites should be on a firmware update cycle. DOE said in a statement sent to Archer News that the event is “related to a known vulnerability that required a previously published software update to mitigate.” That means someone had already found a security hole in the past, the device maker reported it and came up with a patch for it, but the power company may not have applied that patch. Also, if there was an attack (or problem) then they should have a backup way to get into the sites. This should be part of your disaster recovery plan.

Physical Security
  • Physical security is generally scaled in proportion to the criticality of the information being protected. In the IT domain, physical security prevents unauthorized access to locations where proprietary information is handled or stored.

  • However, many people are familiar with the concept of physical security as it relates to the electric power grid. Substations, transformer stations, and maintenance offices are usually well protected with fences, cameras, and possibly a security guard. Yes, but physical security for power stations located in urban centers is everywhere, and easily seen. There are also some unstaffed facilities in remote locations, and security is limited to a single gate or padlock.

  • Although additional security mechanisms may be protecting critical devices and critical cyber assets within the station yard, getting past perimeter defenses can be a trivial task, usually because of the cost associated with protecting remote assets.These systems can be a significant cyber risk if accessed by an adversary.

  • What can be done to reduce cyber risk? Establishing a robust physical security perimeter around ICS assets is critical to reducing cyber risk. Integrating corporate and control system networks is also justified because it provides a more cost-effective management model.

  • However, the blending of IT and ICS networks can negate the physical security perimeter and open the system up to threats across the world. Are firewalls part of physical security? Indirectly. The remote site firewall could be part of the physical security network. There might be cameras, sensors, or other security related controls attached to it. These controls would alarm back to the main control room and warn of any physical security breach.

  • During an unexpected reboot, sPower should have a backup way to contact the remote site so they can verify physical security controls. Worst case they would need to send someone out to the check the site-just to make sure it was not a physical security incident.

Change Management
  • Change management is a challenge for both IT and ICS. This is correct and there are two fundamental elements impacting the change management process.

    • The verification that the change is not going to impact the system in a negative way.

    • The time to make the change to the system.

  • So what happens after these changes? All changes made to an ICS, whether to operator consoles or field devices, must be tested. ICS asset owners cannot simply accept a vendor’s promise that a system patch or application modification will work. ICS administrators must implement the change in a test environment to determine what negative impacts are observed. This can be a long and expensive process, especially if the change is to be implemented across a wide range of production elements.

  • All sPower needed to do was have some change management in place to keep these firewalls up to date. The vendor should also be aware of any issues that firmware in a device might have. If a firmware upgrade was available, then they should have had it on a plan for install. They did do a good job of testing before deployment, as to not cause any additional issues.

IT and ICS Support
Anti-Virus and Anti-Malware
  • Being able to manage malware effectively is critical to the survivability and security of any network environment. In the IT domain, virus scanners and anti-malware technology are common and are placed on centralized servers, gateways, and individual user desktops. Organizations work hard to ensure their anti-malware defenses are optimized, and usually automate updates to anti-malware solutions across all cyber assets.

  • In the with ICS systems. Well, because of their specialized functions and limited processing capability, many ICS do not have the capability to run anti-virus software. Many anti-virus vendors do not have a product designed toward ICS protocols or threats. Signatures are not available for many ICS-centric malicious code variants. Also, timely updates to virus signatures are difficult to deploy because of the “always on” nature of ICS.

  • Do the computing resources traditionally used by an anti-virus impede control system software from operating at peak efficiency? Yes it can. Anti-virus technology can consume significant processing capacity during real-time and scheduled scans. If memory is not available for other critical processes, there is a significant risk software related to control system operations may not perform properly.

  • I thought control system vendors support the use of anti-virus on their ICS and provide specific guidance on which directories to avoid when scanning. Some do, although this may not solve the resource problem. It can mitigate historical issues associated with the anti-virus moving. It can also accidentally delete files critical to control system operation.

  • Virus scanners that are not up-to-date cannot provide protection against the latest threats from malicious code. To ensure anti-virus signatures and anti-malware capabilities are most useful, they need to be constantly updated.

  • The most efficient way to do this is to have a primary server obtain updates and automatically push them to the critical cyber assets.

  • Servers and individual workstations may need to be connected to the Internet to obtain the updates. In theory this makes good sense, but directly connecting to a server located in the control system domain to the Internet can introduce cyber risk. That does seem kind of risky. What else can be done? Well, the deployment of a secure file transfer mechanism across the boundary separates the corporate domain from the control system. Tasking administrative personnel with manually obtaining anti-virus and anti-malware updates and physically applying them to control system assets may solve some of the issues with maintaining up-to-date signatures for anti-virus solutions.

  • Manual updates may be cost prohibitive, however, and do not address the requirement that processing on ICS is uninterrupted.

Patch Mangement
  • Patch management is closely related to change management and can be a challenge for both IT and ICS administrators.

  • Did you know the disclosure rate of cybersecurity vulnerabilities is growing, and security patches are released almost daily by responsive vendors? That is usual for someone working with ICS systems. Patches are often applied on the IT systems as soon as they are released. However, there is still potential they will damage the system or network. IT domains face an increasingly threatening cyber landscape from both internal users and external adversaries. Maintaining up-to-date security patches is considered a normal best practice in countering those threats.

  • But many ICS systems are not built to run on contemporary operating systems. ICS operators rely on the isolation and supplemental physical security for control system operations to minimize the opportunity an attacker may have to exploit a vulnerability.

  • ICS asset owners must decide whether to implement patches at all. Often they are not applied unless they enhance functionality and sometimes not even then! I heard a vulnerability is only exploitable when the adversary is sitting at an operator console. The asset owner may consider that the only realistic threat actor to capitalize on this vulnerability would be someone who breaks into the facility control room or a malicious insider. The asset owner may choose not to apply the patch because of the physical security they have in place to protect against an unauthorized individual getting into the control room, as well as their exhaustive personnel screening program designed to mitigate an insider attack. Yeah, we feel that isolation from the Internet or corporate network reduces our cyber risk to zero.

  • With the pervasive use of portable media, such as USB or optical drives, malicious code or transport hacking software can be introduced into the system. Then an intruder can use the code or software against older or un-patched operating systems. The result is a growing number of un-patched control systems at risk of cyberattack.

Support Personal
  • ICS operators are the link between the system and the function it performs. Did you know that a system is only as secure as its weakest link?

  • Human error, whether accidental or intentional, can wreak havoc on an ICS and bring the functionality and safety of the system to a halt. The most prevalent human threats are untrained operators causing accidental infections. Untrained operators may accidentally infect the ICS system or network by not following policies (such as not accessing email or the Internet through the ICS network) and inadvertently introduce malicious code into the system through spam, spyware, or phishing campaigns. The potential for damage is increased because an untrained user does not know what the indicators of compromise are and may not realize the system is infected until it is too late.

  • While IT faces the same human threat as ICS, the consequences for ICS can be far reaching and could impact the safety and well-being of millions of people and devastate a company’s finances and industry reputation.

Security testing
  • The goal of security testing is to gain a comprehensive understanding of the cyber-risk profile of a system. Testing provides insight into system vulnerabilities, as well as workable countermeasures, and helps asset owners determine how best to improve their security. Testing objectives can be broad in nature, and the methods used to perform security testing can be diverse.

  • It focuses on confidentiality, integrity, and availability. Contemporary testing tools and test frameworks are used to categorize the areas where the security of a system may be weak and where the defenses are functioning as intended.

  • Security testing tools can be configured to model a range of attack types with different capabilities. They are designed to be used on resilient, stable, and robust IT systems. These testing tools scan large numbers of systems for a variety of vulnerabilities, and complete the test over a short period. As a result, security testing can be aggressive.

  • ICS devices and components were not designed to withstand the high volumes or types of traffic used by most testing tools, and intense scanning can often cause undesirable effects. Most ICS operate in highly deterministic and predictable ways, and traffic flows are usually within a well-defined norm.

  • Aggressive testing can introduce abnormal conditions, causing the system to experience tremendous stress and result in system failure, improper operations, or physical damage to the equipment. Some security testing tools are beginning to incorporate the capability to test control systems. Security researchers and vendors are working together to create test frameworks and plug-ins to allow asset owners to obtain vulnerability data, without causing undue stress and damage to the ICS elements being tested.

  • How effective is it? The effectiveness of security testing and the impact of testing on the control system are governed by the level of expertise and experience of the tester. Individuals with only testing experience on modern IT systems should not be allowed to test control system environments unless they have been properly trained to minimize the consequences associated with aggressive scanning techniques.

  • Security tests for ICS should only be performed in a test bed or sandbox because testing within a production environment can often result in unforeseen and undesirable consequences.

Incident Response and Forensics
  • As part of an effective defense-in-depth strategy, organizations need to be able to respond to cyber incidents and investigate how those incidents happened. Most organizations have an existing incident response capability to manage those cyber incidents that impact normal operations.

  • Incident response and forensics techniques applied to the IT domain are straightforward. Well-documented processes and procedures and numerous recommended practices and guidelines are available to assist organizations in developing their incident response plans and the methods for executing a forensics investigation. Modern IT equipment meets most, if not all, the requirements necessary to carry out an investigation, and provides well-developed capabilities to enable effective incident response and attack mitigation. Most common commercial off-the-shelf tools have been developed with traditional IT in mind.

  • Issues can prevent IT security technologies from being effective for ICS incident response. The age of the average ICS can negatively impact both the incident response capability and forensics effectiveness. The technical capabilities required to facilitate appropriate response and investigation techniques, such as event logging, do not exist in many of the control systems in operation today. Luckily, some ICS vendors are making it easier for asset owners to perform incident response and forensics by moving to more contemporary operating systems as the platform for control system solutions.

  • By doing so, asset owners investigating control system cyber incidents can take advantage of the response and investigation tools designed for modern computing platforms. Often, both IT and ICS are required to simply reboot an impacted computing resource, reconfigure equipment, or rebuild/restore from a system image to facilitate a quicker restoration of the system.

  • Rebooting or rebuilding systems for either IT or ICS can result in loss of forensics data. These methods may destroy evidence artifacts that could have been used to determine root cause(s) of the cyber incident.

Outsourcing
  • Competitive technology markets, an increasingly mobile workforce, and resource limitations have caused many organizations to outsource the management and implementation of their IT functions. Cloud computing, infrastructure as a service, software as a service, and security as a service are being used more throughout the IT domain. These benefits could possibly outweigh the risks for companies striving to maintain a competitive advantage, and to ensure their data are available anywhere, any time.

  • The implementation of IT services is well known and understood; however, ICS poses a much greater challenge to a company offering services outside the confines of the corporate network. ICS operations are much more sensitive to failure than IT systems. ICS also require in-depth knowledge of the operational technologies and their limitations to function at expected performance levels for availability and integrity.

  • ICS are under more stringent levels of oversight and require experienced technicians to manage and maintain them.

  • Sadly, outsourcing for ICS operations is not at the same level of maturity as IT operations.

  • The risks to critical infrastructure and the consequences of service failures should be carefully examined before outsourcing is considered for ICS.

Conclusion
  • In August 2017 and the Saudi plant has reported another breach! Reports say that a flaw in the code gave the hackers away before they could do any harm.

  • A flaw in the code triggered a response from a safety system in June, which brought the plant to a halt. Unfortunately, the first outage was mistakenly attributed to a mechanical glitch. Then in August 2017, several more systems were tripped, causing another shutdown. After the second outage, the plant’s owners called in investigators. The malware was finally found. The hackers appear to have been inside the petrochemical company’s corporate IT network since 2014.

  • The hackers could have been inside the petrochemical company’s corporate IT network since 2014. They probably found a way into the plant’s own network. Most likely through a hole in a poorly configured digital firewall that was supposed to stop unauthorized access. They then could have gotten into an engineering workstation, either by exploiting an unpatched flaw in its Windows code or by intercepting an employee’s login credentials.

  • ICS vendor Schneider Electric has drawn praise for publicly sharing details of how the hackers targeted its Triconex model at the Saudi plant, including highlighting the zero-day bug that has since been patched.

  • How do the personnel play into all of this?

  • Well, first of all, the June Triconex system outage occurred on a Saturday evening. This is a time when most engineers aren’t typically in the plant. Secondly, the petrochemical firm called in Schneider to assist in troubleshooting the Triconex system failure. The vendor pulled logs and diagnostics from the machine, checked the machine’s mechanics, and, after later studying the data in its own lab, addressed what it thought was a mechanical issue.

  • We were just talking about vendors. How did they play a part?

  • In June 2017, the end user suspected an issue had occurred with their safety controllers and took one Triconex system offline, completely removing the Main Processors, and sent them to Schneider Electric’s Triconex lab in Lake Forest, Calif. Schneider said “At this stage, the end user wasn’t aware of a cyber incident. At the time, Schneider Electric was not onsite to review the safety controllers.” Once these controllers were removed from power, the memory was cleared and there was no way to conclude that the failure was the result of a cyber incident. Schneider was only able to analyze whether the controllers were working correctly within their safety function. After they completed their analysis, they determined there was no fault with the system components and they returned them to the end user.

  • What about security testing? Was any of that done?

  • It is not known what types of security testing occurred, if any. We do know the petrochemical firm called in Schneider to assist in troubleshooting the Triconex system failure. The vendor pulled logs and diagnostics from the machine, checked the machine’s mechanics, and, after later studying the data in its own lab, addressed what it thought was a mechanical issue. No malware was found.

  • Were any teams called in for investigations?

  • A forensics and incident response team was not called in during the June incident. However, a team was called in for a rapid-response engagement after the now-infamous August shutdown of the Saudi Arabian firm’s Triconex ESD system. This company was very lucky. If there wasn’t a flaw in the code, it would have still gone undetected. Many believe that something bigger was being planned but was interrupted.

Embedded Systems

  • Embedded Systems is computer system consisting of hardware and software specifically defined for a specific purpose or dedicated task. (Workstations, laptops and servers are not embedded system).

  • Embedded system used in ICS

    • Programmable Logic Controller, Remote Terminal Unit, DCS controllers, Intelligent Electronic Devices, field devics (HART, Foundation Fieldbus, Profibus, Devicenet)

    • Network/communication equipment (Routers, switches, modems, radios, terminal servers, gateways, firewall and other security appliances)

    • Others (GPS, time synchronziation, network printers, hand-held configuration devices, test equipment)

Field Controllers

  • Processors (X86, PowerPC, ARM, MIPS)

Memory
  • Non-volatile Memory

    • Flash memory, EEPROM, EPROM, ROM

    • Firmware (boot code, real time operating system (RTOS), application program)

  • Volatile Memory (lost after power; much less susceptible to being able to manipulate or take items from)

    • RAM

    • Variables, stack, buffers

Input/Output
  • Discrete, Analog, Fieldbus (4 to 10 milliAmps or 0-10 Volts)

Communication Ports
  • Serial - RS232, RS422/485, USB, modems, radios

  • Network - Ethernet radio, ControlNet, LonWorks

User interface
Internal
  • Status lights, small LCD screens (HMIs), keypads, jumpers, dip switches, switches

External
  • Browers (allows to see the status, working of the devices), Applications (always check if the applications can be shutdown, is there a business use-case for them?). Remember the smaller the attack surface area the better!

Programs
  • RTOS (Neutrino & RTOS (QNX), VxWorks, Windows CE)

  • IEC 61131 program languages - Workbences (CoDeSys (allows the ability to program in anyone of the below languages), ISaGRAF) - Languages

    • Ladder Logic

    • Function Block Diagram (FBD)

    • Sequential Function Chart (SFC)

    • Structured Text (ST)

    • Instruction List (IL)

  • Device Drivers and Device Managers

    • Ethernet/IP Stacks

    • RS232/RS-485

    • Memory Managers

    • User interfaces

  • Services (Web server, FTP server, SNMP) (Any business case for these running? If not, turn them off)

  • Debuggers (data for troubleshooting, are we turning it off after debugging? Often, debuggers are turned-on exposing data and possible vulnerablities)

Programmable Logic Controller

Program Execution
  • A line of code in a PLC program is called a rung.

  • PLC program execute from left to right and top to bottom.

  • Each completion of the program is called a scan.

  • A PLC will complete many scans in a single second (Scan rate: 50-60 milli-seconds/scan; SCADA system scan rate is approx 2 mins; metering at home (water/energy) is approx 15-30 mins).

Programming Concepts
  • Each rung executes on an “IF-Then” principle

  • IF the instruction(s) on the left are true then execute the instructions on the right.

  • Direct/Normal Open Contact

  • Direct/Normal Open Output Coil

  • Reverse/Normally Closed Contact

  • Placing multiple rungs (branch) on a single rung = OR

  • Placing multiple inputs on the same rung = AND

ICS Process Flow

Process Data

  • Processes are designed to change the chemical or physical properties of upstream materials to more useful downstream products. These changes must be monitored and controlled during this transition. Accurate and timely measurements on the physical and chemical properties of the process are crucial to controlling the process, as well as the outcome. These measurements referred to as process data.

  • Numerous field devices are used to generate process data. Flow, level, temperature, and pressure are common sources of process data; however, many other parameters of complex processes must also be measured. For example, a water treatment plant relies on accurate pH and turbidity measurements in addition to flow, level, and pressure to produce clean water. An ICS uses these measurements for adjusting control devices to ensure the product (in this example, treated water) meets specifications.

  • Process data are not only used for control. For example, the data can also be used to:

    • Meet regulatory requirements

    • Track energy costs

    • Provide design inputs for the process enhancements (i.e., identify and justify capital improvements)

    • Control inventory/ determine product pricing

ICS Process Flow – Control Loop

  • A typical ICS contains numerous control loops, human interfaces, as well as remote diagnostics and maintenance tools. They are built using an array of network protocols on layered network architectures, allowing ICS support staff and vendors access to diagnose and correct operational problems.

  • A control loop, single-loop control is the fundamental building block of industrial control systems. It is communication used to regulate the process. It consists of a group of components working together as a system to achieve and maintain a desired value of a system variable by manipulating the value of another variable in the control loop.

  • For example, a field device sensor produces a measurement of a physical property and sends this information as controlled variables to the controller. The controller interprets the signals and generates corresponding manipulated variables, based on set points, which it transmits to the actuators (field devices), such as control valves, breakers, switches, and motors. These field devices are used to directly manipulate the controlled process based on commands from the controller. Control can be fully automated or include a human in the loop.

ICS Process Flow – Diagram

  • This graphic depicts the ICS process flow. We see the field devices that provide the process data. This is where the actual physical process happens, be it the mixing of chemicals or the management of trains, or the measuring of the pressure of gas at a certain point in a pipeline.

  • A field controller collects information from field devices and assesses, manages, and processes state information about the process.

  • The HMI monitors the information and presents it to an operator. The operator uses the HMI to observe the process, watch for events and alarms, and to make decisions or adjust the system to keep the process stable and safe, as required. The operator function can be performed by a person, or a system such as an EMS, DCS, or any a specialized system that may be unique to a particular sector.

ICS Control Loop – Cascading Control

  • Sometimes these control loops are nested and/or cascading – whereby when multiple sensors are available from measuring conditions in a controlled process, a cascade control system can often perform better than a traditional single-loop. For example, the steam-supplied water heater shown heats water using cascade control. The second controller has taken over responsibility of manipulating the valve opening based on measurements from a second sensor monitoring the steam flow rate.

ICS Data Flow

  • The data flow through ICS varies by vendor, topology, and protocols. The network can be wired or wireless, but it links the components of the ICS. The HMI receives information from the field controllers, relaying information through communication protocols and providing the operator with a view of what is happening in the process.

  • This diagram is a simplified representation of an ICS communication network. The process is controlled by an application running inside the field controllers, which communicates with a series of field inputs and outputs devices. The field controllers consolidate the data and transmits it to the HMI stations where it is presented on displays.

  • ICS collect information about some process or function using a communications infrastructure to send the data back to an operator. The operator reviews the data, typically in a graphical format, assesses the operational status of the process, and tunes the system for optimal performance.

  • Field Devices are the instruments and sensors that measure process parameters and the actuators that control the process. This is the interface between the ICS and the physical process. These sensors or measuring instruments are often referred to as input devices because they “input” data into the ICS.

  • Field Controllers are responsible for collecting and processing input and output information, sometimes referred to as I/O. They also send the process data to the human machine interface (HMI) and process control commands from the operators. They are often located close to the field devices.

  • Servers, HMIs, and engineering workstations take the information from field controllers and display the data in a manner that depicts what is happening in the process. The user interface, usually referred to as the HMI, allows the operator to have a real‐time, or near real‐time, operational view of the process. These three components are linked using networks or communication channels.

    • Field Devices (Meters, Sensors, Valves, Switches) <——-> Field Controllers (PLC, IED, RTU, Controller, PAC) <———–> HMI (SCADA Server, HMI, Workstations, EMS)

    • Direct connection or Device level protocols (HART, Foundation Fieldbus, Profibus) <———-> Command and Control Protocols (DNP3, Modbus, Ethernet/IP)

    • Field Controllers –> Primary Historian –> Secondary Historian

      |—> Configuration Database —> HMI —-> HMI

  • Protocols (ANSI X3.28, BBC 7200, CDC Type 1/2, Contitel, DCP, DNP, Gedac 7020, ICCP, Landis, Modbus, OPC, ControlNet, DeviceNet, DH+, Profibus, Tejas 3/5, TRW 9550, UCA)

  • Indusoft (HMI Software?)

  • Connected Components Workbench

https://www.rockwellautomation.com/en-us/products/hardware/allen-bradley/programmable-controllers/micro-controllers/micro800-family/micro850-controllers.html

https://www.plcfiddle.com/

ICS topology

  • ICS network topologies are similar to IT topologies. However, there are some fundamental differences critical to ICS applications. ICS networks require redundancy to ensure availability, which is not a common practice in IT. In addition, many older ICS networks use proprietary technologies, which are not used in the IT domain.

  • Some similarities that an IT network engineer might notice are the serial technologies. RS232, RS485, and Ethernet reside at the physical layer in the OSI stack and are common to both domains. RS232, which is an older technology in the IT domain, is still commonly used for point-to-point communications in RTUs and PLCs. RS485 is the foundation for many proprietary control networks used at site facilities.

Bus

The bus topology is constructed on a single cable, referred to as the bus, that each node on the network connects to. Each of these nodes passively listens for data being transmitted along the bus. If one node wants to transmit data to another node along the bus, it sends out a signal to the entire network, letting everyone know that a transmission is occurring. This transmission then travels down the bus, being ignored by all other nodes until it reaches its destination node and is accepted.

Ring

The ring topology is constructed from a closed loop cable, known as a ring, that each node on the network connects to. In this topology, the network forms a circular shape and data is transmitted clockwise via a token that each node in the network actively listens for. If a node does not want to transmit data, the node will act as a repeater and send the token around the ring. If a node does want to transmit data, it must wait until the token makes its way to the node and is no longer carrying data.

Star

The star topology is constructed from a central device, either a switch, router, or hub, which every other node in the network connects to. In this design, each distinct cable only connects two physical devices, with one end hooking up to a node on the network and the other hooking up the central device. If one node wants to transmit data to another node, it must send its transmission to the central device, which will then act as a relay station and pass along the transmission to the destination node.

Polling Methods

  • An ICS master station repeatedly communicates with field controllers through a process called “polling.” The master station will send a request for updates whereby a field controller, such as an RTU or PLC, responds by sending back the requested information. assets are polled to verify they are still functioning as expected. When the asset is polled, it sends data about it’s state back to the SCADA, and if everything is OK, it will be polled on a regular schedule. If it’s not OK, it will be polled again to determine whether a technician needs to get to the asset to perform maintenance on it.

  • The poll can contain a general request such as a pressure reading or it can be related to specific metrics. An example could be that the pressure has exceeded required parameters. Has it overheated? Has it seen events that would cause you to believe the life cycle is being degraded? This polling process can happen every couple of seconds, every couple of minutes, every couple of hours, days, months or years. It is dependent on the asset, the role the asset plays in a system, and the risk that asset presents.

Master/Field Controller Relationships
  • Some field controllers are assigned a higher priority where they are polled more frequently than other field controllers. For example, critical field controllers may be polled twice a second, while lower priority devices are polled once per minute.

  • Polling rates also depend on the information being reported. The status of regional high voltage transmission lines will be checked more frequently than ambient temperature because the line status can change quickly compared to relatively slower temperature changes.

  • This image shows several types of relationships between the master and the field controllers. For instance, a master can communicate directly with a single field controller or with multiple field controllers. Multiple masters can also communicate with a common field controller. Most modern control system networks are made up of a blend of communications methods, topologies, architectures, and protocols, which introduces complexity when creating security mitigation strategies

Physical Media

A number of choices are available to support the physical layer in ICS networks. Selecting the proper physical media depends on a number of factors, including costs, reliability, data rates, distance, topology, and availability. Some options are easier to secure than others.

Leased Line
  • Leased lines are dedicated communication circuits, usually provided by a phone company. They are popular options for ICS communications to remote facilities because they are ubiquitous and affordable. Typically, the installation costs are minimal, and the monthly recurring costs are reasonable. The two types of leased lines are analog and digital.

  • Analog leased line circuits are older lines and limit ICS communications to 9,600 bits per second, although there are some technologies that can squeeze higher data rates out of these circuits.

  • Digital leased lines support higher data rates, but they are not available in all areas. Leased lines are not as secure as dedicated lines because the asset owner does not control the physical infrastructure or access. In addition, the leased line infrastructure is not isolated from other phone systems, creating a potential vulnerability from outside sources

Dedicated Lines

Dedicated lines are more secure than leased lines, because they are owned and managed by the asset owner. Unlike leased lines, dedicated lines are not shared with the public, so the exposure is reduced. The capital costs to install a dedicated network can be substantial because of labor and material costs; however, the recurring costs are generally lower than leased lines.

Wired Media – Copper and Fiber

Fiber and copper lines provide the physical layer in an Ethernet-based environment; this medium is often referred to as wired media. These environments are used extensively in ICS networks. They are so popular because they offer fast, reliable, and inexpensive services. Fiber optic cabling is known to be harder to tap into than copper-wired media. However, tapping an optic cable is not impossible therefore, a significant cyber risk still exists.

PowerLines

PowerLine1

  • Power line carrier systems transmit data on electrical conductors. The single main advantage of Power Line Communication is that the power line infrastructure ensures coverage almost everywhere. Power line communication systems are favored by utilities because they allow data to reliably move data over an infrastructure they control.

  • Electric, gas, and water utilities are adopting power line communication as a means to communicate with meters, enabling them to send and receive information on current consumption, gather diagnostic information, and remotely manage loads. Building owners and managers are also using power line communication to monitor, diagnose, and control loads such as lighting and HVAC systems.

PowerLine2
  • Broadband over power line (BPL) is a technology that allows data to be transmitted over utility power lines. This technology uses medium wave, short wave, and low-band VHF frequencies, and operates at speeds similar to those of digital subscriber line (DSL).

  • BPL has existed for many years, but so far, has not been implemented in the United States on a broad scale because of signal degradation, and technical difficulties involving interference with radio signals used for communications by public safety officials during emergencies.

  • The term used for these traditional systems is Power Line Communications or Power Line Telecommunications (PLT). In Europe, the term Power Line Communications is also used for the Broadband data transmission, whereas in North America, the term Broadband over Power Lines (BPL) is more commonly used.

Wi-Fi
  • Wi-Fi is a wireless networking protocol that allows devices to communicate without direct cable connections. Wi-Fi is common in local plant operations. Longer distances are also possible with the use of directional antennas. WI-FI’s low cost makes it extremely attractive solution. From a security perspective, vendors are able to provide several different authentication and encryption technologies. For more information, see the Recommended Practice Guide, “Securing WLANs using 802.11i.”

Radio Frequency

Radio Frequency is any of the electromagnetic wave frequencies that lie in the range extending from below 3 kilohertz to about 300 gigahertz and that include the frequencies used for communication signals (as for radio and television broadcasting and cell-phone and satellite transmissions) or radar signals Radio frequency communications is commonly used locally within a site facility for plant operations, as well as for some long-distance communications to transmission and distribution facilities. The speeds associated with older radio systems are slow and analogous to those seen when using low-speed modems, although some proprietary spread spectrum radios do support Ethernet.

Microwave and cellular

Microwave and cellular data transmission rates are fast, but the installations can be expensive, and they ultimately require line-of-sight to achieve “five nines” availability. Cellular communications leverage existing telephone networks, and many vendors have incorporated either CDMA (Code Division Mobile Access), which is being gradually phased out, or GSM (Global System for Mobile communications), commonly known as 3G and 4G, respectively, into their products.

Communication channels

A communication channel is a physical transmission path that data follows from one device to another. The medium can be a wire or a logical connection. Protocols define the rules for the communication and detail the interactions between these devices. The protocols include mechanisms for how the devices identify and make connections, arranging rules to specify how the data is packaged into sent and received. Basically, protocols specify interactions between the communicating entities.

ICS Communication Channels

The three main segments of an ICS – field devices, field controllers, and HMI – are connected using these communication protocols. As illustrated, the communications between the field devices and field controllers are separate from command and control communications. Initially, ICS were isolated systems running proprietary control protocols using specialized hardware and software. Widely available, low-cost Ethernet and Internet Protocol (IP) devices are now replacing the older proprietary technologies

The communications medium and the protocols used in ICS will depend on the device selected, the requirements of the system, and the business. For example, organizations dealing with the reliability and management of a bulk power system, in real time have different data communication requirements than an organization that needs to measure the water level in a massive reservoir once or twice a day.

ICS Common Protocols

  • There are dozens of protocols used in the ICS domain. Many of these protocols were developed to support a specific technology, and as such, are uncommon or only applicable to a single vendor. Some ICS devices are old—they can be in use 25 or 30 years—and use proprietary protocols developed by vendors that are no longer in business. The owners of these systems often resort to buying used equipment to keep their systems operational.

  • Most, if not all, common ICS protocols are openly published and available for review. The protocols are typically transmitted in clear text, meaning they are not encrypted. This makes them easy targets for eavesdropping and subject to man-in-the-middle (MiTM) attacks. Many of the older protocols were adapted for a network environment by “wrapping” them in TCP/IP packets. This does not improve security because TCP/IP is not a secure protocol.

  • The ICS vendor community has been under pressure by the ICS owner/operator community to move toward greater inter-operability, and toward a more common set of protocols for communications. Unfortunately, many of these protocols are not secure by design—they were designed for reliability.

Modbus

Modbus is one of the oldest and most popular ICS protocols in use today, largely because of it’s openness and simplicity. Modbus is a digital communication protocol for two or more devices to talk to one another. Modbus is related to the application-level protocols of the Open System Interconnection (OSI) network model. The physical layer is not specified in the Modbus protocol. As a result, Modbus implementations are not limited to a single communication media. This frees the communications engineer to select the best physical media for transporting Modbus packets. It has an open source code, which allows most field controllers to support Modbus, and this has made it very popular. Click the icon below to learn how Modbus is used.

Modbus

  • Simple protocol

  • Low cost development

  • Minimum hardware requirement to support

  • Master/slave protocol

  • Communicates with up to 247 devices

  • Uses standard TCP/IP protocols

Modbus can be found in:

  • Industrial Buildings

  • Commercial Buildings

  • Infrastructure

  • Transportation

  • Energy Applications

Modbus – Master/Slave Architecture

Modbus is a serial communications protocol, which acts as a message structure to a establish a master/slave or client/server communication between intelligent devices. This means that a master device talks to all the other devices on the network. It can query them for information or tell them what to do. Unlike most other protocols, however, Modbus is used for both command and control and device level communications.

Modbus – Protocol Versions

There are several versions of the Modbus protocol because the protocol was originally developed for serial connections but has been adapted for the networking world. These include:

  • Modbus ASCII – Original serial version. Data is transmitted in ASCII characters, which makes it easy to troubleshoot when there are problems

  • Modbus Plus – An extended, proprietary version that runs on RS485

  • Modbus RTU – Serial protocol that transmits data in binary form, making data more compact and transmission more efficient than the ASCII version. More commonly used than Modbus ASCII. (Based on Serial communication like RS485, RS422, and RS232.)

  • Modbus TCP/IP – The TCP/IP encapsulated version of Modbus. (Based on Ehternet.)

Modbus – Authentication & Authorization
  • To facilitate interoperability in modern networks, the Modbus Application Protocol (MBAP) header is dropped onto the TCP/IP stack at the application layers in both the OSI and Advanced Research Projects Agency (ARPA) models. This creates a cybersecurity situation where an insecure protocol is using an insecure transport mechanism to perform mission critical and vital operations.

  • The TCP/IP Modbus payloads provide enough intelligence to analyze the traffic. It is also easy to create a list of Modbus-aware field controllers, which attacks can leverage. In addition, no authentication or authorization is required to communicate with a Modbus device. The default port that Modbus/TCP uses is 502. The original performance requirements that have been ported over to new transport mechanisms have not taken into consideration the impact of non-structured data on delicate field controllers

Modbus – Vulnerabilities

Modbus Flooding Attack

  • Modbus Flooding: The Modbus protocol, like many control protocols, does not include any mechanisms to protect confidentiality, although there is Cyclical Redundancy Check (CRC) integrity checking. CRC is a common method used by ICS protocols to determine if the data were unintentionally changed during transmission.

  • The original Modbus protocol does not protect the system from malformed packets and out-of-scope data storms. As a result, attacks such as denial of service, session hijacking, and integrity compromise, are easily executed against the Modbus protocol.

  • One attack example is called ModBus flooding. The aim of the attack is to control the system through this flood of messages, effectively drowning out legitimate commands from the HMI.

  • Modbus protocol is a master/slave protocol: the master reads and writes slaves’ registers.

  • Modbus RTU is usually used via RS-485 (serial network): one master is present with one or more slaves. Each slave has an unique 8-bit address.

  • Modbus data is used to read and write “registers” which are 16-bit long.

    • Holding register: 16-bit; readable and writable

    • Input register: 16-bit; readable

    • Coil (Discrete Output): 1-bit long; readable and writeable

    • Discrete input (Status Input): 1-bit long; readable

Distributed Network Protocol 3 (DNP3)

  • DNP3 is a communication protocol used in SCADA and remote monitoring systems. DNP3 stands for Distributed Network Protocol 3rd version. It is widely used because it is an open protocol, meaning any manufacturer can develop DNP3 equipment that is compatible with other DNP3 equipment. Because DNP3 was designed to support communications with geographically dispersed facilities, it is also used extensively by the oil and gas, water, and wastewater sectors to communicate with distribution and transmission facilities. It supports communications between station computers, RTU, IED. DNP3 also:

    • Provides features and functions missing from Modbus

    • It is an open protocol, therefore numerous vendors support it

    • Most often uses TCP, but also supports UDP

    • Uses Port 20000

    • Traffic is sent in plain text

    • Does not provide for authentication or authorization

    • Originally designed to operate on serial communications, but has been migrated to work on IP

DNP3 – DNP3 Application
  • DNP3 uses the TCP/IP protocol stack and exists on top of the transport layer (TCP or UDP). Three distinct layers contained within the DNP3 application are DNP3 Data Link layer, DNP3 Transport layer, and DNP3 Application layer.

  • Just as Modbus DNP traffic is sent in plaintext, DNP3 connections are susceptible to session hijacking, denial of service, and other attacks found in modern networking environments. Although the DNP3 protocol was designed to be very reliable, it was not designed to be secure from attacks that could potentially disrupt control systems or disable critical infrastructure.

  • DNP3 does not natively provide authentication or authorization as a function of the protocol standard; however, the security specification extensions developed for DNP3 are now compliant to the IEC 62351-1 standard (International Electrotechnical Commission) and, when used, provide mitigation to some modern attack methodologies. Even though DNP was originally designed to operate on serial-based communications, the migration to IP has been successful and embraced by the ICS community.

  • Recognize that the assignment of the protocols is a function of the port used, and not necessarily the payload of the packets. For instance, if the screenshots were taken of two devices using a torrent server for music file downloads using port 20000, the protocol would be classified as DNP because DNP is the standard protocol mapped to Port 20000 using IETF (Internet Engineering Task Force) Port allocations.

Inter-Control Center Communications Protocol (ICCP)

  • Inter-Control Center Communications Protocol (ICCP), also known as the Telecontrol Application Service Element 2 (TASE.2), is a vendor-independent standard protocol. It is designed specifically for real-time data exchange between ISO (Independent System Operator) control centers, power pools, regional control centers, transmission utilities, distribution utilities, and generation facilities over LAN and WAN.

  • ICCP is based on client-server communication. All data transfers originate with a request from a control center (the client) to another control center that owns and manages the data (the server). ICCP also provides services for data transfer, depending on the type of request. For example, if the client makes a one-time request, the data will be returned as a response.

  • If the client makes a request for the periodic transfer of data or the transfer of data only when it changes, the client will first establish the reporting mechanism with the server. This will specify reporting conditions such as periodicity for periodic transfers, or other trigger conditions such as report-by-exception only. The server will then send the data as an unsolicited report whenever the reporting conditions are satisfied.

ICCP Security
  • ICCP provides the ability to read objects, make configuration changes on remote objects, and control objects. Because the protocol is clear text, visibility to network traffic allows an observer to gain important information regarding relationships between clients and servers.

  • Standard ICCP is inherently insecure, however, a version called Secure ICCP can be used. Sites not using Secure ICCP should consider using OpenSSL, IPSec, and data link encryption to provide inter-node data security for standard ICCP communications

Fieldbus

  • Fieldbus is a generic term that describes not one protocol, but a collection or group of industrial computer, digital communication protocols. The idea behind Fieldbus was to eliminate any point-to-point links. Basically, Fieldbus works on a network that permits various topologies, such as the ring, branch, star, and daisy chain.

  • Fieldbus is a LAN dedicated to industrial automation. It replaces centralized control networks with distributed control networks and links the isolated devices such as smart sensors/transducer/ actuators/controllers. Click the button below for Fieldbus characteristics.

  • A few of the characteristics of the fieldbus include:

    • Bi-directional – This means it is a duplex port; the data can be transmitted in two directions at the same time.

    • Multi-drop – This is also referred to as multi-access and can be interpreted as a single bus with many nodes connected to it.

    • Serial-bus – This means the data is transmitted in small packets in a sequential manner.

    • Multiple Topologies – Fieldbus works on network structures such as daisy-chain, star, ring, branch, and tree topologies.

Fieldbus – Levels
  • A simple fieldbus consists of four main levels. As the levels increase the level of complexity increases.

  • Level 4 – The most complex level where all computers and departments are located. This computer-driven level allows data monitoring, file management, and file transfer at a large scale.

  • Level 3 – This is where high-level data communication happens. Controllers, such as PLC, are connected to each other alongside HMI for complete control of the network.

  • Level 2 – Increased complexity scale. All sensor bus networks are connected to this network. Variable speed drives and motor control centers are connected to these for individual control over elements.

  • Level 1 – This level is the least complex and includes all isolated field devices.

Fieldbus – OSI vs Fieldbus Model
  • There are basically two sections to the fieldbus system: interconnection and application. Interconnection refers to passing of data from one device to another. This is the communication protocol part of fieldbus. The application is the automation function the fieldbus performs.

  • As seen in the diagram, an OSI model requires data to move sequentially through each of the seven layers. With the fieldbus, the process has been simplified as Layers 3, 4, 5, and 6 are not intended to make fieldbus faster and easier to implement in devices with limited processor power, such as field devices. Fieldbus has no interconnections between networks, which is the purpose of these layers

Profibus

  • Profibus is a smart fieldbus technology. It is specifically designed for high-speed serial I/O in factory and building automation applications. It is recognized as the fastest fieldbus in operation. Profibus is an open-standard fieldbus defined by German DIN 19245 Parts 1 & 2. Devices on the system connect to a central line. Once connected, these devices can communicate in an efficient manner, but can go beyond automation messages to participate in self-diagnosis and connection diagnosis.

  • The data link layer is defined in Profibus as the Fieldbus data link layer (FDL). It is based on a token/bus/floating master system. Profibus is a network made up of two types of devices connected to the bus: master devices and slave devices. It is a bidirectional network, meaning one device (the master) sends a request to a slave, and the slave responds to that request. The bus contention is not a problem because only one master can control the bus at any time, and a slave device must respond immediately to a request from a master. Profibus can support addresses from 0-127, only 0-125 are used, because 126-127 have special uses and are not assigned to operational devices.

  • There are three types of Profibus: Fieldbus Message Specification (FMS), Profibus DP (Distributed Peripherals), and Profibus PA (Process Automation). FMS is used for general data acquisition systems. DP is used when fast communications are needed to operate sensors and actuators via a centralized controller. Profibus PA is used in areas when intrinsically safe devices and safe communications are needed, such as to monitor measuring equipment in process automation applications.

  • It is very easy to connect all three versions together on the same system because the main difference between the versions is the physical layer. This would allow a company to run lower-cost devices in most of the plant with FMS, DP where speed is needed, and PA in those areas requiring intrinsically safe devices.

  • Of the three versions, PA and DP are those most commonly used. Profibus PA was developed to connect directly with Profibus DP. The graphic below demonstrates how the two systems are connected.

Open Platform Communication (OPC)

  • OPC (Open Platform Communication, formerly OLE for Process Control) is a series of standard, manufacturer-independent programming interfaces through which an automation application client such as an HMI can access data coming from remote devices such as PLC, fieldbus devices, or real-time databases. OPC has become the most versatile way to communicate in the automation layer in all types of industry.

  • OPC is a client/server based communication, which means that you have one or more servers waiting for several clients to make requests. Once the server gets the request, it then answers that request before returning to a waiting state. The client can also tell the server to send updates when the server receives such updates. It is ultimately the client that decides when and what data the server will gather.

  • According to the OPC Foundation’s website, “OPC is open connectivity via open standards.” For example, an operator pulls up a display on the HMI. The HMI has an OPC client that sends a request to an OPC server to provide the data needed to populate the display. The OPC server, using a protocol such as Modbus or DNP 3.0, obtains the data from the RTU and passes it to the client.

OPC Classic Specification
  • OPC does not represent a network protocol in the traditional sense, but rather a capability to support the interfacing and interconnection with disparate vendor technologies.

  • OPC is a set of several specifications for sharing data based on Microsoft technologies COM, DCOM, OLE, and RPC. Microsoft has since replaced these technologies with .NET and no longer supports these legacy technologies. OPC standards based on COM and DCOM are referred to by the OPC foundation as OPC Classic Specifications.

  • The OPC Data Access (DA), the most basic of the protocols and is the original of the OPC Classic Specs, defines the exchange of data including values, time, and quality information. The second protocol to be added, Alarm and Events (A&E), defines the exchange of alarm and event message information. It is a subscription service where the client recieves all incoming events. The OPC Historical Data Access (HDA) specification defines query methods and analytics that may be applied to historical, time-stamped data, and supports record data sets for one or more points.

OPC Unified Architecture (UA)
  • These classic specifications have served the industry well, but as technology has evolved, so did the need for OPC specifications. In 2008, the OPC Unified Architecture (UA) was developed as a platform independent service-oriented architecture to address the issue of platform interoperability by using Web services-oriented architecture (SOA) in place of .NET and DCOM. UA significantly expands the use of OPC to include non-Windows platforms such as field controller, cellphones, UNIX, and Linux enterprise servers, as well as Window servers.

  • The biggest difference between OPC classic and OPC UA is that OPC UA doesn’t rely on OLE or DCOM technology (windows), making it possible to implement OPC UA on any platform, such as Apple, Linus, or Windows. Another important UA feature is its ability to use structures and models, so data tags or points can be grouped and given context, making governance and maintenance easier.

OPC Relationships
  • OPC provides an elegant solution to the protocol problem by introducing the concept of using an OPC client/server architecture in the ICS environment. The client makes a data request to the OPC server, and the server obtains the data from the field controller by communicating with it using its native protocol.

  • The OPC server supports almost any ICS protocol imaginable, including Modbus, DNP3, ICCP, and Foundation Fieldbus.

ICS Cybersecurity Risk

  • Risk Equation is sort of guideline to understand the level of risk we are taking. There are different equation elements and different factors that contribute to elevated risk, the security concerns that we should be aware of by integrating our IT with our OT control systems.

  • Risk is a function of threat, vulnerability, and consequence. The most complex attribute is threat because it can be intentional or unintentional, natural or man-made. When trying to develop defensive strategies to protect control systems, it is important to understand the threat landscape for appropriate countermeasures or compensating controls to be deployed. The risk equation should not be taken literally as a mathematical formula, but as a model to demonstrate a concept.

  • Risk is the possibility of something undesirable occurring and we need to understand how to increase or decrease the chances of that happening.

  • For cybersecurity, there are three elements of the equation: Threat. Vulnerability. And consequence. The better we understand each of these factors, the easier it is to plan appropriate security measures around control systems.

Threat

  • We’re mainly concerned about people with a malicious intent. A threat would be the potential for someone to exploit a particular information system vulnerability. In the context of cybersecurity, this could be a hacker. As technology has improved in recent years, it has also opened up vulnerabilities for not only malicious nation and terrorist organizations, but other organized and mainstream threats.

  • A threat is any person (threat actor), circumstance, or event with the potential to cause loss or damage.

  • It is important to consider threat relative to capability, opportunity, and intent. From a defensive perspective, if we know the capability of our adversaries and the vulnerabilities that would most likely provide them opportunity to attack, we can create countermeasures removing those opportunities. We can also create defenses requiring capabilities beyond the adversary’s ability to compromise.

  • If there is a certain condition associated with compromising a control system, and we create countermeasures forcing the adversary to work at a level beyond that condition, the economics suggests the attacker may abandon the attack altogether. Ultimately, understanding capabilities and motives should help improve security postures to create countermeasures appropriate to the risk, while minimizing impacts to business operations.

  • Three attributes of a threat? Capability. Opportunity. Intent. When all of these requirements are met, in all likelihood the attack will succeed. Applying cybersecurity strategies that help to deter, detect, and itigate a threat can alter how or where an attacker might have opportunity. Removing the opportunity increases the chances of an attack failing.

  • In your situation, you may choose different policies and security measures to enact to protect your control systems. Some ideas include:

    • Network segmentation with strict ingress and egress firewall rule sets.

    • No externally routable network connections.

    • Network monitoring, host logging, and maintaining a Collection Management Framework (CMF).

    • Secure Credential Management, no credential sharing, active directory, or account/group management.

    • Incident response plan, the ability to detect and declare a cyber incident.

    • Keep in mind that the Internet, removable media, and email are the main attack vectors for Industrial Control Systems.

  • Finding the appropriate balance of effective countermeasures that don’t impact control system operations can be challenging, and in many cases asset owners need to identify levels of acceptable risk to their systems.

  • Understanding the threat helps asset owners:

    • Understand the realistic profile of a cyber adversary that could target specific control systems.

    • Make better informed decisions regarding what assets to protect and how.

    • Have the right information to fine tune cybersecurity training for specific personnel involved in control system operations.

    • Define the cybersecurity criteria to be met during system design and when the system is fully operational.

    • Understand what countermeasures can be deployed to escalate cyber defenses beyond the capability of recognizing adversaries.

    • Design appropriate security monitoring strategies addressing threat aspects with the greatest contribution to cyber risk.

The attributes associated with a human threat are capability, intent, and opportunity.

Capability
  • Capability is the means or resources available to perform an attack. This includes attacker expertise and knowledge, as well as the money and tools for carrying out the attack.

  • Generally, adversaries will have a static capability and will need to adjust, depending on their intent and the available resources.

  • For an adversary to determine if current capability is adequate or needs to be modified, the adversary needs to have as much information as possible about the target. Incorrect calculations regarding requirements needed to successfully attack a target can result in too much or not enough capability for the attack.

Intent
  • Intent is the motive or goal of the attack, and is usually the one attribute cyber defenses cannot impact.

  • There are many different motives for launching a cyber attack, including curiosity, economic advantage, industrial espionage, national security, revenge, or promoting a cause.

  • The intended consequence plays a factor in intent as well as the selection of the target.

Opportunity
  • Opportunity is the set of conditions that need to be met for adversaries to be confident their attack will be successful. This can be related to the actual access an adversary has to a target, as well as access to specific knowledge about the system.

  • The opportunity also extends beyond access and knowledge of the system to include timing, which has the potential to change the value of a control system as a target. Opportunities are related to the exposure and vulnerability of targets, two things defenders can control.

Attributes
  • It is important to understand attributes because they are interdependent when it comes to determining whether an adversary may execute an attack. Attributes also allow defenders to create strategies that may thwart attacks.

  • Alignment of all three attributes—capability, intent, and opportunity—may indicate an attack is imminent. Alignment greatly impacts the probability that a threat actor can execute a successful attack.

  • As we will see, influencing an adversary’s intent is rarely possible, but improving defensive and detection capabilities to render an adversary’s capability insufficient is always possible. Deploying security countermeasures has a direct effect in removing or changing adversarial opportunities to attack control systems.

  • Unlike the risk equation, the individual attributes of threat are summed, not multiplied. This means adversaries with a strong intent or motive can still be a threat, even though they may not have the capability or opportunity to launch an attack. Over time, they may acquire or create the capability and opportunity.

  • Threat is often the least understood and most difficult to quantify because human behavior can be unpredictable, and involves diverse capabilities, intent, and opportunities. Unpredictable behavior creates situations where static countermeasures may not be adequate to protect critical systems.

Hazards vs. Threats

Threats are not predictable in the same way as hazards, meaning cybersecurity cannot be assessed in the same way as safety. Defense-in-depth strategies can help compensate for the diversity of threat actors and their wide range of capabilities. As such, it is important to recognize that as the threat landscape changes, so must our ability to defend the systems.

Hazards and threats are two distinct, but related items, as shown in the table below.

Hazards

(Safety)

Threats

(Security)

  • Understood and known danger

  • Predictable based on historical trending

  • Create probabilities of incident

Not always well understood Not so predictable

Hazards
  • Hazards are considered situations possessing inherent and known dangers. Examples of hazards include electrical, confined space, or flammable. The failure of a piece of electronics that causes a chamber filled with acid to overflow is also an example of a hazard. In general, this acid overflow hazard falls into the category associated with safety. In safety studies, we have proven historical data about equipment failures that is tied to known dangers and risks, and we can calculate probabilities on when undesirable events might happen. In some cases, we can calculate the actual average time it will take for a system or device to fail based on environmental factors and past use cases. But the data used to do this are based on predictable behavior.

  • The field of industrial automation has historically collected information on hazards that are used to develop safety guidelines. Databases of hazards and historical events are used to determine the probability of a dangerous event occurring. This in turn allows professional certification of systems to meet measurable safety requirements. Things to consider include system lifetime, mean time between failures, and other measurable attributes that can help system owners proactively manage the safety and resiliency of equipment while optimizing performance.

  • Hazards can be categorized. Certain attributes are associated with different hazards. These attributes offer analysts information that may be used to develop fairly precise forecasting of different types of events, allowing analysts to plan for certain incidents related safe operation.

Threats
  • Threats are not predictable. Cyber attackers, weather, animals chewing cables, personal events, or falling trees are all examples of threats. If a threats are not man-made, it is still hard to accurately predict how and when they will occur. We don’t have data or granular information to help us determine if and when a threat-based event will happen. For human threats, this can be difficult as we usually cannot define the combined value of capability/opportunity/intent. Safety and security have significant roles in the resiliency and reliability of ICS. Safety and security are complementary, but the disciplines themselves are different. It is important to calculate security risk for control systems, and even more important to calculate appropriate proactive and reactive security mitigation strategies.

  • Threats are also not predictable even when historical information exists. Being able to categorize threats and predict associated incidents with precision is difficult because people do unpredictable things. This unpredictability is often driven by a multitude of factors beyond the control of even the threat actor (i.e., weather, politics, personal events).

Threat Actor Categories

By better understanding the capabilities, intents, and opportunities of human threat actors, we can better design defenses for ICS. The types of threat actors can roughly be divided into three categories: mainstream; organized; and terrorist and nation state.

Mainstream
  • Group 1 is, historically, the largest threat group, although these mainstream threat actors are generally not well organized. The motivation of this group varies, but traditionally it has been related to notoriety, fame, or attacking a system to attract attention.

  • Because they are not always organized should not negate the fact their technical skills can be quite advanced. As their notoriety increases, the demand for their services (legal and illegal) increases.

  • Some cybersecurity researchers attack systems to improve their knowledge of how these systems work, which makes them more efficient programmers or engineers.

  • Although they usually operate independently, mainstream threats can combine to form small groups with limited organization

  • Example:

    • Group 1: A Polish teenager modified a TV remote control to change the Lodz Train track positions. As a result, he caused four derailments, injuring 12 people.

    • The teenager had the capability to modify a TV remote control.

    • His intent was a prank.

    • His opportunity was his ability to trespass in the tram depots.

    • Source: http://www.theregister.co.uk/2008/01/11/tram_hack/

Organized
  • Group 2 consists of more organized threats, typically targeting a particular group or groups

  • Group 2’s intent may be financial, revenge, theft of trade secrets, or drawing attention to a cause (hacktivists). Their attacks are more structured and sophisticated than Group 1, but it is not uncommon for Group 2 threats to include membership, capabilities, or skills traditionally found in Group 1.

  • As the structure of this group grows, there is the possibility of recruitment from Group 1 individuals to become part of a larger and more organized effort.

  • Example:

    • Group 2: Disgruntled traffic engineers who maintained the Los Angeles traffic control computers hacked into the system and modified signals at four major intersections. This caused major gridlock and it was 4 days before traffic could return to normal.

    • Their capability was the insider knowledge they had as a result of maintaining the traffic control system.

    • Their intent included disgruntled employees and was thought to be motivated by a pay bargaining dispute between employees and the Engineers and Architects Association.

    • Their opportunity was their ability to hack into the system.

    • Source: www.theregister.co.uk/2008/11/06/traffic_control_system_sabotage/

Terrorist/Nation State
  • Group 3 includes terrorist and nation state elements. The goals of this group’s attacks are to disrupt, terrorize, or eliminate major aspects of society. The impact or consequence of a Group 3 attack could be catastrophic.

  • Targeted groups include financial institutions, political establishments, military organizations, and media outlets. Intelligence sources are also concerned about utilities and manufacturing facilities.

  • Nation states with well-funded cyber warfare programs are also a concern.

  • Both terrorist and nation state threats have significantly more resources than Groups 1 and 2. As a result, Group 3 actors are able to launch more sophisticated attacks.

  • As Group 3 programs grow, it is expected they will recruit from, or use, capabilities, techniques, and procedures found in Group 1 and Group 2.

  • Example:

    • Group 3: ICS-CERT was notified of the existence of a new malware application called Stuxnet. It is believed to have been introduced through a portable media threat vector (USB stick). It contained more than 4,000 functions and used as much code as some commercial products. Stuxnet modified programs for a specific PLC and hid those changes. This attack was a game changer in the ICS hacking community because it is the first known malware to target a specific ICS configuration.

    • Though the author of Stuxnet is still unknown, it has all the hallmarks of a Group 3 attack. As one of the ICS-CERT advisories on Stuxnet reported, “The overall sophistication of the Stuxnet malware cannot be overstated.” According to Wikipedia, “The Guardian, the BBC, and The New York Times all reported that experts studying Stuxnet considered the complexity of the code indicates that only a nation state would have the capabilities to produce it.”

    • Source: http://www.cbsnews.com/news/report-stuxnet-delivered-to-iranian-nuclear-plant-on-thumb-drive/

Insider Threats
  • Can the security of an ICS be threatened by a trusted insider (an employee or vendor) who has specific knowledge of, and access to, the ICS? Absolutely! Recall that threat attributes are summed.

  • Even though an insider may not have intent, they certainly have substantial capability and opportunity, which may make them a significant threat.

Unintentional Threats
  • Based on known ICS cyber incidents to-date, the most likely ICS attacks originate from an insider, or from an external adversary who has acquired credentials to operate as a trusted insider.

  • An insider could be acting alone or as a member of a Group 2 Organized attack, or the more serious Group 3 Terrorist/Nation State attack. The attack may be unintentional or intentional. The causes of an unintentional incident include:

    • Deceived – social engineering, phishing

    • Mistakes

    • Poor training

    • Careless, taking shortcuts, fatigued

  • A mistake or failure to follow adopted policies can also cause a cyber incident on an ICS that is as severe as a deliberate attack. A well-trained system administrator is crucial to protecting an ICS from cyber attacks.

  • Example:

    • As an example of an unintentional threat, a 16-inch gasoline pipeline operated by Olympic Pipeline Company ruptured due to a pressure surge caused by a faulty pressure relief valve. The rupture released gasoline into Whatcom Creek in Bellingham, WA. This unfolding tragedy was exacerbated by the inability of the Supervisory Control and Data Acquisition (SCADA) to perform control and monitoring functions. The gasoline in the river was accidentally ignited, which resulted in an explosion and fire.

    • The explosion resulted in three fatalities, over $45M in property damage, and matching fines of $7.86M against two companies.

    • The database used by the pipeline SCADA system was modified in real time, without the necessary review to ensure the changes would not impact normal operations and safety of the pipeline.

    • The unchecked changes were implemented to the live database, causing a critical slowdown in system monitoring. These changes resulted in the SCADA system polling operational data from the pipeline every 6 minutes, rather than every 3 seconds.

  • Explosion Findings - Although the SCADA system was not directly responsible for the rupture of the pipeline or the explosion, it did contribute to the tragedy because it was not operating properly during a crucial time leading up to and following the pipe rupture. - The findings of the National Transportation Safety Board included: - If the SCADA system computers had remained responsive to the commands of the Olympic controllers, the controller operating the accident pipeline probably would have been able to initiate actions preventing the pressure increase that ruptured the pipeline. - The degraded SCADA performance on the day of the accident likely resulted from the database development work done on the SCADA system. - Had the SCADA database revisions performed shortly before the accident been performed and thoroughly tested on an offline system, instead of the primary online SCADA system, errors resulting from those revisions may have been identified and repaired before they could affect the operation of the pipeline. - Olympic did not adequately manage the development, implementation, and protection of its SCADA system.

Intentional Threats
  • Motivations for launching an intentional attack on an ICS could be related to those cited earlier, but may also include:

    • Recruited – blackmailed, bribed, embedded

    • Revenge – disgruntled, terminated

    • Curiosity

    • Financial

  • As an example of revenge, Mario Azar, an IT consultant for Pacific Energy Resources, successfully disabled an offshore oil platform’s leak-detection system remotely, using his company’s virtual private network (VPN) over the Internet. After receiving his last payment for contract work, Azar petitioned to continue work as a full-time employee, but Pacific Energy Resources declined to hire him.

  • Azar continued to remotely log into the leak-detection system, which was used to monitor three offshore oil platforms near Huntington Beach, CA. This resulted in impaired computer system monitoring for leaks on all three offshore platform

  • As ICS developers began to leverage interoperability and open-system connectivity, they moved away from isolated architectures. However, during this transition, many of the systems were still dependent on legacy hardware and software, and the requirements for availability often prohibited asset owners from taking their systems offline for long periods for updates. As a result, appropriate security defenses were not installed, and the ones that were installed often did not provide sufficient defense against modern-day attacks.

  • ICS defenses have not evolved as quickly as those in the corporate IT world, and in many cases the average ICS is still years behind current levels of cybersecurity found in non-ICS technology.

  • Because of the rapid integration of technology and networks between corporate IT and control systems, there is a huge push to protect legacy ICS from modern-day attacks. As many of the security countermeasures were developed for environments that set confidentiality as a primary focus, deploying such security mitigation technology into ICS environments (where both availability and integrity are primary objectives) can actually have a negative impact productivity.

Attacker Tools and Techniques

The phases of the attack life cycle include: - Reconnaissance/targeting - Vulnerability assessment - Attack/penetration

  • Just like a carpenter uses a variety of tools to build a house, an ICS threat actor also uses a number of different tools and techniques to execute an attack.

  • Specific tools are designed for each specific phase in the attack life cycle. Some tools research the target, some gain and maintain access to a system, and others launch an attack. Part of successfully defending a system depends on understanding your opponent’s capabilities.

  • Phishing: Email containing malicious files or links to nefarious websites

  • Denial of Service (DoS): Makes networks or computer resources unavailable

  • Social Engineering: Used to get privileged information from an insider on the targeted computer system or network

  • Zero Day Exploits:Takes advantage of vulnerabilities not known to a broad community, and for which no countermeasure or mitigation has been developed

  • Malware: Malicious software

Malware

There are several common classifications of malware. Like the tools and techniques just reviewed, the intent is not to delve into details, but to give you an understanding of the basic definitions of these terms by watching the video on the next page.

Malware

  • Backdoor: A method or program for bypassing authentication or obtaining remote access to a computer.

  • Botnet: A large number of infected computers that generate spam, relay viruses, or execute Denial of Service attacks.

  • Ransomware: Systems or data are held hostage by a cyber actor until a ransom is paid.

  • Rootkit: Code that modifies the operating system to maintain privileged access.

  • Trojan Horse: Malicious software posing as a used full program.

  • Virus: Parasitic software that copies and inserts itself into a host file or boot sector. They are unintentionally spread by transferring the infected file between computers.

  • Worm: Malicious software that independently replicates, executes, and travels across the network without a host program.

  • Many modern ICS threats and exploits are due to the rapid research advancement of more complex attack techniques. There is growth in the number of activities correlating to the system attack life cycle, such as:

  • Reconnaissance/targeting (Cynsys/EternalBlue/Shodan)

  • Vulnerability assessment (Sniper/Ettercap)

  • Attack/penetration (Metasploit, Gleg Agora, Nessus Scripts, Immunity Canvas)

Adversary and Research Capabilities

  • Now let’s look at adversary trends as the global interest in ICS security increases.

  • There has been an increase in ICS-specific presentations at conferences worldwide. There has also been an increase in collaboration within news groups specific to the cyber underground, along with more research and publication relating to ICS vulnerabilities.

  • The interest in ICS cybersecurity has grown tremendously due to several factors; including Stuxnet, an increase in open-source incident reporting, and the number of vulnerabilities being disclosed for coordinated research efforts. Overall, ICS cybersecurity is still fairly immature and is an attractive domain for researchers of all types.

  • Interest in ICS cybersecurity is driven by:

    • New independent research

    • Increasing number of disclosed vulnerabilities

    • Asset owner requirements

    • Vendor market differentiators

    • More understanding about influence of ICS on critical infrastructure

    • Increase in incident reports

    • Increase in easy-to-use attack tools

Control System Vulnerabilities

  • Because we are dealing with industrial automation, control system vulnerability discussions cannot take place without considering the consequences and impact on critical infrastructure. This makes ICS targets appealing to a broad audience and will attract interest from adversaries in all threat actor groups.

  • Finally, there is a notable increase in the interest in ICS cybersecurity because asset owners are introducing training and compliance efforts to their personnel. This, in turn, drives demand for briefings, conferences, and academic activities—all of which create literature that is available to the community at large.

Vulnerability

  • A vulnerability is any weakness that can be exploited by an adversary or caused through an accident. In our conversation, we’re mainly concerned about intentional attacks. For example, a hacker may use phishing scams to gain login credentials. They may possibly exploit an older or unpatched vulnerability in a system that you use. From there, they can pivot into different networks, including the control systems, and potentially cause great harm. Mitigating these vulnerabilities can be challenging.

  • What are some challenges when mitigating vulnerabilities? In ideal situations, asset owners will have a program in place that provides timely information about ICS vulnerabilities. Even with accurate vulnerability information, verifying the applicability of the vulnerability to an ICS can be difficult. Mitigating these vulnerabilities can be even more complex because:

    • Extensive testing needs to be performed prior to the application of a mitigation (such as applying a patch) to ensure it does not affect critical system functions; and

    • If a patch or update is considered viable, strategic planning and downtime are required to implement it. In high availability control system environments, finding downtime can be challenging.

    • Even after testing, the system must be monitored to ensure the mitigation is working as intended.

Consequences

  • Financial loss and damage to our systems can have terrible results!

  • Historically, consequences have been measured in terms of financial loss and has been easy to calculate as it relates to IT systems. The calculations have included factors such as lost revenue, asset replacement cost, cost of system repair, etc.

  • The consequences with ICS are similar, but in many cases, other factors can contribute to the overall consequences. Margaret: For example, a bridge operator uses a control system to raise a lower a bridge for passing ships. Imagine being locked of out the control system. Failure to raise and lower the bridge for passing ships could result in not only an accident, but a loss of life and confidence.

  • Some threats are beyond our control, like a hurricane. However, knowing a hurricane could hit can help business owners assess weak points and come up with a plan to minimize the impact.

Elevated Risk

  • The cyber risk, measured in threat, vulnerability, and consequence, was limited since intrusion would most likely originate from an insider accessing the control system.

  • While there were vulnerabilities in the ICS, the risk was perceived as acceptable because of physical controls, such as door locks, were used to prevent unauthorized access.

  • What elements have contributed to past ICS incidents?

  • All sorts of things! People, processes, systems, components. Typically, we put those things in one of two groups: Cultural and Technical factors. The highest concentration of factors is technically-based.

  • Cultural - Cultural factors include any of the people or processes involved with designing, building, operating, and maintaining ICS. Today’s businesses require formerly isolated ICS to be connected with their corporate and customer networks and the Internet.

  • Technical - Technical factors have to do with the actual systems and components of which ICS are composed.

Cultural Factors
Cultural - People (Owner, IT, etc.)

While process or policy might prevent adequate cyber security,it is important to note that people created those processes with a lack of knowledge and awareness of cyber security risk decisions get made that can introduce technical vulnerabilities. Owners and ICS engineers haven’t always perceived that there were credible cyber security threats that justified the added expense of securing their control systems. This was true when these systems were isolated or air-gapped and running on proprietary hardware. However, as people gain a better understanding of the vulnerabiltiies created by an interconnected ICS there is an increased awareness of the cybersecurity threats to their systems which can lead to a lower, overall risk.

Cultural - Policies & Procedures

Many subject matter experts consider culture to be the most important factor in developing and maintaining an effective control system cybersecurity system. Previously, processes and policies didn’t allow for threats to be considered, or vulnerabilities to be protected, or for consequences to be mitigated. Working under old assumptions and paradigms created opportunities for someone to access control systems.

Remember:

  • Be mindful of outdated processes and policies that don’t account for cybersecurity.

  • Culture allows for technical factors to increase in quantity.

Technical Factors
Technical - Vendors

When using vendors, it is important to be mindful of the vulnerabilities that can be introduced. Although the vendor and research community do an excellent job uncovering system-specific vulnerabilities, the implementation of countermeasures required to mitigate the vulnerability may result in the control system operating in an undesirable or unexpected manner. For example, an antivirus program can have between a 2 to 19 percent slowdown in passive discovery and a 6 to 57 percent slowdown in Full-scan mode. In time-critical processing, this is not acceptable.

  • Be mindful of system-specific vulnerabilities

  • Keep in mind that solutions may hinder required function of the ICS (example, an antivirus program that locks up a control system or another tool that requires an ICS to be shut down while running an update).

Technical - Cybersecurity

As technologies have developed over time there has been a gap growing in ICS. Originally, ICS designers didn’t factor in cybersecurity as being an issue when control systems, such as those found in water, electrical, and other locations were put in place. As a result, there are exploitable, technical vulnerabilities at the network and device level. These vulnerabilities are found in both legacy systems and some current designs.

Vulnerabilities in Cybersecurity

  • Legacy devices (old modems, computers, ICS, etc.)

  • Current devices (holes in security between ICS and networks)

Technical - Interconnected Networks

As we’ve learned more and seen consequences from other cyber security incidents there has been a shift in perspective. Asset owners and operators are beginning to understand that interconnected IT and ICS networks can create opportutnities for an adversary to gain access to control systems. This can include remote access capabilities peer-to-peer networking direct internet connectivitiy or network modifications that enhance business performance. When networks are linked, and there is no protection between them, it creates a vulnerabilitiy.

Security issues created by integrating IT systems with ICS

  • Being more aware and knowledgeable will help. However, cybersecurity is always being balanced against the business perspective. Merging IT and ICS networks have done things like optimizing the workflow, making access easier and more universal-these all increase revenue. But it increases our vulnerabilities, which in turn increases risk! In order to counter threats, we need to understand how an adversary thinks. Will it be a targeted attack against a specific system? Or a broad set of systems within a larger corporate enclave? Understanding the intentions of the attack is important since it helps to know how an ICS could be compromised and why.

Technical - Increasing Threats

Here is an alarming factor, the interest and number of malicious activity groups is on the rise. A recent 2020 report showed that there has been an increase from 11 malicious activity groups to 15. These new ICS activity groups are primarily targeting energy and manufacturing.

One of the most important technological issues relating to cybersecurity and ICS is the fact that some vendors create their control system solutions to run on contemporary standard operating systems, such as Microsoft Windows. This means a vulnerability within an ICS may not be in the ICS application itself but in the operating system on which the application is dependent. Do you remember our bridge controller from earlier? His system was running an outdated OS that he hadn’t patched or updated. When an attacker can compromise an operating system, the attacker has compromised every application run on that system since they have control like a regular user.

Security Mitigations

Few things to consider when implementing security mitigations to ICS:

  • One, communication speeds. IT solutions like cryptography and firewalls can cause latency issues. You don’t want that in most ICS svstems

  • Two, detection. Anomaly detection that looks for irregularities works better than intrusion detection systems that relies on signatures to work when there may be no ICS-specific signatures to monitor.

  • Three, intrusion prevention isn’t always favored by ICS. This is because of their active response need, which could interrupt critical ICS operations, and the impact on data availability and integrity.

  • Four, resources. Antivirus solutions, while great, can consume a CPU’s capacity, locking an operator out during a system scan for long periods of time.

  • Five, methods and tools that work in IT environments may not transfer well to ICS.

There can be adverse and irreversible effects on equipment and services.

Although many security mitigation techniques are useful and effective in IT domains, requirements for data availability and integrity force us to revisit how we implement them within an ICS.

Network Discovery and Mapping

Discovery Process in both, Passive is much more stealthy and Active is aggressive in trying to learn things. In both cases, we are mapping out the environment. Often is the case, when we are presented with a case of understading in-production environment, with no-prior person to enquire from, documentation is little, suggestions to how to handle certain performance issue.

Passive Discovery

What?

What is Passive Discovery?

  • Using information discovered from local memory of any host, to build a vision of an existing Control system environment.

  • Practicing safe methods to explore and perform reconnaissance.

  • Attempt to identify network details without sending network packets.

Why?

Why perform passive network discovery?

  • Safer practice regarding Control System networks (don’t want to break something).

  • Can yield information that active discover may not be practical for, such as data found in various files.

  • Use tools passively

    • When exploring a Control System network, practice passive techniques when mapping.

    • Utilities and commands are not neccessarily defined as passive. Using a tool passively is a responsibility of the user.

    • Daily operation of production Control Systems already create expected traffic. Try not to interfere or manipulate pathways when exploring.

  • Examples and Effects

    • Neglect to disable name resolution in commands

      • resolution queries could alert and IDS unnecessarily.

    • Scanning your own host, from the same host (to know what it is running?).

      • Self inflicted scans will preoccupy a host’s network resources and may alert a host-based IDS.

    • Restarting services without planning (often we try turning off and on again without planning). For example, if a watchdog timer checks for a open-port and restarting doesn’t start the service and the port remain closed.

      • Watchdog timers (checking for a particular state or change in state) could generate timeout signals, and trigger alarms to an operator. Meaningless errors can appear in logs.

    • Clearing Cache

      • Clearing cache will cause bursts of packets to repopulate tables.

Artifacts

Tools
  • ipconfig, ip, ifconfig

  • netstat -anob/ netstat -pantu

  • route print/ route -n

  • iptables

  • tcpdump + wireshark

  • EtherApe

History + Logs
  • .bash_history

  • Browser History

  • Remote Desktop History

  • var/log/messages

  • var/log/syslog

Configuration files
  • crontab -l

  • /etc/network/interfaces

  • C:\windows\system32\Drivers\etc\hosts

  • /etc/resolv.conf

Cache
  • arp -an

  • nbstat -c

  • ipconfig /displaydns

How?

ARP
Linux

arp -a -i eth0 will do the DNS resolution that will send the network packets to the DNS server asking for name resolution (Active scan).

arp -a -i eth0 -n will not do the DNS resolution (more passive).

Windows

arp -a

EtherApe is a good tool to understand what traffic is being generated

  • Explore the ARP table

    • Control systems can participate using ethernet.

    • Investigating ARP Tables are a great local cache to start with.

    • Use the arp command to view the table.

    • Take note of the MAC addresses mapped to IPv4 addresses.

    • Research discovered vendoes from first 3 bytes of the address (OUI - Organization Unique Identifier) and figure out what vendor is famous for what in control systems? (router/PLC/HMI/firewall/Cameras?).

  • Why look at the ARP table?

    • Display a list of remote hosts or devices, with with the host has recently communicated.

    • See if there are two ARP tables? (which probably means two network interfaces in a host connected to different networks?)

    • Check the table again later. It may change. If it does, this might be an indicatio nof scheduled tasks. Investigate further.

IP
  • Check IP addressing

    • Control systems can also participate using IP.

    • HMI workstations could be PC operating systems. Learn it’s potiential reach with other IP networks.

    • IP addressing commands can reveal much more than IP address.

    • Compare previously discovered MAC addresses mappings.

  • Why look so closely to IP addressing? - PLC’s, RTU’s and various SCADA devices are often controlled by HMI workstations. Knowing the IP connectivity is important security awareness.

Windows

ipconfig /all

  • check hostname, IP routing enabled (to see if its a router), subnet, gateway, DHCP/DNS servers?

Linux

ifconfig -a

  • if we do ifconfig, it would only show interfaces that are up and in configured state.

  • if we do ifconfig -a, it would show interfaces that are configured/present in an up/down state (we might vlan, vpn, bonded interfaces).

ip a or ip addr show eth0

DNS

cat /etc/resolv.conf

  • When a host is set to use a DNS server, generally ALL applications can query it.

  • HMI software becomes configured with network addresses. If the configurations are populated with names instead of numeric IP addresses, then we will be at the mercy of DNS server.

TCP/UDP Ports
  • Ports

    • Review any Listening or Established ports.

    • Compare TCP and UDP port numbers that maybe associated with Control system vendors.

  • Control System Port Number Examples

    • BACNet/IP : UDP 47808

    • DNP3 : TCP 20000, UDP 20000

    • Ethernet/IP : TCP 44818, UDP 2222, UDP 44818

    • ICCP : TCP 102

    • Modbus : TCP 502

  • Well-know ports range from 0 - 1023

  • Registered port ranges from 1024 - 49151

  • Dynamic port ranges from 49152 - 65535

netstat

What is netstat?

  • Tool for looking at a host current network sessions and listening ports that are being offered.

Why use netstat?

  • Determine which local servers are TCP or UDP based.

  • Search for potiential connections being made with any known Control Systems.

  • View all currently Established connections taking place with HMI, Controller, Historian or other hosts.

Windows

netstat -ano -p tcp

-a all sockets -n no name resolution -b owning process name -o owning processs ID

Linux

netstat -pantu

-p owning process ID -a all sockets -n no name resolution -t tcp -u udp

  • Check Local Address, Remote Address, State column (listening (those port numbers are listening), established (the host is talking to some other host check what ports (ICS Ports?)))

  • Probably, we can figure out what the local machine is used for HMI (connects to several devices and a database)

  • Check if IP addresses are in the same subnet or different (File server, HMI accessing files from an outside network) helps to figure out different subnet or boundary of different subnets.

Routing Table

What is a routing table?

  • A local table of IP network destinations that the host is able to reach.

Why look at the routing table?

  • Identify router/gateway IP addresses.

  • Identify network destinations.

  • Identify individual host destinations.

  • When viewing a route table, learn to notice the IP address ranges. Determine which ones appear public and which one appear private.

  • Make not of any public IP addresses that may appear in configurations found on Control System networks.

  • Private IPv4 ranges:

    • 10.0.0.0 - 10.255.255.255 /8

    • 172.16.0.0 - 172.31.255.255 /12

    • 192.168.0.0 - 192.168.255.255 /16

  • If there’s any public IP printed in route table, if exists try to understand why control system needs to talk to the public IP address.

Windows

route print

Linux

route -n or netstat -rn

  • Any gateway entry of 0.0.0.0 specifies local interface that has IP address setup on them.

  • Check if any static IP addresses are setup?

  • Any host with more than one interfaces can act as a gateway.

    • Linux: check /proc/sys/net/ipv4/ip_forward - 0 - Not forwarding - 1 - Forwarding.

    • Windows: Registry : HKEY_LOCAL_MACHINE\System\CurrentControlSet\services\Tcpip\Parameters

      • Check value IPEnableRouter

netBIOS

What is netBIOS?

  • Network Basic Input/Output System (netBIOS) - allows applications on diffrent computers to communicate within a local area network.

  • Used by Microsoft File and Printer Sharing

How can netBIOS be helpful?

  • Discovers networks and hosts by looking at netBIOS cache (nbtstat -c)

  • Cache contains recently contacted systems.

  • Check the naming convention of the name. For example: FSWCB1, AD2.

    • FS/AD might represent FileServer or Active Directory.

    • Numbers 1,2 might highlight that there could be more than 1 server.

TCPDump/Windump
  • Captures and analyses common network traffic for the command line.

  • Uses standard libpcap/winpcap to capture/parse network traffic.

  • Uses Berkeley Packet Filter (BPF) syntax for creating capture filter expressions.

  • tcpdump can also be active, so probably do -n to avoid doing name resolution.

  • Also, each tool can have a vulnerablities, it’s better to run the tool using a different user -Z username.

wireshark
  • GUI network protocol analyszer and packet sniffer.

  • Libpcap standard library for opening and capturing network traffic.

  • Customizable dissectors (modules) for proprietary protocols.

  • Security Notes:

  • vulnerablities in wireshark could leave your system at risk of compromise if used on active networks.

  • Not required to run with root privileges

  • Long-term traffic monitoring should be done with “tcpdump”

  • Rule of Thumb: Capture with tcpdump and analyze with Wireshark using a normal user account.

Files and Others

Browser history

Control system facilities may have workstations where various routine operations are performed. If particular personnel are no longer avialable, we can still explore a frequenctly used browser to collect information passively.

  • Address bar pre-populating with any URLs.

  • Saved usernames and passwords.

  • Bookmarks or Favorites, relating to Control Systems addresses.

  • Keystrokes to open recently closed tabs and windows.

  • Learn how to explore temporarty cache of the specific browser.

.bash_history

What is .bash_history file?

  • History file containing a record of executed commands.

  • Every user of the has their own history file. It is located in the home directory of each user.

  • Files starting with a period appear hidden by default.

Why look at .bash_history file?

  • Routinely executed commands help identify what tasks are performed at the workstation.

  • Host addresses and filenames could appear with specfied commands. Such as ssh, wget, ftp, rsync, mail and others.

  • People make mistakes. It may also contain username passwords.

  • Use the history command to view the contents.

  • Check for any new IP address or any file extensions that might be of interest or any mail commands (employee addresses/file names) or any mysql commands (username, password or database name, or remote host (if not present that means mysql server is locally hosted)).

  • It may provide info on local hosts directories.

  • Check if any commands shows any USB/HDD/SDD was connected (any /media entries).

Active Discovery

What?

What is Active network discovery?

  • Send network packets and wait for a response in order to identify host and network targets

  • Can be extremely noisy and easily detected

Why?

Why use active disovery methods?

  • Identify targets that cannot be otherwise identified using passive discovery techniques.

  • Provides specific service, port and version information for a given targets.

  • Identify vulnerablities of accessible services.

How?

arp-scan

arp-scan -g 10.10.10.2/24

nmap
  • Designed to allow system administrators and individuals to scan large networks to determine which hosts are up and what services they are offering.

  • network discovery tool that can be used for identifying the systems currently connected to the network

  • nmap allows to audit what services are running on the identified hosts.

  • Can be dangerous to IT, SCADA and PCS systems, ICSs and embedded devices.

What is Nmap?

  • Open source tool for network mapping and security auditing.

Why use nmap?

  • much faster than manual discovery.

  • can scan an entire network quickly, and offers several options to customize a scan and its results.

How does nmap work?

  • Hosts on the network

  • Services (ports)

  • Operating systems etc.

Two-stage process

  • Host discovery

  • Port scanning

nmap - Discovery methods
  • User Datagram protocol (UDP)

    • unreliable stateless communication

    • No handshaking

  • Tranmission Control Protocol (TCP)

    • Reliable stateful communication

    • 3-way handshake

  • Internet Control Message protocol (ICMP)

    • Provides control, troubleshooting, and error messages.

    • Normally used by ping and trace route commands.

  • Address resolution protocol (ARP)

    • Discovers Link Layer addresses of network devices.

    • Communicates in the bounds of single network.

Three-way handshake
Host Discovery
  • What is host discovery (HD)?

    • process of identifying active and interesting hosts on a network.

  • Why does Nmap do HD?

    • To significantly reduce the amount of time to complete network scans.

    • Narrows a set of IP ranges into list of active or interesting hosts to be port scanned.

  • How does HD work?

    • Uses combination of ARP, ICMP, TCP SYN, TCP ACK packets to identify active hosts.

  • Default Host Discovery Settings

    • LAN sends ARP scan (-PR)

    • WAN (privileged) sends TCP ACK packet to Port 80.

    • (-PA) and an ICMP echo request query (-PE)

    • WAN (unprivileged) sends TCP SYN packet (-PS) using connect() system call instead of TCP ACK packet.

    • By default nmap will use arp-response for local network host discovery. If we want to use ICMP, use --send-ip

    -P (Host discovery)

Port Scanning
  • What is port scanning?

    • process of identifying the status of interesting ports on hosts that are discovered on a network.

  • Why does nmap do port scanning?

    • to identify ports that are open on a host

  • How does port scanning work?

    • attempts to communicate with each port with a specified set of ports.

    • port scans are performed on hosts that were identified as active or interesting during HD.

  • Nmap Port states

    • Open: Application on target machine is listening for connections or packets on that port.

    • Closed: No application listening at the moment

    • Filtered: Firewall, filter or other network obstacle is blocking the port so that Nmap cannot tell if the port is open or closed. Nmap received no response.

    • Unfiltered: Port is accessible but nmap not able to determine if open or closed.

    • Open | Filtered: Unable to determine if open or filtered.

    • Closed | Filtered: Unable to determine if closed or filtered.

  • Nmap default port scanning settings. - SYN scan (-sS) for privileged users. - Connect scan (-sT) for unprivileged users.

  • If it starts with -P (host discover) -s is for port scanning.

Timing and Performance options
  • What are timing and performance options

    • Settings used to control scanning delays, timeouts, retries and parallelism.

  • Why use timing and performance options?

    • Help speed up scanning process

    • Slow down scan to avoid IDS detection

  • Timing and performance options

    • Manual options are available but templates are usually sufficient

    • Template timings options offer throttling abilities not available using manual options.

Nmap results
  • Why save your nmap results?

    • easier to analyze and compare scans results (using ndiff)

    • Results overflow the console window buffer.

  • Output options

    • -oN filename.nmap: Output results in normal format

    • -oX filename.xml : Output results in XML format

    • -oG filename.gmap: Output results in grepable format

    • -oA filname: Output results in all formats.

    • -v: Verbose output results

  • --reason tells the reason.

OS and Version detection
  • What is OS and version detection.

    • Identifies operating system by looking at packet charactertistics.

    • Identifies the version of a service running on a host.

  • Why use OS and version detection?

    • Provides information that could help in the selection of exploits and payloads used against a target

  • How does OS detection work?

    • Nmap sends a series of TCP and UDP packets to the remote host and examples every bit in the responses.

    • Nmap compares the results to its database of known OS fingerprints and prints out the OS details if theres is a match.

  • How does Service and Version Detection Work?

    • After TCP and/or UDP ports are discovered, version detection interrogates those ports.

    • Database of probes for querying various services and match expressions to recognize and parse responses.

    • Tried to determine application name, version number, hostname, device type, OS family, and misc. information.

Nmap Address Schemes
  • Target hosts can be specified in many ways

    • 1.2.3.1-254: All 254 possible IP addresses on this subnet.

    • 1.2.3.0/24: Equivalent to above but signifying a Class C address block.

    • 1.2.1-4.1-254: Ranges are allowed for subnets as well.

    • 1.2.0.0/16: The 16-bit netmask will scan the entire clas B address block.

  • --exclude exclude a host/range.

  • -sn only do host scanning phase

ICS challenges
  • scans can cause computer system to restart

  • scans can cause embedded devices to freeze or lose configuration and in some severe cases requires vendor involvement.

  • Nmap considerations

  • Use connect scan (-sT) to prevent dangling connections.

  • Don’t use OS (-O) and version detection (-sV) (Control system would be running PLCs, RTU)

  • Slow the scan down by reducing the rate at which packets are being generated and sent by Nmap.

  • Consider using exlusion lists (--exclude or --excludefile)

Nessus Vulnerablity Scanner
  • Can be dangerous to ICSs.

  • Plugin modules for various ICS protocols.

  • Security auditing tool consists of two parts

  • Server (in charge of the scanning process).

  • Client (presents the interface to the user).

Nessus ICS Plugins
  • Areva/Alstom Energey management system

  • DNP3 Binary Inputs access

  • DNP3:

    • Link layer addressing DNP3

    • Unsolicited Messaging

  • ICCP

    • ICCP/COTP protocol

    • ICCP/COTP

    • TSAP Addressing

    • LiveData ICCP Server

  • Matrikon OPC Explorer

  • Matrikon OPC Server for ControlLogix

  • Matrikon OPC Server for Modbus

  • Modbus/TCP

    • Coil access

    • Discrete Input Access Programming

    • Function Code Access

Network Defense, Detection and Analysis

Identify

Asset and Information inventory

An asset inventory is necessary to understand and manage ICS risk and determine priorities for security defenses. The asset inventory is critical for understanding the potential impact of an intrusion

Know your environment
What?
  • Needs to be protected (PLC, pump, valves, non-electronics still something physical - how it is protected?)

  • Protection levels are available (What is available by vendors to protect the systems). How data is gathered from the ground-up?

  • Inter-connections and dependencies are required (what talks to what?, pump talking to PLC (controlling pump speed or flow) if not it might cause something to fail?)

Why?
  • Are systems critical (any special use, any special vendor?)

  • Are assets valuable ($$ and information)(produce gas or oil, electricity?)(Does the information provide insights to business to make decisions?)

Who?
  • Has responsibility for the asset (Who’s responsible System Admin, SPOC (single point of contact))

How?
  • Are worst-case scenarios identified if compromised (Do we have any plans in place in terms of outside/inside attacker?)

  • Are methods available for user access to the asset (Does the person have to visit the control room to access the devices or can be access remotely or via VPN?)

  • Does the information flow throught the system (where it starts/stops? Goes to firewall? Business IT network?)

Other
Field Devices
  • Easy to forget in asset inventory - “out of sight, out of mind”.

  • Field devices may be accessed remotely because it is more convenient or may require that a human being physically visit the remote device. When accessing remotely make sure the communication is secure and the device accessing the field devices is secure.

  • Security Challenges regarding Field Devices.

    • No centralized management for older field devices.

    • May lack security capabilities (maybe serial only, make sure we understand what capabilities they have)

    • Increased use of portable devices to access field devices (Laptops/Tablets?).

  • Possible Mitigations

    • Lock down unneeded services, ports and restrict access (Disable unused ports on the switch).

    • All devices used to interface with the field devices should be secured and monitored (have anti-virus and properly logged and accounted for).

    • Think about what devices are present and how they are communicating with central system and how they are controlled?

Least Functionality

  • Determine necessary ports, protocols, and services (What are the vendor recommendations/talk to the vendors what needs to be open on firewall/router)

  • Deny all others at the host and firewall

  • Harden devices (be careful while hardening and test whether everything is working or not; Never test on live system.)

  • Network access control (What can talk to what or each others? )

  • Use the data from a scan such as Nmap, to identify unused ports and service and disable all unused ports and services off. This should be done at the host. However, if it cannot be done at the host, use other mitigations, such as a firewall, to block any access to the services or any traffic leaving these hosts on these ports.

  • Hardening systems using security guidelines or controls will also reduce your attack surface. Work with vendors to determine hardening guidelines/settings for ICS equipment

Least Privileges

  • Establish user accounts for administrators (separate accounts for engineers, administrators and test that they are able to do their work and perform their responsiblities)

  • Appropriate use of the escalated privilege function (Check if the user needs esclated privileges and it is logged properly and they use it appropriately (whenever it is really required)).

  • Review work requirements for necessary access requirements

  • Role-based access (provide appropriate access for appropriate person).

Tools

GrassMarlin (Retired)
  • GrassMarlin can be used to identify traffic and systems on ICS network.

  • GrassMarlin is a passive network mapper dedicated to ICS and SCADA networks in support of network security assessments.

  • GrassMarlin passively maps, and visually displays, an ICS/SCADA network topology while safely conducting device discovery, accounting, and reporting on these critical cyber‐physical systems.

GrassMarlin gives a snapshot of the ICS network including:

  • Devices part of the network;

  • Communications between these devices;

  • Metadata extracted from these communications.

  • Reads in Zeek Connection logs, PCAP files and PCAP-NG files or can listen on the wire

Protect

IT-OT Convergence

  • Does IT/OT talk to each other? (They should be able to work together and help each other and whenever they have problems they talk to each other and solve problems by respecting each other.)

  • What we can do to improve communication between IT and OT teams (invite them to meetings, talk to them regarding something they are expert in and can help (firewall issues))

Human element

  • Policies and Procedures specific to ICS

    • Outline rules with regard to securing ICS (What kind of things we need to secure?)

    • Computer use policy (helps to understand what’s expected and what’s not)

  • Make security a priority (everyone should be aware of the ICS security)

  • Training and awareness

    • Employees are part of your defense (They are the most important people. Employee errors or unintentional actions often leads upto 50% incidents).

    • See something, say something (If they see something that is not right, ask them to mention)

    • Talk about security in staff meetings (something going on in your network, group or unit and training around security)

  • Lessons learnt from past incidents

    • User education is important.

    • Do regular phishing tests (As an OT person, we can take help of IT department to set this up.).

    • Explain to users the consequences of clicking bad links (Usually people often don’t understand why it is bad to click on links, if they understand they are more careful.)

OPSEC

  • Operational Security, or OPSEC, is when we protect unclassified information from leaking out via our own actions and behaviors. The goal of Cybersecurity OPSEC is to minimize your digital footprint /information leakage and to minimize the damage when things go bad. In the best of scenarios you might almost drop off the grid completely. Remember that OPSEC does not replace any other security disciplines - it supplements them.

  • Always be aware of what your company is presenting to the outside world (what your network looks from outside? Do we have FTP/SSH server accessible from internet? )

  • Do you know what is on your company’s external webpage and social media feeds?

  • Are vendors using your company for free advertising?

  • Are your IP address ranges showing up in Shodan ICS? If you give data to vendors, do you know how they are storing it?

  • The OPSEC process is categorized into 5 questions/steps. One of the first questions is, who would want access to the data in question, what needs protected?

The OPSEC process
  • Cybersecurity practices can prevent the disclosure of critical information to threat actors. A primary security goal is to control information about your organization’s capabilities and intentions in order to prevent such information from being exploited. The longer it takes an adversary to obtain critical information, the more time you have to discover problems and block access to the information and your assets. In addition, most of us already use cybersecurity practices in our personal lives without even realizing it.

  • Cybersecurity practices include: identifying critical information, analyzing the threat, analyzing the vulnerabilities, assessing risk, and applying countermeasures. In all steps, view the situation from both friendly and adversarial points of view.

  • Practicing cybersecurity is a continuous process, not one that “ends” when you complete the fifth step. In fact, the steps do not necessarily have to be followed in a particular order.

Identify Critical Information
  • What needs to be protected?

  • What adversaries might want to do? / What information will the adversaries need to accomplish their goal? (Be sure to analyse the from both friendly and adversial point of view.). It is the aggregation of information that can be gathered on a target that poses the threat.

  • A company critical information could include - Network Diagrams - Employee Data - email addresses and work schedule - List of usernames

  • Social media profiles are often analysed to aggregate information. Profiles are gold mines of information for attackers. They provide an idea of - What people do? - Where they work? - What type of software are used? - Any issues that can be replicated in corporate environments. - Profile pictures are also useful when gathering information. - Before you post comments or share content on support forums and social media. Ask yourself “Does this give an attacker any information they could use to build a profile (or further build) on me or my company?”

Analyse the Threat

Who is the threat?

  • A threat is a potential danger. It is often defined as any person, circumstance, or event with the potential to cause loss or damage. Threat requires both intent and capability. If one of these isn’t present, there is no threat.

  • To analyse a threat, we need to identify

    • Who are the potential adversaries (e.g., competitors, insiders, terrorists)?

    • What is the adversary’s intent and what capabilities do they have? For example, a disgruntled employee might have different capabilities than a competitor.

    • What does the adversary already know? For example, what might they know from researching information published on the Internet or in trade journals.

    • What does the adversary need to know to succeed (e.g., control system commands, how to gain remote or physical access)?

    • Where is the adversary likely to look to obtain the information (remember, an adversary is apt to go to more than one source)?

  • Thinking from the adversary’s point of view will help you analyze the threats in your work environment.

Analyse the vulnerablities

What are my vulnerablities?

  • Determine the weaknesses (that is, vulnerabilities) that may be exploited by an adversary to gain critical information. Vulnerabilities include:

    • Inadequate training of employees

    • Use of unsecured communications

    • Publishing the control system manufacturer or vendor used

    • Systems designed without security in mind.

  • It is important to think like the adversary in this step. One way to discover vulnerabilities is to look for indicators.

    • Indicators are observable or detectable activities or information that, when looked at by themselves or in conjunction with something else, point to a vulnerability regarding your organization’s operations. For an adversary, indicators are clues that a vulnerability exists and can be exploited.

    • For example, a fence suddenly put up where one did not exist before could tip off an adversary that something valuable is inside the fence. Other examples of indicators include: people in unusual places, unfamiliar cars in an employee parking lot, and late-night meetings. Although indicators are not vulnerabilities by themselves, they can point to or reveal vulnerabilities.

Access Risk

What is the threat level?

Assessing risk incorporates using the risk formula and conducting risk assessments

  • Risk is the liklihood that an adversary will gather and exploit your critical information, thereby having a negative impact on your organisation.

  • Risk is the product of threat x Vulnerablity x Consequence

    • Threat: Any person, circumstance or event with the potential to cause loss or damage.

    • Vulnerablity: Any weakness that can be exploited by and adversary or by accident.

    • Consequence: The negative impact (loss or damage) your organisation would incur if an attack were successful.

  • Risk increases when any factor increases. If a factor is missing, risk doesn’t exist because zero multipled by anything is always zero.

    • For example, if we are certain that no person or organisation is interested in causing your company harm, then there is no threat and therefore no risk (this situation is higly unlikely, because there are always people who act maliciously just because they can).

    • Similarly, if your network and protection devices (e.g. firewalls) are properly patched with all the latest updates, the vulnerability (and the associated risk) in this area maybe greatly reduced.

    • Finally, if a threat and a vulnerability exist but the consequences are nonexistent or minimal, then the risk is also nonexistent or minimal.

  • Risk assessment is a process in which you decide if a countermeasure needs to be assigned to a vulnerability based on the level of risk this vulnerability poses to your organization.

  • When you assess a vulnerability, also consider the adversary’s intent and capability—is the adversary willing to exploit your vulnerability, and does he or she have the means to do so? Next, determine the consequences if the vulnerability were successfully exploited. This determines the level of risk. You then decide if the level of risk warrants the application of one or more countermeasures.

  • Looking at risk as a function of consequence (as opposed to asset value) may allow for easier calculations applicable to control system environments. Elements critical to the control domain, such as loss of life, time to recover, and environmental impact, can help in these calculations.

  • Keep in mind that consequences aren’t always something that have an immediate financial impact. The failure of a control system could result in negative media attention.

Apply countermeasures

How should we combat the threats?

  • A countermeasure can be anything that reduces an adversary’s ability to exploit vulnerabilities. Countermeasures don’t need to be complicated or expensive. For example, locking your car door and removing the keys from the ignition are simple, smart ways to make it harder for someone to steal your car.

  • Countermeasures are implemented in an order of priority directly proportionate to the risk posed by different weaknesses (the most significant consequences to your mission, operation, or activity). Often implementing several low-cost countermeasures provides the best overall protection.

  • Consider all possible countermeasures, and then assess the potential effectiveness of each one against a specific vulnerability or multiple vulnerabilities.

Few countermeasures:

  • Controlling Distribution: Limiting sharing of information to those who need it.

  • Cyber Protection Tools: Implementing anti-virus software, firewalls, and intrusion detection systems can greatly reduce an adversary’s ability to cause damage.

  • Speed of Execution: Accelerating the schedule can limit the ability of an adversary to act on the information they have obtained.

  • Awareness Training: Educating employees about all aspects of cybersecurity practices is one of the most effective countermeasures.

  • Physical Security: While it may be wise to employ security guard patrols, an organization must also ensure that patrol schedules are somewhat randomized, and shift changes are kept secret in order to prevent an intruder from determining a pattern.

Secure Passwords

  • Adversaries focus on gaining legitimate credentials to traverse the network

  • NIST SP 800-63B Guidelines (Digital Identity Guidelines - Authentication and Lifecycle Management)

    • Fewer complexity rules enforced

    • Expiration of passwords no longer based on a time schedule (If the passwords are good and strong, maybe no need to change them every time)

    • Passwords should be screened again lists of dictionaries and common, easily guessed passwords (mention to employees that we will try to guess and crack their passwords and they will create strong passwords)

    • Allow paste functionality from Password Managers (also store your passwords in a safe secure location)

  • Industry compliance documents or your organisation policies may differ.

    • NERC CIP standards (CIP-007-5)

    • NIST-800-53

  • Base password - Think of three of your favorite things.

    • For example: Let’s say we love icecream, tacos, and vinyl records so that give us MintChipVinylTacos

    • Now separate each word with your favorite character. Let’s say we love money so separate it with dollar or pound. $Mint$Chip$Vinyl$Tacos$.

    • Now, add a familar number like your postal code or reverse birth year like $Mint$Chip$Vinyl$Tacos$90277

    • Now, the above password meets all requirements like upper, lowercase, numbers, special characters etc.

    • The above can be your base password and that’s a pretty strong password.

    • Now, humans a lot of the time we want path of least resistance, so it will be tempting just to use this new, awesome, password for all of your accounts. Don’t do this!

    • Make them special!

      • Know you can make them unique with a special identifier for each sensitive site.

      • Maybe for Facebook: FB add it at the beginning, end or even split it up.

      • Although that’s a risk if hacker get your password or understand your pattern but mostly it is a great way to have long unique passwords.

Vendor Access

Vendor connections to the ICS Network
  • One of the most common ways malware and viruses are introduced into ICS environments is the use of media that has been shared or used on systems outside the production environment.

    • To mitigate that risk consider implementing the following:

      • Implement a Dedicated workstation to transfer files and patches to trusted devices that is up to date with the latest virus and malware definitions not connected to the ICS network.

  • Do not allow vendors or 3rd party USB’s in ICS environment (We have no idea who’s USB device it is, where it has been, what it contains?)

  • Have a device whitelisting application or ability to disable media ports.

  • Provide security policies to govern use.

  • Configure your removable media policy to notify your security team of events of when access to USB ports or unapproved media is attempted to be used.

Removable Media
  • If possible, do not allow personal devices to be used in the ICS network (people charging their phones on ICS network?, malicious USB (USBDucky, OMG Cable and others?))

  • If this is not possible, provide good security policies to manage the use of personal devices, and use company resources to help implement the policies.

  • Enterprise device management technology can help ensure that only approved assets can be attached to ICS networks and computers.

  • Lessons learnt from past incidents

    • Good network segmentation can prevent malware call backs.

    • Monitor USB usage especially in the ICS environment (inventory of allowed USB devices, who have them and what they are using it for?).

Secure Authentication
Multi-factor Authentication
  • An increasing number or organizations are implementing multi-factor authentication to add a layer of protection (defense-in-depth) to security. By requiring a second authentication method in addition to the standard user name/password method, organizations implement a powerful countermeasure.

  • Definition:

    • What the user knows (password), what the user has (security token), and/or what the user is (biometric validation).

    • Something you know (password or PIN) + something you have (such as access token or security token) + something you are (such as fingerprints or retinal scan)

  • Single factor authentication increases the attack surface.

  • Use multi-factor authentication for remote access and critical administrative access.

  • Can be used with VPN, network device access, administrator access to systems.

  • Example: Many asset owners use single-factor authentication for remote access. If a user has a vulnerable machine, the attack surface is greatly increased.

Secure VPN access
  • Limit VPN access to business requirements - vendors, technicians, integrators (who has access to what? If providing access to vendor, terminate VPN as close to edge as possible and provide access to only required systems/segmented network/DMZ. Good idea to define that in vendor contract agreements)

  • Require company issued and configured systems be used without Admin access (No admin access provided until and unless really required).

    • If they require admin access or access to a particular resource, work with them to figure out how we can provide that securely. Otherwise, technical users will always figure out a way to achieve it which might result in undocumented access.

  • VPN security policy should check for patches, a personal firewall, and an antivirus product.

  • Utilise a jump-box, or a virtual desktop for further network access.

  • Utilise a second domain controller (Have a separate IT/OT domain controller)

VPN Logs

VPN appliance provides a wealth of logging information regarding the perimeter of your network. This information can be used to monitor the health of the system and potentially detect malicious activity. It is important to:

  • Find unusual login attempts: Look for unusual situations, such as the company President logging in from a Starbucks in England, when the President is actually in the middle of a safari in Africa.

  • Monitor failed authentication attempts: All devices or processes that require identity authentication should log and/or alert when an identity validation attempt fails.

  • Monitor successful authentication attempts from different sources: If available, all devices or processes should log and/or alert when the same user logs in simultaneously from two different source locations.

  • Monitor successful authentication under duress: For critical systems, consider deploying an authentication mechanism that supports duress codes. This allows a user under duress to log into a system using a secondary credential, but alerts that the access was performed under duress.

  • Monitor failed access attempts: All devices or processes that manage access control to communications, data, or services should log and/or alert when access is requested that is not allowed.

  • Monitor successful access attempts: All devices or processes that manage access control to communications, data, or services should log when access is requested and allowed.

Lessons Learned
  • Virtual Machine Use Case

  • Incident: VM was configured in an ICS environment with the VM hardware (vmware/hardware machine) located in the ICS DMZ. Management interface provided direct connectivity to the corporate network for ease of use. Further, ICS servers in the VM bridged the DMZ firewall to the ICS network

  • Lesson: Bridged the corporate protected communications to the VM management interface located in the ICS DMZ. Utilize VMware security guidance to setup VMware systems.

  • VPN/Password Use case

  • Incident: A user had a VPN connection and was logged in as administrator. The user’s home PC was dual homed with VPN client and a public interface.

  • Lesson: Proper configuration of VPN client. Limit VPN access to business requirements. Do not allow users to run as admin.

ICS Network segmentation

  • The Purdue Enterprise Reference Architecture (PERA) Model is suggested by the DHS Assessment Team as a best practice for segmenting networks.

  • The PERA model segments industrial control devices into hierarchical “levels” of operations within a facility. Using levels as common terminology breaks down and determines plant wide information flow. Zones establish domains of trust for security access and smaller LANs to shape and manage network traffic.

  • This model groups levels into the following zones for specific functions:

    • Enterprise Zone: Levels 4 and 5 handle IT networks, business applications/servers (e.g. email, enterprise resource planning - ERP) as well as intranet.

    • ICS Demilitarized Zone (IDMZ): This buffer zone provides a barrier between the ICS and Enterprise Zones but allows for data and services to be shared securely. All network traffic from either side of the IDMZ terminates in the IDMZ. No traffic traverses the IDMZ. That is, no traffic directly travels between the Enterprise and ICS Zones.

    • ICS Zone: Level 3 addresses plant wide applications (e.g., historian, asset management, authentication, patch management), consisting of multiple Cell/Area Zones.

    • Cell/Area Zone: Levels 0, 1 and 2 manage industrial control devices (e.g., controllers, drives, I/O and HMI) and multi-disciplined control applications (e.g., drive, batch, continuous process, and discrete).

  • Typical Flat network

    • Poor asset inventory

    • Poor boundary protection (HMI’s directly connected to the Internet)

    • Poorly Secured Remote Access

  • Recommended Secure Network Architecture

    • Good Asset Inventory and Data flows (How does data flow and what data flow is important/critical (what must always be available))

    • Good Boundary Protection

    • Secured Remote monitoring and Access

    • Isolation of Safety Instrumented Systems (How are safety systems implemented?)

Firewall Implementation

The firewalls are placed at the front line of defense for each of the various zones. These firewalls provide the trusted path for users and applications to communicate with and between all of the various pieces.

  • There are two complimentary principles for segmenting networks.

    • The first principle includes the general functions of a system:

      • Serve external customers

      • Handle facility environmental controls

      • Support IT

      • Process HR data

      • Run/supervise ICS process data

      • Run/Supervise ICS

    • The second principle is trust level.

      • What is the sensitivity of the data/system/data path?

  • Segmentation should be implemented using firewalls or at least routers with access control lists (ACLs). Some considerations for firewalls:

    • Know your environment

      • How does data flow?

      • How is data used? (What does that data mean?)

      • Who uses the data? (Who is the owner of the data? Mostly historian from ICS persecptive)

    • Newer next generation firewall support multiple ICS protocols/standards.

    • Trade off efficency vs. security vs. cost (Every device can provide or hinder efficency or has a cost to it)

    • Erroneously deployed as a cornerstone of architecture (requires month of planning/architected)

Firewall Rules

Without rules, firewall is basically a router.

  • Block direct traffic from the control network to the corporate network. All ICS traffic should end at the DMZ.

  • Every protocol permitted between the control network and the DMZ should be explicitly denied between the DMZ and corporate networks (and vice versa).

  • ICS networks should not be connected directly to the Internet, even if they are protected by a firewall.

Firewall Logs
  • Firewalls logs provides insights into security threats and traffic behaviour regarding the perimeter of your network. Information can be used to monitor the health of the system and potentially detect malicious activity. It is important to:

    • Identify traffic denied at the firewall - e.g. traffic from inside the network that is bouncing off the firewall (what traffic is trying to get out?)

    • Identify traffic allowed at the firewall

    • Identify multiple connections from multiple devices in your network to a few target locations

Data Diode

  • A data diode is a unidirectional gateway intended to move data from a more secure network to a less secure network.

  • A data diode creates a physically se cure, one-way communication channel from the control system network to the corporate network. Data diodes can be implemented in hardware, software, or a combination of both. The hardware implementation is the most secure because it is physically impossible to send any messages in the reverse direction.

Data Diode vs. Firewalls
Data Diodes
  • Behaves like a Proxy Server: converts TCP sessions to UDP

  • Uni-directional communication: reverse tunneling not possible

  • May cost more than some firewalls

  • Fewer rules: rules require less auditing

  • Transmits only the data: no connection between systems.

Firewalls
  • Two-way communications: tunneling possible.

  • Rules require more auditing due to complexity of rule set

  • Cannot create a one-way communication. UDP is one way. Does not create anything but one way.

Patch Management

  • BEFORE PATCHING ANY ICSOT SYSTEM (PLC/RTU/HMI) ENSURE YOU HAVE A GOOD BAREMETAL BACKUP OR ABILITY TO RESTORE THE SYSTEM TO THE CURRENT STATE!

  • Patches are intended to:

    • Fix known vulnerablities.

    • Enhance functionality

  • Software that needs patching includes

    • Operating System

    • ICS Application/hardware

    • Third-party applications

  • Patch deployment considerations

    • Test and validate

    • Offline systems vs. live systems

    • Work with vendors for patch applicability.

Patching Considerations

Considerations when deciding to patch systems:

  • How critical is each system to production?

  • What complications arise in patching critical infrastructure?

  • What is the cost of a patch?

  • What is the cost of not applying a patch?

  • What is the businesssecurity driver in patching?

  • Do you have a mitigating control in place if you decide patching is not an option?

Potential Patch Complications
  • Patching can break other software components

  • Patching can break 3rd party software components

  • Updating antivirus definitions can inadvertently stop legitimate processes

  • Sand box systems are not used directly for production

  • Balance in waiting to test the patch and applying a patch before it is fully tested

    • Systems remain vulnerable until they are patched, or mitigating controls are implemented.

Application whitelisting

Advantages
  • Blocks most current malware

  • Prevents use of unauthorized applications (have good software inventory. Process environment is very predictable)

  • Does not require daily definitions updates

  • Administrator installation and approval of new applications.

Limitations
  • Approved applications - compromised in supply chain.

  • Malware that exploits application that run in higher-level execution environments such as Java may not be found.

Disadvantages
  • Requires performance overhead

  • Requires regular maintainence

  • Causes some users to be annoyed

Detect

Identify a cybersecurity event

Intrusion Detection System

  • ICS environments provide a unique opportunity. Compared to a corporate environment, an ICS environment is a steady state. Once again, you must know your environment. Ask and answer the following questions:

    • WHAT is normal? (Is this documented?)

      • You know that host “A” talks to host “B,” but not host “C”…

    • WHEN does “normal” become abnormal? (indicators that something might be going on?)

      • Host “A” is now talking to host “C”…WHY?

    • WHOSE applications and services are on your critical networks?

    • WHICH protocols are used?

      • Known IT protocols (DNS traffic, HTTP traffic)

      • Vendor (Proprietary traffic)

IDS Types
  • Host: Sensors reside on the host system

  • Network: What traffic is on your network?

  • Application: Web application firewall, database, firewall, application protocol IDS.

  • Log: What is happening at the OS level? or at the application level?

  • Paper: Who came in?

  • Anomaly: Any combination of the above.

All methods of intrusion detection involve the gathering and analysis of information from various sources within a computer, network, and enterprise to identify possible threats posed by hackers inside or outside the organization.

IDS/IPS Functions

An IDS is not a cure‐all for network security problems. It is an alerting tool to let you know something has happened. An IDS can:

  • Provide forewarning

  • Provide forensics data

  • Provide “situational awareness”

  • Provide network troubleshooting

  • Identify policy abuse.

Placing an IDS outside of the firewall can be helpful for situational awareness and forewarning of activities. The IDS can detect scanning or other precursory attack activities that might be dropped by the firewall. An IDS cannot:

  • Tell you directly if the system was exploited

  • Monitor actions taken by the system console

  • Perform analysis of an event (requires human being to analyse ).

HIDS

Host-based intrusion detection (HIDS) refers to intrusion detection that takes place on a single host system. HIDS involves installing an agent on the local host that monitors and reports on the system configuration and application activity. Some common abilities of HIDS systems include:

  • Provides the “victims” view

  • Virus detection/mitigation

  • Local log analysis

  • File integrity checking

  • Policy monitoring

  • Rootkit detection

  • Network monitoring from the host viewpoint

  • Real-time alerting

  • Active response.

HIDS often have the ability to baseline a host system to detect variations in system configuration. In specific vendor implementations, these HIDS agents also allow connectivity to other security systems. This allows for central management of configuration policy and verification.

HIDS Deployment

HIDS tools are initially deployed in “monitor only” mode. This enables the administrator to create a baseline of the system configuration and activity. Active blocking of applications, system changes, and network activity is limited to only the most egregious activities. The policy can then be tuned based on what is considered “normal activity.” Once a policy is configured, it is then applied and distributed to the hosts. Benefits of central management architecture are:

  • Can be centrally managed with deployable policies.

  • Ability to apply changes to many systems at once

  • Create a “baseline” for known system types/use cases

  • Central authentication, alerting, and reporting

  • Central audit logging.

The main two concerns with using any HIDS in an ICS environment are:

  • Does Operating System even support the use of a HIDS?

  • Do the hosts have enough hardware capacity to support the HIDS (CPU, memory, network bandwidth, etc.)

Network Intrusion Detection (NIDS)
  • NIDSs scan traffic from its networks and look for known patterns in traffic (packets).

  • A NIDS can scan both sides of a conversation and can be reactive by blocking traffic when in IPS mode.

  • NIDS often does not know if the system is Windows, Linux, or a PLC. From a NIDS perspective traffic is traffic, and it simply reports on what traffic is seen on the network.

  • NIDS can have a high False-Positive or False-Negative rate based on the information used to generate the signatures.

  • NIDS are connected to the network via a SPAN/mirror port or a network tap.

    • When using a SPAN port, the switch sends a copy of all the network packets “seen” on one physical port (or an entire VLAN) to another physical port, where the packets can be captured and/or analyzed.

    • A networking monitoring tap can be used to collect network packets without having to configure a span port on a switch. Think of a tap as a special T‐connection that can read data from the network, but not inject any data of its own into the network traffic.

IDS Sensor Placement

The placement for IDS sensors is important.

  • Any change in trust zones should have an IDS/IPS deployed

  • A data diode should be attached to the historian. The IDS can also be deployed here

  • All points of presences for the external communications should have an IDS/IPS deployed

  • An IDS on either side of firewalls allows you to audit your firewall rules.

NIDS Signature vs. Anomaly Detection

Signature

Anomaly

Ex. Snort, Mcafee

Watches for specific events

Watches for changes in trends

Only looks for what it has been told

Learns from gradual changes

Can deal with any known threat

Can deal with unknowns, but any attack is subject to false-negative (Doesn’t know what attacks are, just know it’s change in traffic)

Unaware of network configuration changes

Sensitive to changes in network devices

Highly objective inspection

Subjective, prone to misinterpretations

Predictable behavior

Unpredictable behavior

Easy to tune manually

Netflow Anomaly Detection

NetFlow is a network protocol developed by Cisco Systems for collecting IP traffic information. NetFlow has become an industry standard for traffic monitoring and is supported by platforms other than Cisco. Routers and switches that have the NetFlow feature enabled produce UDP data streams that are sent to a NetFlow collector (server) where it can be processed and stored.

  • Describes a set of packets sharing these characteristics: src, sport, dst, dport, protocol, type of service.

  • Data include: time, number of bytes, number of packets

  • Usually sent via UDP or Stream Control Transmission Protocol

  • Distributed Denial of Service

    • Massive increase in flows

  • Trojan Horses

    • “Well-known” or unexpected services

  • Firewall Policy Violation

    • Unexpected inside/outside flow

Example Alerts for Anomaly Detection
  • Hosts scanning for services:

    • Are there external hosts poking at more than __ internal addresses?

    • Are there external hosts poking at more than __ ports on 1 (or more) internal hosts?

  • Internal infected host scanning/talking to for external hosts:

    • Is some internal host poking at __ external hosts?

    • Is some internal host poking at __ internal hosts?

    • Is some internal host poking at dark space (un-allocated Internet address space)?

  • Internal hosts talking to “Interesting Net blocks” (pick your favorite countries here)

    • Are there pokes from __ net blocks that may be of interest?

    • Are there pokes to __ net blocks that may be of interest?

  • Increased network traffic:

    • Distributed Denial of Service (DDOS)

    • Unexpected high volume - Data mining, egress?

Zeek IDS
  • Open-source

  • Allows scripting of monitoring policies

  • Collect logs for analysis (Non-standard ports, Connections, DNS, FTP, Files, HTTP requests, SSL, SMTP activity).

  • Analyzers for many protocols including Modbus and DNP3

  • Unexpected protocol level activity.

  • Logs can be used by several other security products.

IDS vs. IPS

IDS

  • Watching/ Passive alerting

IPS

  • Inline, Passive Alerting, Active Response

SNORT

  • Snort is an open-source network intrusion detection and prevention system. Snort is widely used and has become the standard for IDS/IPS.

  • Learning to write Snort rules is useful because most IDS/IPS applications will either use the Snort rule format or provide a way to import Snort rules.

  • If you are able to understand the data flow in your environment, you will be able to design simple anomalous traffic signatures quickly without regard to the actual details of the protocol used.

  • Snort rules are composed of a rule header and rule options. There are five types of rule options:

    • Metadata

    • Payload detection

    • Non-payload detection

    • Post-detection

    • Thresholding and suppression

  • We will focus on Metadata and payload detection

    alert ip ![10.0.10.20, 10.0.10.30] any <> [10.0.10.15] any (msg:"ALERT - Field Controller interacts with another node"; reference:url,mysite.org/rule1; reference:cve,2018-0000;sid:3000001;priority:1;rev:1;)

action

alert, log, pass, active, dynamic, or a custom defined type

protocol

ip, tcp, udp, icmp, any

src ip and src port

See below

direction

->, <> direction of the traffic that the rule applies to

dst ip and dst port

See below

Msg

Used by analyst to quickly identify the signature

Reference

Can use a predefined tag for a security web site or use “URL” to include any web site reference in the rules

Sid

The signature ID is used by Snort to uniquely identify rules. We recommend using a number > 3,000,000

Priority

Allows the user to set the priority of the rule. Highest - 1, Lowest - 10

Snort Preprocessors for ICS
  • A number of attacks cannot be detected by signature matching alone in the detection engine, so protocol “examine” preprocessors step up to the plate and detect suspicious activity. These preprocessors include packet fragmentation, TCP stateful inspection, portscans, and many other Network/Application protocol‐specific activities.

  • Others modify packets by normalizing traffic so that the detection engine can accurately match signatures. These preprocessors defeat attacks that attempt to evade Snort’s detection engine by manipulating traffic patterns.

  • Snort cycles packets through every preprocessor to discover attacks that require more than one preprocessor to detect them. If Snort simply quit checking for the suspicious attributes of a packet after it had set off a preprocessor alert, attackers could use this deficiency to hide traffic from Snort.

  • Preprocessor parameters are configured and tuned via the snort.conf file. The snort.conf file lets you add or remove preprocessors as you see fit. Of particular interest to the ICS community are the DNP3 and Modbus preprocessors.

  • ICS Specfic: DNP3/Modbus

  • Other useful preprocessor: SSH, SSL, Portscan, httpinspect

DNP3 Preprocessor Rule Options
  • dnp3_func: Matches Function Code inside an Application-Layer request/response header

  • dnp3_ind: Matches on the Internal Indicators flags in Application Response Header (Similar to TCP flags)

  • dnp3_obj: Matches on request or response object headers

  • dnp3_data: Reassembled Application-Layer Fragments.

DNP3 Preprocessor Examples

Here are some examples of the new DNP3 preprocessor rule options:

  • Alerts on DNP3 Write Request:

    • alert tcp any any -> any 20000 (msg:"DNP3 Write request"; dnp3_func:write; sid:3000001;)

  • Alerts on reserved_1 OR reserved_2 being set:

    • alert tcp any 20000 -> any any (msg:"Reserved DNP3  Indicator set"; dnp3_ind:reserved_1,reserved_2; sid:3000002)

  • Alerts on Content in Re-assembled Application-Layer Fragment:

    • alert tcp any any -> any any (msg:"badstuff' in DNP3 message"; dnp3_data; content:"badstuff"; sid:3000003;)

    • Notice in the third rule, dnp3_data sets the content buffer to the beginning of the Re-assembled Application-Layer Fragment then looks for the content: “badstuff”

Modbus Preprocessor Rule Options
  • modbus_func: Matches against the Function Code inside of a Modbus Application-Layer request/response header

  • modbus_unit: Matches against the Unit ID field in a Modbus header

  • modbus_data: Sets the cursor at the beginning of the Data field in Modbus request/response

Modbus Preprocessor Rule Examples
  • Alerts on specific Modbus function:

    • alert tcp any any -> any 502 (msg:"Modbus Write Coils  request"; modbus_func:write_multiple_coils; sid:3000004;)

  • Alerts on unauthorized host

    • var MODBUS_ADMIN 192.168.1.2

    • alert tcp !$MODBUS_ADMIN any -> any 502 (msg:"Modbus command to Unit 01 from unauthorized host";  modbus_unit:1; sid:3000005;)

  • Alerts on Content in modbus data field

    • ``alert tcp any any -> any any (msg:”String ‘badstuff’ in Modbus message”; modbus_data; content:”badstuff”; sid:3000006;).

Example Rule Variables
  • ipvar HOME_NET [1.2.3.0/24,10.0.10.0/24]

  • ipvar EXTERNAL_NET [!HOME_NET]

  • ipvar CANARY 1.2.3.4

  • ipvar PCS [10.0.10.0/24]

  • ipvar CORP [1.2.3.0/24]

  • ipvar HMI [10.0.10.20,10.0.10.30]

  • ipvar AD 1.2.3.20

  • ipvar FC 10.0.10.15

  • ipvar HIST1 [10.0.10.150]

  • ipvar CONFDB [10.0.10.10]

  • portvar TAG 2000

  • portvar TAG_RANGE [2000:2020]

Example Rules
  • #Field Controller (FC) talking to unknown system

    • alert ip ![$HMI,$HIST1,$CONFDB] any -> $FC any (msg:“ALERT - Field Controller interacts with unknown node"; sid:4000001; priority:1; rev:1;)

  • #Configuration Database talks to unexpected system

    • alert ip [$CONFDB] any -> ![$FC,$HMI,$HIST1] any (msg:“ALERT - Configuration DB Communicate with new system; sid:4000002; priority:1; rev:1;)

  • # PCS network communication with CORP network, trying to bypass the firewall

    • alert ip [$PCS,!$HIST1] any -> $CORP any (msg:”PCS network talking to CORP network”; sid:4000003; priority:1; classtype:unknown;)

  • #Configuration Database updates (auditing tool)

    • log ip [$CONFDB] any -> [$FC,$HMI,$HIST1] any (msg:“AUDIT - Configuration Updates; sid:4000004; priority:10; rev:1;)

  • # LOOKING FOR BAD TRAFFIC

  • # Find traffic involving a canary

    • alert ip any any <> $CANARY any (msg:”The canary is talking”; sid: 4000005; priority:1; classtype:unknown; tag:session,256,packets;)

  • #Monitor for the Field Controller talking to the Internet

    • alert tcp $FC any -> $EXTERNAL_NET any (msg:”PLC talking to the outside world”; sid:4000007; priority:1; flags:S; classtype:bad-unknown;)

  • # Monitor for AD attempting to connect to the Internet

    • alert tcp $AD any -> $EXTERNAL_NET any (msg:”AD attempting to talk to the outside world”; sid:4000008; priority:1; flags:S; classtype:bad-unknown;)

  • #Command shell on HMI

    • alert ip any any -> $HMI any (msg:”cmd.exe on HMI”; content: “cmd.exe”; sid:4000009; priority:1; classtype:unknown;)

Log Sources and Management

Logging Architecture
  • A central log server can assist in an incident by providing a chronological list of the events surrounding an incident that give the bigger picture.

  • Multiple systems/sources can send their data to a central log server where it can be correlated with other information.

  • Correlating with other logs can sometimes make the difference between recognizing an event for what it is (true or false) and then acting accordingly. The same data can provide valuable information (such as an IDS) to the security analyst.

There are some considerations in centralizing logs:

  • Properly prioritize the function of log management. Define requirements and goals for log performance and monitoring based on applicable laws, regulations, and existing organizational policies. Then, prioritize goals based on balancing the need to reduce risk with the time and resources necessary to perform log management functions.

  • Create and maintain a secure log management infrastructure. Identify the needed components and determine how they will interact (e.g., firewall rules, diodes). With the various types of information in one place, the log server becomes a valuable system to target a critical system to protect. It should only run the logging service and be in a highly protected area of your network.

  • Provide appropriate support for staff with log management responsibilities. All efforts to implement log management will be for naught if the staff members who are tasked with log management responsibilities do not receive adequate training, proper tools, or support to do their jobs effectively. The staff members need to understand what situations are normal, bad, and weird. Providing log management tools, documentation, and technical guidance are all critical for the success of log management staff.

Log sources
  • Firewalls

  • VPN Servers (maybe part of firewall logs)

  • Operating Systems (e.g Windows, *nix, Mac)

  • Proxy Server

  • Web Servers (e.g. IIS, Apache, NGinx)

  • Databases (e.g. MS SQL, Oracle, MySQL)

  • Others (e.g. PLCs, HMIs)

Log Transport
syslog
  • Defacto standard in IT community

  • Use UDP/TCP

  • Data diode can be used

  • Encryption can be used

  • Third-party tools maybe necessary for some OS or applications.

Operating System Logs
  • Operating system logs can be used to monitor the health of the system and detect malicious activity

  • Windows OS

    • Security Log

    • System Log

    • Third-party agent to send logs to a remote server.

  • Linux/Unix OS

    • Syslog transport part of OS

    • auth.log, messages

Security Audit Logging Web Server Logs
  • Review daily to determine a baseline

  • Web server logs will show:

    • who visited the website

    • when they visited the website

    • what they did while viewing the website (including SQL queries)

    • Where they came from?

Security Audit Logging Database Logs
  • User logins and logouts

  • Database system starts, stops and restarts

  • Various system failures and errors

  • User privilege changes

  • Database structure changes (tables that has been deleted/data that has been changed)

  • Most other DBA actions; and

  • Select or all database data access (if configured to be so)

Security Information and Event Management

Capabilities

  • Data aggregation

  • Correlation

  • Alerting

  • compliance

  • Forensics analysis

Honeypots & Canaries

  • Decoy systems (sit on your network and try to replicate how your network looks like)

  • Variant of an IDS

  • Any traffic seen talking to a Honeypot could be considered malicious

  • Open-source ICS Honeypots are available: Conpot

  • Canaries (doesn’t communicate with any other system on your network. If an IDS is watching for ANY traffic to/from the canary, you will get an early warning that something is going on that shouldn’t be).

Respond and Recover

Execute activities taken during and after a cybersecurity event.

  • The Respond Function supports the ability to contain the impact of a potential cybersecurity event. Examples of outcome Categories within this Function include: Response Planning; Communications; Analysis; Mitigation; and Improvements.

  • The Recover Function supports timely recovery to normal operations to reduce the impact from a cybersecurity event. Examples of outcome Categories within this Function include: Recovery Planning; Improvements; and Communications.

  • Incident Respond Phases

    • Preparation –> Identification –> Containment –> Clean-up and Recovery –> Follow-up

Preparation

  • Build your team

  • Plan your response

    • Secure and alternate methods of communication.

  • Scribe(s) for each group within the team.

    • Securable room where you can keep accurate and complete information

    • access to ALL of the logs and data.

    • Known, certified clean computer systems to do forensics.

    • Person with the authority to unplug from the internet (maybe your manager, CEO?)

  • Define your strategy.

  • Create documentation

  • Train your teams and users

    • A practiced plan

  • Gather threat intelligence

    • Feeds & threat reports

    • Yara rules and indicators of known malware (know whats going on in the world)

  • Use a checklist for starting point

  • Compliance and safety officers should review the IR plan.

Incident Response Team
  • Senior Technical staff

  • Lead and Forensics Analysts

  • Scribe(s)

  • Stakeholders from:

    • Corporate IT

    • Control Systems

    • Subject Matter Experts

    • Public Relations

    • Legal Counsel

    • Law Enforcement (if necessary)

    • IT and/or financial auditors (optional)

Identification

  • Starts when incident is detected (snort/log alert?)

  • Forensics tools

  • Use the intelligence gathered

  • Thorough analysis of logs and network traffic

Containment

  • Find the call back addresses

  • Stop the information flow leaving the network

  • Stop the malware from spreading

Clean-up and Recovery

  • Remediation

  • Intrusion Clean-up

  • Affected system back-in service

Follow-up

  • Incident report

  • Lessons Learned

    • Update incident response plan

    • update threat intelligence

    • Implement new security initiatives

Network Forensics
  • Main purpose: Incident response and Law Enforcement

  • Items to analyse in packet Captures

    • Pattern matching - match specific values

    • Conversations - identify all sessions of interest

    • Exports: export sessions of interest

  • Tools used in network forensics

    • Wireshark, Network Miner, Tcpdump/windump, tcpflow, tcpxtract, argus, YARA, others.

YARA
  • Main purposes: to help identify and classify malware samples

  • Yara Rules

    • consists of a set of strings and boolean expressions

    • can be found in security alerts and bulletins

    • can be used by different security tools

Cybersecurity Practices

  • Incorporating cybersecurity practices into your daily life can prevent the disclosure of critical information (CI) to potential adversaries. If you’re thinking, “But I work in a control system environment; control systems don’t store CI,” then consider our definition of CI:

  • Information that if disclosed would have a negative impact on an organization. It includes not only trade secrets and technical specifications, but also sensitive information such as the processes used by systems (e.g ., commands and access points), financial data, personnel records, and medical information.

  • CI also refers to the information that protects assets, such as passwords to access systems or passcodes to enter a building or room. Recipes, formulas, and strategies are usually CI. Even information such as your name, phone number, and email address—especially when all three of these information pieces are together—may be considered sensitive, because it helps an adversary launch a social engineering or phishing attack. In control system environments, the result of CI disclosure may be severe economic impact or loss of life.

Why Do It?

  • You probably incorporate cybersecurity practices in your personal life without even realizing it. For example, when you have prepared to go on a trip, have you ever done any of the following?

    • Stopped newspaper deliveries so newspapers wouldn’t pile up outside, letting people know you aren’t home?

    • Had your mail held by the post office or asked your neighbor to pick up your mail so the mailbox would not fill up?

    • Connected your porch lights and inside lights to a timer or light sensor so they would go on and off to make it look like someone is home?

    • Left a car parked in the driveway?

    • Had someone keep the lawn trimmed?

    • Asked a friend or neighbor to periodically open and close blinds or curtains?

  • The CI here is obvious - we do not want a burglar or other “bad guy” to know the house is unoccupied. The more clues we provide to an adversary that the house is unoccupied, the more likely it is the house will be robbed. The same holds true at work. We must reduce or obscure indicators to protect our critical information.

Information collection techniques

  • Who are these adversaries?

    • They may be competitors, criminals, spies, unhappy employees, terrorists, or troublemakers. They may be motivated by money, revenge, or political beliefs, to name a few.

    • There are numerous ways adversaries collect information. Some of the more common methods include social engineering, phishing, accidental disclosure, googling, and dumpster diving.

Social Engineering

  • Social engineering is a collection of techniques used to manipulate people into revealing sensitive or other critical information. Those who engage in social engineering rely on the humans’ natural tendency to trust. In fact, it’s often easier for an adversary to obtain information by simply asking the right questions than using technical hacking methods.

  • Social engineering is sometimes conducted by phone. The caller may pretend to be someone in a position of authority or a telephone or computer technician, gradually pulling information out of the targeted person. Often the adversary will call several employees and piece together enough information to launch an attack. Help desk employees are often targeted by an adversary because they’re trained to be friendly and provide information.

  • Social engineering can also occur through online social forums, at professional conferences, and at non-work social events, to name a few examples.

  • The first objective of an adversary attempting social engineering is to convince you that they are in fact a person that you can trust with critical information.

Thoughts: What your employees do on their personal social media poses little to no risk to your organization.

  • Social media is a place where people let their guard down. It’s what your employees check on their lunch break; it’s what they do when they arrive home from work and before they go to sleep at night. On social media sites, where the atmosphere is casual, the tendency to let certain information slip is greater, which brings risk.

  • The information your employees freely post to social media can (and probably will) be used against them. Many times, attackers will use social media as a reconnaissance tool to socially engineer their targets. Suddenly, the fact you publicly tweeted that you went to a leadership conference can be used to craft a targeted phishing email containing a malicious link. While the Nigerian princes of yesteryear might instantly raise eyebrows, if an email is customized to the recipient, the likelihood of the intended response increases.

  • Solving the problem: First, be pragmatic and realize that social media will always be attractive to attackers. But there are ways you can reduce the attack surface. Educate your employees on how much they should expose on social media as well as how to make the best use of available privacy settings.

Thoughts: It’s best to have one person tasked with maintaining, monitoring, and acting as an administrator for your various social media accounts.

  • In theory, this is a best practice – especially for smaller organizations that may lack a dedicated social staff. However, there are security risks with having one person with all the social media tribal knowledge. This risk is amplified when the social media manager mixes personal with professional.

  • For example, if your sole administrator has their personal account attached to your corporate accounts, and their personal account is hacked, you will land in some hot water by extension. Not only does this threaten security, but it also has the potential to threaten your brand image as well. If even a few incendiary tweets come from your corporate account, it could push clients away and lead to negative media attention.

  • Solving the Problem: Designate one person as the “main administrator,” but make sure that other employees – key executives, human resources, or the marketing department – have access to the social media information available. Furthermore, store the passwords to all your corporate accounts in a shared password manager. No employee should be able to easily rattle off any password, and none of your corporate social media passwords should be simple. A password manager can keep your passwords secure as well as help generate stronger ones.

Thoughts: Social media is keeping pace with advancements in security.

  • It is, but don’t let this lull you into a false sense of safety. The responsibility for security does not rest with the social media sites. At the end of the day it’s your problem to own. The controls only work as well as they are used.

  • Solving the Problem: You can stay ahead of the threat by implementing (and enforcing) a social media policy at your organization. While social media policies traditionally are often concerned with how employees should conduct themselves and how they should associate themselves with the organization, security needs to be part of the equation as well. A robust social media policy will incorporate security concerns – password guidelines as well as who can access the account – alongside more guidelines that are geared toward brand standards

Phishing

  • Phishing scams may be the most common types of social engineering attacks used today. Most phishing scams demonstrate the following characteristics:

    • Seek to obtain personal information, such as names, addresses, and social security numbers.

    • Use link shorteners or embed links that redirect users to suspicious websites in URLs that appear legitimate.

    • Incorporate threats, fear, and a sense of urgency in an attempt to manipulate the user into acting promptly.

  • Some phishing emails are more poorly crafted than others, to the extent that their messages often exhibit spelling and grammar errors; but these emails are no less focused on directing victims to a fake website or form where attackers can steal user login credentials and other personal information.

  • If you receive a suspicious email, normally the best defense is to ignore and delete the message. Your organization may have specific procedures to deal with suspicious email and web pop-ups.

  • Do

  • Report impersonated or suspect email.

  • Be cautious about opening attachments, even from trusted senders.

  • Take your time. Resist any urge to “act now” despite the offer and the terms.

  • Restrict who can send mail to email distribution lists.

  • Check financial statements and credit reports regularly.

  • Don’t

  • Send passwords or any sensitive information over email.

  • Click on “verify your account” or “login links” in any email.

  • Reply to, click on links or open attachments in spam or suspicious email.

  • Call the number in an unsolicited email or give sensitive data to a caller.

  • Put critical information on a website, ftp server, social media etc.

Dumpster Driving

  • Dumpster diving is the act of rummaging through commercial or residential trash and recycle bins to find useful items (including information) that have been discarded.

  • At your workplace, adversaries may search for proposal drafts, financial data, architectural designs, and personnel data, both on paper and media such as thumb drives. Bear in mind that dumpster divers aren’t just looking for formal documents—Post-it® Notes, and scraps of notebook paper often contain phone numbers, passwords, and other critical information.

  • Take care with information that is no longer valuable to you because it may have tremendous value to someone else. Follow your organization’s policies and procedures on proper disposal of information and equipment when they are no longer needed. The following are some common practices:

    • Shred paper documents, using a cross-shredder if possible.

    • Whenever possible, sanitize or physically destroy hard drives and other electronic devices that store information (this is discussed in more detail in the “Information Protection” lesson).

    • For devices that cannot be sanitized, physically destroy them.

Wireless Security

Devices such as refrigerators, TVs, coffee makers, etc. now have the ability to connect to the Internet, play music, send pictures, alert you of problems, etc. With the inception of these devices, life has never been more convenient. However, these modern-day conveniences can pose some security issues if left unprotected.

Incorporating wireless security practices such as password protection and Wi-Fi encryption can prevent unauthorized access or damage to devices through wireless networks. Examples of encryption types include:

  • WPS: Wi-Fi Protected Setup uses an 8-digit code to protect the passing of a secret key between two parties (usually the access point and the connecting device such as a laptop, smart phone, or tablet).

  • WEP: Wireless Encryption Protocol (WEP) was developed many years ago and has proven to be weak and easily breakable.

  • WPA: Wi-Fi Protected Access (WPA) was developed as a second-generation to WEP. Additional encryption was applied to the same algorithms. Unfortunately, it is not much stronger than WEP.

  • WPA2: Wi-Fi Protected Access version 2 (WPA2) is a complete rewrite of the algorithm. The current version has the most encryption and is most implemented.

Best Practices for using public Wi-Fi
  • Think before you connect. Before you connect to any public wireless hotspot – like on an airplane or in an airport, hotel, or café – be sure to confirm the name of the network and login procedures with appropriate staff to ensure that the network is legitimate. Cybercriminals can easily create a similarly named network hoping thatusers will overlook which network is the legitimate one. Additionally, most hotspots are not secure and do not encrypt the information you send over the Internet, leaving it vulnerable to cybercriminals.

  • Use your mobile network connection. Your own mobile network connection, also known as your wireless hotspot, is generally more secure than using a public wireless network. Use this feature if you have it included in your mobile plan.

  • Avoid conducting sensitive activities through public networks. Avoid online shopping, banking, and sensitive work that requires passwords or credit card information while using public Wi-Fi.

  • Keep software up to date. Install updates for apps and your device’s operating system as soon as they are available. Keeping the software on your mobile device up to date will prevent cybercriminals from being able to take advantage of knownvulnerabilities.

  • Use strong passwords. Use different passwords for different accounts and devices. Do not choose options that allow your device to remember your passwords. Although it’s convenient to store the password, that potentially allows cybercriminals into your accounts if your device is lost or stolen.

  • Disable auto-connect features and always log out. Turn off features on your computer or mobile devices that allow you to connect automatically to Wi-Fi. Once you’ve finished using a network or account, be sure to log out.

  • Ensure your websites are encrypted. When entering personal information over the Internet, make sure the website is encrypted. Encrypted websites use https://. Look for https:// on every page, not just the login or welcome page. Where an encrypted option is available, you can add an “s” to the “http” address prefix and force the website to display the encrypted version.

Information Protection

Identify several methods to protect critical information.

  • Refer Passwords

  • Refer MFA

Remote Access

Any device that remotely connects to the corporate or control system network provides an opportunity for an adversary to gain access to the device and attack your network.

  • One preferred defensive method is the use of security tokens. The security token displays a number consisting of six or more alphanumeric characters (sometimes numbers, sometimes combinations of letters and numbers, depending on vendor and model). This number normally changes at pre-determined intervals, usually every 60 seconds. When it is combined with a password, the resulting passcode is considered to be multi-factor authentication.

  • To ensure this countermeasure is effective, you should never share your security token with anyone else. You should keep it locked away or on your person at all times.

  • Other examples of “something you have” are smart cards and USB tokens. “Something you have” methods use readers or scanners installed on a device such as a computer. They are effective because they use a unique trait (such as fingerprint) to identify an individual.

Internet and Intranet Access

  • Your organization probably has policies about what can and cannot be put on public Internet websites. It may even have a review process to ensure sensitive data are not publicly available. However, sometimes seemingly benign information can reveal more information about your organization than it should. - For example, do your job postings mention the control systems and other equipment used? If so, this may be a piece of information an adversary can use in planning an attack.

  • Also consider information about your organization on other companies’ websites. Do your vendors’ press releases list where they have deployed their products? Do they publish their products’ manuals (which include control commands) on the Internet? A diligent adversary will gather information in as many ways from as many different sources as possible. A simple web search may reveal far more than you might think.

  • Do not forget about internal Internet sites. Remember that threats often come from within an organization. Critical information such as network diagrams and proprietary software code should not be made available to anyone without a need-to-know. Think twice before you publish anything on the Internet or Intranet—and if in doubt, leave it out!

Sanitation, Destruction, and Reuse

  • Sanitization permanently removes all data from equipment (such as a computer’s hard drive) by overwriting the data to make it unreadable.

  • Destruction means physically demolishing media to prevent recovery of any of its information.

  • Reuse refers to transferring equipment to another employee or an outside entity.

  • Organizations vary widely in requirements for sanitization. At one extreme, some organizations require all equipment with memory or storage devices must be sanitized before being transferred (even to another staff member) or disposed of. At the other extreme, some organizations have no policy in effect at all.

  • If your organization does not have a specific policy—or has a lax policy—at least, you should consider the criticality and sensitivity of the information on the device, and determine if it should be sanitized or destroyed before transferring or disposing of it

Device Candidates for Sanitization

Any equipment with a storage device needs to be sanitized in certain circumstances. Such devices include:

  • Desktop and laptop computers

  • Personally owned equipment that has processed company information

  • Smartphones

  • Desk phones that store telephone numbers

  • Programmable logic controllers (PLCs)

  • Copiers

  • Fax machines

  • Many scientific instruments

  • Media such as USBs and removable hard drives

How to Sanitize your Data
  • When you “permanently” delete files, the operating system makes the space available for future use. New data will eventually overwrite the old data (the “deleted” files), but until those data are overwritten, they can be recovered by someone with the right tools and know-how. Similarly, when you reformat a hard drive, the original data are still there in raw form and can be recovered.

  • Deleting files, emptying the Recycle Bin, and reformatting the hard drive are not enough!

  • Sanitization makes the data unrecoverable by overwriting the data. Fortunately, there are tools available to make this fairly easy, at least for standard desktop and laptop computers.

Protecting Critical Assets

State ways to physically protect critical assets at work, home, and while traveling.

  • The traditional physical security measures of “guns, guards, and gates” are no longer enough for today’s organizations. Many control system environments have effective physical security measures in place in addition to the traditional “three Gs” listed above.

  • For example, additional measures could be the use of camera monitoring, electronic entryways that deny access to anyone without the proper credentials, and keypad locks. However, physical protection and control are also the responsibility of individual employees.

  • This section covers protection measures you can take at work, when traveling, and at home.

Protection Measures At Work

Being vigilant is key to physically protecting information assets. Some of your responsibilities may include:

  • Know your environment and take appropriate action when something is out of the ordinary.

  • Be aware of who is behind you (and who may try to “piggyback”) when you are entering a restricted area.

  • Limit access to systems you are responsible for to those who have a need-to-know.

  • When appropriate, use a password-protected screensaver or some other lockout method when leaving a system unattended.

  • Close and lock your office door when you leave for extended periods.

  • Supervise the use and maintenance of your systems.

  • Do not leave critical documents or systems (including systems that store critical information) unattended in a publicly accessible area (such as a conference room or building lobby).

Protection Measures When Travelling

When you’re traveling, your information and computer systems (e.g., laptop, smartphone, etc.) are at even greater risk of theft or unauthorized access. Take the following precautions when traveling:

  • Do not leave systems unattended during travel. If possible, transport your systems in your carry-on bags instead of checked bags.

  • Pay attention when going through airport security. Thieves may be able to steal your laptop while you are focusing on getting through the security checkpoint.

  • Whenever possible, don’t leave systems in an unattended hotel room. If you are unable to take your system with you, use the hotel safe if one is available.

  • Avoid accessing critical information on your laptop or other device on the airplane or other public places. If you must access critical information, use screen filters to prevent the information from being read by others.

Protection Measures At Home

If you use or store work computer systems or information at your home, provide the same level of physical protection that you would at work.

  • Do not allow others without a need to know to access or use your system or information.

  • Ensure your home is secure when leaving systems and data. If possible, store the system and data in a locked room or locked storage container when unattended.

  • Do not leave systems or storage media in your vehicle.

  • Report the theft of company property from your home in accordance with your organization’s policies.

Defense-in-Depth Approach

  • Defense-in-depth refers to the use of multiple techniques to help mitigate the risk of one security measure being compromised or circumvented. These techniques are often a combination of information protection and physical protection measures.

  • One example is a building with an electronic card reader to permit and deny access, and a receptionist in the same building who checks credentials before allowing access. An additional defensive measure would be training all employees to verify building occupants are authorized to be there. With every measure that is added, security becomes “deeper” and risk is lessened.

Maintaining integrity

Identify specific ways to maintain integrity in secured areas.

  • What is and is not allowed in a secured area, such as a control system environment, varies from organization to organization. This section will cover some of the most common equipment do’s and don’ts.

Computers

  • In many control system environments, computers that are not needed for control system operations are not allowed in the control room. One reason for this is that email, websites, and files from home are common sources of malware (viruses, Trojan horses, spyware).

  • Some organizations do not have Internet connections within the control rooms and may allow limited use of computers not related to control room operations within them. When an Internet connection is allowed, it should be on a separate computer for an explicit purpose.

  • If a laptop is brought into the control center (for example, to install an upgrade), it should be scanned for malware before being connected to a control system device.

  • Know your organization’s restrictions and adhere to them.

Corporate Security Hole: Employees Forwarding Email to Personal Accounts

  • Employees forwarding their work email to web-accessible personal accounts is a growing problem. When away from the corporate network, accessing email from these accounts is usually faster and easier than going through the corporate remote email solution.

  • Only software related to control systems should exist on computers on the control system network. Operating system extras, such as games and any other unneeded software, should be removed.

  • Many word processing and spreadsheet programs have the ability to run macros, which makes it possible for malicious code (Trojans and other malware) to run and infect a system and any systems connected to it. Do not run macros unless the file comes from a trusted source. Similarly, malicious websites can install malware on a computer without your knowledge

Additional guidance for applications in the control room

  • If Internet access is needed to run the control system environment, then it should be accessed from a different network from the control system network.

  • If Internet traffic is allowed into the control system network (for example, to download software and firmware upgrades), it should be restricted to a single dedicated system, not to control systems. Any downloads should be scanned for malware before installation on a control system device.

  • Internet traffic should never be allowed out of the control system network.

Removable Media

  • USB flash drives are a wonderful invention. You can transport large files to a customer’s office and access the data without worrying about compatibility. You can take work home, and you can travel with just the flash drive instead of lugging a laptop around. However, flash drives also present many risks.

    • Malware. Organizations can greatly reduce the spread of malware on their network by installing antivirus software on email servers and prohibiting certain websites, but the use of flash drives can bypass these safeguards.

    • Data theft. Any unattended and unlocked computer with a USB port is an easy target for an adversary with a flash drive.

    • Data Loss. The portability of flash drives also increases the potential for lost data falling into the wrong hands. Most of these devices have little or no security features. If you happen to lose your flash drive, anyone who finds the device may be able to access its data.

  • Removable media need to be treated with great care. These devices can be inserted into a control system or other system, and either accidentally or intentionally transmit malware or interfere with the system’s function. To prevent malware, the following precautions should be taken:

    • Media should come from a reputable source such as an employee or trusted vendor.

    • Media should be scanned for malware before being connected to any device in the control system environment.

    • Media contents should be reviewed before connection to a control system device.

  • Removable media include the following:

    • USB flash drives

    • MP3 players

    • digital cameras

    • removable hard drives

    • magnetic tapes

Vistors

  • If you are hosting or otherwise responsible for a visitor, you should ensure the visitor complies with your organization’s policies. For example, it is rarely appropriate for a visitor to be taking pictures of your control center with his or her smartphone.

  • Take care with what information you disclose to visitors, both verbally and through what is visible in your office or the control center. Although it’s natural to want to be helpful and talk about your work to an inquisitive visitor, never reveal critical information.