U.S. Army Research Laboratory models predict cyber intrusionsStory
October 12, 2017
Researchers from the Army Research Laboratories (ARL) have found that the number of cyber intrusions into a system can be predicted, particularly if analysts are already observing activities on a company or government organization's computer network.
Researchers from the Army Research Laboratories (ARL) have found that the number of cyber intrusions into a system can be predicted, particularly if analysts are already observing activities on a company or government organization’s computer network.
Cyber intrusions are difficult to prevent if an attacker wants access to that data badly enough, so it’s helpful to know how often it is likely to occur before undertaking the work of designing network cybersecurity and resilience postures.
The empirical data for the ARL work came from a cyberdefense services provider, which was defending organizations during intrusions. The researchers were able to tap this information to find the correlation – or lack of one – between the number of successful intrusions and certain features observed for 41 different organizations.
To get at the answer, the researchers scrutinized security incident reports containing detailed information about malicious activities and computer security policy violations by users and operators; DNS traffic, collected with specialized and open source software for all organizations within the study; and other data sources describing a selected subset of features of each organization’s network topography and cyber footprint.
Based on this data, they proposed four generalized linear models (GLMs) to predict the number of successful cyber intrusions into an organization’s network, for which the rate of intrusions is a function of several observable characteristics of the organization.The researchers took this a step further by additionally analyzing regression results for a fit to the intrusion data.
What did they discover? “One of these models – the generalization of the Poisson regression model to the negative binomial GLM – predicts the response variable appreciably better than others,” says Dr. Nandi O. Leslie, part of the ARL’s Network Security Branch. Moreover, intrusion data shows “sufficient regularity in a statistical sense, and the construction of a practically useful predictive model is feasible.”
One of the key research questions the group was exploring – that asking which of the initially conjectured predictor variables should be included in the model – brought some surprises, she adds.
“Several of the predictor variables that were recommended to the researchers by subject matter experts (SMEs) turned out to be lacking in influence or were even misleading,” Leslie explains. For example, they felt “that the extent to which an organization is visible on the Internet, as measured by the number of records found related to that organization on the popular Google Scholar, would be a significant predictor of intrusion frequency.” It turns out, however, that visibility alone isn’t a useful predictor of successful intrusions.
Another variable that the SMEs expected to be influential – the number of hosts within an organization’s network – also turned out to be less significant as a predictor for the GLMs than anticipated.
But, as you might expect, the researchers found that the number of violations of an organization’s internal cybersecurity policies is a strong predictor of the number of intrusions. “This is rather intuitive,” Leslie says. “If users such as employees of the organization lack the discipline or knowledge to comply with organizational cyber hygiene policies and if the organization is unable or unwilling to enforce its own policies, it’s easy to expect that the organization’s cyber defenses are poor and lead to more frequent intrusions.”
Or maybe not quite so intuitive: “The frequency of accesses by the organization’s networks to the domains domestic.net and foreign.net are strong predictors of intrusions,” Leslie says.
What can the researchers’ predictive models be used for? One option is to help managed security service providers, which are often hired by government and defense organizations to provide their cyberdefense services, estimate how many intrusions might be expected during a certain time period. This metric is important because the cost of doing business is influenced by the number of intrusions experienced by clients of managed security service providers.
These types of models can “contribute to our fundamental understanding of cyber situational awareness and ways to monitor, quantify, and manage cyber risk,” according to the researchers.
Models of this nature “may offer clues toward enhancing the security posture and perhaps the design and operation of an organization’s computing systems and networks,” the researchers report. “If the model indicates that certain characteristics are associated with an increased number of intrusions, the organization might be able to find ways to modify those characteristics.”
Figure 1: Image caption: Dr. Nandi O. Leslie is part of an Army Research Laboratory group that explored empirical data from successful cyber intrusions committed against a variety of organizations. Photo credit: Jhi Scott, U.S. Army photographer.