blog

Can Machine Learning Save Us From Cybercriminals?

Mike Cobb

**Editor’s Note: Learn more about helping your customers protect themselves against cybercriminals in our new report, “Solving Your Cybersecurity Talent Shortage.”**

By Michael Cobb

Almost nine out of 10 respondents in a recent Center for Strategic and International Studies survey said that cybersecurity technology could help compensate for skill shortages, with just over half believing that, in five years, cybersecurity solutions will be able to meet the majority of their needs.

That’s a lot of optimism, and potential sales.

To cash in, most vendors are pinning their hopes on big data analytics, machine learning and artificial intelligence to give their products the edge over cybercriminals. These features are being added to many top-end security solutions, and are being heavily marketed by vendors as an effective strategy to reduce the time it takes to detect and respond to cyberattacks, thanks to automated detection and remediation.

The problem: Experience tells us that attackers adapt their tactics whenever new security controls are introduced. Can these technologies really change the threat landscape, alleviate the cybersecurity skills gap and win the war against cybercriminals?

Traditional security solutions rely on signatures, rules, filters and blacklists to stop malware and attackers from taking over a network. This approach is effective at detecting known malicious code and activities but is increasingly ineffective against modern attacks. Advanced analytics and neural networks have been used by banks to detect fraud for more than 20 years, and this form of analysis has recently been harnessed to protect enterprise computer networks and data.

Even so, more advanced detection techniques are needed, and data security requires more than algorithms that can check byte and packet counts to spot if someone is suddenly working outside of office hours.

Machine learning — giving computers the ability to learn without being explicitly programmed — is viewed by many as the most efficient and effective way to detect attacks and risky behavior, and overcome the limitations of older security information and event-management products. Automated and iterative algorithms allow a program to probe data for obscure structures and use predictive analytics to recognize potential threats that would go unnoticed using human analysis alone.

The amount of data and events generated by security systems today are beyond the capacity of human experts to parse, but machine learning systems can actually benefit from very large volumes of data — with an important caveat.

Security solutions that incorporate machine learning still have various places where improvement is needed and, surprisingly, one is the same problem faced by human analysts: the sheer amount of data, particularly unstructured and hybrid data sets. Even small networks generate millions of logged events every day that need to be stored and analyzed. Many attacks are carried out over several months through discrete steps, often concealed in the guise of legitimate requests and commands. This means analysis has to reach back over huge amounts of historical data to find and correlate attack-related events. Analyzing this amount of data for prolonged periods of time can introduce performance issues unless only a small set of attributes is examined. Attackers understand this and can adopt their tactics to slip through the analysis and findings.

Another problem is that to determine if there is a suspicious deviation in network usage, the system needs a clean baseline, and the current hypothesis is that most networks are already compromised. Even baselining a “clean” network is no easy matter. Network traffic is constantly changing and evolving, making it difficult to gauge whether activity is normal or malicious.

The main argument so far against security solutions powered by unsupervised machine learning, though, is that they spit out too many false positives, resulting in alert fatigue and missed critical events.

These difficulties make relying entirely on new security technologies like big data analytics, machine learning and artificial intelligence to spot and prioritize complex attacks impractical. But human analysis-based solutions clearly can’t keep up with the huge volume of data that needs to be analyzed. What’s the answer?

Hybrid Time

Reducing the high rates of undetected attacks and delayed responses demands a combination of human effort supported by machine learning to automate the process of recognizing patterns hidden in increasingly large and complex datasets.

MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) is developing a system called AI2, a cybersecurity platform that combines machine learning and the experience of security experts to continually improve its ability to find real breaches while reducing false positives. AI2 works by analyzing security logs and flagging anything it “thinks” is suspicious. This filtered data is passed on for human analysis, with legitimate threats generating feedback to AI2.

This approach is proving to greatly improve detection rates and reduce false positives compared with unsupervised anomaly detectors.

For full visibility into emerging threats, we can’t rely just on data gathered from endpoints and network traffic, either. A lot of clues and pointers exist in unstructured data like social media posts, news stories and research reports. To capture that, IBM is looking to use the natural language processing capabilities of its artificial intelligence platform Watson to hunt through and learn from unstructured data to identify new threats.

Hopefully these types of projects will lead to improved attack-detection techniques and security solutions. However, no security technology can stop all cyberattacks. Malicious hackers will continue to use social engineering to circumvent even the most advanced analytic security systems, just as they have circumvented fraud systems in the banking world. However, advances in machine learning and its application to information security should decrease the time to detection, which at the moment is woefully slow.

We need to do something, and soon. The growth of the Internet of Things is creating more data, more attack vectors and more attacks. Security teams already can’t cope, so machine learning technology will play an important role in not only reducing workloads but providing better-quality information from which to prioritize activities.

Machine learning systems certainly won’t replace humans, but they will make people far more efficient, and make life harder for the enemy.

Mike Cobb, CISSP-ISSAP, is 20-year veteran of IT security with a passion for making industry best practices easier to understand and implement. As an adviser on security controls and information handling practices to companies and government agencies large and small, Cobb has helped numerous organizations achieve ISO 27001 certification and successfully migrate data and services to the cloud.


Leave a comment

Your email address will not be published. Required fields are marked *

The ID is: 53202