Best practices to strengthen machine learning security

Machine learning security is business critical

ML security has the same goal as all cybersecurity measures: to reduce the risk of sensitive data being exposed. If an attacker disrupts your ML model or the data it uses, that model can produce false results, which at best undermine the benefits of ML and, at worst, negatively impact your business or customers.

“Leaders should care because there’s nothing worse than doing the wrong thing very quickly and confidently,” said Zach Hanif, vice president of machine learning platforms at Capital One. And although Hanif works in a regulated industry — financial services — that requires additional layers of governance and security, he says any company adopting ML should take the opportunity to review its security practices.

Devon Rollins, vice president of cyber engineering and machine learning at Capital One, adds: “Securing mission-critical applications requires a differentiated level of protection. Many large-scale deployments of ML tools are expected to be critical, given the role they play for the business and how they directly impact user outcomes.”

Novel security considerations to be aware of

While best practices for securing ML systems are similar to those for software or hardware systems, greater ML adoption also brings new considerations. “Machine learning adds another layer of complexity,” explains Hanif. “That means companies need to consider the many points in a machine learning workflow that can represent entirely new vectors.” These core workflow elements include the ML models, the documentation, and systems around and from those models data used and the use cases they enable.

It is also imperative that ML models and supporting systems are designed with security in mind from the start. It’s not uncommon for engineers to rely on freely available, open-source libraries developed by the software community, rather than coding every single aspect of their program. These libraries are often designed by software developers, mathematicians, or academics who may not be very familiar with writing secure code. “The people and skills required to develop high-performance or cutting-edge ML software may not always overlap with safety-centric software development,” adds Hanif.

According to Rollins, this underscores the importance of cleaning up open-source code libraries used for ML models. Developers should consider considering confidentiality, integrity and availability as the framework for information security policy. Confidentiality means that data assets are protected from unauthorized access; Integrity refers to the quality and security of data; and availability ensure the right authorized users can easily access the data needed for the job at hand.

In addition, ML input data can be manipulated to compromise a model. One risk is manipulating inference—essentially changing data to trick the model. Because ML models interpret data differently than the human brain, data could be manipulated in ways that are imperceptible to humans but still alter the results. For example, all it takes to break a computer vision model may be to change a pixel or two in an image of a stop sign used in that model. The human eye would still see a stop sign, but the ML model may not categorize it as a stop sign. Alternatively, one could explore a model by sending a series of different input data and learn how the model works. By observing how input affects the system, Hanif explains, external actors could figure out how to disguise a malicious file so it evades detection.

Another risk vector is the data used to train the system. A third party could “poison” the training data so that the machine learns something wrong. As a result, the trained model will make mistakes — for example, automatically identifying all stop signs as give-way signs.


Leave a Reply

Your email address will not be published. Required fields are marked *