Given the power of AI systems and the increasing role they play in making important decisions about our lives, homes and societies, they are surprisingly under-studied.
Thanks to the thriving field of AI audits, that’s starting to change. When they are working well, these audits allow us to reliably verify how well a system is working and to figure out how to mitigate any potential bias or damage.
Notably, a 2018 review of commercial facial recognition systems by AI researchers Joy Buolamwini and Timnit Gebru found that the system did not recognize dark-skinned people as well as white people. For dark-skinned women, the error rate was up to 34%. As AI researcher Abeba Birhane points out in a new essay in Nature, the audit has “spurred a body of critical work that has uncovered the bias, discrimination and oppression of facial analysis algorithms.” The hope is that by doing these types of audits on different AI systems, we will be better able to root out problems and have a broader discussion about how AI systems are affecting our lives.
Regulators are catching up, which is partly driving the demand for audits. A new law in New York City, effective January 2024, will require all AI-powered hiring tools to be bias tested. In the European Union, major tech companies will be required to conduct annual audits of their AI systems from 2024, and upcoming AI legislation will mandate audits of “high-risk” AI systems.
It’s a big ambition, but there are some massive obstacles. There is no common understanding of what an AI audit should look like and there are not enough people with the right skills to conduct it. The few audits that happen today are mostly ad hoc and of very variable quality, Alex Engler, who studies AI governance at the Brookings Institution, told me. One example he cited is AI recruitment company HireVue, which has indicated in a press release that an external audit found its algorithms to be unbiased. It turned out to be nonsense – the audit hadn’t actually examined the company’s models and was subject to a non-disclosure agreement, meaning there was no way to verify the results. It was basically nothing more than a PR stunt.
One way the AI community is trying to address the shortage of validators It does this through bias bounty competitions, which work in a similar way to cybersecurity bug bounties — that is, they challenge people to develop tools to identify and mitigate algorithmic biases in AI models. One such competition was launched just last week and organized by a group of volunteers including Rumman Chowdhury, Twitter’s AI ethical leader. The team behind it hopes it will be the first of many.
It’s a good idea to incentivize people to learn the skills needed to conduct audits – and also to start developing standards for what audits should look like by showing what methods work best. You can read more about it here.
The increase in these audits suggests that we may one day see cigarette pack-style warnings that AI systems could harm your health and safety. Other sectors such as chemicals and food carry out regular audits to ensure products are safe to use. Could something like this become the norm in AI?