At a time when more and more processes are going digital owing to the belief that machines exist without the biases present in human reasoning, a new research suggests that this may not hold true in every scenario.
In a paper — Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification — presented at the Conference of Fairness, Accountability and Transparency, the researchers examine three commercial facial recognition programs developed by Microsoft, IBM and Face++.
The programs are general-purpose facial analysis systems, which can be used for a range of face-perception tasks such as matching faces in different photos, assessing characteristics like gender, age and mood, and face classification.
Commonly, these software are built into smartphones. In the United States, this technology is used by law-enforcement and healthcare agencies in determining “who is hired, fired, granted a loan, or for how long an individual spends in prison”.
In other words, algorithms are now carrying out tasks that have traditionally been performed by humans.
The researchers characterised two existing facial analysis benchmarks, IJB-A and Adience, using a dermatologist-approved six-point Fitzpatrick Skin Type classification system. This characterisation revealed that existing datasets were overwhelmingly composed of lighter skinned subjects, and used the binary labels of male and female to define gender, not capturing the complexities of addressing transgender identities.
While the highest error rate in determining the gender of light-skinned men was 0.8%, those for darker-skinned women ballooned to more than 20% for one software, and more than 34% for the other two. All three programs had the lowest accuracy on darker females.
Notably, even the Fitzpatrick classification system — used for determining the risk of skin cancer — was also found to be skewed towards lighter skin, with three categories that can be applied to people perceived as white.
“What’s really important here is the method, and how it applies to the other applications,” Joy Buolamwini of MIT, and the lead author of the paper, said in a statement released by MIT.
Because the existing datasets over-represented lighter males and under-represented darker individuals, the researchers introduced sub-groups of darker females, darker males, lighter females and lighter males. Now, instead of evaluating the accuracy of the software by gender or skin type, they also examined its accuracy for the four sub-groups.
These introduced labels were used to classify the Pilot Parliamentarian Benchmark, comprising 1,270 photos of individuals from three African countries (Rwanda, Senegal and South Africa), and three European countries (Iceland, Finland and Sweden).
While all three software classified male subjects more accurately than female subjects (8.1-20.6% difference in error rate) and lighter individuals more accurately than darker individuals (11.8-19.2% difference in error rate), all classifiers performed worst on darker female subjects (20.8- 34.7% error rate).
However, skin type alone may not be fully responsible for misclassification. According to the paper, “darker skin may be highly correlated with facial geometries or gender display norms that were less represented in the training data of the evaluated classifiers.”
Moreover, earlier studies have shown that face recognition systems developed in western nations and those developed in Asian countries tend to perform better on their respective populations.
“The same data-centric techniques that can be used to try to determine somebody’s gender are also used to identify a person when you’re looking for a criminal suspect or to unlock your phone,” Buolamwini told MIT news.
The study raises questions about the accuracy of outcomes from these algorithms if they continue to be programmed on data that is racially biased and skewed towards the white male.
Acknowledging the existence of such biases, Anupam Guha, a computational linguistics and AI researcher told The Indian Express, “It’s not that AI or technology itself is biased. But the data that this AI is trained in, is inherently racist and misogynistic. The datasets are made according to certain demographics. Thus making them intrinsically biased.”