A Data Labeller Spots AI Bias against their community

“AI is like a child. If we feed it wrong information, it will learn wrong things,” says Nehma, the protagonist of the film Humans in the Loop. She says this to her supervisor, pointing out that the caterpillar labelled as a pest in the training video [of a data annotation centre] is incorrect. It actually doesn’t harm the plant, she says, as it only eats away the rotten bits. Amidst many ethical conversations around the use of artificial intelligence, the short film brings to light the invisible workforce of data labellers — in this case, women in Jharkhand — who annotate data that feeds the AI system run by the west.

In an essay titled ‘Human Touch’, for the publication FiftyTwo, Karishma Mehrotra writes that the “data problem” is central to machine learning architectures, which, in the last few years, have revolutionised AI. The more data you have — images, videos, text — and the more precisely it is labelled, the more sophisticated the algorithm is likely to be. In fact, you’re likely to have done some basic data labelling yourself: when Google’s CAPTCHA technology asks you to mark all the boxes in a grid that contain a traffic light, you are adding to the mass of labelled information. And you are, knowingly or unknowingly, feeding AI bias.

India is one of the world’s largest markets for data annotation labour. The Indian AI market will reach 126 billion by 2030, according to the IT industry body NASSCOM. By 2021, there were roughly 70,000 people working in the field, which had a market size of an estimated $250 million. Around 60 percent of the revenues came from the United States, while only 10 percent of the demand came from India.

The NASSCOM report found that over 80 percent of data annotation employees are from rural, semi-rural and underserved backgrounds. Over 90 percent of the industry’s players are based in Tier II and Tier III cities — from Ranchi, Shillong, Vizag, Bhubaneshwar and Yemmiganur employing 50 percent or higher women employees at any given time. According to a paper that investigated the work practices concerning data labelling in India, the rapid pace of growth of the data annotation industry did not translate to benefit the individual annotators as it did other stakeholders. For instance, the average tenure for an annotator contract was between 12-18 months, which offered little stability and opportunities for long-term career development.

Jump to Workshop

Humans in the Loop is a Kurukhi-Hindi film directed by Aranya Sahay, inspired by Karishma’s essay for the publication FiftyTwo, made in collaboration with the Museum of Imagined Futures (MOIF). It focuses on the intersection of technology and society. Developed as part of the Storiculture Impact Fellowship that supports socially relevant storytelling in media, MOIF is a platform for creators, researchers, and activists who use storytelling to question the systems shaping our world.

Synopsis: Nehma, an Adivasi woman, returns to her ancestral village with her children, Dhaanu (12) and Guntu (1), after her divorce. She begins work as a ‘data labeller’, training AI models to recognise objects in images and videos. Finding AI childlike in its learning process, she imagines seeing the world through its eyes — a connection she longs for with Dhaanu. Even as Nehma faces the challenge of giving Dhaanu — who is forever tempted to flee to the city — a reason to stay, she notices AI adopting human biases, some echoing prejudices against her community. Ultimately, post-divorce Nehma realises she’s battling not just for Dhaanu’s custody and future, but also for how technology and the world see people like her.

In conversation with The Third Eye, Aranya Sahay discusses the making of his film, the extensive research and writing it demanded and why he chose to tell the story the way he did.

TTE: How did you come to filmmaking?

Aranya Sahay: I come from the discipline of social sciences and have studied political science. My mother is a sociologist. I enjoy research, especially when it allows me to embed myself elsewhere… So, I think [filmmaking is] coming from that perspective of wanting to get out of my own milieu and just understand India, just understand wherever you live. There are so many layers that you live with.

I went to FTII (Film and Technology Institute of India), made a couple of short films, and later documentaries. This is my first feature film. In FTII, I made a film called Song For Babasaheb.

Song For Babasaheb is a documentary on the Shahiri singing in Maharashtra. Shahiri is the oldest form of protest art in Maharashtra. Maharashtra is particularly interesting because the state’s politics, in many ways, are shaped by two dominant forces: one being centre-right ideology and the other being a strong current of anti-caste politics.

I was in Pune trying to make sense of the area and its politics, and I came upon Dattavadi. People from Vidharbha region settled in that pocket due to a shortage of water. The entire basti is along a canal. So, the desire to understand patterns of settlement, migration — not just academically, but as stuff of life — started building my enquiry. I think that became the basis of any kind of filmmaking that I wanted to do.

Before making the film, I received a short grant with which I travelled to Jharkhand and lived there for a year, researching and writing. There, I met Biju Toppo, an ethnographic filmmaker, a three-time National Award winner, who runs a film production house called ‘Akhra’, working in the field of culture and communication in Jharkhand, India.

Biju introduced me to writers, filmmakers and art conservationists from Jharkhand. This was beyond factual research; it was more anecdotal, philosophical, ancestral and historical. I understood Jharkhand better through their insights and experiences, and applied them to the story. It took me a year to turn all the research into the script of the film.

TTE: Tell us more about data labelling, the job you have used as an entry point into some of the themes in this film. What does it entail?

It’s a great premise for a story, but very often, a great premise is a curse. Your hands are tied because all people talk about is the premise: ‘Wow, humko nahi pata tha ki yeh hota hai.’ I don’t want a film to just do that.

For instance, there’s something called ‘content moderation’, where people have to go through very violent videos of murder and mark it non-safe by community standards. I wanted to take it [the narrative] there, but then I realised that I was going to limit it. It’s easy; violence is an easy way of catching and retaining attention. So, then I was like, okay, wait, there’s something more here. So, I began to understand data labelling itself.

If a human being is going through thousands of photos and videos and based on that repetitive tagging, an algorithm is understanding the basic difference between a chair and a table, isn’t it the same as parenting? When our children are growing up, we teach them how to differentiate between colours, objects and then also impose our morality and ways of navigating the world. I read a similar pattern in the context of AI.

But here the premise became an Adivasi mother trying to raise an AI child, primarily on first-world data and first-world labels.

And labels are extremely crucial because labels are also culturally contextual, right? A pest for one person is not a pest for another person. The word ‘pest’ is a value-driven term. The woman in the film who is labelling the data has seen the creature protecting the plant, so why should she accept that it harms? So, that is the enquiry.

Is AI really a clean slate as people think it is, or is it sort of like a descendant of our biases, weaknesses and knowledge systems? There is a school of thought that says AI will be a long-living civilization that can go interplanetary, even intergalactic, eventually. But it is born out of the data we feed it; we’re only custodians of all the knowledge it’s growing up on.

I used to see AI as something that requires careful handling — and it still does. I shared a belief held by many tech thinkers: that organic life on Earth is fragile. A single catastrophic event like an asteroid impact or nuclear exchange could wipe out humanity and much of the planet’s biodiversity. In that context, if AI were capable of surviving such a catastrophe, it could become the inheritor of Earth.

And if AI is to inherit the Earth, should it represent only a single worldview or way of life?

Voyager 1, which is still jetting across the universe, has a golden disc that doesn’t have more than 3 or 4 ragas. But it has many country songs from the US. We know that ragas too are not representative of everything that is India. Here, even in folk music, you have to really go deeper to understand the kind of music that exists. AK Ramanujan has spoken a lot about this rather beautifully — little traditions, larger traditions and their interplay. Little traditions are as important. And as he has noted, so many ragas come from field songs: women are singing in the fields, that’s how a raga is born. Have we studied this? Do we even know that it’s part of the evolution of music? Absolutely not.

Also, my view has shifted significantly. We’re now in the midst of a global arms race to achieve AGI (Artificial General Intelligence) between the West and China. And in such a race, the focus often shifts from thoughtful development to coming first – number one. That raises a sobering possibility that we may end up creating an AGI that doesn’t save us, but instead becomes the reason we’re wiped out.

But, to say all of these things, you have to tell an emotional story. Otherwise, you’ll get intellectual discourse.

TTE: How did you ensure that the indigenous ecological knowledge that you’re situating in the film is part of the character’s psyche? What, essentially, are you saying about knowledge at the end of the film?

It happened through a lot of conversations with Adivasi women. I was talking to an Oraon tribal artist named Philomina Tirkey Imam, wife of Bulu Imam who’s an Adivasi art conservationist, responsible for finding 70 missing neolithic rock art sites. She said, ‘When you walk on grass, you think you’re entitled to it. But when we walk on grass, we thank it for letting us walk on it.’ So, when we look at it from that perspective, there is deep faith. We’re born and raised with deep-seated beliefs that become a part of our psyche. How the deep-seated beliefs translate into newer interactions is what the film is really about. An Adivasi person — not necessarily all Adivasis — believes that there is a larger philosophical idea; that there is life in everything.

Personally, I believe that because there is a certain sentience to everything around us, the universe punishes you if you don’t respect it. When Nehma, who is an Adivasi girl — whose father taught her that there is life in everything — sees that AI figure, she inherently believes that it’s alive. A non-Adivasi may not even go there.

So, when she realises that it’s alive and it’s a child, then you have to teach it; you have to raise it correctly. Otherwise, we can just dispassionately disconnect from AI, because it is automation, it is only… data. It is cold. It’s so distant from the body and you can’t see it.

It's something that you try to understand and still, you can't.

TTE: Do you think that ultimately, the film has a positive take on the AI discourse?

Fairly positive, yes, but there are two or three reasons for that. The first reason is that you have to segregate the digital labour from the AI aspect of things. The data work, when you look at it from outside, seems exploitative. But when you speak to these women, they have a job where there was no job. There’s dignity in sitting indoors, in front of a computer and doing this work, instead of going back to exploitative and agricultural work.

Secondly, it comes from a personal-political desire to always look at possibilities. It’s a proposal towards hope, a shift away from the constant refrain of kuch nahi ho sakta. Things will never move in the right direction. But do we deny hope? How can we deny hope?

And lastly, this film is about identity and knowledge systems, but one can also talk about AI and governance, AI and warfare, AI and policing. That’s something that I wanted to do, but couldn’t do in this film.

But I’m writing something entirely new, which is about predictive policing and how technology enables it. So, that kind of a thing, where the buck stops with AI to dispassionately decide a person’s fate… that’s a story for another film.

And then I think that the human loop becomes important. We feed them and they feed us.

We, as creative people, have to make meaning through association.

TTE: Any responses from your screenings that you would like to share?

There was this one Q&A where some things stayed with me. They asked, do you think AI and Adivasi lives are actually congruent? Do you really think so? I was like, that’s what I am trying to say. So, she was like, but do you think that the minerals on which AI is being propelled are being taken from those areas? That is something to think about.

TTE: What are some of the discussions you hope this film will generate for educators or people who are engaged in the work of teaching, learning and knowledge?

As reflected in the film, I want the right to speak your truth and tell your own story to be the central theme. However, many other ideas also move alongside this theme.

One of the most important is the coexistence between the natural order and a tech-based world, where we can increasingly listen to indigenous perspectives in order to find a way forward.

Discussion and workshop around the film ‘Humans in the Loop’

An Adivasi single mother / data labeller, adolescent daughter (upset with her mother) and an infant son (still learning how to walk) against the backdrop of a data labelling centre. All of them come together in a fictional story set in Jharkhand, to ask the question, ‘What counts as knowledge?’

The following TTE workshop module helps start a conversation around the making of knowledge in the age of AI — who makes it and who has the final say on it? It also brings in the conversations around identity, structures, lived experiences, ancestral wisdom, ecology, labour, gender and the ethics/ bias of artificial intelligence.

Suggested questions:

Do you remember a story, a poem, anything from your childhood that you still carry with yourself? What is the memory you hold of it and why?
Share an anecdote or an experience when you saw yourself in what you were learning/ being taught. Were you visible? If yes, where do you find yourself in the pyramid of privilege? If not, why do you think so?
Take some time to recollect and share something you know from your own experience, or your community’s experience, to be knowledge, but is not considered so.
As students who grew up on textbooks, what does it feel like to turn to AI for knowledge? What feelings does it evoke?
In the age of misinformation and disinformation, how do you look at the relationship between truth and knowledge? What do you believe to be true? What do you believe to be false?
If you were given a chance to step into the teacher’s shoes to draft a lesson/ make a curriculum about existing in the modern world, what would the chapters be?

Activity:

1. Clip from Humans in the Loop (Your Image is Being Generated via AI)

Everyone is asked to use ChatGPT/AI software to search how it shows them or their community. Feed it prompts (one has to be specific in terms of geographical location, caste/community, gender et cetera).

What do you see?
What feelings does it evoke?
What is the image’s relationship to truth?
Discuss the prompt that you used: what words, instructions, how specific was it? Which identity did you foreground and hide in making your digital portrait?
Based on the response, talk about how you felt? What showed up? Can you identify a pattern or a bias? What does it show and hide? How would you like to be represented?

The idea behind the activity is to think of representation, but also locate ourselves on a platform that we are informing with our own bias, while also being a work in progress.

2. Telling Stories Through Objects

We live in a world where data is labelled, sorted, and stored. What if we break down the data as physical objects? This activity aims to label objects with feelings. How can we pay attention to these ordinary objects around us — not just for what they are, but how we see and feel them?

Look around the room and choose 3 to 5 objects in your immediate space. Try choosing an object either made by hand or one that has a certain living quality to it.
For each object, create a label or an annotation. For instance – shape, size, colour, texture, utility, memory and emotion et cetera.
Imagine the object as a living entity: What does it long for? What does it remember? What would it say if it could speak?
Write a short story for each object, like a biography. Think of it as a way to introduce the object to the world. You can also add fiction to it.

If you try this workshop, do write to us at [email protected] and tell us how it went.

Shivam Rastogi is Video Producer and Image Editor, The Third Eye. He is a photographer and filmmaker, who is constantly trying to make sense of the world from behind the camera. A graduate in Journalism & Mass Communication, IP University and post graduate from Mass Communication Research Centre, Jamia Millia Islamia, Shivam has previously worked with Film Companion and Memesys Culture Lab.
Samiya is a writer and researcher. A love for stories led her to specialise in English Literature at Lady Shri Ram College for Women and Creative Writing at Ambedkar University. Cities, culture and storytelling are her prime interests. Presently, she manages social media and edits pieces at The Third Eye. Some of her favourite things are to be completely zapped by a good line of poetry, rediscovering long lost stationery in her cupboard, and being a staunch advocate for hot chocolate.