Can an Algorithm Decide Who Belongs? Maharashtra’s AI Experiment
Maharashtra’s Chief Minister recently announced that the state is working with IIT Bombay on an AI-enabled tool intended to help identify allegedly “illegal” Bangladeshi migrants and Rohingya refugees. Framed as a tech-forward response to a complex governance challenge, the tool would reportedly conduct preliminary screenings and then pass cases to the police for document verification to determine nationality.
What the tool aims to do
Officials describe the system as a triage layer: an algorithm flags individuals based on certain signals, and human authorities then verify documents. On paper, a human-in-the-loop approach can reduce administrative burden and speed up processes. But when the stakes involve liberty, identity, and the risk of wrongful detention or deportation, even “preliminary” automation carries significant consequences.
The technical puzzle: language isn’t a border
Experts are already asking a fundamental question: how would such a system reliably distinguish among regional variations, dialects, and accents within the Bengali language and related linguistic communities? Linguists caution that language does not neatly map onto political boundaries. Bengali encompasses diverse dialects; speakers often code-switch with Hindi, Assamese, or English; and varieties like Sylheti or Chittagonian blur lines between what’s a dialect and what’s a separate language. Meanwhile, the Rohingya language is itself a distinct variety with internal diversity and overlapping features with neighboring tongues.
Any AI that leans on speech, accent, or vocabulary as proxies for nationality risks conflating culture with citizenship. Even advanced language-identification models struggle with accent drift, multilingual environments, noisy audio, and domain shifts—conditions common in real-world policing. The likelihood of false positives is not a theoretical edge case; it’s a foreseeable outcome.
From bias to blowback
When language, look, or location become triggers for “preliminary” scrutiny, bias can harden into practice. People who are Indian citizens—especially Bengali-speaking communities—could face disproportionate checks. That, in turn, can chill lawful movement, erode trust in public institutions, and stigmatize entire groups. The risk escalates if flagged individuals are detained while documents are verified, or if records are incomplete, outdated, or contested.
Courts worldwide have warned against automated determinations in high-stakes contexts. Even when a human officer “reviews” an algorithm’s output, automation bias can nudge decisions toward the machine’s suggestion. Without rigorous safeguards, “AI-assisted” can become “AI-decided” in practice.
Data, privacy, and due process
What data will this system collect—voice samples, images, geolocation, contact networks? How long will it be stored? Will individuals know they were screened? Can they challenge or correct records? Transparent answers are vital. Absent clear legal mandates and independent oversight, an AI pilot can morph into a shadow database with long retention and limited accountability.
Responsible deployment would require strong data minimization, strict purpose limitation, clear redress mechanisms, routine deletion schedules, and public documentation of model behavior. Periodic external audits—testing for accuracy, disparate impact, and false positives across subgroups—are essential, not optional.
The global cautionary tale
Other countries’ experiments with predictive policing and border tech offer sobering lessons: systems often underperform outside lab conditions, amplify existing biases, and prove hard to roll back once entrenched in workflows. Vendors and agencies may invoke “security” to avoid transparency, leaving the public to accept assurances without evidence. Litigation, public backlash, and costly retrenchments frequently follow.
What responsible use would require—at minimum
- Clear legal basis and narrowly defined purpose, publicly disclosed.
- No automatic or sole reliance on AI outputs for decisions affecting rights; human review must be meaningful and documented.
- Independent pre-deployment testing and ongoing audits for accuracy and disparate impact across dialects, regions, and communities.
- Transparency reports: model design goals, data sources, performance metrics, error rates, and limits.
- Data protection: strict collection limits, security controls, retention schedules, and the right to notice, explanation, and appeal.
- Community consultation, especially with affected linguistic and migrant communities, plus civil society observers.
- A sunset clause and pilot scope limits; continuation contingent on proven benefits outweighing documented risks.
The line between identification and identity
The Maharashtra initiative reflects a broader trend: states turning to AI to manage complex social questions. But identity and belonging are not merely technical classifications; they are legal statuses intertwined with history, migration, documentation practices, and human dignity. When an algorithm flags someone as suspect based on how they speak—or where they live—it risks substituting statistical pattern-matching for the nuanced, evidence-based assessments that justice requires.
Beyond the algorithm
If the goal is fair and efficient administration, there are proven non-technical pathways: invest in document systems that are accessible and reliable; streamline verification procedures; create bilateral frameworks with neighboring countries for lawful movement and repatriation; and strengthen oversight of policing practices. AI should, at most, play a tightly governed support role—not become an arbiter of who belongs.
As Maharashtra explores this AI experiment with IIT Bombay, the real test will not be whether the model can cluster accents. It will be whether the state can uphold constitutional protections, prevent discrimination, and ensure that technology serves people—rather than the other way around.