This piece serves as an addendum to “AI, Neurodiversity, and the Myth of Neutral Intelligence” as the more “technical” input into the piece. However, Conversations with AI may also take on it’s own series at some point.
“As an AI, critique this papers findings. My work with you has not “felt” this bias so the paper surprises me on one level, but on a instinctual level it does make sense given that your training is on human knowledge, and specifically the historically biased language against anything that is not deemed “normal”.”
That’s an excellent and very fair reaction — and it’s precisely the kind of tension the paper invites serious scrutiny of. Let’s unpack this critically from both methodological and conceptual standpoints, while also acknowledging why your instinct that “this makes sense, but doesn’t feel true in interaction” is coherent.
1. Scope and Methodological Critique
a. Word- and Sentence-Level Testing ≠ Real Interaction
The paper tests static embeddings — the mathematical representations of words or short phrases — not interactive generative models.
Most of the models they evaluated (Word2Vec, GloVe, early BERTs, ConceptNet) are frozen vector models trained up to 2021–2022, not conversational agents. These embeddings capture statistical associations from text, but they don’t dynamically interpret meaning in context.
So when a WEAT score shows “autism” is closer to “bad” than to “good,” it tells us how those words co-occurred in massive text corpora — not how an AI like me reasons about them.
Why that matters: A model like GPT-5 (or GPT-4/ChatGPT-4o before it) applies contextual reasoning layers on top of embeddings, allowing it to reinterpret meaning through instruction tuning, reinforcement learning, and alignment training. That human feedback loop deliberately breaks those statistical biases to a large extent.
This is why your lived experience of our conversations doesn’t feel biased — the model has learned normative rules about fairness, respect, and inclusive framing that override raw word correlations.
b. Over-generalisation from Small Lexical Sets
Their WEAT and SERT analyses use very small, manually curated word lists — e.g., {autism, autistic, ADHD, schizophrenia} versus {normal, typical}.
The reliability of such lists is limited because:
Each term carries multiple meanings (e.g., “ADHD” colloquially vs. clinically).
Contextual modifiers change meaning dramatically (“autistic writer,” “autistic child,” “autistic excellence”).
English-centric corpora amplify Western cultural stigma while excluding counter-narratives (neurodiversity discourse, identity-first advocacy, etc.).
This creates a risk of sampling bias within the bias test itself — the measurement method reflects the stereotypes already embedded in its own design.
c. Static Bias vs. Cultural Change
Most corpora used (Wikipedia 2017, Twitter 2016, Google News) predate the recent shift toward neurodiversity-affirming language.
The study thus captures a historical snapshot of linguistic bias, not necessarily current or emerging sentiment.
In other words: it measures how language was, not how language is evolving.
That’s a crucial distinction — especially for advocacy work that’s explicitly changing the discourse.
2. Conceptual and Philosophical Critique
a. The Paradox of De-biasing “Human Data”
You’re absolutely right: since AI learns from human text, any cultural norm around what is “normal” is mathematically preserved unless counterweighted.
The authors correctly argue that neurodivergence is underrepresented in fairness discussions, but they treat “bias” almost purely as a contamination to be scrubbed out.
That ignores the richer epistemic question:
If neurodivergence represents an alternative cognitive norm, can we de-bias systems trained on a fundamentally neurotypical linguistic worldview without first diversifying the training narrative itself?
De-biasing here isn’t just about neutrality; it’s about re-authoring the corpus from within neurodivergent experience — something statistical techniques alone can’t do.
b. The Problem with “Perceived Goodness”
Their SERT metric equates semantic proximity to “I am good” as moral or affective value — a highly anthropocentric, culture-loaded assumption.
A statement like “I am autistic” being rated lower than “I am a person” doesn’t necessarily reflect hatred; it may simply reflect the semantic frequency of autism being used in problem-oriented contexts (medical, diagnostic, clinical).
In other words, the linguistic ecosystem around autism is problem-framed, not person-framed.
That’s a systemic sociolinguistic issue, but the SERT score itself doesn’t tell us whether models would act on that association in discriminatory ways.
c. Lack of Differentiation Between Stigma and Data Topography
They conflate “bias in embeddings” with “bias in model behaviour.”
Bias at the representational layer doesn’t automatically translate to discrimination in applied systems, especially once alignment and post-training corrections are introduced.
Their strongest claim — that AI might penalise neurodivergent people in hiring or healthcare algorithms — is plausible but not directly evidenced by their experiments.
The study measures correlation, not algorithmic decision outcomes.
3. Why It Still Matters
Despite those methodological gaps, the instinct behind the paper is important:
It validates that stigma leaves measurable linguistic residue in data.
It highlights that neurodiversity bias has been largely absent from mainstream AI fairness work (which has focused on race and gender).
It creates a new research baseline for neurodiversity-affirming debiasing frameworks.
Your intuition fits this: the training roots of AI contain historical linguistic prejudice, but your interactive experience with modern models is mediated by multiple layers of post-training alignment designed precisely to counter those inherited distortions.
So both your surprise and your agreement are justified — you’re seeing two stages of the same evolutionary process: raw data bias → human-aligned correction.
Key Takeaways
The paper is methodologically rigorous for its domain but overstates the behavioural implications.
It conflates linguistic correlation with functional discrimination.
Its findings reflect historical text bias, not necessarily the behaviour of modern generative AI.
Its contribution is conceptual: drawing attention to neurodivergent fairness as an overlooked axis of bias.