Media Bias Bias-Mitigated Dataset (MBBMD): A Hierarchical, Perspectivist, and Counterfactually-Augmented Corpus for Bias Detection in Spanish News

F.J. Rodrigo-Ginés; J. Carrillo-de-Albornoz; L. Plaza

doi:10.5281/zenodo.19160881

Media Bias Bias-Mitigated Dataset (MBBMD): A Hierarchical, Perspectivist, and Counterfactually-Augmented Corpus for Bias Detection in Spanish News

F.J. Rodrigo-Ginés, J. Carrillo-de-Albornoz, L. Plaza

Procesamiento del Lenguaje Natural (SEPLN) 2026, 2026. doi:10.5281/zenodo.19160881

Abstract

Media bias manifests through subtle editorial, discursive, and linguistic mechanisms that shape public perception without explicit falsehoods. Research on automatic media bias detection has focused largely on English resources and on binary, document-level labels, overlooking the hierarchical and perspectivist nature of bias. This article presents the Media Bias Bias-Mitigated Dataset (MBBMD), a new corpus designed to address these limitations. MBBMD integrates three annotation levels: binary and multilabel document-level bias, and fine-grained sentence-level manifestations. The annotation process combines a perspectivist document-level scheme, preserving annotator disagreement, with a deterministic sentence-level procedure. The dataset also incorporates systematic Counterfactual Data Augmentation (CDA), enabling analyses of how such factors influence perceived bias. The sentence-level component comprises 2,348 annotated sentences covering six linguistic manifestations of media bias.

Keywords

Media biasSpanish NLPPerspectivist annotationCounterfactual Data AugmentationHierarchical annotation

PDF DOI Publisher version

Related work

Media Bias Within Information Disorder: Bridging Two Research Communities Through a Systematic ReviewInDor Workshop @ LREC 2026, 2026
From Co-Pilots to Co-Workers: A Formal Typology of Human–Agent Collaboration in OrganizationsIEEE Conference on Artificial Intelligence (CAI) 2026, 2026
The Epistemic Limits of NLP Models in Media Bias Detection: Toward a Framework for Context-Aware and Reflexive AI SystemsIEEE Conference on Artificial Intelligence (CAI) 2026, 2026
A benchmark of expert-level academic questions to assess AI capabilitiesNature, 2026