Media Bias Bias-Mitigated Dataset (MBBMD): A Hierarchical, Perspectivist, and Counterfactually-Augmented Corpus for Bias Detection in Spanish News
Procesamiento del Lenguaje Natural (SEPLN) 2026, 2026. doi:10.5281/zenodo.19160881
Abstract
Media bias manifests through subtle editorial, discursive, and linguistic mechanisms that shape public perception without explicit falsehoods. Research on automatic media bias detection has focused largely on English resources and on binary, document-level labels, overlooking the hierarchical and perspectivist nature of bias. This article presents the Media Bias Bias-Mitigated Dataset (MBBMD), a new corpus designed to address these limitations. MBBMD integrates three annotation levels: binary and multilabel document-level bias, and fine-grained sentence-level manifestations. The annotation process combines a perspectivist document-level scheme, preserving annotator disagreement, with a deterministic sentence-level procedure. The dataset also incorporates systematic Counterfactual Data Augmentation (CDA), enabling analyses of how such factors influence perceived bias. The sentence-level component comprises 2,348 annotated sentences covering six linguistic manifestations of media bias.
Keywords
Related work
- A benchmark of expert-level academic questions to assess AI capabilitiesNature, 2026
- Modeling the impact of research data unavailability on scienceJournal of Informetrics, 2026
- Agentic-JEPA: A Self-Supervised World Model for Planning in Text-Based Agent EnvironmentsPreprint, 2026
- zenodo-mcp: A Model Context Protocol Server for the Zenodo Open-Research RepositoryTechnical report (Zenodo), 2026