Meneame Media Bias Dataset: Interaction Features and Bias Labels

Name: Meneame Media Bias Dataset: Interaction Features and Bias Labels
Published: 2026

F. Rodrigo-Ginés; J. Carrillo-de-Albornoz; L. Plaza

doi:10.5281/zenodo.19182446

Meneame Media Bias Dataset: Interaction Features and Bias Labels

F. Rodrigo-Ginés, J. Carrillo-de-Albornoz, L. Plaza

Zenodo dataset, 2026. doi:10.5281/zenodo.19182446

Abstract

A processed dataset of news articles submitted to Meneame (Spanish social news aggregator) with automatic media bias labels and rich interaction features derived from user comments. Contents articles_with_features.parquet : 14,995 articles with 38 columns including bias labels (from DistilBERT trained on MBBMD), interaction features (karma statistics, comment engagement metrics), and metadata (outlet, tags, timestamp). articles_labeled.parquet : Articles with bias probability scores. karma_features.parquet : Advanced karma distribution features per article (entropy, Gini, bimodality, skewness) for 183K+ articles. comments_with_sentiment.parquet : 20K comment sample with sentiment (POS/NEG/NEU) and emotion (joy, anger, sadness, fear) scores from pysentimiento/robertuito. user_profiles.parquet : User-level bias exposure metrics. user_outlet_interactions.parquet : Bipartite graph data (user-outlet comment counts). Pipeline Data was collected from meneame.net (2005-2021), processed through a 5-step pipeline: ingestion, filtering, automatic bias labeling (franfj/fdtd_media_bias_E), interaction feature extraction, and statistical analysis. See the GitHub repository for full reproducibility. Key Statistics 14,995 articles from 2,868 media outlets 13.2M comments from 96K unique users 61.5% articles labeled as biased (automatic labeling) Timespan: 2005-2021

Keywords

media biasmeneamesocial newsNLPuser interactionssentiment analysisSpanishdisinformation

DOI Publisher version

Related work

Media Bias Within Information Disorder: Bridging Two Research Communities Through a Systematic ReviewInDor Workshop @ LREC 2026, 2026
From Co-Pilots to Co-Workers: A Formal Typology of Human–Agent Collaboration in OrganizationsIEEE Conference on Artificial Intelligence (CAI) 2026, 2026
The Epistemic Limits of NLP Models in Media Bias Detection: Toward a Framework for Context-Aware and Reflexive AI SystemsIEEE Conference on Artificial Intelligence (CAI) 2026, 2026
A benchmark of expert-level academic questions to assess AI capabilitiesNature, 2026