Meneame Media Bias Dataset: Interaction Features and Bias Labels
Zenodo dataset, 2026. doi:10.5281/zenodo.19182446
Abstract
A processed dataset of news articles submitted to Meneame (Spanish social news aggregator) with automatic media bias labels and rich interaction features derived from user comments. Contents articles_with_features.parquet : 14,995 articles with 38 columns including bias labels (from DistilBERT trained on MBBMD), interaction features (karma statistics, comment engagement metrics), and metadata (outlet, tags, timestamp). articles_labeled.parquet : Articles with bias probability scores. karma_features.parquet : Advanced karma distribution features per article (entropy, Gini, bimodality, skewness) for 183K+ articles. comments_with_sentiment.parquet : 20K comment sample with sentiment (POS/NEG/NEU) and emotion (joy, anger, sadness, fear) scores from pysentimiento/robertuito. user_profiles.parquet : User-level bias exposure metrics. user_outlet_interactions.parquet : Bipartite graph data (user-outlet comment counts). Pipeline Data was collected from meneame.net (2005-2021), processed through a 5-step pipeline: ingestion, filtering, automatic bias labeling (franfj/fdtd_media_bias_E), interaction feature extraction, and statistical analysis. See the GitHub repository for full reproducibility. Key Statistics 14,995 articles from 2,868 media outlets 13.2M comments from 96K unique users 61.5% articles labeled as biased (automatic labeling) Timespan: 2005-2021
Keywords
Related work
- A benchmark of expert-level academic questions to assess AI capabilitiesNature, 2026
- Modeling the impact of research data unavailability on scienceJournal of Informetrics, 2026
- Agentic-JEPA: A Self-Supervised World Model for Planning in Text-Based Agent EnvironmentsPreprint, 2026
- Media Bias Bias-Mitigated Dataset (MBBMD): A Hierarchical, Perspectivist, and Counterfactually-Augmented Corpus for Bias Detection in Spanish NewsProcesamiento del Lenguaje Natural (SEPLN) 2026, 2026