New publication: A geolocated dataset of German news articles
July 2, 2025
Online
Dr. Lukas Kriesch and Dr. Sebastian Losacker have successfully published their new article "A geolocated dataset of German news articles" with Nature Portfolio in Scientific Data.
In the article, they present a geolocated dataset of 50 million German news articles (2016–2024) extracted from the Common Crawl Foundation 's news dataset. Using natural language processing (NLP) methods such as the named-entity recognition model and SBERT, the articles were geocoded and transformed into semantic text embeddings that enable semantic searches within the dataset. By linking news content to geographic locations, the dataset enables large-scale regional analyses of public discourse.
With this dataset, the authors make a valuable contribution to computer-assisted social research and create new possibilities for analyzing social trends. The methodology is also transferable to news data from other countries, thus opening up diverse perspectives for international comparative research. The dataset is available for download.




