Text and data mining comprises the development and application of methods which are designed to extract knowledge that is relevant to the social sciences from unstructured texts or data streams.
Our research on Text and Data Mining
- Detection of statistical regularities in data and text and alignment of these regularities with variables of interest such as political leaning or gender
- Combine digital behavioral data and survey data to create new types of user models
- Semantic enrichment and analysis of collaboratively generated documents (e.g. wikipedia articles or scientific publications) and the social dynamics of the creation process (e.g. conflicts, productivity)
- Statistical modelling of sequential human behavior (e.g., the decisions made when navigating on the web or individual movement in urban surroundings)
- Detection, disambiguation and linking of entities which are of interest for the social sciences in academic publications (especially references to research data)
- Extraction of key information from texts and (semi-)automatic indexing
- Dahou, Abdelhalim Hafedh, and Mohamed Amine Cheragui. 2023. "DzNER: A large Algerian named entity recognition dataset." Natural Language Processing Journal 3 (June 2023): 100005. doi: https://doi.org/10.1016/j.nlp.2023.100005.
- Fröhling, Leon, Lukas Birkenmaier, and Jessica Daikeler. 2023. "Garbage in - Garbage out? : Datenqualität im Umgang mit digitalen Verhaltensdaten." Easy social sciences 2023 (68): 21-31. doi: https://doi.org/10.15464/easy.2023.03.
- Dahou, Abdelhalim Hafedh. 2021. "A3C: Arabic Anaphora Annotated Corpus." Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021), 147–155. Association for Computational Linguistics.
- Dahou, Abdelhalim Hafedh, and Mohamed Amine Cheragui. 2022. "Impact of Normalization and Data Augmentation in NER for Algerian Arabic Dialect." Modelling and Implementation of Complex Systems: Proceedings of the 7th International Symposium, MISC 2022, Mostaganem, Algeria, October 30‐31, 2022. 249-262. Springer International Publishing. doi: https://doi.org/10.1007/978-3-031-18516-8_18.
- Ben Aichaoui, Shaimaa, Nawel Hiri, Abdelhalim Hafedh Dahou, and Mohamed Amine Cheragui. 2022. "Automatic Building of a Large Arabic Spelling Error Corpus." SN Computer Science 2 (4): 108. doi: https://doi.org/10.1007/s42979-022-01499-x.
Title | Start | End | Funder |
---|---|---|---|
Kompetenzzentrum Datenqualität in den Sozialwissenschaften
(KODAQS)
|
2023-11-15 | 2026-11-14 | Bund |
NFDI for Data Science and Artificial Intelligence
(NFDI4DS)
|
2021-10-01 | 2026-09-30 | DFG |
NFDI for Business, Economic and Related Data
(BERD@NFDI)
|
2021-10-01 | 2026-09-30 | DFG |
Dehumanization Online: Measurement and Consequences (Professorinnenprogramm)
(DeHum)
|
2021-01-01 | 2026-09-30 | SAW (Leibniz) |
Find out more about our consulting and services:
-
Analyzing Digital Behavioral Data
Methods, tools, frameworks and infrastructures for analyzing digital behavioral data.
-
GESIS Guides to Digital Behavioral Data
Expertise and hands-on advice on the acquisition and analysis of digital behavioral data and the computational methods needed.