Tasks
Desegma is structured into two main sub-tasks, the first involves the detection of machine-generated texts at the document level, the second the segmentation of a document into the human-written and the machine-generated part.
SubTask A: MGT Detection in the Wild
In the first sub-task, we explicitly simulate these challenging conditions. Test documents are:
- sampled from different semantic domains compared to those in the training set
- generated by undisclosed large language models (LLMs)
- produced by both vanilla pre-trained models and LLMs that have been fine-tuned to better mimic the linguistic distribution of human-written texts.
The task is structured as a binary-classification problem and defined as following: Given a piece of text, assign it the label 0, if the text is written by a human, and 1 otherwise.
Different mixtures of samples in the test data will represent different level of complexity. Simple domain shift, in terms of semantic, will represent the easier setting, while texts generated by the DPO fine-tuned LLMs will represent higher complexity samples.
The performance of proposed solutions will be evaluated via binary pairwise accuracy and F1-score.
Label: 0
Machine Text: Viktor Orban , dopo anni di duri scontri diplomatici, sono pronti a unire le loro forze per riscrivere l'agendia di Bruxelles. Il leader di Fratelli d'Italia Giorgia Meloni e il presidente del governo ungherese si sono visti a Vienna, in un vertice di "centrodestra, identità italiana e sovranità italiana". Dopo un incontro di ben tre ore i due hanno spiegato come si possano conciliare politicamente le due visioni d'Europa
Label: 1
SubTask B: Human - Machine Text Segmentation
In the second sub-task, participants are required to detect the boundary between the human-written text and the machine-generated continuation by identifying the index of the character that marks the beginning of the MGT content. Each data sample will consist of a variable-length human-written prompt, always followed by a variable-length continuation produced by the model. Unlike traditional MGT detection tasks that require document-level binary classification, this sub-task focuses on localization: participants must pinpoint the beginning of the text generated by the LLM.
The task is defined as follows: Given a piece of text, return the index of the first character that is generated by an LLM.
To ensure statistically robust evaluation, the length of the human-written substring will vary considerably. This setup simulates real-world scenarios in which MGT may be inserted into otherwise human-written content. The same techniques described for the previous sub-task will be used to generate continuations of varying complexity.
The performance of proposed solutions will be evaluated via Mean Absolute Error (MAE).
Machine Continuation: al 10 agosto la data in cui il tribunale arbitrale di Amburgo esaminerà le informazioni che l'Italia intende raccogliere in India per scagionare i Marò . Nei giorni scorsi su vari quotidiani erano uscite indiscrezioni circa la data di un eventuale incontro tra i due fucilieri di Marina e i loro avvocati e i funzionari del ministero dell'Interno indiano, che dovrebbero rilasciare a loro una sorta di "licenza" temporanea così che i due marinai possano recarsi in India.
Target Character Index: 103