The DADC Shared Task

The DADC Shared Task this year will focus on the Extractive Question Answering (QA) task. We hope to expand to other NLP-related tasks in future iterations of the competition.

How to Participate

Track 1: Better Annotators

Participants will submit 100 "official" question answering (QA) examples through the Dynabench platform. The collected dataset will form parts of the evaluation set for Tracks 2 and 3. The objective is to find as many model-fooling examples as possible -- the winning team will be the one with the highest validated model error rate (vMER).

Participation Instructions

Create an "official" Dynabench account for your team (and share your account username with us when filling out the Team Registration Form).
Test out Dynabench QA Interface. Important: DO NOT use your "official" shared task account for this. We recommend that you either create personal test accounts or switch to Sandbox mode in the interface.. You are free to interact with the model as often as you like from your test accounts to try to identify model failure patterns.
The actual submission will occur over a 2-week Track 1 Example Creation window. The organisers will provide a new set of annotation passages, but you will be competing against the current best QA model which will remain the same as during the previous testing phase. Some rules regarding the official submission phase:
- You should only submit 100 examples from your official participation account. If you submit more than 100, only the first 100 will be taken into consideration.
- You will NOT be allowed to retract examples.
- You MUST provide explanations for all your model-fooling examples only.
- You are NOT allowed to use sandbox mode on passages for which you will submit questions to the competition.
Once the Track 1 Example Creation window is over, your submitted examples will get validated by an expert validator. Please refer to the Validation Instructions below for more information.
The winning team will be the one with the highest validated model error rate (vMER).
By participating, you agree to make any of the data you submit available for public release.

After validation and cleaning, this dataset will form part of the validation and testing for the other competition tracks.

Validation Instructions

Questions must have only one valid answer in the passage.
The shortest span which correctly answers the question is selected.
Questions can be correctly answered from a span in the passage and DO NOT require a Yes or No answer.
Questions can be answered from the content of the passage and DO NOT rely on expert external knowledge but can rely on commonsense knowledge (e.g. knowing that the sky is blue).
DO NOT ask questions about the passage structure such as "What is the third word in the passage?".
There should be NO duplicate (or very similar) model-fooling questions. To ensure this, we require that all your model-fooling example for the same passage have different answers.
You MUST provide explanations for all your model-fooling examples only.
If the interface suggests that you didn't fool the model, but you actually consider the model to be fooled, please prepend the text MODELFOOLED to your explanation and then provide the explanation for why the model is fooled as normal.

Valid examples should be questions about the content of the passage (not its structure) that the model answers incorrectly and a sufficiently well-trained human answers correctly.

These instructions may be updated by the workshop organisers prior to the start of the Track 1 Example Creation window.

Track 2: Better Training Data

In this data-centric track, participants will submit 10,000 training examples (in SQuAD v1.1 JSON format, see https://huggingface.co/datasets/adversarial_qa#dataset-structure). These examples can be selected from existing datasets, expert-annotated, crowdsourced, or synthetically-generated. The workshop organisers will then train ELECTRA-Large models and evaluate them on the data collected in Track 1. The team with the highest word-overlap F1 score on the test set will be considered the winner.

To facilitate participation in this task, we make a variety of resources available including datasets, question generator models, and general tools and utilities at https://github.com/dadcworkshop/shared-task-resources.

Participation Instructions

Each team must submit 10,000 training examples (in SQuAD v1.1 JSON format, see https://huggingface.co/datasets/adversarial_qa#dataset-structure). Submissions in the wrong format will be disqualified.
It is the team’s responsibility to ensure that all question IDs are unique.
Data can be self-annotated, selected from existing datasets, synthetically generated, crowdsourced, etc. Please also refer to the DADC Shared Task resources.
The submitted data cannot include passages from the test set. These will be made available by the start of the Track 1 Example Creation window. Any examples for which we find significant overlap will be removed.
The workshop organisers will then train five (5) ELECTRA-Large models on your provided training dataset using the default 🤗 HuggingFace hyper-parameters and different random seeds (these will not be disclosed in advance).
We will create both a validation and test set from the Track 1 Dataset that will be used to determine which model checkpoint to use and evaluate these models.
We will evaluate word-overlap F1 score of each model on the Track 1 Test Dataset.
Your official result will be the median score of the best 3 performing models (of the 5 trained).
The team with the highest word-overlap F1 score on the test set will be considered the winner.

Submission Instructions

For Track 2 participation, upload your 10,000 training example dataset at https://forms.gle/icJ5cijHtd9b4UmL6

Important Dates


May 22, 2022	Team Registration Deadline
May 2 - 22, 2022	Official Example Creation Window for Track 1
June 13, 2022	Track 2 Submission Deadline
June 13, 2022	System Description Paper (Optional) Submission Deadline
June 17, 2022	System Description Paper Notification of Acceptance
June 27, 2022	System Description Paper Camera-Ready Deadline
July 14, 2022	Workshop Dates & Result Announcement 🏆