The First Workshop on Dynamic Adversarial Data Collection (DADC) at NAACL 2022 in Seattle, Washington.
HomeThe DADC Workshop '22 will be held on the 14th July, 2022 and is co-located with NAACL at the Hyatt Regency in Seattle, Washington. Along with the announcement of our shared task results, we have a fantastic line up of keynote talks and a diverse and controversial panel discussion planned. It's going to be a great event, see you there!
The workshop will be held in room 708 Sol Duc. For further details, see the underline link.
09:00 – 09:10: Opening remarks
09:10 – 09:45: Invited Talk 1: Anna Rogers
09:45 – 10:20: Invited Talk 2: Jordan Boyd-Graber
10:20 – 10:35: Collaborative Progress: MLCommons Introduction
10:35 – 10:50: Coffee Break
10:50 – 11:10: Best Paper Talk: Margaret Li and Julian Michael
11:10 – 11:45: Invited Talk 3: Sam Bowman
11:45 – 12:20: Invited Talk 4: Sherry Tongshuang Wu
12:20 – 13:20: Lunch
13:20 – 13:55: Invited Talk 5: Lora Aroyo
13:55 – 14:55: Panel on The Future of Data Collection moderated by Adina Williams. Panelists: Anna Rogers, Jordan Boyd-Graber, Sam Bowman, Sherry Tongshuang Wu, Lora Aroyo, Douwe Kiela & Swabha Swayamdipta.
14:55 – 15:10: Coffee Break
15:10 – 15:20: Shared Task Introduction: Max Bartolo
15:20 – 15:30: Shared Task Presentations: Team Fireworks
15:30 – 15:40: Shared Task Presentations: Team Longhorns
15:40 – 15:50: Shared Task Presentations: Team Supersamplers
15:50 – 16:50: Poster Session
16:50 – 17:00: Closing Remarks
18:30 – 21:30: The DADC Social Event
This talk provides an overview of the current landscape of resources for Question Answering and Reading comprehension, highlighting the current lacunae for future work. I will also present a new taxonomy of "skills" targeted by QA/RC datasets and discuss various ways in which questions may be unanswerable.
Dynamic and/or adversarial data collection can be quite useful as a way of collecting training data for machine-learning models, identifying the conditions under which these models fail, and conducting online head-to-head comparisons between models. However, it is essentially impossible to use these practices to build usable static benchmark datasets for use in evaluating or comparing future new models. I defend this point using a mix of conceptual and empirical points, focusing on the claims (i) that adversarial data collection can skew the distribution of phenomena such as to make it unrepresentative of the intended task, and (ii) that adversarial data collection can arbitrarily shift the rankings of models on its resulting test sets to disfavor systems that are qualitatively similar to the current state of the art.
I'll discuss two examples of our work putting experienced writers in front of a retrieval-driven adversarial authoring system: question writing and fact-checking. For question answering, we develop a retrieval-based adversarial authoring platform and create incentives to get people to use our system in the first place, write interesting questions humans can answer, and challenge a QA system. While the best humans lose to computer QA systems on normal questions, computers struggle to answer our adversarial questions. We then turn to fact checking, creating a new game (Fool Me Twice) to solicit difficult-to-verify claims---that can be either true or false---and to test how difficult the claims are both for humans and computers. We argue that the focus on retrieval is important for knowledge-based adversarial examples because it highlights diverse information, prevents frustration in authors, and takes advantage of users' expertise.
The efficacy of machine learning (ML) models depends on both algorithms and data. Training data defines what we want our models to learn, and testing data provides the means by which their empirical progress is measured. Benchmark datasets define the entire world within which models exist and operate, yet research continues to focus on critiquing and improving the algorithmic aspect of the models rather than critiquing and improving the data with which our models operate. If “data is the new oil,” we are still missing work on the refineries by which the data itself could be optimized for more effective use. In this talk, I will discuss data excellence and lessons learned from software engineering to achieve the scare and rigor in assessing data quality.
Assistive models have been shown useful for supporting humans in creating challenging datasets, but how exactly do they help? In this talk, I will discuss different roles of assistive models in counterfactual data collection (i.e., perturbing existing text inputs to gain insight into task model decision boundaries), and the characteristics associated with these roles. I will use three examples (CheckList, Polyjuice, Tailor) to demonstrate how our objectives shift when we perturb texts for evaluation, explanation, and improvement, and how that change the corresponding assistive models from enhancing human goals (requiring model controllability) to competing with human bias (requiring careful data reranking). I will conclude by exploring additional roles that these models can play to become more effective.
The DADC Shared Task this year will focus on the Extractive Question Answering (QA) task. We have tracks focusing on better annotators, better training data and better models.
Specific details and a call for participation can be found here. Here's a quick overview:
Please join our mailing list to get important updates. If you have any questions, please contact dadc-workshop@googlegroups.com
.
We would like to thank our generous sponsors for their help and support.