DCH-2: Training Data for the DialEval-2 Task

For detail of the NTCIR-16 Dialogue Evaluation Task (DialEval-2), see here.

Overview of the DCH-2 Dataset

Training Data

The Chinese training dataset contains 4,390 (4,090 for training + 300 for dev) customer-helpdesk dialgoues which are crawled from Weibo. All of these dialogues are annotated by 19 annotators.

In the DCH-2 dataset, as we finished the translation of all the Chinese dialogues, the English dataset contains 4,390 dialogues as well. All the English dialogues are manually translated from the Chinese dataset. The English dataset shares the same annotations with the Chinese dataset.

Training set
- dch2_train_cn.json (4,090 Chinese dialogues: 3,700 as training set + 390 as dev set at DialEval-1)
- dch2_train_en.json (4,090 English dialogues: 2,251 as training set + 390 as dev set at DialEval-1 + 1,449 newly translated)
Dev set
- dch2_dev_cn.json (300 dialogues, used as test set at DialEval-1)
- dch2_dev_en.json (300 dialogues, used as test set at DialEval-1)

Test Data

Will be released in Dec 2021 according to the task schedule.

Annotation

We hired 19 Chinese students to annotate the training/dev dataset in 2018. In 2019, the test dataset of DialEval-1 were annotated by another group of annotators. Thus, there may be a gap between the training data and test data, as the dialogue annotation is quite subjective.

Format of the JSON file

Each file is in JSON format with UTF-8 encoding.

Following are the top-level fields:

id
turns: array of turns from the customer and the helpdesk (see details below)
annotations: a list of annotations provided by 19 annotators. Each annotation consists of two fields: nugget and quality

Each element of the turns field contains the following fields:

sender: the speaker of this turn (either customer or helpdesk)
utterances: the utterances (may be multiple) they sent in this turn. Note that some utterances are empty strings since we didn’t crawl emoji and photos.

Each element of annotations contains the following fields:

nugget: The list of nugget types for each turn (see details below).
quality: A dictonary consists of the subjetive dialogue quality scores: A-score, S-score, and E-score (see details below).

Nugget Types

CNUG0: Customer trigger (problem stated)
CNUG*: Customer goal (solution confirmed)
HNUG*: Helpdesk goal (solution stated)
CNUG: Customer regular nugget
HNUG: Helpdesk regular nugget
CNaN: Customer Not-a-Nugget
HNaN: Helpdesk Not-a-Nugget

drawing

Dialogue Quality

A-score: Task Accomplishment (Has the problem been solved? To what extent?)
S-score: Customer Satisfaction of the dialogue (not of the product/service or the company)
E-score: Dialogue Effectiveness (Do the utterers interact effectively to solve the problem efficiently?)

Scale: [2, 1, 0, -1, -2]

Note

To protect the privacy, some sensitive information in the dialogue data has been masked. For example, we replaced telephone numbers with 123456789, and email addresses with XXX@YYY.com

Access to the dataset

To obtain the dataset, please fill and submit our user agreement form.

Conditions and Terms

See here.

Have questions?

Please contact: dialeval2org@list.waseda.jp