
Natural language inference data for AI models
Description
Book Introduction
Humans require a variety of intellectual abilities to listen to, understand, and respond to others.
Our everyday conversations are conducted on the basis of this vast accumulation of linguistic and extralinguistic knowledge, and inference is made up of countless logical puzzle pieces that make this complex intellectual processing possible.
This book begins with a reflection on why AI language models, which currently perform remarkably well in human-to-human conversations, still fail to demonstrate human-like reasoning abilities in many areas.
There is no disagreement that the inference ability of AI models can be improved when reliable training data is provided for this purpose, but there are still many things to be discovered regarding the question of how the 'Natural Language Inference (NLI)' data for this purpose should be structured.
This book introduces research trends in the NLI dataset proposed for learning the natural language inference ability of AI language models, and examines which characteristics of natural language should be considered to overcome the limitations currently being pointed out.
In order to design an NLI dataset specifically for Korean, we discussed in depth which linguistic properties unique to Korean should be described.
This book is divided into three parts:
Chapter 1 examines the current research trends in building natural language inference datasets, and Chapter 2 discusses linguistic phenomena that are deemed important for building natural language inference datasets based on the syntactic and semantic properties of the Korean language, categorized into 78 types.
Finally, in Chapter 3, the proposed natural language inference schema KOLINS and the Korean inference data KOLIN (version V_1.0) constructed based on this type-specific attribute are introduced and their performance evaluated.
This book is intended for researchers in the development group who wish to develop AI models for various tasks specialized in Korean, as well as data linguistics researchers interested in constructing language data for natural language understanding, and theoretical linguistics and Korean language studies researchers who wish to conduct linguistic research on the lexical, syntactic, and semantic properties involved in inferential relationships in Korean.
This book began as a book based on university lectures and research, but it has also come to serve as a practical schema for constructing training datasets for fine-tuning language models.
Through the reflection on the types of linguistic properties of Korean classified and proposed in this study, it is expected that future language models will be able to identify aspects in which they are particularly vulnerable to understanding syntactic and semantic language phenomena, and that customized data augmentation for this purpose will be possible.
We hope that this will provide an opportunity to once again experience why this 'symbolic approach' must be used in conjunction with other approaches to overcome the limitations of current language models.
Our everyday conversations are conducted on the basis of this vast accumulation of linguistic and extralinguistic knowledge, and inference is made up of countless logical puzzle pieces that make this complex intellectual processing possible.
This book begins with a reflection on why AI language models, which currently perform remarkably well in human-to-human conversations, still fail to demonstrate human-like reasoning abilities in many areas.
There is no disagreement that the inference ability of AI models can be improved when reliable training data is provided for this purpose, but there are still many things to be discovered regarding the question of how the 'Natural Language Inference (NLI)' data for this purpose should be structured.
This book introduces research trends in the NLI dataset proposed for learning the natural language inference ability of AI language models, and examines which characteristics of natural language should be considered to overcome the limitations currently being pointed out.
In order to design an NLI dataset specifically for Korean, we discussed in depth which linguistic properties unique to Korean should be described.
This book is divided into three parts:
Chapter 1 examines the current research trends in building natural language inference datasets, and Chapter 2 discusses linguistic phenomena that are deemed important for building natural language inference datasets based on the syntactic and semantic properties of the Korean language, categorized into 78 types.
Finally, in Chapter 3, the proposed natural language inference schema KOLINS and the Korean inference data KOLIN (version V_1.0) constructed based on this type-specific attribute are introduced and their performance evaluated.
This book is intended for researchers in the development group who wish to develop AI models for various tasks specialized in Korean, as well as data linguistics researchers interested in constructing language data for natural language understanding, and theoretical linguistics and Korean language studies researchers who wish to conduct linguistic research on the lexical, syntactic, and semantic properties involved in inferential relationships in Korean.
This book began as a book based on university lectures and research, but it has also come to serve as a practical schema for constructing training datasets for fine-tuning language models.
Through the reflection on the types of linguistic properties of Korean classified and proposed in this study, it is expected that future language models will be able to identify aspects in which they are particularly vulnerable to understanding syntactic and semantic language phenomena, and that customized data augmentation for this purpose will be possible.
We hope that this will provide an opportunity to once again experience why this 'symbolic approach' must be used in conjunction with other approaches to overcome the limitations of current language models.
index
index
Preface | Author's Preface ii
Book Structure | Table of Contents iv
Chapter 1.
Natural Language Inference Dataset Research Trends 1
1 Natural Language Inference 3
1.1 Definition of Natural Language Inference 3
1.2 Natural Language Inference (NLI) and Early Linguistic Considerations 8
2 Natural Language Inference Datasets and Benchmarks 29
2.1 Text Envelope Recognition (RTE) Dataset 30
2.2 The Emergence of Large-Scale Natural Language Inference Benchmarks 32
2.3 Study of Datasets Based on Vocabulary, Logic, and Syntactic 44
2.4 Study of Natural Language Inference Datasets Based on Common Sense and Context 60
2.5 Study of Domestic Natural Language Inference Learning Datasets 79
3 Approaches to Building Natural Language Inference Datasets 86
3.1 Web Document-Based Premise and Crowdworker Hypothesis 86
3.2 Designing a Dataset Considering Linguistic Features 88
Chapter 2.
A Study on Korean Inference Data Based on Linguistic Properties 91
I.
Argument Transformation Schema 97
1 A01 Intersection of Subject and 'N-and' Argument 100
2 A02 Intersection of Object and 'N-and' Argument 104
3 A03 Intersection of Subject and 'N-to/to' Argument 106
4 A04 Intersection of Object and 'N-to/to' Argument 109
5 A05 Intersection of Object and Subject 110
6 A06 Transformation of the genitive case into a nominative case argument 114
7 A07 Transformation of the accusative case into an objective case argument 116
8 A08 Transformation of the nominative case of an embedded sentence into the objective case of the main clause 118
9 A09 Transformation of the nominative case into an unmarked argument 120
10 A10 Transformation of the objective case into an unmarked argument 122
11 A11 Postpositional Variations of the Adverbial 125
12 A12 Deletion of the nominative argument 130
13 A13 Deletion of argument in nominative case 132
14 A14 Deletion of objective argument 141
15 A15 Deletion of the argument in the objective case 142
16 A16 Argument deletion in homonymous argument structure 147
17 A17 Deletion of the adverbial argument 149
18 A18 Deletion of the argument of the nominal pronoun 151
19 A19 AND Coordination of Noun Phrases 155
20 A20 OR Coordination of Noun Phrases 158
21 A21 External Coordination of Nominative Arguments 161
22 A22 Objective Argument's External Coordination 163
23 A23 External connection of the adverbial argument 165
II.
Predicate Transformation Schema 167
24 P01 Negation of Verb Phrase Predicates 170
25 P02 Negation of Adjective Phrase Predicate 172
26 P03 Negation of Noun Phrase Predicates 176
27 P04 Double Negative Predicate Sentence 178
28 P05 Passive sentence transformation of predicate 184
29 P06 Tense transformation of predicates 188
30 P07 Allegorical construction of predicates 190
31 P08 Complementary clause transformation of predicate 194
32 P09 Nominalization of embedded predicates 196
33 P10 Adverbialization of Adjective Predicates 199
34 P11 Deletion of implied verbs 203
35 P12 Deletion of factual verbs 210
36 P13 Deletion of the verb of service 215
37 P14 AND Coordination of Predicates 218
38 P15 OR Coordination of Predicates 221
III.
Formula Component Conversion Schema 227
39 M01 Variation of Quantifiers/Time Expressions 230
40 M02 Upward Forging Existential Quantifier Variation 235
41 M03 Universal quantifier variation of downward forging 237
42 M04 Non-monotonic variation of the adjective 239
43 M05 Cross-transformation of two adjectives 240
44 M06 Positional variation of adjectives 243
45 M07 Deletion variation of adjectives 245
46 M08 AND Coordination of Adjectives 248
47 M09 OR Coordination of Adjectives 252
48 M10 Subject Nominative Case Clause Transformation 258
49 M11 Subject Non-Nominative Relative Clause Transformation 263
50 M12 Nominative Relative Clause Transformation of Non-Subject Arguments 266
51 M13 Non-subject argument non-nominative relative clause transformation 269
52 M14 Variation of sentences containing conditional adverbial clauses 272
53 M15 Variation of sentences containing concessionary adverbial clauses 274
54 M16 Transformation of sentences containing purpose adverbial clauses 276
55 M17 Transformation of sentences containing causal adverbial clauses 278
56 M18 Transformation of sentences containing adverbial clauses of time 280
57 M19 Description: Variation of sentences containing adverbial clauses 283
58 M20 Conformity/Attitude Expression Sentence Adverbial Transformation 287
59 M21 Variations of Adverbs of Uncertainty 289
IV.
Vocabulary and Knowledge Transformation Schema 293
60 L01 Synonymous Vocabulary Transformation of Nouns 296
61 L02 Synonymous vocabulary transformations in the non-noun category 298
62 L03 Antonyms of Nouns Vocabulary Transformation 301
63 L04 Antonymous Vocabulary Variation in the Non-Noun Category 304
64 L05 Metaphorical and Idiomatic Synonyms 308
65 L06 Derivative transformation by negative prefix 310
66 L07 Noun Hypernym and Hyponym Lexical Transformation 313
67 L08 Variation of the superordinate and hyponym vocabulary of the non-noun category 316
68 L09 Noun Partial/Whole Word Lexical Transformation 319
69 L10 Metonymic lexical transformation of nouns 322
70 L11 Transformation based on cultural and religious knowledge 325
71 L12 Variations based on geographical knowledge 327
72 L13 Variations based on historical knowledge 329
73 L14 Transformation based on artistic knowledge 331
74 L15 Transformation based on legal and social knowledge 334
75 L16 Transformation based on economic and sports knowledge 336
76 L17 Modification based on mathematical knowledge 338
77 L18 Modification based on scientific and medical knowledge 340
78 L19 Variation based on general knowledge 342
Chapter 3. KOLINS Schema & KOLIN Dataset 345
1 KOLINS Korean Inference Data Schema 347
2 Building the KOLIN Korean Language Inference Dataset 353
3 KOLIN dataset performance evaluation 364
Reference 371
Preface | Author's Preface ii
Book Structure | Table of Contents iv
Chapter 1.
Natural Language Inference Dataset Research Trends 1
1 Natural Language Inference 3
1.1 Definition of Natural Language Inference 3
1.2 Natural Language Inference (NLI) and Early Linguistic Considerations 8
2 Natural Language Inference Datasets and Benchmarks 29
2.1 Text Envelope Recognition (RTE) Dataset 30
2.2 The Emergence of Large-Scale Natural Language Inference Benchmarks 32
2.3 Study of Datasets Based on Vocabulary, Logic, and Syntactic 44
2.4 Study of Natural Language Inference Datasets Based on Common Sense and Context 60
2.5 Study of Domestic Natural Language Inference Learning Datasets 79
3 Approaches to Building Natural Language Inference Datasets 86
3.1 Web Document-Based Premise and Crowdworker Hypothesis 86
3.2 Designing a Dataset Considering Linguistic Features 88
Chapter 2.
A Study on Korean Inference Data Based on Linguistic Properties 91
I.
Argument Transformation Schema 97
1 A01 Intersection of Subject and 'N-and' Argument 100
2 A02 Intersection of Object and 'N-and' Argument 104
3 A03 Intersection of Subject and 'N-to/to' Argument 106
4 A04 Intersection of Object and 'N-to/to' Argument 109
5 A05 Intersection of Object and Subject 110
6 A06 Transformation of the genitive case into a nominative case argument 114
7 A07 Transformation of the accusative case into an objective case argument 116
8 A08 Transformation of the nominative case of an embedded sentence into the objective case of the main clause 118
9 A09 Transformation of the nominative case into an unmarked argument 120
10 A10 Transformation of the objective case into an unmarked argument 122
11 A11 Postpositional Variations of the Adverbial 125
12 A12 Deletion of the nominative argument 130
13 A13 Deletion of argument in nominative case 132
14 A14 Deletion of objective argument 141
15 A15 Deletion of the argument in the objective case 142
16 A16 Argument deletion in homonymous argument structure 147
17 A17 Deletion of the adverbial argument 149
18 A18 Deletion of the argument of the nominal pronoun 151
19 A19 AND Coordination of Noun Phrases 155
20 A20 OR Coordination of Noun Phrases 158
21 A21 External Coordination of Nominative Arguments 161
22 A22 Objective Argument's External Coordination 163
23 A23 External connection of the adverbial argument 165
II.
Predicate Transformation Schema 167
24 P01 Negation of Verb Phrase Predicates 170
25 P02 Negation of Adjective Phrase Predicate 172
26 P03 Negation of Noun Phrase Predicates 176
27 P04 Double Negative Predicate Sentence 178
28 P05 Passive sentence transformation of predicate 184
29 P06 Tense transformation of predicates 188
30 P07 Allegorical construction of predicates 190
31 P08 Complementary clause transformation of predicate 194
32 P09 Nominalization of embedded predicates 196
33 P10 Adverbialization of Adjective Predicates 199
34 P11 Deletion of implied verbs 203
35 P12 Deletion of factual verbs 210
36 P13 Deletion of the verb of service 215
37 P14 AND Coordination of Predicates 218
38 P15 OR Coordination of Predicates 221
III.
Formula Component Conversion Schema 227
39 M01 Variation of Quantifiers/Time Expressions 230
40 M02 Upward Forging Existential Quantifier Variation 235
41 M03 Universal quantifier variation of downward forging 237
42 M04 Non-monotonic variation of the adjective 239
43 M05 Cross-transformation of two adjectives 240
44 M06 Positional variation of adjectives 243
45 M07 Deletion variation of adjectives 245
46 M08 AND Coordination of Adjectives 248
47 M09 OR Coordination of Adjectives 252
48 M10 Subject Nominative Case Clause Transformation 258
49 M11 Subject Non-Nominative Relative Clause Transformation 263
50 M12 Nominative Relative Clause Transformation of Non-Subject Arguments 266
51 M13 Non-subject argument non-nominative relative clause transformation 269
52 M14 Variation of sentences containing conditional adverbial clauses 272
53 M15 Variation of sentences containing concessionary adverbial clauses 274
54 M16 Transformation of sentences containing purpose adverbial clauses 276
55 M17 Transformation of sentences containing causal adverbial clauses 278
56 M18 Transformation of sentences containing adverbial clauses of time 280
57 M19 Description: Variation of sentences containing adverbial clauses 283
58 M20 Conformity/Attitude Expression Sentence Adverbial Transformation 287
59 M21 Variations of Adverbs of Uncertainty 289
IV.
Vocabulary and Knowledge Transformation Schema 293
60 L01 Synonymous Vocabulary Transformation of Nouns 296
61 L02 Synonymous vocabulary transformations in the non-noun category 298
62 L03 Antonyms of Nouns Vocabulary Transformation 301
63 L04 Antonymous Vocabulary Variation in the Non-Noun Category 304
64 L05 Metaphorical and Idiomatic Synonyms 308
65 L06 Derivative transformation by negative prefix 310
66 L07 Noun Hypernym and Hyponym Lexical Transformation 313
67 L08 Variation of the superordinate and hyponym vocabulary of the non-noun category 316
68 L09 Noun Partial/Whole Word Lexical Transformation 319
69 L10 Metonymic lexical transformation of nouns 322
70 L11 Transformation based on cultural and religious knowledge 325
71 L12 Variations based on geographical knowledge 327
72 L13 Variations based on historical knowledge 329
73 L14 Transformation based on artistic knowledge 331
74 L15 Transformation based on legal and social knowledge 334
75 L16 Transformation based on economic and sports knowledge 336
76 L17 Modification based on mathematical knowledge 338
77 L18 Modification based on scientific and medical knowledge 340
78 L19 Variation based on general knowledge 342
Chapter 3. KOLINS Schema & KOLIN Dataset 345
1 KOLINS Korean Inference Data Schema 347
2 Building the KOLIN Korean Language Inference Dataset 353
3 KOLIN dataset performance evaluation 364
Reference 371
GOODS SPECIFICS
- Date of issue: January 20, 2025
- Page count, weight, size: 386 pages | 148*210*30mm
- ISBN13: 9791198899156
- ISBN10: 1198899158
You may also like
카테고리
korean
korean