Analógikus általánosítási folyamatok a gyereknyelvben

súgó

nyomtatás

vissza »

Projekt adatai

azonosító

61735

típus

Vezető kutató

Babarczy Anna

magyar cím

Analógikus általánosítási folyamatok a gyereknyelvben

Angol cím

Analogical generalisation processes in language acquisition

magyar kulcsszavak

nyelvelsajátítás, lexikon, szintaxis, statisztikai tanulás, túláltalánosítás

angol kulcsszavak

language acquisition, lexicon, syntax, statistical learning, overgeneralisation

megadott besorolás

Nyelvtudomány (Bölcsészet- és Társadalomtudományok Kollégiuma)	60 %
Pszichológia (Bölcsészet- és Társadalomtudományok Kollégiuma)	30 %
Informatika (Műszaki és Természettudományok Kollégiuma)	10 %

zsűri

Nyelvészet

Kutatóhely

Kognitív Tudományi Tanszék (Budapesti Műszaki és Gazdaságtudományi Egyetem)

projekt kezdete

2006-02-01

projekt vége

2009-05-31

aktuális összeg (MFt)

3.276

FTE (kutatóév egyenérték)

0.70

állapot

lezárult projekt

magyar összefoglaló

A generatív nyelvelmélet térhódításával előtérbe került a kérdés, hogy hogyan fejlődhet ki az ember elméjében az a mentális nyelvtan, ami pontosan az elsajátítandó anyanyelvnek megfelelő mondathalmazt generálja – se többet, se kevesebbet. Bár a probléma megfogalmazása olyan nyelvelméletre utal, ami szabályrendszerként írja le a mentális nyelvtant, a kérdés abban az esetben is fennáll, ha a nyelvet konstrukciók halmazaként jellemezzük: hogyan sajátítható el pontosan a célnyelvnek megfelelő konstrukció készlet? A kutatási terv témája ezen a kérdéskörön belül a “több” problémája: milyen túláltalánosításra utaló hibák jellemzőek a gyereknyelvre és pontosan miben áll az a mechanizmus, amely a kelleténél megengedőbb mentális nyelvtan leszűkítéséhez vezet? A kutatás kiindulópontja az a hipotézis, miszerint a nyelvelsajátításban alapvető szerepet játszik egy valószínűségeken alapuló, input-vezérelt statisztikai tanulási mechanizmus. A kutatás kettős eszközt vesz igénybe: Első célunk egy magyar gyereknyelvi korpusz részletes elemzése a nemzetközi gyereknyelvkutatásban kimutatott empirikus eredmények megerősítésére és kiegészítésére. A kutatás második szakasza a tanulási mechanizmus elméleti kidolgozása és számítógépes modellálása az empirikus eredmények ismeretében.

angol összefoglaló

A central question of the generative paradigm of theoretical linguistics is how a mental grammar develops in the human mind that can generate precisely the set of acceptable sentences of the target language, neither more, nor less than that. The problem holds not only for a linguistic theory that characterises grammar as a system of rules but also for one that sees language as a set of construction templates: how does the child acquire precisely the target set of templates? Within this area, the research topic of the proposal is the problem of “more”: what sort of overgeneralization errors are characteristic of child language and what is the nature of the mechanism that allows the restriction of this overgeneral mental grammar? The proposed research builds on the hypothesis that at least some aspects of language acquisition are driven by a probabilistic, input-based statistical learning mechanism. Our aims are planned to be achieved using two methods of research. First, we intend to analyse a Hungarian corpus of child language in line with relevant international empirical studies. At the second stage of the project a theoretical model of the learning mechanism will be developed and implemented on the basis of our empirical results.

Zárójelentés

kutatási eredmények (magyarul)

A lexikai tudás, vagyis a felnőtt nyelvtan által megengedett predikátum-argumentum struktúrák elsajátítását vizsgáltuk. A kutatás módszere a gyereknyelvi adatok elemzéséből nyert statisztikák összevetése különböző számítógépes tanulási mechanizmusok eredményeivel. A CHILDES adatbázisból elérhető és a projekt keretében készített magyar gyereknyelvi korpuszokat a kutatás céljaira kialakított annotációs rendszerben elemeztük az előforduló predikátum-argumentum szerkezetek helyessége szerint. Az elemzés eredményeként sekély U-görbét kaptunk, ami arra utal, hogy a kezdeti konzervatív tanulási mechanizmust felváltja egy analogikus általánosító mechanizmus, amely átmenetileg hibákhoz vezet. A gyerek nyelvelsajátítási mechanizmusainak szimulálására automatikus vonzatkeret-kinyerő alkalmazást hoztunk létre. Elsőként Brent által kidolgozott statisztikai gépi tanulási módszert adaptáltuk a magyar nyelvre. A tanulás a vonzatok morfológiai jegyei alapján történik annotált korpuszból. Brent módszere szigorú konzervatív tanulási algoritmus, ahol a vonzatkeretek elsajátítása kizárólag megfelelő pozitív input alapján történik, így nem kaptunk a gyereknyelvi adatokhoz hasonlítható U-görbét. Második lépésben a tanulási algoritmust úgy módosítottuk, hogy ne zárjuk ki az általánosítás illetve túláltalánosítás lehetőségét. Ez a modell közelebb áll a gyereknyelvben megfigyelt mintákhoz, de lényegesen több inputra van szükség. A cél-nyelvtan leszűkítésével eredményjavulást értünk el.

kutatási eredmények (angolul)

We looked into children’s acquisition of predicate-argument structures. Our method involved the comparison of the results of the statistical analysis of child language corpora with the output of various machine-learning algorithms. A Hungarian child language corpus was constructed of new data and the data available from the CHILDES databank. The corpus was annotated using a grammar developed for the project, and the argument frames produced by the children were analysed for accuracy. The results showed a shallow U-shaped curve suggesting that an initial conservative learning strategy was followed by an analogical generalization mechanism, which resulted in a dip in performance. The mechanisms of child learning were modelled by a series of computational models of argument frame acquisition. Model 1 used Brent’s statistical learning algorithm adapted to the Hungarian language. The learning mechanism relied on morphological cues extracted from a pre-annotated corpus. The model used a strictly conservative learning algorithm, where argument frames were added to the lexicon only if sufficient positive evidence was found. Model 1 failed to produce a U-shaped learning curve. Model 2 used a less conservative learning algorithm allowing for generalization and, thus, overgeneralisation. The output was closer to the patterns observed in child language, but the system required substantially more input. The model’s performance was improved by reducing the target grammar.

a zárójelentés teljes szövege

https://www.otka-palyazat.hu/download.php?type=zarobeszamolo&projektid=61735

döntés eredménye

igen

Közleményjegyzék

Babarczy Anna: Számítógépes nyelvészet, Kovács--Szamarasz, Látás, nyelv, emlékezet. Typotex., 2006

Babarczy Anna, Gábor Bálint, Hamp Gábor, Rung András: Argumentumstruktúrák gépi azonosítása, Alexin--Csendes, IV. Magyar Számítógépes Nyelvészeti Konferencia Kiadványa. Szeged, 2006

Serény András, Simon Eszter, Babarczy Anna: A model of learning verb argument frames in Hungarian, 1st Dubrovnik Conference on Cognitive Science: Language and the Brain, 2009

Fidler, Ashley, Babarczy Anna: Expanding Locative Case Marking beyond Spatial Contexts in Child Hungarian, Boston University Conference on Language Development, 2008

Serény András, Simon Eszter, Babarczy Anna: Automatic acquisition of Hungarian subcategorization frames, 9th International Symposium of Hungarian Researchers on Computational Intelligence and Informatics, 2008

vissza »