Mély tanulás alapú szemantikus reprezentációk kiterjesztése prozódiai-akusztikai jellemzőkkel beszéd automatikus tartalmi kivonatolásában és összefoglalásában

súgó

nyomtatás

vissza »

Projekt adatai

azonosító

124413

típus

Vezető kutató

Szaszák György

magyar cím

Mély tanulás alapú szemantikus reprezentációk kiterjesztése prozódiai-akusztikai jellemzőkkel beszéd automatikus tartalmi kivonatolásában és összefoglalásában

Angol cím

Enhancement of deep learning based semantic representations with acoustic-prosodic features for automatic spoken document summarization and retrieval

magyar kulcsszavak

mély tanulás, beágyazás, szemantikus térbeli reprezentáció, prozódia, tartalmi kivonatolás

angol kulcsszavak

deep learning, embedding, semantic space representation, speech prosody, speech summarization

megadott besorolás

Informatika (Műszaki és Természettudományok Kollégiuma)	60 %
Ortelius tudományág: Alkalmazott informatika
Nyelvtudomány (Bölcsészet- és Társadalomtudományok Kollégiuma)	40 %
Ortelius tudományág: Számítógépes nyelvészet

zsűri

Informatikai–Villamosmérnöki

Kutatóhely

Távközlési és Mesterséges Intelligencia Tanszék (Budapesti Műszaki és Gazdaságtudományi Egyetem)

résztvevők

Beke András
Gosztolya Gábor
Kiss Gábor
Makrai Márton
Sztahó Dávid
Vicsi Klára

projekt kezdete

2017-12-01

projekt vége

2023-05-31

aktuális összeg (MFt)

34.873

FTE (kutatóév egyenérték)

10.11

állapot

lezárult projekt

magyar összefoglaló

A kutatás összefoglalója, célkitűzései szakemberek számára
Itt írja le a kutatás fő célkitűzéseit a témában jártas szakember számára.
A kutatás a természetes nyelvek feldolgozása és a beszédtechnológia területén tervezi új lehetőségek és technológiák felderítését, különösképpen a beszéd gépi kivonatolása és összefoglalása tárgykörében. Mély tanulásban használatos eszköztárral tervezzük új, magas absztrakciós képességű modellezés kidolgozását, amellyel bonyolultabb jelentésbeli és szerkezeti összefüggések és reprezentációk is feltárhatók. Ezzel párhuzamosan a jelenlegi state-of-the-art feldolgozási lánc gyenge pontjait is azonosítjuk és javítjuk: a beszédfeldolgozást végző front-enden összefüggő, nem csupán egyes markerekre alapozó prozódiai elemzéssel végzett tokenizáló eljárást dolgozunk ki; ezzel összefüggésben a beszédfelismerő kimenetéről hiányzó írásjeleket is automatikusan helyreállítjuk, így segítve a ráépülő elemzést; a modellezést kiterjesztjük a hangsúly reprezentálására is, ezzel a magasabb információértékű szavakat azonosítva az összefoglaláshoz. A szövegfeldolgozásban újszerű beágyazási architektúrákat, szekvencia-leképezéseket vizsgálunk konvolúciós és visszacsatolt neuronhálókkal és kombinációikkal is. A verbális (szintaktikai és szemantikai viszonyaiban) és nem verbális (hangsúlyozás, a prozódiai tagolás) összetevőkre készített absztrakt reprezentációkat egységesítjük és összevonjuk, ezzel egyedülálló egységes modellt teremtve, amely mindkét információforrást (a verbálist és a nem verbálist) azonos elven kezeli és integrálja, enkóder elven megvalósított neurális hálózatokban (illetve azok projekciós rétegeiben). A kidolgozott új eljárásokat beszéd automatikus tartalmi kivonatolásában és összefoglalásában is kiértékeljük objektív és szubjektív módszerekkel.

Mi a kutatás alapkérdése?
Ebben a részben írja le röviden, hogy mi a kutatás segítségével megválaszolni kívánt probléma, mi a kutatás kiinduló hipotézise, milyen kérdéseket válaszolnak meg a kísérletek.
A kutatás a beszéd automatikus összefoglalásának legszűkebb keresztmetszetű, ezért a feldolgozásban információveszteséget okozó pontjait veszi célba, hogy új eljárásokkal jelentősen javítsa a beszéd tartalmi kivonatolásának és összefoglalásának teljesítményét. A kutatást 10 központi hipotézis köré építjük fel, amelyeket részleteikben a kutatási tervben fejtünk ki:
(1) Teljes megnyilatkozás-részletek prozódiai “dekódolásával” nyerjük ki a beszédben közvetített információszerkezetet.
(2) Ezt a prozódiai modellt absztrakt megvalósításban is elkészítjük enkóder-dekóder neurális hálózattal (lényegében prozódiai beágyazást megvalósítva)
(3) Írásjelezés automatikus helyreállítását dolgozzuk ki automatikus beszédfelismerők kimenetére (1 és 2 alapján).
(4) Az (1-3) hipotéziseket spontán beszédben is teszteljük.
(5) A szóbeágyazásokkal megvalósított jelentéstérbeli leképezéseket bevezetjük a beszéd automatikus összefoglalását megvalósító rendszer tematikus terminus kalkulációs folyamatába.
(6) Vizsgáljuk a szótövek és toldalékolási sorok szétválasztásával történő beágyazásokat és felhasználásuk lehetőségeit automatikus tartalmi összefoglalásban.
(7) A (6) alatt készített beágyazások - összefűzéses, párhuzamos tréning jellegű vagy ensemble - fúzióját vizsgáljuk.
(8) Az (5-7) hipotéziseket kiértékeljük beszéd automatikus kivonatolásában (híranyagokon és spontán narratívákban, párbeszédekben is). Kiértékelés objektív és szubjektív metrikákkal, eredmények statisztikai validációja.
(9) A prozódiai információt szóbeágyazásokkal kombináljuk beszéd automatikus összefoglalásában.
(10) Szekvencia leképezés alapú tartalmi összefoglalás kutatása és end-to-end rendszerintegráció.

Mi a kutatás jelentősége?
Röviden írja le, milyen új perspektívát nyitnak az alapkutatásban az elért eredmények, milyen társadalmi hasznosíthatóságnak teremtik meg a tudományos alapját. Mutassa be, hogy a megpályázott kutatási területen lévő hazai és a nemzetközi versenytársaihoz képest melyek az egyediségei és erősségei a pályázatának!
A beszéd tartalmi összefoglalása a gépi beszédértés területébe tartozik. Kutatási szempontból célunk a state-of-the-art jelentős továbbfejlesztése mind mérnöki, mind nyelvészeti vonatkozásban. Beszédből és szöveges elemzésből származó jellemzőket egyaránt figyelembe veszünk, ami ugyan ismert, mégis ritkán alkalmazott technológia. Ennek oka meglátásunk szerint, hogy a prozódiát nem önálló szervező erőként figyelembe véve koherens, összefüggéseiben értelmezhető struktúra felderítésére használják fel. 1. hipotézisünkben, amelyet előzetes eredményeink is alátámasztanak, ezt az újszerű technikát dolgozzuk ki. A 3. és 4. hipotézisekben szintén új elem, hogy ezt a technológiát automatikus írásjelezésben, illetve spontán beszéd elemzésére és felhasználjuk. 2. hipotézisünkben mély neurális hálóval valósítjuk meg ezt, mivel a nemzetközi trendek jobb eredményeket valószínűsítenek a gépi tanulás deep learning alapú megközelítésére. 5-8. hipotéziseinkben szóbeágyazásokkal foglalkozunk, újdonságként toldalékoló nyelvekre is hatékony és kevésbé adatigényes módszerek kidolgozásával. Jelentős az eszköztár elemzése spontán beszéd (illetve automatikus leiratának) feldolgozásában. A 9. hipotézisben megfogalmazott prozódiai és szemantikai jellemzőkombináció tudomásunk szerint egyedülálló kezdeményezés. A 10. hipotézisben a nemzetközi trendeket is követve egy gyorsan fejlődő, nagyon ígéretes, egységes architektúrába szervezzük a kutatás során kidolgozott modulokat.
Társadalmi hasznosíthatóság tekintetében a beszéd tartalmi összefoglalása egyre nagyobb jelentőségre tesz szert a csak hang formában elérhető adatok mennyiségének megugrása miatt. Eredményeink megbeszélések automatikus jegyzőkönyvezésében, hangdokumentumok összefoglalásában, kulcsszavazásában, indexelésében, adatbányászati alkalmazásokban is jól hasznosíthatók, valós technológiai igényt kielégítve. Ezek az alkalmazások máris mindennapi életünk részei, és várható, hogy a jövőben még fontosabbak lesznek. A lehetséges felhasználási területek azonban ezen is túlmutatnak: metaadatok kinyerésével ajánlórendszerekben, személyi asszisztensekben, kérdés-válasz alapú rendszerekben egyaránt keresettek.

A kutatás összefoglalója, célkitűzései laikusok számára
Ebben a fejezetben írja le a kutatás fő célkitűzéseit alapműveltséggel rendelkező laikusok számára. Ez az összefoglaló a döntéshozók, a média, illetve az érdeklődők tájékoztatása szempontjából különösen fontos az NKFI Hivatal számára.
A hanganyagok kezelése mindennapi életünk része. A mesterséges intelligenciára épülő technológiák már egy okostelefonnal is a kezünkbe kerülnek, amelyek több nyelven értő személyi asszisztensektől kezdve a hanganyagokban történő keresésen át a személyre szabott, például zenei ajánlórendszerekig hasznos eszközök sorát biztosítják a mindennapi felhasználó számára. Kutatásunkban ezeknek az eszközöknek az alapjául szolgáló technológiákat fejlesztjük tovább, illetve bővítjük funkcióikat, mesterséges neurális hálózati alapokon. Kutatásunkban külön figyelmet kap a magyar nyelv, amelynek toldalékoló jellege miatt a zömmel angolra kidolgozott technológiák átültetése módosításokat, előkészítést igényel. Hanganyagok tartalmi kivonatolása vagy összefoglalása legalább annyira hasznos, mint a hosszabb, írott dokumentumoké, ami mind a hétköznapi, mind az üzleti életben jól hasznosítható - például megbeszélések, egyeztetések automatikus jegyzőkönyvezésével. Beszéddel kommunikálni sokszor kényelmesebb és gyorsabb, mint írásban. Olykor mégis többet kényszerülünk írni, mint szeretnénk, ugyanis az adatfeldolgozást, tárolást és automatikus visszakeresést, kategorizálást szolgáló eljárások mind szöveg alapúak. A kutatás során kidolgozott technológiákkal lehetőség nyílik ezeknek a feladatoknak beszéd útján történő megvalósítására, pontosabban a szöveges elemzésre való közvetlen visszavezetésére, ami számos esetben hatékonyabb, természetesebb és teljesebb alternatívája a szöveges feldolgozásnak.

angol összefoglaló

Summary of the research and its aims for experts
Describe the major aims of the research for experts.
The proposed research addresses the exploration of new techniques in speech and natural language processing, with a special focus on automatic speech summarization in particular. Beside applying the deep learning paradigm to explore novel and abstract modelling and extractive technologies allowing for handling sophisticated structural and contextual representation and dependencies, we identify and attack bottleneck points in the existing state-of-the-art speech and text processing pipeline. These involve both the speech processing front-end – tokenization aided by advanced prosodic models, not just raw acoustic-prosodic features; punctuation restoration allowing for more accurate analyses prior to summarization; extending modelling capabilities by detecting prosodic stress and hence retrieving elements with higher informational value – and the text representation – using advanced word-embedding and sequence-to-sequence modelling approaches; combining convolutional and recurrent deep neural networks, with a special attention on the very promising and recently emerged adversarial and attention mechanism approaches. We attempt a high level fusion of verbal (syntax and semantics in speech or speech-to-text) and non-verbal (stress and phrasing information inferred by speech prosody) abstract representations, hence proposing a novel synthetic model capable of capturing both information streams in a uniform, abstract feature representation and model, applying the encoder-decoder deep learning paradigm. We evaluate the proposed approaches and componenst within an automatic speech summarization system using objective and subjective tests.

What is the major research question?
Describe here briefly the problem to be solved by the research, the starting hypothesis, and the questions addressed by the experiments.
The research attacks bottleneck points (where information is lost in current state-of-the-art) in speech summarization, with the goal of advancing summarization (both extractive and abstractive). The 10 hypotheses explained in detail in the research plan are expected to solve the following issues via scientific research:
(1) By imposing a coherent and consistent prosodic structure onto entire speech utterances, we are able to reconstruct the information structure.
(2) Port this abstract prosodic model onto an encoder-decoder based neural network representation (creating an embedding for feature representation)
(3) Punctuation recovery for the output of automatic speech recognizers.
(4) Evaluate (1-3) on spontaneous speech.
(5) Exploit word embeddings for grouping words with similar meaning in the semantic space, and introduce this knowledge into the thematic term calculation process.
(6) Analyse word embeddings for stems and suffix sequences and try them in speech summarization.
(7) Fusion of embeddings from (6) via concatenation, co-training or ensemble approach.
(8) Evaluate (5-7) in speech summarization tasks (broadcast news and narratives or dialogues). Objective and subjective metrics, statistical validation of results.
(9) Combine prosodic information (prosodic embeddings) with word embeddings and evaluate in speech summarization.
(10) Sequence-to-sequence approach for abstractive summarisation and end-to-end fusion of components.

What is the significance of the research?
Describe the new perspectives opened by the results achieved, including the scientific basics of potential societal applications. Please describe the unique strengths of your proposal in comparison to your domestic and international competitors in the given field.
Speech summarization can be regarded as a subfield within automatic speech understanding. In basic research perspective, the current proposal is supposed to considerably advance the state-of-the-art both in engineering and linguistic aspects. Also, we attempt to represent both spoken and textual features in our approach, which is a known, but less frequently exploited facility. We believe, that the main issue blocking the full exploitation of speech prosody in speech summarization resides in relying on distinct markers instead of looking for a coherent structure on utterance level. Hypothesis 1, based on encouraging preliminary results, suggests this internationally unique contribution. Hypotheses 3 and 4 also formulate new scientific ambitions when applying this approach in automatic punctuation (necessary for correct textual analysis) and to spontaneous speech. Hypothesis 2 proposes a neural network based implementation for hypothesis 1, as deep learning has shown superior performance in many machine learning tasks. Hypotheses 5-8 address research on word embeddings. A unique novel contribution here is a resource-efficient and fast approach for agglutinative languages like Hungarian, and also the extension of the experiments for spontaneous speech. Hypothesis 9 is a completely new idea in combining word and acoustic embeddings. Hypothesis 10 fits into the international trends, but ports all our planned contributions to the framework expected to undergo a rapid development due to the improved deep learning technics in the upcoming years.
Regarding benefits for the society, speech summarization is of high importance as available spoken data quantity increases drastically. It can be used in summarizing meetings, spoken documents, but also for voice mining etc. These features are penetrating our everyday life and can soon be as widespread as searching for text patterns over the Internet. Often summarisation is the only way of processing spoken documents in reasonable time. Applications even go further whereby metadata extraction is used: recommendation systems, personal assistants, question answering, etc. can all benefit from the achievements of the current proposal.

Summary and aims of the research for the public
Describe here the major aims of the research for an audience with average background information. This summary is especially important for NRDI Office in order to inform decision-makers, media, and others.
Accessing and processing spoken data (audio recordings) has become part of our everyday life. Artificial intelligence already brings us technologies close even via smart phones -- starting from personal assistants to whom we can communicate in any spoken language, over voice based queries and summarization of voice documents, to recommendation systems which help us explore musics of our taste, for example. The proposed research has the ambition to further develop these technologies and the capabilities they can provide, by applying modern deep learning methods and by focusing also on the Hungarian language, which, due to its agglutinating nature (it prefers suffixes instead of prepositions), needs special treatment in some cases. Preparing an excerpt or a summary of a spoken document can be as useful as reading just the abstracts of text documents,
both in personal, but also in business life, where automatic preparation and summarisation of the minutes of a meeting can become available as well. It is often faster and more convenient to communicate in speech, but we are often forced to type what was spoken to be able to store, process or later recall or search the data. With the proposed technologies, these activities will become more natural, efficient and complete.

Zárójelentés

kutatási eredmények (magyarul)

Kutatásunkban tartalmi összefoglalás és kivonatolás kontextusában szerteágazóan vizsgáltuk az emberi nyelv szöveges és beszélt változatainak feldolgozására irányuló algoritmusok, implementációk képességeit, illetve fejlesztettük tovább ezeket. A hagyományosan külön kezelt szöveges és beszélt nyelvi interfészek integrációja irányába is jelentős lépéseket tettünk. Extraktív és absztraktív tartalmi összefoglalással is foglalkoztunk, elsősorban magyar nyelvre, a zömében az angol nyelvre fókuszáló eljárások adaptációjával, majd jelentős továbbfejlesztésével. A beszélt nyelvi oldalon a prozódia felhasználása tekintetében nemzetközi viszonylatban is egyedülálló eredményeket kaptunk. Fontosnak tartottuk a technológiai szűk keresztmetszetek analízisét és felszámolását, az eredmények objektív és szubjektív tesztekkel történő alapos alátámasztását. A beszélt nyelvi dokumentumok tartalmi összefoglalásában elért előrelépés mellett a gépi beszédfelismerésben közvetlenül hasznosítható központozót és készítettünk, amely az ipari hasznosíthatóság szintjét is eléri. Eredményeink a hang alapú diagnosztikában és analízisben, illetve elsősorban ezek kutatásában is nagyon hasznosnak bizonyultak.

kutatási eredmények (angolul)

In our research, in the context of spopken document summarization and retrieval, we extensively investigated the capabilities of algorithms and implementations for processing textual and spoken versions of human language, and developed them further. We have also taken significant steps towards the integration of text and spoken language interfaces, which are traditionally handled separately. We also dealt with extractive and abstractive summarization, adapting and further developing the procedures focusing initially on English, to obtain optimal performance when used for Hungarian. In terms of the use of prosody on the spoken language side, we obtained unique results on the international scene. We considered it important to analyze and eliminate technological bottlenecks, and to thoroughly validate the results with objective and subjective tests. In addition to the progress made in summarizing the content of spoken language documents, we also created a punctuation recovery algorithm that can be used directly in automatic speech recognition, reaching the level of industrial applicability. Our results have proven to be very useful in audio-based diagnostics and analysis, and especially in their research.

a zárójelentés teljes szövege

https://www.otka-palyazat.hu/download.php?type=zarobeszamolo&projektid=124413

döntés eredménye

igen

Közleményjegyzék

Máté Ákos Tündik, Balázs Tarján, György Szaszák: A low latency sequential model and its user -focused evaluation for automatic punctuation of ASR closed captions, Computer Speech & Language 63 (2020): 101076., 2020

Balázs Tarján, György Szaszák, Tibor Fegyó, Péter Mihajlik: On the Effectiveness of Neural Text Generation Based Data Augmentation for Recognition of Morphologically Rich Speech, TSD 2020, 2020

Balázs Tarján, György Szaszák, Tibr Fegyó, Péter Mihajlik: N-gram Approximation of LSTM RecurrentLanguage Models for Single-pass Recognition ofHungarian Call Center Conversations, CoginfoCom 2019, 2019

Gábor Gosztolya: Using the Fisher Vector Representation for Audio-based Emotion Recognition, Acta Polytechnica Hungarica, Vol. 17, No. 6, pp. 7-23, 2020

Mercedes Vertráb, Gábor Gosztolya: Investigating the Corpus Independence of the Bag-of-Audio-Words Approach, International Conference on Text, Speech, and Dialogue. Springer, Cham, 2020. p. 285-293., 2020

Gábor Gosztolya: Very Short-term Conflict Intensity Estimation Using Fisher Vectors, Interspeech 2020, 2020

Tündik Máté Ákos, Szaszák György: ASR-hibaterjedés vizsgálata a gépi beszédértés szemszögéből, MSZNY, 2020

Vertráb Mercédesz, Gosztolya Gábor: Az akusztikus szózsák eljárás korpuszfüggetlenségének vizsgálata, MSZNY 2020, 2020

Pintér Ádám, Tóth László, Gosztolya Gábor: Mély neuronhálós akusztikus modellek súlyinicializálásának vizsgálata, MSZNY 2020, 2020

Gosztolya, G., Grósz, T., Tóth, L.: Social Signal Detection by Probabilistic Sampling DNN Training, IEEE Transactions on Affective Computing, Vol. 10, No. 1, pp. 164-178, 2020., 2020

Gosztolya, G., Balogh, R., Imre, N., Egas-López, J.V., Hoffmann, I., Vincze, V., Tóth, L., Devanand, D.P., Pákáski, M., Kálmán, J.: Cross-Lingual Detection of Mild Cognitive Impairment Based On Temporal Parameters of Spontaneous Speech, Computer, Speech & Language, Vol. 69, article no. 101215, 2021, 2021

Gosztolya, G., Busa-Fekete, R.: Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy, IEEE/ACM Transactions on Audio Speech and Language Processing, Vol. 29, pp. 477-488, 2021, 2021

José Vicente Egas-López, Gábor Gosztolya: Using the Fisher Vector Approach for Cold Identification, Acta Cybernetica, vol.25, no. 2., pp. 223-232, 2021, 2021

Egas-López, J.V., Vetráb, M., Tóth, L., Gosztolya, G.: Identifying Conflict Escalation and Primates by Using Ensemble X-Vectors and Fisher Vector Features, Proceedings of Interspeech, pp. 476-480, Brno, Czech Republic, 2021., 2021

Egas-López, J.V., Balogh, R., Imre, N., Tóth, L., Vincze, V., Pákáski, M., Kálmán, J., Hoffmann, I., Gosztolya, G.: Enyhe kognitív zavar detektálása beszédhangból x-vektor reprezentáció használatával, Proceedings of the 2021 Hungarian Computational Linguistics Conference (MSZNY), pp. 147-156, Szeged, Hungary, 2021., 2021

Egas-López, J.V., Balogh, R., Imre, N., Hoffmann, I., Szabó, M.K., Tóth, L., Pákáski, M., Kálmán, J., Gosztolya, G.: Automatic Screening of Mild Cognitive Impairment and and Alzheimer's Disease by Means of Posterior-Thresholding Hesitation Representation, Computer, Speech & Language, Vol. 75, article no. 101377, 2022

Gosztolya, G.: Optimizing Class Priors to Improve the Detection of Social Signals, Audio Data, Engineering Applications of Artificial Intelligence, Vol. 107, article no. 104541, 2022

Gosztolya, G.: Estimating the Degree of Conflict in Speech by Employing Bag-of-Audio-Words and Fisher Vectors, Expert Systems with Applications, Vol. 205, article no. 117613, 2022

Imre, N., Balogh, R., Gosztolya, G., Tóth, L., Hoffmann, I., Várkonyi, T., Lengyel, Cs., Pákáski, M., Kálmán, J.: Temporal Speech Parameters Indicate Early Cognitive Decline in Elderly Patients With Type 2 Diabetes Mellitus, Alzheimer Disease & Associated Disorders, Vol. 36, No. 2, pp. 148-155, 2022, 2022

Vincze, N., Szabó, M.K., Hoffmann, I., Tóth, L., Pákáski, M., Kálmán, J., Gosztolya, G.: Linguistic Parameters of Spontaneous Speech for Identifying Mild Cognitive Impairment and Alzheimer Disease, Computational Linguistics, Vol. 48, No. 1, pp. 119-153, 2022, 2022

Vetráb, M., Gosztolya, G.: Using the Bag-of-Audio-Words Approach for Emotion Recognition, Acta Universitatis Sapientiae Informatica, Vol. 14, No. 1, pp. 1-21, 2022, 2022

Egas-López, J.V., Kiss, G., Sztahó, D., Gosztolya, G.: Automatic Assessment of the Degree of Clinical Depression from Speech Using X-Vectors, Proceedings of ICASSP, pp. 8502-8506, Singapore, 2022, 2022

Vetráb, M., Egas-López, J.V., Balogh, R., Imre, N., Hoffmann, I., Tóth, L., Pákáski, M., Kálmán, J., Gosztolya, G.: Using Spectral Sequence-to-Sequence Autoencoders to Assess Mild Cognitive Impairment, Proceedings of ICASSP, pp. 6467-6471, Singapore, 2022., 2022

Egas-López, J.V., Gosztolya, G.: Identification of Subjects Wearing a Surgical Mask from Their Speech by Means of X-vectors and Fisher Vectors, Proceedings of MDAI, pp. 108-118, Barcelona, Catalonia, Spain, 2022., 2022

Márton Makrai, Ákos Máté Tündik, Balázs Indig, György Szaszák: Towards abstractive summarization in Hungarian, Berend Gábor. XVIII. Magyar Számítógépes Nyelvészeti Konferencia : MSZNY 2022. (2022) ISBN:9789633068489 pp. 505-519, 2022

György, Szaszák ; Máté, Ákos Tündik ; Branislav, Gerazov: Prosodic stress detection for fixed stress languages using formal atom decomposition and a statistical hidden Markov hybrid, SPEECH COMMUNICATION 102 pp. 14-26., 2018

Máté, Ákos Tündik ; György, Szaszák: Joint Word- and Character-level Embedding CNN-RNN Models for Punctuation Restoration, In: Sallai, Gyula (szerk.) 9th IEEE International Conference on CogInfoCom, pp. 135-140., 2018

Valér, Kaszás ; Máté, Ákos Tündik ; György, Szaszák: A semantic space approach for automatic summarization of documents, In: Sallai, Gyula (szerk.) 9th IEEE International Conference on Cognitive Infocommunications, pp. 153-158., 2018

Máté, Ákos Tündik ; György, Szaszák ; Gábor, Gosztolya ; András, Beke: User-centric Evaluation of Automatic Punctuation in ASR Closed Captioning, Proc. Interspeech, pp. 2628-2632., 2018

Tündik, Máté Ákos ; Tarján, Balázs ; Szaszák, György: Televíziós feliratok írásjeleinek visszaállítása rekurrens neurális hálózatokkal, In: Vincze, Veronika (szerk.) XIV. Magyar Számítógépes Nyelvészeti Konferencia, pp. 183-195., 2018

Döbrössy Bálint, Makrai Márton, Tarján Balázs, Szaszák György: Investigating Sub-Word Embedding Strategies for the Morphologically Rich and Free Phrase-Order Hungarian, In: Isabelle, Augenstein; Spandana, Gella; Sebastian, Ruder; Katharina, Kann; Burcu, Can; Johannes, Welbl; Alexis, Conneau; Xiang, Ren; Marek, Rei (szerk.) Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Association for Computational Linguistics (2019) pp. 187-193., 2019

Szaszák György, Tündik Máté Ákos: Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach, In: Gernot, Kubin; Zdravko, Kačič The 20th Annual Conference of the International Speech Communication Association COMMUNICATION ASSOC (2019) pp. 2988-2992., 2019

Tündik Máté Ákos, Kaszás Valér, Szaszák György: On the Effects of Automatic Transcription and Segmentation Errors in Hungarian Spoken Language Processing, PERIODICA POLYTECHNICA-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE x: pp. 1-9., 2019

Tündik Máté Ákos, Kaszás Valér, Szaszák György: Assessing the Semantic Space Bias Caused by ASR Error Propagation and its Effect on Spoken Document Summarization, In: Gernot, Kubin; Zdravko, Kačič (szerk.) The 20th Annual Conference of the International Speech Communication Association (2019) pp. 1333-1337., 2019

Tündik Máté Ákos, Szaszák György: Kombinált központozási megoldások magyar nyelvre pehelysúlyú neurális hálózatokkal, In: Berend, G; Gosztolya, G; Vincze, V (szerk.) XV. Magyar Számítógépes Nyelvészeti Konferencia, Szegedi Tudományegyetem, Informatikai Intézet (2019) pp. 275-286., 2019

Szaszák György: An Audio-based Sequential Punctuation Model for ASR and its Effect on Human Readability, Acta Polytechnica Hungarica, 2019

Projekt eseményei

2022-03-03 11:45:54

Résztvevők változása

vissza »