Developing artificial intelligence tools for solving bionformatical problems

Help

Back »

Details of project

Identifier

127909

Type

Principal investigator

Grolmusz, Vince

Title in Hungarian

Mesterséges intelligenciai eszközök fejlesztése bioinformatikai problémák megoldására

Title in English

Developing artificial intelligence tools for solving bionformatical problems

Keywords in Hungarian

bioinformatika, hálózatok, neurális hálók

Keywords in English

bioinformatics, networks, neural nets

Discipline

Information Technology (Council of Physical Sciences)	90 %
Ortelius classification: Applied informatics
Mathematics (Council of Physical Sciences)	5 %
Ortelius classification: Computational mathematics
Bioinformatics (Council of Medical and Biological Sciences)	5 %

Panel

Informatics and Electrical Engineering

Department or equivalent

Department of Computer Science (Eötvös Loránd University)

Participants

Fellner, Máté
Rec, Tamás
Takács, Kristóf
Varga, Bálint

Starting date

2018-09-01

Closing date

2022-11-30

Funding (in million HUF)

48.000

FTE (full time equivalent)

4.30

state

closed project

Summary in Hungarian

A kutatás összefoglalója, célkitűzései szakemberek számára
Itt írja le a kutatás fő célkitűzéseit a témában jártas szakember számára.
A nagyátbocsájtóképességű biológiai vizsgálatok elterjedésével, az ezredforduló körül, rendkívül optimista volt az orvosbiológiai kutatásokkal foglalkozó kutatói társadalom: sokan meg voltak győződve arról, hogy a nagyon nagy mennyiségű adat felvételével sok biológiai és orvosi kérdésre hamar megtaláljuk a választ. Gyorsan kiderült azonban, hogy erős matematikai háttér nélkül ezek a vizsgálatok (pl. génszekvenálás, metagenomikai szekvenálás, tömeg-spektrometria, DNS chipek, differenciál gélelektroforézis) csupán nagy zajt tartalmazó, nehezen tárolható és még nehezebben feldolgozható adathalmazt eredményeznek, amelyből hasznos és főleg megbízható tudást kinyerni nehéz feladat. Ezt az is bizonyítja, hogy amíg a 90-es években (a nagyátbocsájtóképességű vizsgálatok elterjedése előtt) az amerikai FDA átlagosan évi 34 új gyógyszermolekulát engedélyezett, addig az új évezred első évtizedében csak átlagosan évi 24-et (forrás: https://goo.gl/e5v43N ) A nagytömegű adat feldolgozására, reprezentálására bizonyos matematikai módszerek gyorsan elterjedtek, de ezek némelyike az illető matematikai területnek csak felszínes, túlságosan leegyszerűsített vagy esetenként tudománytalan feltevéseken alapuló felhasználását jelentette. Az ELTE PIT Bioinformatikai Kutatócsoportja a 2005-ös megalakulása óta matematikailag megalapozott, nem-triviális módszereket fejleszt nagytömegű adat matematikailag igényes feldolgozásához. Jelen projektben olyan adathalmazok feldolgozásához fejlesztenek új módszereket, amelyek gráfokkal ábrázolhatók. A módszerek magukba foglalják a mesterséges neurális hálók új, innovatív felhasználását, valamint mély gráfelméleti paraméterek vizsgálatát biológiai gráfokban.

Mi a kutatás alapkérdése?
Ebben a részben írja le röviden, hogy mi a kutatás segítségével megválaszolni kívánt probléma, mi a kutatás kiinduló hipotézise, milyen kérdéseket válaszolnak meg a kísérletek.
A gráfelmélet megszületését Leonhard Euler königsbergi hidakról szóló problémájáról szóló cikknek az 1741-ben való megjelenésétől szokták számolni. A gráfelmélet a XX. század második felében bámulatosan gyors fejlődésnek indult. Sok fontos gráftulajdonságot leíró paramétert nem a triviális kiszámíthatósága, hanem mélysége, érdekessége hívott életre. A biológiai gráfok elemzésénél a biológusok ritkán használnak ilyen mély gráfparamétereket, általában ad hoc definiált, könnyen számítható élsűrűségeket, fokszámeloszlásokat alkalmaznak. A bonyolultabb, mélyebb, akár NP-nehéz gráfparaméterek is azonban sokszor – bár nem mindig könnyen - kiszámíthatóak a biológiában tekintett kisebb, néhány száz csúcsú gráfokra, és ezzel – ahogy már sok esetben megmutattuk – olyan biológiai tulajdonságokat ragadhatunk meg, amelyekre a felszínesebb elemzések nem képesek.
A gráfparaméterek vizsgálatának (még a mélyeknek is) van egy jelentős hátránya: ezek a paraméterek gráf-izomorfizmus invariánsak, azaz csak a vizsgált gráftól, mint matematikai objektumtól függenek, és nem veszik figyelembe a csúcsok "identitását". Ez a biológiai felhasználásokban azért jelentős, mert a biológiai gráfok csúcsai egymással nem felcserélhetők, fontos, hogy melyik csúcs melyik fehérjét, gént, vagy éppen agyterületet jelöli.
Jelen projektben mások mellett neurális hálók segítségével olyan módszereket fejlesztünk, amelyek jóval mélyebbek, mint a széles körben alkalmazott fokszámeloszlásra és élszámlálásra alapozott eljárások, és több eszközzel képesek kezelni a gráf-izomorfizmus invariancia kérdését is.

Mi a kutatás jelentősége?
Röviden írja le, milyen új perspektívát nyitnak az alapkutatásban az elért eredmények, milyen társadalmi hasznosíthatóságnak teremtik meg a tudományos alapját. Mutassa be, hogy a megpályázott kutatási területen lévő hazai és a nemzetközi versenytársaihoz képest melyek az egyediségei és erősségei a pályázatának!
A kutatás eredményei jól használhatók lesznek nagyméretű biológiai adathalmazok elemzésében, jobb, pontosabb, mélyebb eredményeket kaphatunk így. Ennek segítségével több matematikust vonhatunk be nagy biológiai adathalmazok kiértékelésébe, és megmutathatjuk a matematikához kevésbé értő biológusoknak is, hogy a matematika eszköztára nem csak a matematikán belül használható jól.

A kutatás összefoglalója, célkitűzései laikusok számára
Ebben a fejezetben írja le a kutatás fő célkitűzéseit alapműveltséggel rendelkező laikusok számára. Ez az összefoglaló a döntéshozók, a média, illetve az érdeklődők tájékoztatása szempontjából különösen fontos az NKFI Hivatal számára.
A nagyátbocsájtóképességű biológiai vizsgálatok elterjedésével, az ezredforduló körül, nagyon optimista volt az orvosi és biológiai kutatásokkal foglalkozó kutatói társadalom: sokan meg voltak győződve arról, hogy a nagyon nagy mennyiségű adat felvételével sok biológiai és orvosi kérdésre hamar megtalálják a választ. Gyorsan kiderült azonban, hogy erős matematikai háttér nélkül ezek a vizsgálatok (pl. génszekvenálás, metagenomikai minták szekvenálása, tömeg-spektrometria, DNS és RNS chipek használata, differenciál gélelektroforézis és más proteomikai vizsgálatok) csupán nagy zajt tartalmazó, nehezen tárolható és még nehezebben feldolgozható adathalmazt eredményeznek, amelyből hasznos és főleg megbízható tudást kinyerni igen nehéz feladat. A nagytömegű adat feldolgozására, reprezentálására bizonyos matematikai módszerek gyorsan elterjedtek, de ezek némelyike az illető matematikai területnek csak felszínes, túlságosan leegyszerűsített vagy esetenként tudománytalan feltevéseken alapuló feldolgozását nyújtotta. Az ELTE PIT Bioinformatikai Kutatócsoportja a 2005-ös megalakulása óta matematikailag megalapozott, nem-triviális módszereket fejleszt nagytömegű, főként biológiai adat igényes feldolgozásához. Jelen projektben olyan adathalmazok feldolgozásához fejlesztünk új módszereket, amelyek gráfokkal ábrázolhatók.

Summary

Summary of the research and its aims for experts
Describe the major aims of the research for experts.
With the dawn of high-throughput biological experimental techniques an enormous amount of biomedical data is produced and deposited in every single day. Scientists believed that these methods bring new drugs and therapies within one or two decades. It turned out that without deep, robust and mathematically well-founded data analysis techniques these data are nothing else but noisy, un-structured, hard-to-store and even harder-to-analyze heap of digits. Numerous efforts were made to develop novel mathematical methods for the filtering, re-structuring and analyzing these biological data. Unfortunately, there are very few success stories: in the 90's, before these techniques, the FDA approved, on the average 34 new drug molecules yearly, while in the first decade of the new millennium, on the average, only 24 molecules yearly (source: https://goo.gl/e5v43N}. Among a few robust and successful methods, most newly suggested analytical process were simplistic, trivial, or, sometimes, just based on false assumptions. Our research group is working on the development of mathematically well-founded, deep analytical methods for biological graphs for a long time. In the present project we will develop new methods for biological graph analysis, involving artificial neural nets and deep graph-theoretical properties.

What is the major research question?
Describe here briefly the problem to be solved by the research, the starting hypothesis, and the questions addressed by the experiments.
While deep graph-theoretical ideas have been proven to well-characterize several biological properties, they have an inherent weakness: graph-theoretical properties are graph-isomorphism-invariants, while the nodes of the biological graphs have clear identity of their own, therefore, graph isomorphism-invariant methods miss a lot of characteristics in these networks. Among other possible solutions, we suggest applying artificial neuronal-network (ANN) based deep learning algorithms for these graphs, and when an ANN is trained satisfyingly, we will disentangle the variational factors using adversarial training. The results will be non-trivial sub-graphs that imply biological properties of the graphs analyzed.

What is the significance of the research?
Describe the new perspectives opened by the results achieved, including the scientific basics of potential societal applications. Please describe the unique strengths of your proposal in comparison to your domestic and international competitors in the given field.
The results of the present project will be applied in large biological networks: we can get better, more robust, more accurate results by our methods.

Summary and aims of the research for the public
Describe here the major aims of the research for an audience with average background information. This summary is especially important for NRDI Office in order to inform decision-makers, media, and others.
With the dawn of high-throughput biological experimental techniques an enormous amount of biomedical data is produced and deposited in every single day. Scientists believed that these methods bring new drugs and therapies within one or two decades. It turned out that without deep, robust and mathematically well-founded data analysis techniques these data are nothing else but noisy, un-structured, hard-to-store and even harder-to-analyze heap of digits. Numerous efforts were made to develop novel mathematical methods for the filtering, re-structuring and analyzing these biological data. Unfortunately, there are very few success stories: in the 90's, before these techniques, the FDA approved, on the average 34 new drug molecules yearly, while in the first decade of the new millennium, on the average, only 24 molecules yearly (source: https://goo.gl/e5v43N}. Among a few robust and successful methods, most newly suggested analytical process were simplistic, trivial, or, sometimes, just based on false assumptions. Our research group is working on the development of mathematically well-founded, deep analytical methods for biological graphs for a long time. In the present project we will develop new methods for biological graph analysis, involving artificial neural nets and deep graph-theoretical properties.

Final report

Results in Hungarian

Rendkívül sikeres projektet zárunk: 19 referált, angol nyelvű folyóirat-publikációból 5 D1 lapban, 9 Q1 lapban jelent meg, a kumulatív impact factor 59.989 . Két webszervert hoztunk létre: A Budapest Amyloid Predictort: https://pitgroup.org/bap A PDB_Amyloid listát: https://pitgroup.org/amyloid/ Illetve, a már létező https://braingraph.org webszerverüket több százezer új, augmentált konnektommal bővítettük. Sok alapjában új módszert dolgoztunk ki, így a konnektomok írányítására, a implkátor élek megtalálására, és robusztus, hibatűrő egzakt vizsgálatokat vezettünk be.

Results in English

We have concluded a very successful project: Number of referred, international journal publications in the course of the project: 19, from these: Number of D1 journal publications: 5 Number of Q1 journal publications: 9 Cumulative impact factor of our journal publications in project K 127909: 59.989 We have introduced the Budapest Amyloid Predictort: https://pitgroup.org/bap webserver, and the PDB_Amyloid list: https://pitgroup.org/amyloid/ The already existent webserver of ours, https://braingraph.org was refreshed by several hundred thousand new augmented connectomes.

Full text

https://www.otka-palyazat.hu/download.php?type=zarobeszamolo&projektid=127909

Decision

Yes

List of publications

Szalkai B, Grolmusz V: MetaHMM: A webserver for identifying novel genes with specified functions in metagenomic samples, GENOMICS 111: (4) pp. 883-885., 2019

Szalkai B, Varga B, Grolmusz V: Mapping correlations of psychological and structural connectome properties of the dataset of the human connectome project with the maximum spanning tree method, BRAIN IMAGING AND BEHAVIOR 13: (5) pp. 1185-1192., 2019

Takács K., Varga B., Grolmusz V.: PDB_Amyloid: an extended live amyloid structure list from the PDB, FEBS OPEN BIO 9: (1) pp. 185-190., 2019

László Keresztes, Evelin Szögi, Bálint Varga, Viktor Farkas, András Perczel, Vince Grolmusz: Succinct Amyloid and Non-Amyloid Patterns in Hexapeptides, ACS Omega Vol. 7, No. 40, 35532-35537, 2022

Laszlo Keresztes, Evelin Szogi, Balint Varga, Vince Grolmusz: Identifying Super-Feminine, Super-Masculine and Sex-Defining Connections in the Human Braingraph, Cognitive Neurodynamics, Vol. 15. No. 6. pp. 949-959, 2021

Kristof Takacs, Vince Grolmusz: The multiple alignments of very short sequences, FASEB BioAdvances 2021;3:523-530, 2021

Balint Varga, Vince Grolmusz: The braingraph.org Database with more than 1000 Robust HumanStructural Connectomes in Five Resolutions, Cognitive Neurodynamics Vol. 15 No. 5, pp. 915-919,, 2021

Balázs Szalkai, Bálint Varga, Vince Grolmusz: The Graph of our Mind, Brain Sciences Vol. 11, No. 3. 342, 2021

Lászlo Keresztes, Evelin Szögi, Bálint Varga, Viktor Farkas, András Perczel, Vince Grolmusz: The Budapest Amyloid Predictor and its Applications, Biomolecules, 11(4) 500,, 2021

Kristóf Takács, Vince Grolmusz: On the Border of the Amyloidogenic Sequences: Prefix Analysis of the Parallel Beta Sheets in the PDB_Amyloid Collection, Journal of Integrative Bioinformatics, 2021

Takács K., Varga B., Grolmusz V.: PDB_Amyloid: an extended live amyloid structure list from the PDB, FEBS OPEN BIO 9: (1) pp. 185-190., 2019

Fellner Máté, Varga Bálint, Grolmusz Vince: The Frequent Network Neighborhood Mapping of the human hippocampus shows much more frequent neighbor sets in males than in females, PLOS ONE 15: (1) e0227910, 2020

Fellner Máté, Varga Bálint, Grolmusz Vince: Good neighbors, bad neighbors: the frequent network neighborhood mapping of the hippocampus enlightens several structural factors of the human intelligence, SCIENTIFIC REPORTS 10: (1) 11967, 2020

Balázs Szalkai, Vince Grolmusz: SCARF: A Biomedical Association Rule Finding Webserver,, Journal of Integrative Bioinformatics, Vol. 19, No. 1. pp. 20210035, 2022

László Keresztes, Evelin Szögi, Bálint Varga, Vince Grolmusz: Introducing and Applying Newtonian Blurring: An Augmented Dataset of 126,000 Human Connectomes at braingraph.org, Scientific Reports, 12:3102, 2022

László Keresztes, Evelin Szögi, Bálint Varga, Viktor Farkas, András Perczel, Vince Grolmusz: Succinct Amyloid and Non-Amyloid Patterns in Hexapeptides, ACS Omega, 2022

László Keresztes, Evelin Szögi, Bálint Varga, Vince Grolmusz: Discovering Sex and Age Implicator Edges in the Human Connectome, Neuroscience Letters Vol. 791, 136913, 2022

Muntasir Kamal, Levon Tokmakjian, Jessica Knox, Peter Mastrangelo, Jingxiu Ji, Hao Cai, Jakub Wojciechowski, Micael P. Hughes, Kristof Takacs, Xiaoquan Chu, Jianfeng Pei, Vince Grolmusz, Malgorzata Kotulska, Julie Deborah Forman-Kay, Peter J. Roy: A Spatiotemporal Reconstruction of the C. elegans Pharyngeal Cuticle Reveals a Structure Rich in Phase-Separating Proteins, eLife, 2022

Máté Fellner, Bálint Varga, Vince Grolmusz: The Frequent Subgraphs of the Connectome of the Human Brain, Cognitive Neurodynamics Vol. 13, No. 5, pp. 453-460 (2019) https://doi.org/10.1007 /s11571-019-09535-y https://rdcu.be/bAHoe, 2019

Balázs Szalkai, Csaba Kerepesi, Bálint Varga, Vince Grolmusz: High-Resolution Directed Human Connectomes and the Consensus Connectome Dynamics, PLOS ONE, Vol. 14 No. 4,: e0215473 (2019) https://doi.org/10.1371/journal.pone.0215473, 2019

Máté Fellner, Bálint Varga, Vince Grolmusz: The Frequent Complete Subgraphs in the Human Connectome, In: Rojas I., Joya G., Catala A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science, Vol 11507. pp. 908-920, Springer, 2019

Balint Varga, Vince Grolmusz: The braingraph.org Database with more than 1000 Robust HumanStructural Connectomes in Five Resolutions, arXiv preprint arXiv:2008.13273, 2020

Fellner Máté, Varga Bálint, Grolmusz Vince: The Frequent Complete Subgraphs in the Human Connectome, PLOS ONE 15(8): e0236883 (2020), 2020

Kristof Takacs, Vince Grolmusz: On the Border of the Amyloidogenic Sequences: Prefix Analysis of the Parallel Beta Sheets in the PDB Amyloid Collection, arXiv preprint arXiv:2003:02942, 2020

Fellner Mate, Varga Balint, Grolmusz Vince: The frequent subgraphs of the connectome of the human brain, COGNITIVE NEURODYNAMICS 13: (5) pp. 453-460., 2019

Fellner Mate, Varga Balint, Grolmusz Vince: The Frequent Complete Subgraphs in the Human Connectome, In: Catala, A; Joya, G; Rojas (szerk.) ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2019, PT II, SPRINGER INTERNATIONAL PUBLISHING AG (2019) pp. 908-920., 2019

Laszlo Keresztes, Evelin Szogi, Balint Varga, Vince Grolmusz: Identifying Super-Feminine, Super-Masculine and Sex-Defining Connections in the Human Braingraph, arXiv preprint arXiv:1912:02291, 2019

Szalkai B, Grolmusz V: MetaHMM: A webserver for identifying novel genes with specified functions in metagenomic samples, GENOMICS 111: (4) pp. 883-885., 2019

Szalkai B, Kerepesi Cs, Varga B, Grolmusz V: High-Resolution Directed Human Connectomes and the Consensus Connectome Dynamics, PLOS ONE 14: (4) e0215473, 2019

Events of the project

2022-01-04 16:52:32

Résztvevők változása

Back »