Structural effects in cross-classified data

Help

Back »

Details of project

Identifier

106154

Type

Principal investigator

Rudas, Tamás

Title in Hungarian

Strukturális hatások keresztosztályozott adatokban

Title in English

Structural effects in cross-classified data

Keywords in Hungarian

kontingencia tábla, okság, paraméterezés, variációs függetlenség, kezelési besorolás, grafikus modell

Keywords in English

contingency table, causality, parameterization, variation independence, treatment allocation, graphical model

Discipline

Sociology (Council of Humanities and Social Sciences)	50 %
Ortelius classification: Sociology
Economics (Council of Humanities and Social Sciences)	50 %
Ortelius classification: Statistics

Panel

Society

Department or equivalent

Institute of Empirical Studies (Eötvös Loránd University)

Participants

Klimova, Anna
Németh, Renáta

Starting date

2013-01-01

Closing date

2016-12-31

Funding (in million HUF)

8.236

FTE (full time equivalent)

1.60

state

closed project

Summary in Hungarian

A kutatás összefoglalója, célkitűzései szakemberek számára
Itt írja le a kutatás fő célkitűzéseit a témában jártas szakember számára.
A kutatás célja statisztikai modellek és becslési eljárások fejlesztése, hatások, különösen oksági hatások kvantifikálására kategoriális adatokban. A modellek definíciója és a hatások becslése az együttes eloszlások megfelelő paraméterezésén alapul. A kategoriális adatok grafikus, marginális és relációs elemzésére vonatkozó új eredményekre támaszkodva variációsan független paraméterek alkalmazásával fogjuk a kezelési besorolás és a kezelés hatását szétválasztani. Nyilvánvaló, hogy ez a szeparálás lehetséges véletlenítést alkalmazó tervezett kísérletek esetében és nem lehetséges megfigyeléses vizsgálatoknál. Ismeretes, hogy ez a szétválasztás szintén lehetséges olyan adatgyűjtési eljárásoknál, amelyeknél a kísérleti besorolás erősen elhanyagolható. Ezekben az esetekben a propensity score alapú illesztést széles körben használják. Ezekre az adatgyűjtési eljárásokra olyan paraméterezést fogunk kifejleszteni, amelyben a kísérleti személyek jellemzőinek nincs sem közvetlen sem közvetett hatása a megfigyelt válaszra, csak a kezelés közvetlen hatása létezik. Azzal a kérdéssel is foglalkozni fogunk, hogy vajon léteznek-e más adatgyűjtési eljárások, amelyek rendelkeznek ezzel a szétválaszthatósági tulajdonsággal. Ezek meghatározására számos tudományos és közpolitikai célú adatgyűjtést fogunk megvizsgálni.

Mi a kutatás alapkérdése?
Ebben a részben írja le röviden, hogy mi a kutatás segítségével megválaszolni kívánt probléma, mi a kutatás kiinduló hipotézise, milyen kérdéseket válaszolnak meg a kísérletek.
Ez a kutatás két kérdést kíván megválaszolni. Először azt, hogyan helyezhetőek el a különböző adatgyűjtési eljárások tervezett kísérletek és a megfigyeléses vizsgálatok között. Azok az adatgyűjtési eljárások, amelyek esetében a kezelésekbe való besorolás az un. erősen elhanyagolható tulajdonsággal rendelkezik, megfelelőek oksági következtetések levonására. Azt kívánjuk eldönteni, hogy vajon minden olyan adatgyűjtési eljárás, amely oksági következtetések levonására alkalmas, rendelkezik ezzel a tulajdonsággal vagy vannak más kísérleti elrendezések, amelyek szintén a tervezett kísérlet és a megfigyeléses vizsgálat között helyezkednek el.
Másodszor azt, hogyan lehet egyéni jellemzőket, kezeléseket és ezekre adott válaszokat keresztosztályozó eloszlásokat úgy paraméterezni, hogy a kezelések hatása, a személyek és besorolások különbözőségének hatásától megtisztítva, azonosítható legyen bizonyos paraméterekkel. Ilyen paraméterzések, megfelelő eljárással gyűjtött adatok esetén, alkalmasak arra, hogy kezelések oksági vagy az azoknak betudható hatását kiolvassuk az adatokból.

Mi a kutatás jelentősége?
Röviden írja le, milyen új perspektívát nyitnak az alapkutatásban az elért eredmények, milyen társadalmi hasznosíthatóságnak teremtik meg a tudományos alapját. Mutassa be, hogy a megpályázott kutatási területen lévő hazai és a nemzetközi versenytársaihoz képest melyek az egyediségei és erősségei a pályázatának!
Az okság és az oksági hatások kvantifikálásának kérdései központi fontosságúak a tudományban és a közpolitikában. Az ebben a kutatásban kifejlesztésre kerülő módszerek különösen alkalmasak a szociológiában és a közpolitikai döntésekben való alkalmazásra, mivel ezeken a területeken a kezelések és az ezekre adott válaszok általában kategoriálisak. Az adatok kategoriális jellege miatt az oksággal kapcsolatos fogalmakat másként kell operacionalizálni, mint azokban az esetekben, amikor a hatások kvantitatívak. A projekt eredményei a kutatók számára explicit szabályokat fognak adni, amelyeket be kell tartaniuk, ha adataikból oksági következtetéseket kívánnak levonni. Azt reméljük, hogy az adatgyűjtési eljárások struktúrájának finomabb leírását fogjuk megállapítani, mint a tervezett kísérlet, erősen elhanyagolható allokációjú vizsgálat és megfigyeléses vizsgálat három kategóriája. Ezek az eredmények közvetlenül hasznosíthatók lesznek a társadalomstatisztikában, beleértve a hivatalos statisztikát is, és közpolitikai döntések meghozatalában olyan kísérletek tervezésére és kiértékelésére, amelyek megfelelő bizonyítékkal szolgálnak a leghatékonyabb társadalmi intervenciók kiválasztásához.

A kutatás összefoglalója, célkitűzései laikusok számára
Ebben a fejezetben írja le a kutatás fő célkitűzéseit alapműveltséggel rendelkező laikusok számára. Ez az összefoglaló a döntéshozók, a média, illetve az érdeklődők tájékoztatása szempontjából különösen fontos az NKFI Hivatal számára.
A kutatás célja olyan adatelemzési módszerek fejlesztése, amelyek különböző kezelések hatásainak tesztelésére, illetve a hatások nagyságának becslésére alkalmasak. Ezek a kérdések központi jelentőségűek a társadalomtudományokban és közpolitikai döntések bizonyítékokra alapozott meghozatalában. Egy társadalmi intervenció, például munkanélküliek átképzésére szolgáló program eredményességét értékelni kell, mielőtt az intervenció hatékonyságéról döntenek. Hasonló technikák alkalmazhatóak egy reklámkampány eredményességének értékelésében, olyan tudományos kérdések megértésében, mint például az, hogy különböző társadalmakban az apa iskolai végzettsége milyen hatással van a fia jövedelmére. Az alapvető probléma az, hogy a különböző kezeléseket (különböző átképző programokat, különböző reklámokat, különböző iskolai végzettségű apákat) más és más személyek kapják illetve választják és a megfigyelt reakciók (munkaviszony az átképzés után 6 hónappal, a reklámozott termék megvásásárlása a reklámkampány után, a fiú jövedelme) eltérése a kezelést kapók és nem kapók között részben a kezelés, de részben a kezeltek és nem kezeltek eltérésének következménye is lehet. Például, azok, akik egy bizonyos áru megvásárlását tervezik, feltehetőleg jobban odafigyelnek az árut hirdető reklámokra, ezért ha a terméket ténylegesen megvásárlók nagyobb arányban emlékeznek vissza a hirdetése, mint azok, akik nem vásárolták meg a terméket, nem jelenti azt, hogy azért vásároltak, mert látták a hirdetést. A kutatás olyan eljárásokat fejleszt, amelyek lehetővé teszik a kétféle hatás elválasztását, ezek kvantifikálását és jellemzik azokat az adatgyűjtési eljárásokat, amelyek mellett ez a szeparáció lehetséges.

Summary

Summary of the research and its aims for experts
Describe the major aims of the research for experts.
This research aims at developing statistical models and estimation techniques to quantify effects, in particular causal effects, in categorical data. The definition of such models and the quantification of the effects rely on appropriately defined parameterizations of the joint distribution. Recent advances in graphical, marginal and relational modeling of categorical data will be used to separate the effects of treatment allocation and that of treatment, using parameterizations that consist of variation independent parameters. Obviously, such a separation is possible in the case of designed experiments with random allocation and is not possible in observational studies. Data collection designs that make such a separation possible include designs with strongly ignorable treatment assignment, where propensity score matching is widely used. For such designs, a parameterization of the joint distribution, without direct or indirect effect of the characteristics of the individuals on response, and with a direct effect of treatment on response, will be developed. It is also a question to be addressed, whether further designs with such separability exists. To identify such designs, several scientific and policy oriented data collection exercises will be investigated.

What is the major research question?
Describe here briefly the problem to be solved by the research, the starting hypothesis, and the questions addressed by the experiments.
The research aims to answer two questions. First, how data collection designs can be positioned between designed experiments and observational studies? Designs with strongly ignorable treatment assignment are in between these and are appropriate to make causal inferences. We seek to decide, whether or not all designs that are appropriate for causal inference are characterized by strongly ignorable treatment assignment, or there are other such designs between designed experiments and observational studies. Second,how such parameterizations of joint distributions of cross-classifications containing individual characteristics, treatments and responses can be constructed, so that the effect of treatments, net of the effects of individuals or of the treatment assignments, can be identified with some of the parameters. Such parameterizations, in case of appropriate data collection designs, may be used to read off the causal or attributable effects of treatments.

What is the significance of the research?
Describe the new perspectives opened by the results achieved, including the scientific basics of potential societal applications. Please describe the unique strengths of your proposal in comparison to your domestic and international competitors in the given field.
The questions of causality and the quantification of causal effects are of central importance in science and in policy making. The methods to be developed in this research will be specially suited to applications in sociology and policy making, where the choice between treatments and the possible outcomes or responses are mostly categorical. This categorical nature of the data requires the concepts of causality to be operationalized differently from cases when the effects are quantitative. The results of this project will provide researchers with explicit rules to observe in their data collection designs if causal conclusions are aimed at. It is hoped, that a finer description of the structure of data collection procedures will be achieved than the three types of designed experiment, designs with strongly ignorable treatment assignment and observational study. In social statistics, including official statistics and in policy making, the results will be directly applicable to design and evaluate experiments that can provide the necessary evidence to select the most efficient methods of social interventions.

Summary and aims of the research for the public
Describe here the major aims of the research for an audience with average background information. This summary is especially important for NRDI Office in order to inform decision-makers, media, and others.
This research develops methods for data analysis that are appropriate for testing the existence of effects of various treatments and to estimating the magnitude of such effects. These questions are of central importance in the social sciences and also in evidence based policy making. The effects of a social intervention, for example providing retraining to unemployed people, should be carefully evaluated before a decision can be made about the efficacy of that intervention. Similar techniques are relevant in assessing the results of commercial campaigns or in understanding scientific questions, like how, in different societies, a father’s educational level may affect his son’s income. The key problem here is that different treatments (e.g., different retraining programs, or commercials in tv, or different fathers) are given to or received by different individuals and the difference in the observed response (e.g., employment 6 months after retraining, or purchasing of goods after the campaign, or income of the son) is partly due to the different treatments but also to the differences among the individuals who received them. For example, those who are interested in buying a product, may pay more attention to commercials promoting it, so the fact that those who buy the product can recall having seen the commercial in larger fractions than those who do not buy it, does not imply that consumers buy the product because of having seen the commercial. This research will develop methods that can separate these two kinds of effects, can quantify the net effect of treatment on response, and will characterize data collection designs, based on which such a separation is possible.

Final report

Results in Hungarian

A kutatás fő célja strukturális hatások modellezésére szolgáló eljárások fejlesztése volt, keresztosztályozott adatokra. Az okságilag is értelmezhető hatások mérése és modellezése terén elsősorban olyan eljárásokkal foglalkoztunk, amelyek a Simpson paradoxon elkerülését teszik lehetővé. Kimutattuk, hogy gyenge feltételek mellet csak egyetlen, lineáris kontrasztokon alapuló hatás definíció létezik, amelyik sohasem ad paradox következtetéseket. Azonban ez az operacionalizálási eljárás sem tekinthető minden esetben célszerűnek, és a paradoxon létét betudhatjuk annak, hogy nagyon sok különböző helyzetben próbáljuk a hatások erősségének mérésére ugyanazt az eljárást használni. A hatás helyes méréséhez az adatgyűjtés módjának, a kedvező és kedvezőtlen kimenetelek pontos jelentésének, és a meghozandó szakpolitikai döntés jellegének figyelembe vétele is szükséges. Az asszociációs struktúra modellezése terén a loglineáris modelleket általánosítottuk a hiányzó adatok különböző típusainak és a koordináta független hatásoknak a figyelembe vételével. Szoftvert fejlesztettünk az eljárások alkalmazására. A kutatás eredményeit eddig 5 impakt faktoros cikkben publikáltuk, amelyek közül 4 Q1, 1 pedig Q2 besorolású.

Results in English

The main aim of this research was the development of methods to model structural effects for cross classified data. In the area of measuring and modeling effects which may also be given a causal interpretation, we concentrated on methods of avoiding Simpson’s paradox.We showed that, under mild conditions, there is only one measure of effect, based on linear contrast, which never commits the paradox. However, even this operatrionalization fails to be universally applicable, and the existenc eof the paradox may be attributed to the fact, that the samemeasur eis being used in several, substantially different setups. To properly measur effect, the mode of data collection, the precise meanings of the positive and negative outcomes, and the type of policy decision to be made also need to be taken into account. Int he area of modeling association structures, we developed a generalization of th log-linear model, which can take various types of missing data and also coordinate-free effects into account. We developed software to implement these methods. The results were published so ar in 5 papers in journals with impact factor, out oif which 4 has Q1, and 1 has Q2 classificaton.

Full text

https://www.otka-palyazat.hu/download.php?type=zarobeszamolo&projektid=106154

Decision

Yes

List of publications

Tamás Rudas: Independence models for non rectangular data, Cost Action Workshop, April 16-17, Paris (2016), 2016

Klimova, A., Rudas, T.: Extended relational models, 7th CFE 2013 and 6th ERCIM p125. Senate House, University of London, UK, Dec. 14-16, 2013., 2013

Tamás Rudas: Estimation and testing in relational models, IX. International Conference of the ERCIM WG on Computational and Methodological Statistic, December 9-11, Seville, 2016

Tamás Rudas: On variants of the iterative scalling algorithm, XXII. International Conference on Computational Statistics, August 23-26, 2016 Oviedo, 2016

Tamás Rudas: On generalizations of the log-linear model, October 14, University of Washington, 2016

Tamás Rudas: Model based analysis of incomplete data with non-ignorable missing data mechanism, October 10, Columbia University, 2016

Tamás Rudas: Model based analysis of incomplete data with non-ignorable missing data mechanism, October 7,New York University, 2016

Rudas, T., Klimova, A.: Log-linear models on non-product spaces, In 45e Journées de Statistique de la SFdS (JDS 2013) p40, Toulouse, France, May 27-31, 2013, 2013

Rudas, T.: How to avoid Simpson’s paradox in treatment selection based on observational data, In Causal inference in health and social sciences (UK-CIM 2013) p17, Manchester, UK, May 14-15., 2013

Rudas, T., Klimova, A.: Log-linear models on non-product spaces, In Márkus, L. & Prokaj, V. (Eds), Abstracts of the 29-th European Meeting of Statisticians (EMS 2013) p258, Budapest, Hungary, July 20-25, 2013, 2013

Németh, R., Rudas, T.: On sociological applications of discrete graphical models, In Márkus, L. & Prokaj, V. (Eds), Abstracts of the 29-th European Meeting of Statisticians (EMS 2013) p227. Budapest, Hungary, July 20-25, 2013, 2013

Anna Klimova: Learning Hypergraphs from Discrete Data, Prague Stochastics, August 25-29, Prague., 2014

Tamás Rudas: Model based analysis of incomplete data with non-ignorable missing data mechanism, VII Conference of the European Association of Methodology, July 27-29, 2016 Palma, 2016

Klimova, A., Rudas, T.: Extended relational models, In Programme and Abstracts of 7th CFE 2013 and 6th ERCIM p125. Senate House, University of London, UK, Dec. 14-16., 2013

Rudas, T., Klimova, A.: On the closure of relational models, Journal of Multivariate Analysis, 143: pp. 440-452., 2016

Klimova, A., Rudas T.: Testing the fit of relational models, arXiv:1612.02416, 2016

Tamás Rudas: Some current challenges for statistical methodology, Conference of European Statistics Stakeholders 2016, October 20-21, Budapest, 2016

Rudas, T., Klimova, A.: Independence on non-product spaces, In Programme and Abstracts of the 7th CFE 2013 and 6th ERCIM p126. Senate House, University of London, UK, Dec. 14-16, 2013

Tamás Rudas: Effects and interactions, VI European Congress of Methodology, July 23-25, Utrecht, 2014

Tamás Rudas: On directionally collapsible parameterizations of multivariate binary distributions, ERCIM 2014, December 6-8, Pisa, 2014

Anna Klimova, Caroline Uhler, Tamás Rudas: Faithfulness of discrete distributions to graphs and hypergraphs, ERCIM 2014, December 6-8, Pisa, 2014

Tamás Rudas: Directionally Collapsible Measures of Association, Center for Statistics and the Social Sciences, Universiy of Washington, 2014

Anna Klimova, Caroline Uhler, Tamás Rudas: Parametric faithfulness of discrete data, Algebraic Statistics 2014, May 19-22, Illinois Institute of Technology, Chicago, 2014

Anna Klimova: Properties of Measures of Association for Binary Distributions, Prague Stochastics, August 25-29, Prague, 2014

Tamás Rudas: Properties of Measures of Association for Binary Distributions, Prague Stochastics, August 25-29, Prague, 2014

Klimova, A., Rudas, T.: Iterative Scaling in Curved Exponential Families, Scandinavian Journal of Statistics (Közlésre elfogadva: 2015 január), 2015

Klimova, A., Uhler, C., Rudas, T.: Faithfulness and learning of hypergraphs from discrete distributions, Computational Statistics and Data Analysis (Közlésre elfogadva: 2015, 2015

Rudas, T.: Log-linear and marginal models, In: Wright, J (ed) International Encyclopedia of Social and Behavioral Sciences 2nd ed, Elsevier, 2014

Klimova, A., Rudas, T.: On the closure of relational models, arXiv: 1408.2489 (Feltöltve: 2015 január), 2015

Rudas, T.: Directionally collapsible parameterizations of multivariate binary distributions, arXiv: 1501.00600, 2014

Klimova, A., Uhler, C., Rudas, T.: Faithfulness and learning of hypergraphs from discrete distributions, arXiv: 1404.6617, 2014

Klimova, A., Rudas, T.: Iterative Scaling in Curved Exponential Families, Scandinavian Journal of Statistics 42, 832-847., 2015

Klimova, A., Uhler, C., Rudas, T.: Faithfulness and learning of hypergraphs from discrete distributions, Computational Statistics and Data Analysis 87, 57-72., 2015

Tamás Rudas: On the measurement of effects and interactions, XIV Congreso de Metodología de las Ciencias Sociales y de la Salud, Palma, július 22-24., 2015

Tamás Rudas: Testing the Fit of Relational Models for Contingency Tables, Eighth International Workshop on Simulation, Bécs, szeptember 21-25., 2015

Tamás Rudas: Simpson's paradox and a linear concept of association, 8th International Conference of the ERCIM WG on Computational and Methodological Statistics London, december 12-14., 2015

Renáta Németh, Tamás Rudas: Confounding in Causal Analysis in Case of Binary Responses, 12th Conference of the European Sociological Association, Prága, augusztus 25-28., 2015

Rudas, T.: Effects and interactions, Methodology-European Journal of Research Methods for the Behavioral and Social Sciences. 142-149., 2015

Rudas, T.: Directionally collapsible parameterizations of multivariate binary distributions, Statistical Methodology 27, 132-145., 2015

Milibák Eszter: Okságmegközelítések a statisztikában - Elemzés és kritika., Szakdolgozat. ELTE TáTK, Survey Statisztika MSc. (Rudas Tamás konzulens), 2015

Tamás Rudas: Directionally Collapsible Measures of Association, Center for Statistics and the Social Sciences, Universiy of Washington, 2014

Klimova, A., Rudas, T.: Iterative Scaling in Curved Exponential Families, Scandinavian Journal of Statistics 42, 832-847., 2015

Klimova, A., Uhler, C., Rudas, T.: Faithfulness and learning of hypergraphs from discrete distributions, Computational Statistics and Data Analysis 87, 57-72., 2015

Rudas, T.: Log-linear and marginal models, In: Wright, J (ed) International Encyclopedia of Social and Behavioral Sciences 2nd ed, Elsevier, 2014

Back »