LIAM2* A sneak preview... *Life-cycle Income Analysis Model Gijs Dekkers Federal Planning Bureau and Katholieke Universiteit Leuven Paper presented at the « Tresury Brown Bag Lunch Meeting », Ministero dell'Economia e delle Finanze, Rome, February 14 th, 2011
LIAM 2: the foundations LIAM by Cathal O'Donoghue Used in AIM-project, developing MIDAS for Belgium, Italy and Germany. Updating, extending and considerable problem solving by Geert Bryon (FPB) PROGRESS-MiDaL project (Grant VS/2009/0569) FPP (Be): development, application and testing Gaëtan de Menten: development Geert Bryon: application and testing Gijs Dekkers: management, data and a bit application CEPS/INSTEAD (Lux): testing IGSS (Lux): investment, testing Cathal O'Donoghue (Teacasc, Ire), Howard Redway (minstry of work and pensions, uk): comments and conceptual assistance
Overview of this sneak preview Current features Current performance Demonstration TODO(?)
Current features Input Simulation: a text file Alignment: CSV files Initial field data: an hdf5 file Output hdf5 file Converters Old data format (Tab-separated text files) hdf5
constants per_period: PARAMETER-NAME: float entities entity-name1 (e.g. Household): fields links processes entity-name2 (e.g. Person): fields links processes simulation: init: processes: entity: [list of processes, separated by commas] input: path: "path name" file: "file name.h5" output: path: "path name" file: "file name.h5" start_period: periods: The setup of a mode l
Current features Language Python High level, concise, readable, easy interface with C Lots of 3 rd party libraries (especially scientific tools) But uses some efficient (open source) libraries written mostly in C Numpy Numexpr PyTables
Current features Can declare "fields" with a type float, int, bool Evaluate simple expressions Arithmetic operators: +, -, *, /, **, % 0.51 * age * age ** 2 – * age ** 3 Comparison operators: =, > age < 20 Boolean operators: and, or, not not male and (age >= 15) and (age = 15) and (age <= 50) Conditional expressions: if(condition, iftrue, iffalse) if(age < 65, earnings, pension)
Current features Store fields for each period (if the field is declared) age: "age + 1" as temporaries (the value is lost after each period) ischild: "age < 18" Macros (re-evaluated wherever they appears) ISCHILD: "age < 18" difference with temporaries: ischild: "age < 18" before1: "if(ischild, 1, 2)" before2: "if(ISCHILD, 1, 2)" # before1 == before2 age: "age + 1" after1: "if(ischild, 1, 2)" after2: "if(ISCHILD, 1, 2)" # after1 != after2 !! # after1 == before 1
Current features Functions Per individual abs, log, exp clip 0.25 * clip(age ** 3, 0, ) round round(age / 10.0, 2) min/max min(age, 99) max(pension, benefit)
Current features Functions Aggregate functions grpcount, grpsum, grpavg, grpstd, grpmax, grpmin abs(age - grpavg(age)) Normal: random numbers with a normal distribution normal(loc=0.0, scale=grpstd(errsal)) Some functions accept a filter argument abs(age - grpavg(age, filter=male), filter=not male) filter=not male)
Current features lag/value_for_period Only simple expressions and explicitly saved aggregates for now value_for_period(inwork and not male, 2002) lag(sum_twr) matching: match two sets of individuals (aka Marriage market) matches individuals from set 1 with individuals from set 2 follow a particular order (given by an expression) for each individual in set 1, computes the score of all (unmatched) individuals in set 2 and take the best scoring one matching(set1filter=to_marry and not male, set2filter=to_marry and male, set2filter=to_marry and male, orderby=difficult_match) orderby=difficult_match)
Current features Many-to-one links partner.age grpavg(partner.age – age) partner.father.age partner.get(earnings + benefits)
Current features One-to-many links countlink(link, filter) countlink(persons) countlink(children, age < 18) sumlink(link, expr, filter) sumlink(persons, earnings, age >= 18) avglink(link, expr, filter) avglink(children, age, not male) minlink/maxlink(link, expr, filter) minlink(children, age, not male)
Current features Regressions Logit logit_regr(expr, filter, align) Continuous (expr + normal(0, 1) * mult + error) cont_regr(expr, filter, align, mult, error_var) Clipped continuous (always positive) clip_regr(expr, filter, align, mult, error_var) Log continuous (exponential of continuous) log_regr(expr, filter, align, mult, error_var) Alignment Fixed percentage or 2 dimensional table in a csv file
Current features Lifecycle functions new: create new individuals new('person', filter=to_give_birth) remove: remove individuals from the dataset remove(dead) remove(nb_persons == 0) Miscellaneous functions show: print anything to the console show(grpcount(age >= 18)) show(grpcount(not dead), grpavg(age, filter=not dead))
Current features (9/10) Miscellaneous functions dump: produce a table with the expressions given as argument show(dump(age, age / 10, filter=id < 20)) groupby (aka "pivot table"): group individuals by their value for the given expressions, and optionally compute an expression for each group show(groupby((age / 10, gender))) show(groupby((agegroup, gender, inwork), grpcount())) show(groupby(agegroup, grpavg(income))) show(groupby((inwork, gender), id, filter=age < 10) csv: write a table to a csv file csv(dump(age, age / 10, gender), suffix='age') Show: interactive assessment of results: command line
Current Performance For a simple model: birth (using alignment data from MIDAS) chronic ill (using a fixed percentage alignment) marriage market earnings (using macro alignment) Or at least what I think macro alignment is... death (using alignment data from MIDAS)
Current Performance 10,000 persons, 20 periods 2,65s (on a Dell latitude laptop computer) 100,000 persons, 20 periods 29s 1,000,000 persons, 20 periods 16 minutes 31s, of which approx. 83% is spent in the marriage market ~180Mb RAM 897Mb output file could be compressed if needed For a complete model with 100,000 persons probably under 10min
TODO Automated tests (aka "unit tests") Documentation User manual Code Speed optimizations Clean-up the code
