../_images/logo1.png

Analysis Scripts

This section is a technical documentation of the statistical analysis for parameter generation organized in a collection of STATA scripts. Scripts typically correspond to parameter files belonging to one specific module of microWELT. The first script calls all other scripts, thus for reproducing all model parameters, just this first script has to be executed.

0_Base_File.do

This file sets paths to data and parameter folders and defines global values for the simulation, such as the start and end year of the simulation. Notice that the usage of the eurostatuse command in STATA requires a non-relational directory path where EUROSTAT data are stored. After setting the paths to all directories and setting globals all parameter generating do-files are called, where these do-files are stored in the dofile directory.

Directories that must be defined:

  • dofile contains all parameter do-files
  • param defines where parameter files shall be stores
  • cfe contains data from the cohort fertility database
  • eurostat here EUROSTAT data for population projection parameters or population characteristics are stores. This must be a non-relational path.
  • startpop contains the starting population micro dataset
  • lfsin contains EUROSTAT’s European Labour force survey scientific use files
  • adhoc contains EUROSTAT’s ad-hoc modules of the European Labour force survey use files
  • save sets the path to where STATA saves datafiles

Global values must be set for:

  • yearmax defines the last year of the simulation period
  • startyear is the first year of the simulation
  • lfs_y defines the year for which lfs-based calculations are made. This may but must not correspond to the startyear of the simulation.
  • Geosample names the countries included in the analysis

Other globals define birth cohort limits used in different parameter files.



clear all

*******************************************
* set paths to dofile and parameter folder:
*******************************************
*gl dofile   "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\parameterfiles\dofiles"
*gl param    "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\parameterfiles\datfiles"
gl param    "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\TESTPARAMETERS\paratest"
gl dofile   "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\TESTPARAMETERS"

**************************************************
* set paths to folders conatining divers datasets:
**************************************************
gl cfe      "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\parameterfiles\cfe-database"
gl eurostat "D:\horvath\EUROSTAT"
gl startpop "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\startpop"
gl lfsin    "\\int.wsr.at\Nabu\restriktive_Daten\EC\WELTRANSIM_RPP_167-2017-ECHP_LFS_EU-SILC\Daten\LFS\vol-2\YearlyFiles"
gl adhoc    "\\int.wsr.at\Nabu\restriktive_Daten\EC\WELTRANSIM_RPP_167-2017-ECHP_LFS_EU-SILC\Daten\LFS\vol-1\AdhocModules\LFS_ahm_2009"
gl save     "\\int.wsr.at\Nabu\restriktive_Daten\EC\WELTRANSIM_RPP_167-2017-ECHP_LFS_EU-SILC\Auswertungen\TEST"
*gl save     "\\int.wsr.at\Nabu\restriktive_Daten\EC\WELTRANSIM_RPP_167-2017-ECHP_LFS_EU-SILC\Auswertungen"

gl share_input "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\SHARE_TMP"

************************************************
* define simulation period, sex and maximum age:
************************************************
gl yearmax "2150"
gl startyear "2010"
gl lfs_y "2014"
gl Sex "F M"
gl maxage 105

************************
* define country sample:
************************
gl Geosample "AT ES FI"

*****************************************
* globals needed in MaleChildlessness.do:
*****************************************
gl YOB_MALEFERT_L 1920
gl YOB_MALEFERT_U 2050

********************************
* used in refined fertility do:
********************************
gl yob_birth_low 1960
gl yob_birth_high 2050
gl yob_low 1900
gl yob_high 1959

********************************************************************************
* run do-files generating parameterfiles. !! need to run startpop files first !!
********************************************************************************

do $dofile\PersonCore.do

do $dofile\BaseMortality.do
    * UK data no longer contained in eurostat population projections!
    * needs eurostat data (unzipped) in a non-relational directory:
    * proj_19naasmr <-- variable name proj_19naasmr must be updated in dofile when new projections are used

do $dofile\RefinedMortality.do
    * data for Spain are from "Defunciones 2012-14_MR.xlsx"
    * data for AT, ES und FI: "Inequalities-in-longevity-by-education-in-OECD-countries.xlsx

do $dofile\BaseFertility.do
    * UK data no longer contained in eurostat population projections!
    * needs eurostat data (unzipped) in a non-relational directory:
    * proj_19naasfr <-- variable name proj_19naasfr must be updated in dofile when new projections are used

do $dofile\RefinedFertility.do
    * data for AT, ES and FI from cohort fertilty database http://www.cfe-database.org/database/
    * data for UK from Berrington et al. 2015

do $dofile\BaseEduc.do

do $dofile\EducationPattern.do

do $dofile\SchoolEnrolment.do

do $dofile\RefinedEducFate.do

do $dofile\FemalePartnership.do

do $dofile\PartnerMatching.do

do $dofile\NetMigration.do
    * UK data no longer contained in eurostat population projections!
    * UK data taken from office of nation statistics (hard coded in dofile!)

do $dofile\Emigration.do

do $dofile\Immigration.do

do $dofile\FamilyLinks.do

do $dofile\MaleChildlessness.do

BaseEduc.do

Creates parameters on education progression rates by sex and year of birth contained in BaseEduc_2010.dta. These are derived by calculating education shares of the population aged 30 to 34 (assuming that highest education levels are typically acchieved before age 30) and then dividing the cumulated shares i.e. the share achieving educaion levels at least as high as “X” by the cumulated shares of achieving at least an education level lower than “X”.

This results in progression rates reflecting the probability that someone how has education level “X” progresses to an higher education level “Y”.

The parameterfile contains two education progression rate:

  • EducProg1 gives the probability of progressing from low to medium education
  • EducProg2 gives the probability of progressing from medium to high education

Data Source: EU-LFS 2014 Quarter 1-4

Version: March 2018

Author(s): Tom Horvath



clear all

loc Ctry $Geosample

* ------------------------------------------------------------------------------
* 1. IMPORT CSV DATA TO STATA FORMAT
* ------------------------------------------------------------------------------

* Yearly file
* --------------------------------------
display("$Geosample")
foreach country of glo Geosample {

    import delimited ///
       using "$lfsin/`country'_YEAR_1998_onwards/`country'${lfs_y}_y.csv", ///
       clear delimiters(",") varnames(1) asdouble stripquotes(yes)

    save "$save/`country'${lfs_y}_y.dta", replace

}

* ------------------------------------------------------------------------------
* 2. GET FILES TOGEHTER
* ------------------------------------------------------------------------------

* Append files
* --------------------------------------

clear
set obs 1

foreach country of glo Geosample {
    ap using "$save/`country'${lfs_y}_y.dta", nol force
    rm "$save/`country'${lfs_y}_y.dta"
}

drop if _n == 1

save "$save/lfs_sample_${lfs_y}_y.dta", replace

use "$save/lfs_sample_${lfs_y}_y.dta", replace

* recode education variables
*---------------------------

recode hat11lev                                                     ///
    ( 100/200 = 1)                                                  ///
    ( 300/499 = 2)                                                  ///
    ( 500/800 = 3)                                                     ///
    , gen(last_edu)
        cap lab def last_edu_VL 1 "Low" 2 "Med" 3 "High"
        lab val last_edu last_edu_VL
        lab var last_edu "highest edu"


ge intwgt=int(1000*coeff)
drop if intwgt == .
drop if last_edu == . | last_edu == 0
ge female = sex == 2
ge male = sex == 1

keep if age >= 25 & age < 35

bys country female: egen tot_pop = total (intwgt)
bys country female last_edu: egen edu_pop = total (intwgt)

ge edu_share = edu_pop / tot_pop

bys country female last_edu: keep if _n == 1

keep country female last_edu edu_share

gsort country female -last_edu
by country female: ge cumsum = sum(edu_share)
by country female: ge prog = cumsum/cumsum[_n+1] if _n<_N

keep if prog !=.

gsort country last_edu -female

drop edu_share cumsum

save $save\base_educ.dta, replace

********************************************************************************
* Program for writing dat file
********************************************************************************

loc Ctry $Geosample

cap pr drop baseeduc2dat
pr de baseeduc2dat

syntax , GEO(string)
cap file close _all

use  $save\base_educ.dta, clear

keep if country == "`geo'"

qui sum prog
loc nedu = r(N)

loc startyear = $startyear

loc do $dofile

    * Create file
    tempname file
    file open `file' using "$param/`geo'_BaseEduc_`startyear'.dat", w replace

    * Header
    file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\BaseEduc.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters

    //EN Education progression probability low -> medium
    file write `file' _tab "double" _tab "EducProg1[SEX][YOB_EDUC_PROG1] = { " _n
    file write `file' _tab _tab "(61)" _tab (prog[1]) ", " _n
    file write `file' _tab _tab "(61)" _tab (prog[2]) ", " _n
    file write `file' _tab "}; " _n

     //EN Education progression probability medium -> high
    file write `file' _tab "double" _tab "EducProg2[SEX][YOB_EDUC_PROG2] = {" _n
    file write `file' _tab _tab "(71)" _tab (prog[3]) ",  " _n
    file write `file' _tab _tab "(71)" _tab (prog[4]) ",  " _n
    file write `file' _tab "}; " _n

    file write `file' _tab "};" _n

file close _all
end

********************************************************************************
* WRITE .DAT FILES

foreach geo of local Ctry{
baseeduc2dat , geo("`geo'")
}

BaseFertility.do

Create age-specific fertility rates contained in BaseFertility_2010.dta based on historic data and projections provided by EUROSTAT. Projected fertility rates correspond to the baseline projection rates of EUROSTAT population forecast.

Two parameters are produced:

  • AgeSpecificFertility: contains the fertility rate of females by age (15 to 49) over the entire simulation period (2010 to 2150)
  • SexRatio: give the relation between males and females born each year. This is currently set to 101 for each country.

Data Source:

  • Fertilityrate projections by EUROSTAT (proj_19naasfr)
  • Hisortic Fertilityrates (demo_frate)

Version: Jan 2019

Author(s): Marian Fink, Tom Horvath



cd $eurostat

loc Ctry $Geosample_POP

********************************************************************************
* PREPARE FERTILITY DATA
********************************************************************************

* Perpare FR for years before 2015
* --------------------------------

clear
foreach geo of local Ctry{
display("`geo'")
}

eurostatuse demo_frate, long geo(`Ctry') noerase

drop if age == "TOTAL" | age == "Y10-14" | age == "Y15-19"| age == "Y20-24"  ///
    | age == "Y25-29" | age == "Y30-34" | age == "Y35-39" | age == "Y40-44" ///
    | age == "Y45-49" | age == "Y_GE50"

drop if time < $startyear

save "demo_frate.dta", replace

* prepare FR for years 2015 onwards (up to 2080)
* ----------------------------------------------
clear

eurostatuse proj_19naasfr, long geo(`Ctry') noerase

drop if age ==  "Y_LT15" | age == "Y_GE50" | age == "TOTAL"

save  "proj_19naasfr.dta", replace

* Get files together
* ------------------
use "proj_19naasfr.dta", clear

g proj = 1

ap using "demo_frate.dta"
    replace proj = 0 if mi(proj)

drop unit*

drop *label

replace age = subinstr(age, "Y", "", .)

destring age , replace

ge value = proj_19naasfr if proj == 1
    replace value = demo_frate if proj == 0

drop proj_19naasfr demo_frate

rename value frate

rename time year

replace projection = "NULL" if proj == 0

drop proj

sort geo year age projection

keep geo year age projection frate

compress

save "fertility_data.dta", replace

********************************************************************************
* PREPARE DATA for DTA-FILE
* - Reshape file
* - Expand data to cover years up to 2150 (assume constand FR from 2080 onwards)
********************************************************************************

use "fertility_data.dta", clear

g byte  proj = 1 if projection == "BSL"
replace proj = 2 if projection == "LFRT"
replace proj = 0 if projection == "NULL"

drop projection

reshape wide frate, i( year geo age ) j( proj )

ren frate0 frate
ren frate1 frate_BSL
cap ren frate2 frate_LFRT

qui su year
loc y = r(max)
loc n = $yearmax - `y' + 1
display `n'
expand `n' if year == `y' , gen(copy)

qui su year
loc y = r(max)

sort geo year age copy

by geo year age: replace year = year + _n - 1 if year == `y'

replace frate_BSL = frate if !mi(frate )
cap replace frate_LFRT = frate if !mi(frate )

drop frate copy

save "fertility2dat.dta", replace

use "fertility2dat.dta", clear
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop fertility2dat
pr de fertility2dat

    syntax , GEO(string) FRate(string)

    clear all

    cap file close _all

    loc do $dofile

    * Create file
    tempname file
    loc year $startyear
    display(`year')
    file open `file' using "$param/`geo'_BaseFertility_`year'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\BaseFertility.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters

    ** Parameter table for fertility rates
    file write `file' _tab "//EN Age distribution of fertility" _n
    file write `file' _tab "double" _tab "AgeSpecificFertility[FERTILE_AGE_RANGE][SIM_YEAR_RANGE] = {" _n // open fertility table

    * Get values for fertility rates
    use "fertility2dat.dta", clear

    keep if geo == "`geo'" // Country

    if "`frate'" == "BSL" {
        cap drop frate_LFRT
        ren frate_BSL frate
    }
    sort age year

    loc age1 = age[1]
    loc ageN = age[_N]
    loc y1   = 1
    loc y11  = 2
    loc yN   = $yearmax - ($startyear - 1) //2150 - 2009
    loc yN1  = `yN' - 1  //2150 - 2009 - 1
    display `y1'
    display `y11'
    display `yN'
    display `yN1'
    * Write fertility rate values into parameter file
    forv a = `age1' / `ageN' {
        *display `y1'
        file write `file' _tab _tab (frate[`y1']) ", "

        forv y = `y11' / `yN1' {
        *display `y'
            file write `file' (frate[`y']) ", "
        }

        file write `file' (frate[`yN']) ", " _n

        drop if age == `a'
    }

    file write `file' _tab "};" _n // close life table

    ** Parameter table for fertility rates
    file write `file' _tab "//EN Sex ratio (male per 100 female)" _n
    file write `file' _tab "double" _tab "SexRatio[SIM_YEAR_RANGE] = {" _n // open fertility table
    file write `file' _tab "(`yN') 101," _n
    file write `file' _tab "};" _n // close sexratio table
    file write `file' "};" _n  // Close parameters

    file close _all

end

********************************************************************************
* WRITE .DAT FILES
foreach geo of local Ctry{
fertility2dat , geo("`geo'") fr("BSL")
}

BaseMortality.do

Creates sex and age specific mortality hazard rates contained in BaseMortality_2010.dat from life tables provided by EUROSTAT. We use age-sex-specific mortality rates contained in the life tables to compute mortality hazards (exits to death). Hazard rates are calculated by the forumla hx = -log(1-qx) where qx is the age and sex specific mortality rate for a given year.

The parameterfile contains the parameter:

  • MortalityTable[SEX][AGE_RANGE][SIM_YEAR_RANGE] containing age and sex specific over the entire simulation period (2010 to 2150).

Data Source:

  • Mortalityrate projections by EUROSTAT (proj_19naasmr)

Version:Jan 2019

Author(s): Marian Fink, Tom Horvath



loc Ctry $Geosample_POP
loc Sex $Sex
loc simstart $startyear
*loc Yearmax $yearmax

*gl maxagerange 105

cd $eurostat

********************************************************************************
* Load Mortality DATA
********************************************************************************
clear all

eurostatuse proj_19naasmr, long geo(`Ctry') noerase

replace age = "Y0" if age == "Y_LT1"

drop if age ==  "Y_GE100"

replace age = subinstr(age, "Y", "", .)
destring age , replace

rename time year

g byte  proj = 1 if projection == "BSL"
replace proj = 2 if projection == "LMRT"

* Calculate Hazardrate:
rename proj_19naasmr qx
g double hx = -log(1-qx)   // hazard rate

keep geo sex age year hx proj

sort geo year age sex proj

reshape wide hx, i(geo year age sex) j(proj)

ren hx1 mrate_BSL
cap ren hx2 mrate_LMRT

sort geo sex age year

* -----------------------------------------
* Extend mortality rates for age 100 to 105
* -----------------------------------------

qui sum age
loc maxage = r(max)
loc b = $maxage - `maxage' + 1
display `b'
expand `b' if age == `maxage', gen(copy)

sort geo sex year age copy
by geo sex year: replace age = _n -1 if age == `maxage'

* -----------------------------------------
* Extend data to year 2150
* -----------------------------------------
qui su year
loc y = r(max)
loc n = $yearmax - `y' + 1
display `n'
cap drop copy
expand `n' if year == `y' , gen(copy)

sort geo sex year age copy

by geo sex year age: replace year = year + _n - 1 if year == `y'

drop copy
* -----------------------------------------
* Extend data for years 2010 - 2014
* -----------------------------------------
sort geo sex age year

qui su year
loc y = r(min)
loc n = `y' - $startyear + 1

display `n'
expand `n' if year == `y' , gen(copy)

gsort geo sex year age -copy

by geo sex year age: replace year = year - `n' + _n if year == `y' & copy == 1

sort geo sex age year

cap drop copy

save "hr_mortality_19.dta", replace

********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop mortality2dat
pr de mortality2dat

    syntax , GEO(string) MRate(string)

    clear all

    cap file close _all

    * Create file
    tempname file
    loc year = $startyear
    loc do $dofile

    file open `file' using "$param/`geo'_BaseMortality_`year'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\BaseMortality.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters

    ** Parameter table for mortality hazard rates
    file write `file' _tab "//EN Mortality hazard by age" _n
    file write `file' _tab "double" _tab "MortalityTable[SEX][AGE_RANGE][SIM_YEAR_RANGE] = {" _n // open fertility table

    * Get values for mortality rates
    use "hr_mortality_19.dta", clear

    keep if geo == "`geo'" // Country

    if "`mrate'" == "BSL" {
        cap drop mrate_LMRT
        ren mrate_BSL frate
    }
    sort sex age year

    loc age1 = age[1]
    loc ageN = age[_N]
    loc y1   = 1 //2010 - 2009
    loc y11  = 2 //2010 - 2009 + 1
    loc yN   = $yearmax - ($startyear - 1) //2150 - 2009
    loc yN1  = `yN' - 1  //2100 - 2009 - 1
    display `yN'
    display `yN1'

    * Write mortality rate values into parameter file
    ge male = sex == "M"
    forv s = 0 / 1 {
    forv a = `age1' / `ageN' {

        file write `file' _tab _tab (frate[`y1']) ", "

        forv y = `y11' / `yN1' {
            file write `file' (frate[`y']) ", "
        }

        file write `file' (frate[`yN']) ", " _n

        drop if age == `a' & male == `s'
    }
    }

    file write `file' _tab "};" _n // close life table

    file write `file' "};" _n  // Close parameters

    file close _all

end

********************************************************************************
* WRITE .DAT FILES

foreach geo of local Ctry{
mortality2dat , geo("`geo'") mr("BSL")
}

EducationPattern.do

Here we derive different generic education patterns that reflect different pathways through the education system stored in EducPattern_2010.dat. For each of the three modeled education levels (low, medium and high) we allow for 12 different pathways where pupils can transit between differnt education states.

There are currently 13 possible education states - not all of them are used in the current version:

  • EP_LOW: current education state is “Low”
  • EP_MED_DUAL: current education state is “Medium Dual”
  • EP_MED_VOC: current education state is “Medium Vocational”
  • EP_MED_GEN: current education state is “Medium General”
  • EP_OUT1: first “Out of school” episode
  • EP_HIGH1_FT: first episode of attenting higher education as full time student
  • EP_HIGH1_PT: first episode of attenting higher education as part time student
  • EP_OUT2: second “Out of school” episode
  • EP_HIGH2_FT: second episode of attenting higher education as full time student
  • EP_HIGH2_PT: second episode of attenting higher education as part time student
  • EP_OUT3: third “Out of school” episode
  • EP_HIGH3_FT: third episode of attenting higher education as full time student
  • EP_HIGH3_PT: third episode of attenting higher education as part time student

While there are up to 12 different possible paths for each finally acchieved education level we currently only model three paths for each possible education level. Their probabilty is derived simply by assuming that each path for a given education outcome has equal probabilty.

For simplicity we also only use the states EP_LOW, EP_MED_GEN and EP_HIGH1_FT correspoing to time spent in low, medium and high education. In order to derive the number of years spent in each of these states we use information on the highest educational level attained and the age at which this highest level was obtained from EUROSTAT labor force survey data and derive for each education level (low, medium and high) the distribution of age at completion. For simplicity we assess age at education end at three points of the respective age distribution: the median as well as the 33rd and 66th percentiles. This gives us three different possible ages for the completion of each level of education. Assuming that pupils enter the school system at age 6 these age values result in a three different possible values for years spent in the education system for each level of education.

In a final step we split years spent in education into years spent in different education states.

  • For those who acchieve low education the number of years spent in EP_LOW is simply on of three possible number of years spent in education.
  • For those achieving medium education we deduct the average value of EP_LOW from their total years spent in education to derive the number of years spent in medium education (EP_MED_GEN).
  • For those achieving high education we again assume that they spend the average number of years in lowest education before advancing to medium and higher education. The number of years spent in medium education is again derive my the mean number of year spent in medium education for those ending up with meidum education. Deducting mean number of years spent in low and in medium education results in the number of years spent in high education (EP_HIGH1_FT).

The parameterfile contains the parameter:

  • EducPattern[EDUC_LEVEL3][EDUC_PATTERN_RANGE][EDUC_PATTERN] containing the number of years spent in each education state by highest level of education
  • EducPatternDist[SEX][EDUC_LEVEL3][EDUC_PATTERN_RANGE] containing the probability for each path for given finally acchieved education level

Data Source: EU-LFS 2014 Quarter 1 - 4

Version: March 2018

Author(s): Tom Horvath




loc Ctry $Geosample

* ------------------------------------------------------------------------------
* 1. IMPORT CSV DATA TO STATA FORMAT
* ------------------------------------------------------------------------------

* Yearly file
* --------------------------------------
/*
foreach country of glo Geosample {

    import delimited ///
       using "$lfsin/`country'_YEAR_1998_onwards/`country'${lfs_y}_y.csv", ///
       clear delimiters(",") varnames(1) asdouble stripquotes(yes)

    save "$save/`country'${lfs_y}_y.dta", replace

}

* ------------------------------------------------------------------------------
* 2. GET FILES TOGEHTER
* ------------------------------------------------------------------------------

* Append ad hoc module files
* --------------------------------------

clear
set obs 1

foreach country of glo Geosample {
    ap using "$save/`country'${lfs_y}_y.dta", nol
    rm "$save/`country'${lfs_y}_y.dta"
}

drop if _n == 1

loc file = strtoname("$countries")
save "$save/`file'_${lfs_y}_y.dta", replace
*/

use "$save/lfs_sample_${lfs_y}_y.dta", replace

* recode education variables
*---------------------------

recode hat11lev                                                     ///
    ( 100/200 = 1)                                                  ///
    ( 300/499 = 2)                                                  ///
    ( 500/800 = 3)                                                     ///
    , gen(last_edu)
        cap lab def last_edu_VL 1 "Low" 2 "Med" 3 "High"
        lab val last_edu last_edu_VL
        lab var last_edu "highest edu"

recode educlevl                                                   ///
    (   0/2 = 1)                                                  ///
    (   3/4 = 2)                                                  ///
    (   5/8 = 3)                                                     ///
    (   9   = 4)                                                  ///
    , gen(cur_edu)
        cap lab def cur_edu_VL 1 "Low" 2 "Med" 3 "High" 4 "out"
        lab val cur_edu cur_edu_VL
        lab var cur_edu "current edu"

rename hatyear year_edu_attained

ge age_edu_attained = age - (year - year_edu_attained) if year_edu_attained != 9999

* in education
cap drop inedu
ge inedu = educstat == 1 | educstat == 3
    la def inedu_VL 0 "no edu" 1 "yes edu"
    la val inedu inedu_VL

* get probabilities
*------------------
ge intwgt=int(1000*coeff)
drop if intwgt == .
drop if last_edu == .
ge female = sex == 2
ge male = sex == 1

keep if age == 32 & intwgt != . & last_edu != 0

bys country female: egen ntot = total(intwgt)
bys country female last_edu: egen nedu = total(intwgt)

ge p_edu = nedu/ntot
sum p_edu

ge p_edu_fem = p_edu if female == 1

ge p_edu_male = p_edu if female == 0

bys country last_edu: egen p_edu_f = min(p_edu_fem)
by country last_edu: egen p_edu_m = min(p_edu_male)

    drop p_edu_fem p_edu_male p_edu

* ------------------------------------------------------------------------------
* Prepare 12 paths per education level:
* ------------------------------------------------------------------------------
cap drop age*q
ge ageq1 = 0
ge ageq2 = 0
ge ageq3 = 0
ge ageq4 = 0
ge ageq5 = 0
ge ageq6 = 0
ge ageq7 = 0
ge ageq8 = 0
ge ageq9 = 0
ge ageq10 = 0
ge ageq11 = 0
ge ageq12 = 0

* at the moment only three paths are relevant: age at completion is dividet into
* three ageranges

foreach country of glo Geosample {
 foreach edu of numlist 1 2 3 {
        _pctile age_edu [fw=intwgt] if country == "`country'" & age == 32 & last_edu == `edu', p(33, 50, 66)
        replace ageq1 = r(r1) if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq2 = r(r2) if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq3 = r(r3) if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq4 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq5 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq6 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq7 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq8 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq9 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq10 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq11 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
        replace ageq12 = 0 if country == "`country'" & age == 32 & last_edu == `edu'

 }
}

bys country last_edu: keep if _n == 1

keep country last_edu ageq* p_edu*

* generate patterns
reshape long ageq, i(country last_edu)

* korrigiere unplausible Werte
replace ageq = 14 if ageq < 14 & ageq > 0

* ------------------------------------------------------------------------------
* Prepare 13 possible education states per finally acchieved education level:
* ------------------------------------------------------------------------------

ge low = 0
ge med_dual = 0
ge med_voc = 0
ge med_gen = 0
ge out1 = 0
ge high1_ft = 0
ge high1_pt = 0
ge out2 = 0
ge high2_ft = 0
ge high2_pt = 0
ge out3 = 0
ge high3_ft = 0
ge high3_pt = 0

sort country last_edu _j

replace low = ageq if last_edu == 1
*replace med_dual = ageq if last_edu == 2
*replace med_voc = ageq if last_edu == 2
replace med_gen = ageq if last_edu == 2
*replace high1_pt = ageq if last_edu == 3
replace high1_ft = ageq if last_edu == 3
*replace high2_pt = ageq if last_edu == 3
*replace high2_ft = ageq if last_edu == 3
*replace high3_pt = ageq if last_edu == 3
*replace high3_ft = ageq if last_edu == 3

* ------------------------------------------------------------------------------
* subtract school starting age from age at completion i.o. to get years of total
* education duration
* ------------------------------------------------------------------------------

replace low = low - 6 if low > 0
*replace med_dual = med_gen - 6 if med_gen > 0
*replace med_voc = med_gen - 6 if med_gen > 0
replace med_gen = med_gen - 6 if med_gen > 0
*replace high1_pt = high1_pt - 6 if high1_pt > 0
replace high1_ft = high1_ft - 6 if high1_ft > 0
*replace high2_pt = high2_pt - 6 if high2_pt > 0
*replace high2_ft = high2_ft - 6 if high2_ft > 0
*replace high3_pt = high3_pt - 6 if high3_pt > 0
*replace high3_ft = high3_ft - 6 if high3_ft > 0

* ------------------------------------------------------------------------------
* for each education level higher thant "low" generate "years in low" (only for
* those paths with probabilty >0 <--> ageq >0)
* ------------------------------------------------------------------------------
cap drop tmp
ge tmp = 0
foreach c of global Geosample{
qui: sum low if low > 0 & country == "`c'"
replace tmp = r(mean) if ageq > 0 & country == "`c'"
}
by country: replace low = tmp if low == 0 & ageq > 0    // jeder pfad > low beginnt mit durchschnittlicher Dauer Low

* substract years in low from age at completion of medium education levels
replace med_dual = med_dual - low if med_dual > 0 & ageq > 0    // dauer für low_phase abziehen
replace med_voc = med_voc - low if med_voc > 0 & ageq > 0    // dauer für low_phase abziehen
replace med_gen = med_gen - low if med_gen > 0 & ageq > 0    // dauer für low_phase abziehen

* derive mean years spent in medium education:
cap drop tmp
ge tmp = 0
foreach c of global Geosample{
qui: sum med_gen if med_gen > 0 & country == "`c'"
replace tmp = r(mean) if ageq > 0 & country == "`c'"
}

by country:  replace med_gen = tmp if med_gen == 0 & high1_ft > 0 & ageq > 0 // mindestdauer für PS abziehen

replace high1_pt = high1_pt - med_gen - low if high1_pt > 0 & ageq > 0
replace high1_ft = high1_ft - med_gen - low if high1_ft > 0 & ageq > 0
replace high2_pt = high2_pt - med_gen - low if high2_pt > 0 & ageq > 0
replace high2_ft = high2_ft - med_gen - low if high2_ft > 0 & ageq > 0
replace high3_pt = high3_pt - med_gen - low if high3_pt > 0 & ageq > 0
replace high3_ft = high3_ft - med_gen - low if high3_ft > 0 & ageq > 0

cap drop tmp

/* -----------------------------------------------------------------------------
derive probabilty of each path assuming equal probabilty for each path for given
education outcome
* ----------------------------------------------------------------------------*/
replace p_edu_f = 0 if ageq == 0 // no probabitly for those paths not considered
replace p_edu_m = 0 if ageq == 0 // no probabitly for those paths not considered

ge tmp = 0
replace tmp = 1 if ageq != 0

by country last_edu: egen npaths=total(tmp)

replace p_edu_f = p_edu_f / npaths if npaths>0
replace p_edu_m = p_edu_m / npaths if npaths>0

rename country geo

replace low         =round(low)
replace med_dual     =round(med_dual)
replace med_voc     =round(med_voc)
replace med_gen     =round(med_gen)
replace out1         =round(out1)
replace high1_ft     =round(high1_ft)
replace high1_pt    =round(high1_pt)
replace out2         =round(out2)
replace high2_ft     =round(high2_ft)
replace high2_pt     =round(high2_pt)
replace out3         =round(out3)
replace high3_ft    =round(high3_ft)
replace high3_pt    =round(high3_pt)

save "$save/educpattern2dat.dta", replace

********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop pattern2dat
pr de pattern2dat

    syntax , GEO(string)

    clear all

    cap file close _all

    loc startyear = $startyear

    loc do $dofile

    * Create file
    tempname file
    file open `file' using "$param/`geo'_EducPattern_`startyear'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\EducationPattern.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters

    ** Parameter table for mortality hazard rates
    file write `file' _tab "//EN Education Pattern" _n
    file write `file' _tab "int" _tab "EducPattern[EDUC_LEVEL3][EDUC_PATTERN_RANGE][EDUC_PATTERN] = {" _n // open fertility table

    * Get values for mortality rates
    use "$save/educpattern2dat.dta", clear

    keep if geo == "`geo'" // Country

    sort geo last_edu _j

    loc edu1 = 1
    loc eduMax = last_edu[_N]
    display `eduMax'

    qui sum _j
    loc pathmax = r(max)
    display `pathmax'
    loc pathmax_min1 = `pathmax' - 1
    display `pathmax_min1'

    forv a = `edu1' / `eduMax' {
    file write `file' _tab "// EL_`a'" _n

    file write `file' _tab _tab (low[1]) ", " (med_dual[1]) ", " (med_voc[1]) ", " (med_gen[1]) ", "
    file write `file' (out1[1]) ", " (high1_ft[1]) ","  (high1_pt[1]) ", "
    file write `file' (out2[1]) ", " (high2_ft[1]) ","  (high2_pt[1]) ", "
    file write `file' (out3[1]) ", " (high3_ft[1]) ","  (high3_pt[1]) ", "

    forv s = 2 / `pathmax_min1' {
        file write `file'  (low[`s']) ", " (med_dual[`s']) ", " (med_voc[`s']) ", " (med_gen[`s']) ", "
        file write `file'  (out1[`s']) ", " (high1_ft[`s']) ","  (high1_pt[`s']) ", "
        file write `file'  (out2[`s']) ", " (high2_ft[`s']) ","  (high2_pt[`s']) ", "
        file write `file'  (out3[`s']) ", " (high3_ft[`s']) ","  (high3_pt[`s']) ", "
        }

    file write `file'  (low[`pathmax']) ", " (med_dual[`pathmax']) ", " (med_voc[`pathmax']) ", " (med_gen[`pathmax']) ", "
    file write `file'  (out1[`pathmax']) ", " (high1_ft[`pathmax']) ","  (high1_pt[`pathmax']) ", "
    file write `file'  (out2[`pathmax']) ", " (high2_ft[`pathmax']) ","  (high2_pt[`pathmax']) ", "
    file write `file'  (out3[`pathmax']) ", " (high3_ft[`pathmax']) ","  (high3_pt[`pathmax']) ", " _n

    drop if last_edu == `a'
    }
    file write `file' "};" _n // close life table


    file write `file' _tab "int" _tab "SchoolEntryAge = 6; //EN School entry age" _n

    file write `file' _tab "double" _tab    "StartSchoolYear = 0.66; //EN Start of school year" _n
    file write `file' _tab "//EN Education Pattern Distribution" _n
    file write `file' _tab "cumrate" _tab "EducPatternDist[SEX][EDUC_LEVEL3][EDUC_PATTERN_RANGE] = {" _n

    * Get values for education pattern probabilities (by sex)
    use "$save/educpattern2dat.dta", clear

    keep if geo == "`geo'" // Country

    reshape long p_edu_, i(last_edu _j) j(sex) string

    sort sex last_edu _j

    tostring last_edu, replace force

    replace last_edu = "Low" if last_edu == "1"
    replace last_edu = "Medium" if last_edu == "2"
    replace last_edu = "High" if last_edu == "3"

    replace sex = "FEMALE" if sex == "f"
    replace sex = "MALE" if sex == "m"

    loc Sex "FEMALE MALE"
    loc Edu "Low Medium High"

    foreach sex of loc Sex {
        foreach edu of loc Edu{
            file write `file' _tab "//" "`sex'""_""`edu'" _n
            forv i = 1/`pathmax'{
            file write `file' _tab _tab (p_edu_[`i']) ","
            }
            file write `file' _n
            drop if last_edu == "`edu'" & sex == "`sex'"
        }
    }
    file write `file' _tab "};" _n // close life table

    file write `file' "};" _n  // Close parameters
    file close _all

end

********************************************************************************
* WRITE .DAT FILES

foreach geo of local Ctry{
pattern2dat , geo("`geo'")
}

cap log c
exit

Emigration.do

Emigration related parameters stored in Emigration_2010.dat are produced using data from EUROSTAT on the total number of emigrants per sex and age and EUROSTAT population data. We derive emigration rates by 5 year age groups and sex. Additionally the model allows to define the total number of emigrants in the parameter EmigraitonTotal. The latter parameter is not relevant in this applicatioin and simply set to zero, since EUROSTAT published only data on total net-migration in it’s population projections. If projections on total number of emigrants become available these can easily be added in the parameterfile.

The parameterfile contains three parameters:

  • EmigrationSettings
  • EmigrationRates[SEX][AGE5_PART] gives the emigration rate of the population by sex and 5 year age group
  • EmigrationTotal[SIM_YEAR_RANGE] gives the total number of emigrants per year which is set to zero in this application

Data Source:

  • EUROSTAT migr_emi2 containg the total number of emigrants by sex and age
  • EUROSTAT demo_pjan containing the size of the resident population by sex and 5 year age group

Version: March 2018

Author(s): Tom Horvath



loc Ctry $Geosample
loc Sex $Sex
loc simstart $startyear
loc yearmax $yearmax
loc agemax $maxage
*gl maxagerange 105

cd $eurostat

clear
eurostatuse migr_emi2, long geo(`Ctry') noerase
keep if agedef == "COMPLET"

drop agedef agedef_label age_label unit_label sex_label geo_label flags

replace age = "Y0" if age == "Y_LT1"

drop if age ==  "Y_GE100" | age == "TOTAL" | age == "UNK"

replace age = subinstr(age, "Y", "", .)
destring age , replace

rename time year

rename migr_emi2 n_emi

ge age5part=floor(age/5)
    replace age5part = 18 if age5part >18

bys geo sex year age5part: egen n_emi5=total(n_emi)

bys geo sex year age5part: keep if _n==1

keep if year>=2014

bys geo sex age5part: egen n_emi_avg=mean(n_emi5)

bys geo sex age5part: keep if _n == 1

save $eurostat\emi_avg.dta, replace

loc Ctry $Geosample

clear

eurostatuse demo_pjan, long geo(`Ctry') noerase

drop unit *_label flags

rename time year

keep if year >=2014 & year < 2019

drop if sex == "T"

replace age = "Y0" if age == "Y_LT1"

drop if age ==  "Y_GE100"
drop if age == "TOTAL" | age == "UNK" | age == "Y_OPEN"

replace age = subinstr(age, "Y", "", .)
destring age , replace

ge age5part=floor(age/5)
    replace age5part = 18 if age5part >18

bys geo sex year age5part: egen n_pop5=total(demo_pjan)

bys geo sex year age5part: keep if _n==1

keep if year>=2014

bys geo sex age5part: egen n_pop_avg=mean(n_pop5)

bys geo sex age5part: keep if _n == 1

mmerge geo sex age5part using $eurostat\emi_avg.dta, type(1:1)

tab _merge

ge emi_share = n_emi5/ n_pop5

sort geo sex age5

keep geo sex age5 emi_share

save $eurostat\emigration_rates.dta, replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop emig2dat
pr de emig2dat

    syntax , GEO(string)

    clear all

    cap file close _all
    use "$eurostat\emigration_rates.dta", clear

    keep if geo == "`geo'" // Country
    ge male = sex == "M"

    * Create file
    tempname file
    loc startyear = $startyear
    loc yearmax = $yearmax
    loc do $dofile
    loc simyears = $yearmax - $startyear +1

    qui sum age
    loc minage = r(min)+1        //need to start at position 1 not 0
    loc age_limit = r(max)+1

    file open `file' using "$param/`geo'_Emigration_`startyear'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\Emigration.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters
    file write `file' _tab "EMIGR_SET" _tab    "EmigrationSettings = ES_AGERATES;"_n
    file write `file' _tab "//EN Emigration rates by sex and age" _n
    file write `file' _tab "double" _tab "EmigrationRates[SEX][AGE5_PART] = {" _n

    forvalues s = 0(1)1{
        forvalues a = `minage'(1)`age_limit'{
        file write `file' _tab (emi_share[`a']) ","
        }
        file write `file' _n
        drop if male == `s'
    }
    file write `file' _tab "};" _n

    file write `file' _tab "//Total number of emigrants" _n
    file write `file' _tab "long" _tab "EmigrationTotal[SIM_YEAR_RANGE] = {" _n
    file write `file' _tab "(`simyears') 0, "_n
    file write `file' _tab "};" _n
    file write `file' _tab "}; " _n

    file close _all

end

********************************************************************************
* WRITE .DAT FILES
loc Ctry $Geosample
foreach geo of local Ctry{
cap emig2dat , geo("`geo'")
}

FemalePartnership.do

Here the two parameters contained in the FemalePartnerships_2010.dat file are estimated. The first giving the probability of women being in a partnership when having children is estimated depending on mothers’ own education, the age of the youngest child in the household as well as mothers’ own age. The estimation is based on a probit model, where the probability of being in a partnership is explaind by mothers age, a country specific effect of mother’s education and age of the youngest child in the family. The second parameter is estimated also by a probit model where females’ partnership status is explained by an age polynomial and a country specific education effect.

Results are stored in the parameter file as two parameters:

  • InUnionProbWithChildren[EDUC_LEVEL3][CHILD_AGEGR][MOTH_AGEGR]
  • InUnionProbNoChildren[PART_AGE_RANGE][EDUC_LEVEL3]

Data Source:

  • starting population file based on EUSILC

Version: October 2020

Author(s): Tom Horvath



use $startpop/startpop.dta, clear

loc Ctry $Geosample

drop if idhh ==.

rename deh isced

ge intwgt=round(dwt)
drop if intwgt == .

* ---------------------------------
* recode education and sex variable
* ---------------------------------

recode isced                                                          ///
    ( 1/2 = 1)                                                  ///
    ( 3/4 = 2)                                                  ///
    ( 5/6 = 3)                                                 ///
    , gen(edu)

    cap lab def edu_VL 1 "Low" 2 "Med" 3 "High"
    la val edu edu_VL

ge fem = sex == 0

replace idmother = 0 if idmother ==.

replace idfather = 0 if idfather ==.

* -----------------------
* Define person in Union:
* -----------------------

ge inUnion = 0
    replace inUnion = 1 if idpartner >0 & idpartner !=.

* generate mother_tag:
* --------------------
order country idhh idperson idpartner idfather idmother age

gsort country idhh idmother

by country idhh: ge anymoth = idmother > 0

order country idhh idperson idpartner idfather idmother age anymoth

gsort country idhh anymoth -age // jungestmother at end of list

cap drop multiple_mothers
ge multiple_mothers = 0
by country idhh: replace multiple_mothers = 1 if idmother[_n-1]!= 0 & idmother != idmother[_n-1] & _n > 1
order multiple_mothers

tab multiple_mothers

by country idhh: ge id_mother = idmother[_N] // keep only mother of youngest child in household (other mothers most likely grandmothers)

cap drop is_mother
ge is_mother = 0
by country idhh: replace is_mother = 1 if idperson == id_mother

order country idhh idpartner is_mother idmother age

tab inUnion is_mother

* -------------------------------
* generate age of youngest child
* -------------------------------

sort country idhh age

ge age_youngest = .

by country idhh: replace age_youngest = age[1] if idmother >0 | idfather > 0

by country idhh: replace age_youngest = age_youngest[1]

tab age_youngest is_mother

cap drop childAgeGr

recode age_youngest (0 /1 = 1 "0")(2/3 = 2 "1-2") (4/5 = 3 "3-5") (6/8 = 4 "6-8") ///
                (9/11 = 5 "9-11") (12/14 = 6 "12-14") (15/24 = 7 "15-24") (25/99 = -3), ge(childAgeGr)

tab age_youngest childAge
tab childAge is_mother
replace childAgeGr = -3 if is_mother == 0 | childAgeGr == . // only mothers have children

* ------------------------------------
* reduce sample to persons of interest
* ------------------------------------

keep if fem == 1 // need only women

keep if age >= 15

ge mark = is_mother == 1 & childAgeGr == -3
tab childAge is_mother

recode age (0/19 = 1 "<20") (20/24 = 2 "20-24") (25/29 = 3 "25-29") (30/34 = 4 "30-34") ///
                    (35/39 = 5 "35-39") (40/99 = 6 "40+"), ge(ownAgeGr)


qui sum ownAgeGr
gl N_Age_Gr = r(max)
display $N_Age_Gr

save "$save\fempart2dat.dta", replace

* ------------------------------------------------------------------------------
* Prepare data for InUnionProbabilty with children
* ------------------------------------------------------------------------------

* generate blank matrix for all possible ownageXchildAgeXedu combinations:
* ------------------------------------------------------------------------
use "$save\fempart2dat.dta", clear


qui tab country
loc nGeo r(r)
display `nGeo'
scalar a = `nGeo'

qui sum ownAgeGr
loc nAge = r(max)
display `nAge'

qui sum edu
loc nEdu = r(max)
display `nEdu'

qui sum childAgeGr
loc nChild = r(max)
display `nChild'

clear

set obs 1
ge country = "A"

expand a

loc i = 1
foreach geo of glo Geosample {
 replace country = "`geo'" if _n == `i'
 loc i = `i'+1
}

expand `nAge' // expand to age_groups

bys country: ge ownAgeGr = _n

expand `nEdu' // expand to edu groups

bys country ownAgeGr: ge edu = _n

expand `nChild' // expand to child age groups

bys country ownAgeGr edu: ge childAgeGr = _n

save "$save\blank_withkids.dta", replace

* load lfs data and estimate probablities for being inUnion:
* ----------------------------------------------------------
use "$save\fempart2dat.dta", clear

keep if is_mother == 1 // keep women living with their children

drop if edu == . | edu == 0 // make sure everyone has education information

sum country edu ownAgeGr childAgeGr inUnion
sort country edu ownAgeGr childAgeGr inUnion

* calculate observed shares inUnion by groups:
* --------------------------------------------
by country edu ownAgeGr childAgeGr: egen ntot = total(intwgt)
by country edu ownAgeGr childAgeGr inUnion: egen totInUnion = total(intwgt)
ge shareUnion = totInUnion/ntot if inUnion == 1

sort country edu ownAgeGr childAgeGr shareUnion
order country edu ownAgeGr childAgeGr shareUnion
by country edu ownAgeGr childAgeGr: replace shareUnion = shareUnion[1]

* merge blank matrix in order to be able to estimate prob of union for those
* combinations not available in data:
* --------------------------------------------------------------------------
mmerge country ownAgeGr edu childAgeGr using "\$save\blank_withkids.dta"

tab _merge

ge insample = _merge == 3

ge age2 = ownAgeGr*ownAgeGr

xi: probit inUnion ownAgeGr i.edu*i.country i.childAgeGr*i.country [fw = intwgt] if  insample == 1
est store estprob
predict p, pr

order country edu ownAgeGr age2 childAgeGr inUnion ntot totInUnion shareUnion p

sort country _merge

replace inUnion = 1 if missing(inUnion)

* keep only one obs per group:
* ----------------------------

bys country edu ownAgeGr childAgeGr inUnion: keep if _n ==1
bys country edu ownAgeGr childAgeGr: keep if _n == 1

keep country edu ownAge childAge shareUnion p ntot

qui sum ownAge
loc ownAgeMax = r(max)

reshape wide shareUnion p ntot, i(country edu childAge)  j(ownAge)

forv i = 1/`ownAgeMax' {
    replace shareUnion`i' = 0 if missing(shareUnion`i')
}

drop if childAgeGr == -3

* set unplausible values to 0 (no birth before age 15):
* ----------------------------------------------------
by country edu: replace p1 = 0 if childAgeGr > 3    // no Mother <20 child > 5 years
by country edu: replace p2 = 0 if childAgeGr > 5    // no Mother 20-24 child > 9
by country edu: replace p3 = 0 if childAgeGr > 7    // no Mother 25-29 child > 15

order country edu p* share*

drop ntot*

rename country geo

save "$save\fempart_withkids.dta", replace

* ------------------------------------------------------------------------------
* Prepare data for InUnionProbabilty without children
* ------------------------------------------------------------------------------

* generate blank matrix for all possible age*edu combinations:
* ------------------------------------------------------------------------

use "$save\fempart2dat.dta", clear

qui tab country
loc nGeo r(r)
display `nGeo'
scalar a = `nGeo'

qui sum edu
loc nEdu = r(max)
display `nEdu'

clear

set obs 1

ge country = "A"

expand a

loc i = 1
foreach geo of glo Geosample{
 replace country = "`geo'" if _n == `i'
 loc i = `i'+1
}

loc agerange = 80 - 15 + 1
display `agerange'

expand `agerange'

bysort country : ge age = 14 + _n

tab age country

expand `nEdu'

bys country age: ge edu = _n

save "$save\blank_nokids.dta", replace

* load lfs data and estimate probablities for being inUnion:
* ----------------------------------------------------------
use "$save\fempart2dat.dta", clear

keep if is_mother == 0 & age <=80 // keep women not living with  children

drop if edu == . | edu == 0 // make sure everyone has education information

sum country edu age inUnion
sort country edu age inUnion

* calculate observed shares inUnion by groups:
* --------------------------------------------
by country edu age: egen ntot = total(intwgt)
by country edu age inUnion: egen totInUnion = total(intwgt)
ge shareUnion = totInUnion/ntot if inUnion == 1

sort country edu age shareUnion

by country edu age: replace shareUnion = shareUnion[1]

* merge blank matrix in order to be able to estimate prob of union for those
* combinations not available in data:
* --------------------------------------------------------------------------
mmerge country edu age using "\$save\blank_nokids.dta"

tab _merge

ge insample = _merge == 3

ge age2 = age*age
ge age3 = age2*age
ge age4 = age3*age

xi: probit inUnion age age2 age3 age4 i.edu*i.country [fw = intwgt] if  insample == 1
est store estprob
predict p, pr

order country edu age age2 inUnion ntot totInUnion shareUnion p

sort country _merge

replace inUnion = 1 if missing(inUnion)

* keep only one obs per group:
* ----------------------------

bys country edu age inUnion: keep if _n ==1
bys country edu age: keep if _n == 1

keep country edu age shareUnion p ntot

qui sum age
loc ownAgeMax = r(max)

reshape wide shareUnion p ntot, i(country age)  j(edu)

rename age tmp

by country: ge age = 14 + _n

order country age p* share*

rename p1 low
rename p2 medium
rename p3 high

drop ntot* tmp

rename country geo

save "$save\fempart_nokids.dta", replace

********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop fempartner2dat
pr de fempartner2dat

    syntax , GEO(string)

    clear all

    cap file close _all

    loc startyear $startyear
    loc do $dofile

    * Create file
    tempname file
    file open `file' using "$param/`geo'_FemalePartnerships_`startyear'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\FemaleParntership.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters

    ** Parameter table for mortality hazard rates
    file write `file' _tab "//EN Probability to be in a partnership - Females living with children" _n
    file write `file' _tab "double" _tab "InUnionProbWithChildren[EDUC_LEVEL3][CHILD_AGEGR][MOTH_AGEGR] = {" _n // open partner education table

    * Get values for partnerships with kids
    use "$save\fempart_withkids.dta", clear

    keep if geo == "`geo'" // Country

    sort geo edu childAgeGr

    qui sum edu
    loc eduMax = r(max)
    display `eduMax'
    loc eduMax_low = `eduMax' - 1

    qui sum childAgeGr
    loc nChild = r(max)
    display `nChild'
    loc nChildLow = `nChild' - 1

    loc ageMax = $N_Age_Gr
    loc ageMax1 = `ageMax' - 1
    display `ageMax'

    forv edu = 1 / `eduMax' {

            file write  `file' _tab _tab  "// EN Edu `edu'" _n

            forv cage = 1 / `nChild' {
                file write `file' _tab _tab (p1[`cage']) ", "
                forv mage = 2 / `ageMax1' {
                    file write `file' (p`mage'[`cage']) ", "
                }
                file write `file' (p`ageMax'[`cage']) "," _n
            }
            drop if edu == `edu'
            }

    file write `file' _tab "};" _n // close fem with kids table

    file write `file' _n

    file write `file' _tab "//EN Probability to be in a partnership - Females not living with children" _n
    file write `file' _tab "double"  _tab "InUnionProbNoChildren[PART_AGE_RANGE][EDUC_LEVEL3] = {" _n

    use "$save\fempart_nokids.dta", clear

    keep if geo == "`geo'"

    loc max = _N
    display `max'
    forv a = 1/ `max'{
        file write `file' _tab _tab (low[`a']) ","
        file write `file' (medium[`a']) ","
        file write `file' (high[`a']) "," _n
    }

    file write `file' _tab "};" _n // close fem with no kids table

    file write `file' "};" _n  // Close parameters

    file close _all

end

********************************************************************************
* WRITE .DAT FILES
********************************************************************************

foreach geo of local Ctry{
fempartner2dat, geo("`geo'")
}

cap log c
exit

Immigration.do

This file produces the parameters contained in Immigration_2010.dat. The first parameter gives the age distribution of newly arriving immigrants using EUROSTAT data. The second parameter calculates the age distribution of mothers at birth based on the starting population file.

The last parameter contaings the total number of immigrants per year of the simulation. The latter parameter is not relevant in this applicatioin and simply set to zero, since EUROSTAT published only data on total net-migration in it’s population projections. If projections on total number of immigrants become available these can easily be added in the parameterfile.

Currently the parameterfile contains three parameters:

  • AgeOfImmigrantMother[FERTILE_AGE_RANGE] giving the age distribution of mothers at birth
  • ImmigrationAgeSexAll[SEX][AGE_RANGE] giving the age distribution of immigrants
  • ImmigrationTotal[SIM_YEAR_RANGE] gives the total number of immigrants per year which is set to zero in this application

Data Source:

  • starting population file based on EUSILC
  • EUROSTAT migr_imm8

Version: March 2019

Author(s): Tom Horvath



loc Ctry $Geosample
loc Sex $Sex
loc simstart $startyear
loc yearmax $yearmax
loc agemax $maxage
*gl maxagerange 105

*Prepare data for eurostat based parameters

cd $eurostat

clear

eurostatuse migr_imm8, long geo(`Ctry') noerase

keep if agedef == "COMPLET"

drop agedef agedef_label age_label unit_label sex_label geo_label flags_migr_imm8

replace age = "Y0" if age == "Y_LT1"

drop if age ==  "Y_GE100"
drop if age == "TOTAL" | age == "UNK"

replace age = subinstr(age, "Y", "", .)
destring age , replace

rename time year
rename migr_imm8 n_immi

drop if sex == "T"
keep if year >= $startyear & year < 2019

sort geo year sex age

bys geo sex age: egen n_mig_avg = mean(n_immi)
by geo sex age: keep if _n == 1

by geo sex: egen n_tot = total(n_mig_avg)
ge immi_share = n_mig_avg/n_tot

sort geo sex age

qui sum age
local agelimit = r(max)
local nexp = $maxage - `agelimit' +1
display `nexp'
expand `nexp' if age == `agelimit'
sort geo sex age
by geo sex: replace age = age[_n-1] +1 if _n > `agelimit'
replace immi_share = 0 if age > `agelimit'
save $eurostat\immi_shares.dta, replace


*Prepare data for eusilc based parameters (starting population file)

use $startpop/startpop.dta, clear

* mark families with kids aged 0 to 10 with mothers born abroad
cap drop is_mother

ge is_mother = 0
gsort country idfamily -idmother
by country idfamily: replace is_mother = 1 if idperson == idmother[1]

ge mother_tag = is_mother //& foreign_born

ge child_tag = age < 18 & is_child

bys country idfamily: egen fam_with_mother = max(mother_tag)

by country idfamily: egen child_0_18 = max(child_tag)

ge sample = fam_with_mother == 1 & child_0_18 == 1

* derive age of mother at birth
ge mother_age_tmp = age if is_mother == 1

bys country idfamily: egen mother_age = min(mother_age_tmp)

keep if mother_age <= 41

ge mother_age_at_birth = mother_age - age if child_tag == 1

order country idfamily sample mother_age* age fam_with_mother child_0_*

keep if child_tag == 1 & mother_age_at_birth >= 16 & mother_age_at_birth <= 40

keep if sample == 1

bys country: egen pop = total(dwt)
bys country mother_age_at_birth: egen age_gr = total(dwt)
by country mother_age_at_birth: keep if _n == 1
ge share_age = age_gr / pop

keep country mother_age_at_birth share_age

save $save/age_immi_mother.dta, replace

********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop immig2dat
pr de immig2dat

    syntax , GEO(string)

    clear all

    cap file close _all

    * Create file
    tempname file
    loc startyear = $startyear
    loc yearmax = $yearmax
    loc simyears = $yearmax - $startyear +1

    file open `file' using "$param/`geo'_Immigration_`startyear'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by Immigration.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters

    file write `file' _tab "//EN Age distribution of mothers at birth" _n
    file write `file' _tab "cumrate" _tab    "AgeOfImmigrantMother[FERTILE_AGE_RANGE] = {" _n
    use $save/age_immi_mother.dta, clear
    keep if country == "`geo'"
    qui sum mother_age
    loc minage = r(min)    //16
    loc maxage = r(max) //40
    loc anz_age = 50 - `minage' + 1 // 49-16+1
    display `anz_age'
    forvalues i = 1(1)`anz_age' {
        if `minage' + `i' - 1 < = `maxage' {
        file write `file' (share_age[`i']) ","
        }
        else if `minage' + `i' - 1 > `maxage'{
        file write `file' "0,"
        }
    }
    file write `file' _tab "};" _n

    use "$eurostat\immi_shares.dta", clear

    keep if geo == "`geo'" // Country
    ge male = sex == "M"
    qui sum age
    loc minage = r(min)+1        //need to start at position 1 not 0
    loc age_limit = r(max)+1

    file write `file' _tab "//EN Age-Sex distribution of immigrants" _n
    file write `file' _tab "double" _tab "ImmigrationAgeSexAll[SEX][AGE_RANGE] = {" _n

    forvalues s = 0(1)1{
        forvalues a = `minage'(1)`age_limit'{
        file write `file' _tab (immi_share[`a']) ","
        }
        file write `file' _n
        file write `file' _n
        drop if male == `s'
    }
    file write `file' _tab "};" _n

    file write `file' _tab "//Total number of immigrants" _n
    file write `file' _tab "long" _tab "ImmigrationTotal[SIM_YEAR_RANGE] = {" _n
    file write `file' _tab "(`simyears') 0, "_n
    file write `file' _tab "};" _n
    file write `file' _tab "}; " _n

    file close _all

end

********************************************************************************
* WRITE .DAT FILES
loc Ctry $Geosample
foreach geo of local Ctry{
cap immig2dat , geo("`geo'")
}

MaleChildlessness.do

This file produces a parameter for male childlessness by birthyear and education level stored in the parameterfile MaleChildlessness_2010.dat. The calculation of the share of males remaining childless are taken from SHARE data for AT and ES and from the cohort fertility database for FI.

Currently the parameterfile contains one parameter:

  • MaleCohortChildlessness[YOB_MALEFERT][BIRTH1_GROUP]

Data Source:

Version: March 2019

Author(s): Tom Horvath



clear all

loc Ctry $Geosample

cd $share_input

use weltransim_share, clear

***********************************************************
* Countries
***********************************************************

tab country

ge geo = substr(mergeid,1,2)

ge keep = 0
foreach geo of loc Ctry {
replace keep = 1 if geo == "`geo'"
}

keep if keep == 1

***********************************************************
* Men only
***********************************************************

keep if gender == 1

***********************************************************
* Create socio-demographic variables
***********************************************************

// Childless
g byte childless = ch001_ == 0

// Education
ren isced1997_r isced97
drop if isced97 < 0 | mi(isced97)
drop if isced97 == 95 // still in school
drop if isced97 == 97 // other
cap drop educ
recode isced97 (0/2=0) (3/4=1) (5/6=2), gen(educ)
la de educ 0 "low" 1 "medium" 2 "high"
la val educ educ
la var educ "education"

// Age group
drop if int_year < 0 | mi(int_year)
drop if yrbirth < 0 | mi(yrbirth)
g int age = int_year - yrbirth
drop if age < 0 | mi(age)
egen age_group = cut( age ), at( 0, 55, 60, 65, 70, 75, 80, 150 ) icodes
la de age_group 0 "0-54" 1 "55-59" 2 "60-64" 3 "65-69" 4 "70-74" 5 "75-79" 6 "80+"
la val age_group age_group


***********************************************************
* Share of childless men by education
***********************************************************

table age_group educ, by(country)
table age_group educ [w=int(cciw_w4)], c(mean childless) by(country)
table educ [w=int(cciw_w4)] if age>=40, c(mean childless) by(country)
tab country educ [w=int(cciw_w4)], sum(childless) noobs nost nof

keep if age >= 40 & age <= 65

bys country educ: egen pop = total(cciw_w4)
bys country educ childless: egen tmp = total(cciw_w4)

keep if childless == 1

ge share_childless = tmp/pop

bys country educ: keep if _n==1

keep geo educ share_childless

rename geo country

save $save\male_childlessness.dta, replace

/* #############################################################################
                                    Finland
############################################################################# */

import delimited using "$cfe\Finland_Population Register 2015.csv", varnames(1) clear

destring cohort, ge(birthyear) force

keep if sex == "M"

qui sum birthyear
loc last_cohort = r(max)

drop if birthyear > 2015 - 40
drop if birthyear < 2015 - 65
drop if birthyear == .

keep if origin == "Total" // do not destinguish natives and foreignborn here

ge educ = 0
replace educ = 1 if edu ==  "ISCED3A-4A"
replace educ = 2 if edu == "ISCED5B-6"

keep birthyear educ women_total parity_0

collapse (sum) women_total parity_0, by(educ)

ge share_childless = parity_0/women_total

ge country = "FI"

save $save\male_childlessness_FI.dta, replace

use $save\male_childlessness.dta, clear

append using $save\male_childlessness_FI.dta

save $save\male_childlessness.dta, replace

********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop maleChild2dat
pr de maleChild2dat

syntax , GEO(string)
cap file close _all

use $save\male_childlessness.dta, clear

keep if country == "`geo'"
loc startyear $startyear
loc do $dofile
qui sum educ
loc minedu = r(min)
loc maxedu = r(max)
loc anz_edu = `maxedu'-`minedu'+1
display `anz_edu'

loc cohort_l $YOB_MALEFERT_L
loc cohort_u $YOB_MALEFERT_U
loc anz = `cohort_u' - `cohort_l' +1
display `anz'

    * Create file
    tempname file
    *file open `file' using "$param\BaseEduc_AT_2016.dat", w replace
    file open `file' using "$param/`geo'_MaleChildlessness_`startyear'.dat", w replace

    * Header
    file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\MaleChildlessness.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters
    file write `file' "//EN Male cohort childlessness" _n
    file write `file' _tab "double" _tab "MaleCohortChildlessness[YOB_MALEFERT][BIRTH1_GROUP] = {" _n
    file write `file' _tab _tab "(`anz') {"
    forvalues i = 1(1)`anz_edu' {
    file write `file'(share_childless[`i']) ", "
    }
    file write `file' _tab "}, " _n
    file write `file' _tab "}; " _n
    file write `file' _n
    file write `file' "};" _n

file close _all
end

********************************************************************************
* WRITE .DAT FILES
loc Ctry $Geosample
foreach geo of local Ctry{
maleChild2dat , geo("`geo'")
}
/*
parameters {

     //EN Male cohort childlessness
    double    MaleCohortChildlessness[YOB_MALEFERT][BIRTH1_GROUP] = {
        (131) {
            0.18, 0.14, 0.1,
        },
    };
};
*/

NetMigration.do

Data from EUROSTAT population projections are used to produce the parameters on net migration by sex, age and simulationyear stored in the parameterfile NetMigration_2010.dat.

The parameterfile contains two parameters:

  • MIGRATION_SETTINGS: model choice parameter
  • NetMigrationSexPeriodAge[SEX][SIM_YEAR_RANGE][AGE_RANGE]: total number of net migrants by sex, age and simulation year (2010 to 2150)

Data Source:

Version: Oct. 2020

Author(s): Tom Horvath



loc Ctry $Geosample
loc Sex $Sex
loc simstart $startyear
loc yearmax $yearmax
loc agemax $maxage
*gl maxagerange 105

cd $eurostat

********************************************************************************
* Load Mortality DATA
********************************************************************************
clear all

eurostatuse proj_19nanmig, long  noerase

keep if geo == "AT" | geo == "ES" | geo == "FI" // Uk is NA

drop projection_label unit unit_label sex_label age_label geo_label flags_proj_19nanmig

keep if projection == "BSL"

replace age = "Y0" if age == "Y_LT1"

drop if age ==  "Y_GE100"
drop if age == "TOTAL"

replace age = subinstr(age, "Y", "", .)
destring age , replace

rename time year

drop if sex == "T"

rename proj_19nanmig net_mig

save $eurostat\net_mig2019.dta, replace

clear

eurostatuse migr_imm8, long geo(`Ctry') noerase

keep if agedef == "COMPLET"

drop agedef agedef_label age_label unit_label sex_label geo_label flags_migr_imm8

keep if geo == "AT" | geo == "ES" | geo == "FI" |geo == "UK" // Uk is NA

replace age = "Y0" if age == "Y_LT1"

drop if age ==  "Y_GE100"
drop if age == "TOTAL" | age == "UNK"

replace age = subinstr(age, "Y", "", .)
destring age , replace

rename time year
rename migr_imm8 n_immi

drop if sex == "T"
keep if year >= $startyear & year < 2019

sort geo year sex age

save $eurostat\immi_upto_2019.dta, replace

clear
eurostatuse migr_emi2, long geo(`Ctry') noerase
keep if agedef == "COMPLET"

drop agedef agedef_label age_label unit_label sex_label geo_label flags

keep if geo == "AT" | geo == "ES" | geo == "FI" |geo == "UK" // Uk is NA

replace age = "Y0" if age == "Y_LT1"

drop if age ==  "Y_GE100"
drop if age == "TOTAL" | age == "UNK"

replace age = subinstr(age, "Y", "", .)
destring age , replace

rename time year

rename migr_emi2 n_emi

drop if sex == "T"
keep if year >= $startyear & year < 2019

sort geo year sex age

save $eurostat\emi_upto_2019.dta, replace
mmerge geo year sex age using $eurostat\immi_upto_2019.dta, type(1:1)
ge net_mig = n_immi-n_emi

save $eurostat\net_mig_upto_2019.dta, replace

use $eurostat\net_mig_upto_2019.dta, clear

append using $eurostat\net_mig2019.dta

replace net_mig = 0 if missing(net_mig)

* expand data to last simyear
qui sum year
loc ylow = r(min)
loc ymax = r(max)
loc nexpand = $yearmax -`ymax' +1
display `nexpand'

loc yearchange = `ymax' - `ylow' + 1
display `yearchange'

expand `nexpand' if year == `ymax'
sort geo sex age year
by geo sex age: replace year = year[_n-1]+1 if _n> `yearchange'

replace net_mig = 0 if year > `ymax' | missing(net_mig)

sort geo sex year age
save $eurostat\net_mig.dta, replace

use $eurostat\net_mig.dta, clear
bys geo year: egen tot_mig = total(net_mig)

ge sh_mig = net_mig/tot_mig
replace sh_mig =. if year != 2018

bys geo sex age: egen sh_mig_hyp = min(sh_mig)

loc nexp = $yearmax - 2018 + 1
expand `nexp' if geo == "UK" & year == 2018
sort geo sex age year

by geo sex age: replace year = year[_n-1] + 1 if year == 2018

replace net_mig = round(sh_mig_hyp * 265000) if year == 2019 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 255000) if year == 2020 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 236000) if year == 2021 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 224000) if year == 2022 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 213000) if year == 2023 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 201000) if year == 2024 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 190000) if year >= 2025 & geo == "UK"

sort geo sex year age
save $eurostat\net_mig.dta, replace

********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop netmig2dat
pr de netmig2dat

    syntax , GEO(string)

    clear all

    cap file close _all
    use "$eurostat\net_mig.dta", clear

    keep if geo == "`geo'" // Country
    ge male = sex == "M"

    * Create file
    tempname file
    loc startyear = $startyear
    loc yearmax = $yearmax
    loc do $dofile

    qui sum age
    loc minage = r(min)
    loc age_limit = r(max)


    loc agemax $maxage

    file open `file' using "$param/`geo'_Netmigration_`startyear'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'_Netmigration.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters
    file write `file' _tab "//EN Migration Settings" _n
    file write `file' _tab "MIGRATION_SETTINGS" _tab "MigrationSettings = MSE_NET;" _n

    file write `file' _tab "//EN Net migration by age and sex" _n
    file write `file' _tab "double" _tab "NetMigrationSexPeriodAge[SEX][SIM_YEAR_RANGE][AGE_RANGE] = {" _n // open fertility table


    forvalues s = 0(1)1{
        forvalues t = `startyear'(1)`yearmax'{
            forvalues i = 0(1)`agemax' {

                if `i' <= `age_limit' {
                    loc j = `i' + 1
                    file write `file' (net_mig[`j']) ", "
                }
                else if `i' > `age_limit' {
                    file write `file' "0 , "
                }
            }
        file write `file' _n
        drop if year == `t' & male == `s'
        }
    file write `file' _n
    file write `file' _n
    }

    file write `file' _tab "}; " _n

    file write `file' _tab "};" _n


    file close _all

end

********************************************************************************
* WRITE .DAT FILES
loc Ctry $Geosample
foreach geo of local Ctry{
cap netmig2dat , geo("`geo'")
}

PartnerMatching.do

Based on the starting population file two parameters storen in PartnerMatching_2010.dat are produced. Firstly, the distribution of partners’ education levels by own education is derived and secondly the age distribution of females’ partners dependung on their own age.

While the first is simply taken from a cross tabulation of education levels from partners aged 25 to 35 observed in the starting population file the latter is derived by smoothing observed age distributions of females’ parterns by assuming a normal distribution of partners’ age with mean and standard deviation as observed in the data. To acchieve smoother distributions these calculations are based on 5 year age groups before disaggregating the parameter to single year of age categories used in the model parameter.

The parameterfile contains one parameter:

  • PartnerEducation[EDUC_LEVEL3][EDUC_LEVEL3] giving the education distribution of parterns by females’ highest level of education
  • PartnerAgeDistribution[PART_AGE_RANGE][MALE_PART_AGE_RANGE] ginving the age distribution of females’ partners by own age

Data Source:

  • starting population file based on EU-SILC data

Version: Oct. 2020

Author(s): Tom Horvath



use $startpop/startpop.dta, clear

loc Ctry $Geosample

rename deh isced

ge intwgt=round(dwt)
drop if intwgt == .

drop if age <= 15

keep if idpartner != 0 & idpartner != . // keep couples only

tab country

* recode education and sex variable
recode isced                                                          ///
    ( 1/2 = 1)                                                  ///
    ( 3/4 = 2)                                                  ///
    ( 5/6 = 3)                                                 ///
    , gen(edu)

    cap lab def edu_VL 1 "Low" 2 "Med" 3 "High"
    la val edu edu_VL

ge fem = sex == 0

ge edu_spouse = 0
ge age_spouse = 0

order country idhh idperson  fem edu edu_spouse idpartner

gsort country idhh -fem

bys country idhh: replace edu_spouse = edu[_n+1] if _n < _N & idpartner == idperson[_n+1]
bys country idhh: replace edu_spouse = edu[_n+2] if _n + 1 <_N & idpartner == idperson[_n+2]
bys country idhh: replace edu_spouse = edu[_n+3] if _n + 2 <_N & idpartner == idperson[_n+3]
bys country idhh: replace edu_spouse = edu[_n+4] if _n + 3 <_N & idpartner == idperson[_n+4]
    la val edu_spouse edu_VL

bys country idhh: replace age_spouse = age[_n+1] if _n < _N & idpartner == idperson[_n+1]
bys country idhh: replace age_spouse = age[_n+2] if _n + 1 < _N & idpartner == idperson[_n+1]
bys country idhh: replace age_spouse = age[_n+3] if _n + 2 < _N & idpartner == idperson[_n+2]
bys country idhh: replace age_spouse = age[_n+4] if _n + 3 < _N & idpartner == idperson[_n+3]
    la val age_spouse age_VL

keep if fem == 1 & age_spouse != 0

save "$save\partner2dat.dta", replace

* -----------------------------------------------------------------
* Generate Dataset für Educational Distribution of females' spouses
* -----------------------------------------------------------------
use "$save\partner2dat.dta", clear

keep if edu != 0 & edu_spouse != 0 & age >= 25 & age <= 35

drop if edu ==. | edu_spouse == .

rename country geo

sort geo edu edu_spouse

by geo edu: egen totnum = total(intwgt)

by geo edu edu_spouse: egen totedu = total(intwgt)

ge edu_share = totedu / totnum

by geo edu edu_spouse: keep if _n == 1

keep geo edu edu_spouse edu_share

save "$save\partnerEdu2dat.dta", replace

* -----------------------------------------------------------------
* Generate Dateset for females' spouses age distriubtion
* --> normal distribution of spouse's age by mean and standard deviation of
* spouses age observed in data
* -----------------------------------------------------------------

foreach geo of glo Geosample {
use "$save\partner2dat.dta", clear

recode age (15/19 = 17) (20/24 = 22) (25/29 = 27) (30/34 = 32) (35/39 = 37) ///
            (40/44 = 42) (45/49 = 47) (50/54 = 52) (55/59 = 57) (60/64 = 62) (65/69 = 67) ///
            (70/74 = 72) (75/79 = 77) (80/84 = 82) (85/89 = 87), ge(age5)

keep if country == "`geo'"

bys country age5: egen m_age = mean(age_spouse)

bys country age5: egen sd_age = sd(age_spouse)

bys country age5: keep if _n == 1

keep country *age*

by country: replace m_age = m_age[_n+1] - 3 if age5 == 17 & m_age > m_age[_n+1] // some unlikely values here

by country: replace sd_age = 5 if age5 == 17 & sd_age > 9 // some unlikely values here

by country: replace sd_age = 5 if age5 == 22 & sd_age > 9 // some unlikely values here

expand 5

sort age5

ge agey = 14 + _n // age of females in 1-year categories

by age5: ge dif_parameter = _n - 3

replace m_age = m_age + dif_parameter

drop dif_parameter

cap drop age_*
forv a = 15/105{
 ge age_`a' = 0
 replace age_`a' = round(normal((`a'-m_age)/sd_age),0.001)
}

ge age_cor_15 = age_15

forv a = 16/105{
loc amin = `a'-1
 ge age_cor_`a' = 0
 replace age_cor_`a' = age_`a' - age_`amin'
}

replace age_cor_15 = age_cor_16 if age_cor_15 > age_cor_16

forv a = 15 / 104{
    loc aplus = `a'+1
    replace age_cor_`a' = 0 if age_cor_`aplus' == 0
}

egen check = rowtotal(age_cor_*)

sum check

keep agey age_cor*

keep if agey <=80

save "$save\partner2dat_`geo'.dta", replace
}

********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop partner2dat
pr de partner2dat

    syntax , GEO(string)

    clear all

    cap file close _all
    loc startyear $startyear
    loc do $dofile

    * Create file
    tempname file
    file open `file' using "$param/`geo'_partnerMatching_`startyear'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\PartnerMatching.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters

    ** Parameter table for mortality hazard rates
    file write `file' _tab "//EN Partner Education" _n
    file write `file' _tab "cumrate" _tab "PartnerEducation[EDUC_LEVEL3][EDUC_LEVEL3] = {" _n // open partner education table

    * Get values for mortality rates
    use "$save\partnerEdu2dat.dta", clear

    keep if geo == "`geo'" // Country

    sort geo edu edu_spouse

    qui sum edu
    loc eduMax = r(max)
    display `eduMax'
    loc eduMax_low = `eduMax' - 1

    qui sum edu_spouse
    loc edu_SMax = r(max)
    display `edu_SMax'
    loc edu_SMax_low = `edu_SMax' - 1

    forv s = 1 / `eduMax' {

        file write `file' _tab _tab (edu_share[1]) ", "

        forv a = 2 / `edu_SMax_low' {
            file write `file' (edu_share[`a']) ", "
        }

        file write `file' (edu_share[`edu_SMax']) ", " _n

        drop if edu == `s'
    }


    file write `file' _tab "};" _n // close educ table

    file write `file' _n

    ** Parameter table for mortality hazard rates
    file write `file' _tab "//EN Distribution of partner ages by age of female partner" _n
    file write `file' _tab "double" _tab "PartnerAgeDistribution[PART_AGE_RANGE][MALE_PART_AGE_RANGE] = {" _n // open partner education table

    * Get values for Age distribution
    use "$save\partner2dat_`geo'.dta", clear

    sort agey

    loc agemin = 1
    loc agemax = _N
    loc agemax_min1 = `agemax' - 1

    loc partage_min = 15
    loc partage_max_min1 = 104
    loc partage_max = 105

    forv a = `agemin' / `agemax' {    // agey = 15 bis 80.. 1 bis 65

        forv pa = `partage_min' / `partage_max_min1' {

            file write `file' (age_cor_`pa'[`a']) ","

        }
        file write `file' (age_cor_`partage_max'[`a']) "," _n
    }

    file write `file' _tab "};" _n // close age distribution table

    file write `file' "};" _n  // Close parameters

    file close _all

end


********************************************************************************
* WRITE .DAT FILES
*

foreach geo of local Ctry{
partner2dat, geo("`geo'")
}

cap log c
exit

PersonCore.do

The file creates PersonCore_2010.dat file containing some relevant information on the starting population file such as the name and the sample size of the starting population file.

The parameterfile contains following parameters:

  • MicroDataInputFile containing the name of the staring population mirco data file
  • MicroDataInputFileSize gives the number of oversations contained in the starting population file
  • StartPopSize is the real population size in the start year of the simulation
  • StartPopSampleSize is the default value for number of persons simulated
  • WriteMicrodata is a boolean parameter indicating whether a micro data output file should be produced
  • TimeMicroOutput[OUTPUT_TIMES] give the years for which the microdatafile shall be produced
  • MicroRecordFileName gives the name of the micro data file

Data Source:

  • starting population file based on EU-SILC data

Version: Oct. 2020

Author(s): Tom Horvath



use $startpop/startpop.dta, clear
drop if idhh ==.

loc Ctry $Geosample

cap pr drop generate_dat
pr de generate_dat


    syntax , GEO(string) POPSAMPLEsize(integer)

    loc geo = upper("`geo'")
    qui count if country == "`geo'"
    loc size = r(N)

    cap drop totpop
    bys country: egen totpop = total(weight)
    qui sum totpop if country == "`geo'"
    loc realpopsize = r(max)

    loc startyear = $startyear

    loc do $dofile

    cap file close _all

    tempname file
    //file open `file' using "$para/StartPop`geo'`year'.dat", w replace
    file open `file' using "$param/`geo'_PersonCore_`startyear'.dat", w replace

    file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\PersonCore.do"

    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters

    * Parameters for starting population
    *file write `file' _tab "file" _tab "MicroDataInputFile = " _char(34) "pop_2010_`geo'.csv" _char(34) ";" _n
    file write `file' _tab "file" _tab "MicroDataInputFile = " _char(34) "final_pop_2010_`geo'.csv" _char(34) ";" _n
    file write `file' _tab "long" _tab "MicroDataInputFileSize = `size';" _n
    file write `file' _tab "double" _tab "StartPopSize = `realpopsize';" _n
    file write `file' _tab "double" _tab "StartPopSampleSize = `popsamplesize';" _n
    file write `file' _n
    file write `file' _tab "//EN Write micro-data output file Y/N" _n
    file write `file' _tab "logical" _tab "WriteMicrodata = FALSE;" _n
    file write `file' _tab "//EN Time(s) of micro-data output" _n
    file write `file' _tab "double" _tab "TimeMicroOutput[OUTPUT_TIMES] = {" _n
    file write `file' _tab "2010, 2050, 10," _n
    file write `file' _tab "};" _n
    file write `file' _tab "//EN File name micro-data output file" _n
    file write `file' _tab "file" _tab "MicroRecordFileName =" _char(34) "MicroDataOutput.csv" _char(34) ";"_n
    file write `file' "};" _n  // Close parameters

    file close _all

end

foreach geo of local Ctry{
generate_dat, geo("`geo'") popsample(75000)
}

RefinedEducFate.do

We use information on highest educational level attained and parents highest level of education to estimate the probability of attaining low, medium or high education given the maximum of the parents highest education is Low, Medium or High. The resulting parameter is stored in the parameter file RefinedEducFate. The parameter is taken from a cross tabulation of own education and parents highest education based on EU-LFS data from the 2009 ad-hoc module.

The parameterfile contains one parameter:

  • EducModel is the model choice parameter indicating whether the refined education model shall be used
  • EducFirstCohortRefinedModel indicated from which birth cohort onwards the refined education modell shall be used
  • EducProg1Odds[EDUC_GROUP][SEX] gives the odds ratio of progressing from low to medium or high education by parents’ highest level of education and sex
  • EducProg2Odds[EDUC_GROUP][SEX] gives the odds ratio of progressing from medium to high education by parents’ highest level of education and sex

Data Source:

  • EU-LFS 2009 adhoc module

Version: Oct. 2020

Author(s): Tom Horvath and Marian Fink




* ------------------------------------------------------------------------------
* 1. IMPORT CSV DATA TO STATA FORMAT
* ------------------------------------------------------------------------------

* Ad hoc module
* --------------------------------------
loc Ctry $Geosample

foreach country of glo Geosample {

    import delimited using "$adhoc/`country'2009_y.csv", ///
       clear delimiters(",") varnames(1) asdouble stripquotes(yes)

    keep age hat97lev sex ahm2009_stopdate ahm2009_parhat country hhnum hhseqnum qhhnum year coeff

    g byte quarter = real( substr( qhhnum, 2, 1 ) )

    g pid = country + strofreal(year) + qhhnum + strofreal(hhseqnum)

    drop qhhnum

    ren ahm2009_parhat parhat
    ren ahm2009_stopdate stopdate

    recode parhat (.=-1) (9=-2)

    replace stopdate = -1 if stopdate == 0
    replace stopdate = -2 if stopdate == 999999

    recode stopdate parhat (-1=.a) (-2=.b) (-3=.c) (-4=.d) (-5=.e) (-9=.i)

    save "$save/ahm_`country'2009_y.dta", replace
}


* ------------------------------------------------------------------------------
* 2. GET FILES TOGEHTER
* ------------------------------------------------------------------------------

* Append ad hoc module files
* --------------------------------------

clear
set obs 1

foreach country of glo Geosample {
    ap using "$save/ahm_`country'2009_y.dta", nol
    rm "$save/ahm_`country'2009_y.dta"
}

drop if _n == 1

save "$save/ahm_2009_y.dta", replace

* recode education variable
* --------------------------------------
recode hat97lev (0 11 21 22 = 1 "L") (30 31 32 41 42 = 2 "M") ///
    (51 52 60 = 3 "H"), gen(edu_lev)

ge intwgt=int(1000*coeff)

* ------------------------------------------------------------------------------
* 5. PROBABILITIES
* ------------------------------------------------------------------------------

replace sex = 0 if sex == 2

drop if missing(parhat) | missing(edu_lev) //touse == 0

drop if age < 25 | age >35

egen educ_groups = group(parhat edu_lev sex country)

decode edu_lev, ge(edu_str)

collapse (sum) intwgt (firstnm) parhat edu_lev edu_str sex country, by(educ_group)

drop educ_group

rename country geo

rename parhat edu_par

sort geo sex edu_par edu_lev

by geo sex edu_par: egen sum_edu = total(intwgt)

ge edu_share = intwgt/sum_edu

drop sum_edu

gsort geo sex edu_par -edu_lev

by geo sex edu_par: ge edu_cum = sum(edu_share)

ge odds = edu_cum/(1-edu_cum)

sort geo sex edu_lev edu_par

by geo sex edu_lev: ge odds_ratio = odds/odds[1]

sort geo edu_lev edu_par  sex

drop if edu_lev == 1

save "$save/educ2dat.dta", replace

use "$save/educ2dat.dta", clear
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop transprob2dat
pr de transprob2dat

    syntax , GEO(string)

    clear all

    cap file close _all
    loc startyear $startyear
    loc do $dofile

    * Create file
    tempname file
    file open `file' using "$param/`geo'_RefinedEducFate_`startyear'.dat", w replace

    * Get values for mortality rates
    use "$save/educ2dat.dta", clear
    keep if geo == "`geo'" // Country

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\EducRefinedFate.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters
    file write `file' _tab "EDUC_MODEL  EducModel = EM_BASE; //EN Model Selection" _n
    file write `file' _tab "int" _tab "EducFirstCohortRefinedModel = `startyear'; //EN First birth cohort to apply refined model" _n

    qui sum edu_par
    loc edparmax = r(max)
    loc edparmax1 = `edparmax' - 1

    qui sum edu_lev
    loc ownedmax = r(max)
    loc ownedmax1 = `ownedmax' - 1

    ge sexstr = "F"
        replace sexstr = "M" if sex == 1

    ge edustr = "Medium" if edu_lev == 2
        replace edustr = "High" if edu_lev == 3


    loc i = 1
    forv own = 2/`ownedmax'{

        loc edstr = edustr

        file write `file' _tab _tab "//EN Odds of achieving `edstr'" _n
        file write `file' _tab _tab "double" _tab "EducProg`i'Odds[EDUC_GROUP][SEX] = {" _n

            file write `file'  _tab _tab  (odds_ratio[1]) ", "(odds_ratio[2]) "," _n
            file write `file'  _tab _tab (odds_ratio[3]) ", "(odds_ratio[4]) "," _n
            file write `file'  _tab _tab (odds_ratio[5]) ", "(odds_ratio[6]) "," "};" _n
            drop if edu_lev == `own'
            loc i = `i' +1
            display(`i')
        }


    file write `file' "};" _n  // Close parameters

    file close _all

end

********************************************************************************
* WRITE .DAT FILES

foreach geo of local Ctry{
transprob2dat, geo("`geo'")
}

cap log c
exit

RefinedFertility.do

Create first birth rates by age and highest level of education and a parameter on females’ childlessness by highest level of education stored in RefinedFertility.dat. The first parameter is estimated based on the staring population file. The probit model that we apply estimates the probability for a woman to have a first birth in a given year depending on her age and education. Data on female childlessness are taken from the cohort fertility data base for AT, ES and FI and a publication from Berrington et al. 2015 for the UK.

The parameterfile contains the following parameters:

  • selected_fertility_model is a model choice paramter indicating whether the refined fertility model shall be used
  • CalibrateChildlessness indicates whether childlessness shall be calibrated in order to meet predefined target values
  • ChildlessnessYob[YOB_START50][BIRTH1_GROUP] give the target values for female childlessness
  • ChildlessnessYobTargets[YOB_BIRTH1][BIRTH1_GROUP]
  • FirstBirthCohortRates[BIRTH1_GROUP][FERTILE_AGE_RANGE][YOB_BIRTH1] contain the first birth rates by education level, age and birth cohort

Data Source on first birth rates:

  • Starting population file based on EU-SILC

Data Source on childlessness:

Version: Oct. 2020

Author(s): Tom Horvath and Marian Fink



/* #############################################################################
                                    AUSTRIA
############################################################################# */

import delimited using "$cfe\Austria_Census 2001.csv", varnames(1) clear

loc yob_low $yob_low
loc yob_high $yob_high

destring cohort, ge(birthyear) force

keep if sex == "F"

qui sum birthyear
loc last_cohort = r(max)

keep if birthyear >= `yob_low' | birthyear == `last_cohort'
drop if birthyear > `yob_high'
keep if origin == "Total"

ge edu3 = 0
replace edu3 = 1 if edu ==  "ISCED3A-4A" | edu == "ISCED3B"
replace edu3 = 2 if edu == "ISCED5B-6"

keep birthyear edu3 women_total parity_0

collapse (sum) women_total parity_0, by(birthyear edu3)

ge share_childless = parity_0/women_total

qui sum birthyear
loc min_year = r(min)
loc max_year = r(max)
display `min_year'

loc n_exp_low = `min_year' - `yob_low' +1
display `n_exp_low'
expand `n_exp_low' if birthyear == `min_year'

sort edu birthyear

by edu: replace birthyear = `yob_low' if _n==1

by edu: replace birthyear = birthyear[_n-1] + 1 if _n > 1 & birthyear <= `min_year'

sort birthyear edu

loc n_exp_high = `yob_high' - `max_year' +1
display `n_exp_high'
expand `n_exp_high' if birthyear == `max_year'

sort edu birthyear

by edu: replace birthyear = birthyear[_n-1] + 1 if birthyear >= `max_year'

sort birthyear edu

save $save\childless_females_AT.dta, replace

/* #############################################################################
                                    Spain
############################################################################# */

import delimited using "$cfe\Spain_Census 2011.csv", varnames(1) clear

loc yob_low $yob_low
loc yob_high $yob_high

destring cohort, ge(birthyear) force

keep if sex == "F"

qui sum birthyear
loc last_cohort = r(max)

keep if birthyear >= `yob_low' | birthyear == `last_cohort'
drop if birthyear > `yob_high'
drop if birthyear == .

keep if origin == "Total"

ge edu3 = 0
replace edu3 = 1 if edu ==  "ISCED3B-4A" | edu == "ISCED3C"
replace edu3 = 2 if edu == "ISCED5B-6"

keep birthyear edu3 women_total parity_0

collapse (sum) women_total parity_0, by(birthyear edu3)

ge share_childless = parity_0/women_total


qui sum birthyear
loc min_year = r(min)
loc max_year = r(max)
display `min_year'

loc n_exp_low = `min_year' - `yob_low' +1
display `n_exp_low'
expand `n_exp_low' if birthyear == `min_year'

sort edu birthyear

by edu: replace birthyear = `yob_low' if _n==1

by edu: replace birthyear = birthyear[_n-1] + 1 if _n > 1 & birthyear <= `min_year'

sort birthyear edu

loc n_exp_high = `yob_high' - `max_year' +1
display `n_exp_high'
expand `n_exp_high' if birthyear == `max_year'

sort edu birthyear

by edu: replace birthyear = birthyear[_n-1] + 1 if birthyear >= `max_year'

sort birthyear edu

save $save\childless_females_ES.dta, replace

/* #############################################################################
                                    Finland
############################################################################# */

import delimited using "$cfe\Finland_Population Register 2015.csv", varnames(1) clear

loc yob_low $yob_low
loc yob_high $yob_high

destring cohort, ge(birthyear) force

keep if sex == "F"

qui sum birthyear
loc last_cohort = r(max)

keep if birthyear >= `yob_low' | birthyear == `last_cohort'
drop if birthyear > `yob_high'
drop if birthyear == .

keep if origin == "Total"

ge edu3 = 0
replace edu3 = 1 if edu ==  "ISCED3A-4A"
replace edu3 = 2 if edu == "ISCED5B-6"

keep birthyear edu3 women_total parity_0

collapse (sum) women_total parity_0, by(birthyear edu3)

ge share_childless = parity_0/women_total
qui sum birthyear
loc min_year = r(min)
loc max_year = r(max)
display `min_year'

loc n_exp_low = `min_year' - `yob_low' +1
display `n_exp_low'
expand `n_exp_low' if birthyear == `min_year'

sort edu birthyear

by edu: replace birthyear = `yob_low' if _n==1

by edu: replace birthyear = birthyear[_n-1] + 1 if _n > 1 & birthyear <= `min_year'

sort birthyear edu

loc n_exp_high = `yob_high' - `max_year' +1
display `n_exp_high'
expand `n_exp_high' if birthyear == `max_year'

sort edu birthyear

by edu: replace birthyear = birthyear[_n-1] + 1 if birthyear >= `max_year'

sort birthyear edu
save $save\childless_females_FI.dta, replace

/* #############################################################################
                                    UK
NOTE no data for UK in cfe-database
use \\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\parameterfiles\cfe-database\Berrington_ea__2015_ChildlessnessUK.pdf
instead

        Low        Medium    High
1940-49    0.084    0.120    0.185
1950-59    0.1        0.149    0.206
1960-69    0.102    0.139    0.22

############################################################################# */

use $save\childless_females_FI.dta, clear
drop women_total parity_0
replace share_childless = 0.084    if edu == 0 & birthyear <1950
replace share_childless = 0.120    if edu == 1 & birthyear <1950
replace share_childless = 0.185 if edu == 2 & birthyear <1950

replace share_childless = 0.1    if edu == 0 & birthyear <1960 & birthyear >=1950
replace share_childless = 0.149    if edu == 1 & birthyear <1960 & birthyear >=1950
replace share_childless = 0.206    if edu == 2 & birthyear <1960 & birthyear >=1950

replace share_childless = 0.102    if edu == 0 & birthyear >= 1960
replace share_childless = 0.139    if edu == 1 & birthyear >= 1960
replace share_childless = 0.22    if edu == 2 & birthyear >= 1960

save $save\childless_females_UK.dta, replace

********************************************************************************
********************************************************************************
*        Derive "first birth rates from starpop files
********************************************************************************
********************************************************************************

use  $startpop/startpop.dta, clear

loc Ctry $Geosample

drop if idfamily ==.

*rename deh isced

ge intwgt=round(dwt)
drop if intwgt == .

rename idfamily famid

sort country famid age
by country famid: egen n_kids = total(role == 2)
by country famid: ge age_youngest = age[1]

keep if female == 1 & role < 2

cap drop sample
ge sample = n_kids == 0 | (n_kids == 1 & age_youngest == 1)
tab country sample

ge first_birth = n_kids == 1 & age_youngest == 1

tab age country if sample == 1 & age >= 15 & age < 50 [fw=intwgt], sum(first_birth) noobs nost nof

keep if sample == 1 & age >= 15 & age <= 50

ge age2 = age*age

ge first_pr = 0
loc Ctry $Geosample
foreach geo of local Ctry{
probit first_birth age age2 educ [fw=intwgt] if country == "`geo'"
predict first_pr_`geo', pr
replace first_pr = first_pr_`geo' if country == "`geo'"
}

tab age country if sample == 1 & age >= 15 & age < 50 [fw=intwgt], sum(first_pr) noobs nost nof

bys country age educ: egen pop = total(intwgt)

bys country age educ first_birth: egen first_births = total(intwgt)

gsort country age educ -first_birth

by country age educ: keep if _n == 1

ge share_first = first_births/pop
replace share_first = 0 if missing(share_first)

keep country age educ share_first first_pr

sort country age educ

save $save\first_births.dta, replace

* prepare grid for parameter table
qui tab country
loc nGeo r(r)
display `nGeo'
scalar a = `nGeo'

qui sum age
loc minage = r(min)
loc maxage = r(max)
loc nAge = `maxage' - `minage' + 1
display `minage'
display `maxage'

qui sum edu
loc nEdu = r(max) + 1
display `nEdu'

clear

set obs 1

ge country = "A"

expand a

loc i = 1
foreach geo of glo Geosample {
 replace country = "`geo'" if _n == `i'
 loc i = `i'+1
}

expand `nAge' // expand to age_groups

bys country: ge age = _n + `minage' - 1

expand `nEdu' // expand to edu groups

bys country age: ge educ = _n - 1

save "$save\grid_firstbirths.dta", replace
mmerge country age edu using \$save\first_births.dta, type(1:1)
tab _merge

sort country edu age

replace first_pr = 0 if missing(first_pr)

// in case that no birth observed at age x take value from x-1
by country edu: replace first_pr = first_pr[_n-1] if first_pr == 0 & first_pr[_n-1] != 0 & first_pr[_n+1] != 0 & _n>1 & _n < _N

save $save\first_births.dta, replace

********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop refFert2dat
pr de refFert2dat

syntax , GEO(string)
cap file close _all

use  $save\childless_females_`geo'.dta, clear

loc year $startyear
loc do $dofile

qui tab birthyear
loc anz = r(N)
display `anz'

    * Create file
    tempname file
    file open `file' using "$param/`geo'_RefinedFertility_`year'.dat", w replace

    * Header
    file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\RefinedFertility.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters
    file write `file' "//EN Fertility model selection" _n
    file write `file' _tab "SELECTED_FERTILITY_MODEL" _tab    "selected_fertility_model = SFM_MACRO;"_n
    file write `file' _n
    file write `file' _tab "logical" _tab    "CalibrateChildlessness = TRUE; //EN Calibrate cohort childlessness to targets y/n" _n
    file write `file' _n
    file write `file' _tab "//EN Childlessness in older population (female)" _n
    file write `file' _tab "double" _tab "ChildlessnessYob[YOB_START50][BIRTH1_GROUP] = {" _n // YOB_START50 1904 1963
    forvalues i = 1(1)`anz' {
    file write `file' _tab (share_childless[`i']) ", "
    }
    file write `file' _tab "}; " _n
    file write `file' _tab "//EN Calibtration Targets Cohort Childlessness" _n
    file write `file' _tab "double" _tab    "ChildlessnessYobTargets[YOB_BIRTH1][BIRTH1_GROUP] = {" _n
    loc yob_bir_l $yob_birth_low
    loc yob_bir_h $yob_birth_high
    loc n_years = `yob_bir_h' - `yob_bir_l' + 1
    qui sum birthyear
    loc lastyear = r(max)
    keep if birthyear == `lastyear'
    file write `file' _tab "(`n_years'){" (share_childless[1]) "," (share_childless[2]) "," (share_childless[3]) ",}" _n
    file write `file' _tab "}; " _n
    file write `file' _n
    file write `file' "//EN First Birth Rates" _n
    file write `file' "double" _tab    "FirstBirthCohortRates[BIRTH1_GROUP][FERTILE_AGE_RANGE][YOB_BIRTH1] = {"_n //15-49
    use $save\first_births.dta, clear
    keep if country == "`geo'"
    *keep if country == "AT"
    qui sum edu
    loc nedu = r(max)
    display `nedu'
    qui sum age
    loc agemin = r(min)
    loc agemax = r(max)
    display `agemax'

    loc yob_bir_l $yob_birth_low
    loc yob_bir_h $yob_birth_high
    loc n_years = `yob_bir_h' - `yob_bir_l' + 1

    forvalues educ = 0(1)`nedu' {
        *display `nedu'
        forvalues a = `agemin'(1)`agemax'{
            loc x = `a'-`agemin' + 1
            *display `x'
            file write `file' "(`n_years')" (first_pr[`x']) "," _n
        }
        drop if educ == `educ'
        file write `file' _n
    }
    file write `file' _tab "};" _n
    file write `file' _tab "};" _n

file close _all
end

********************************************************************************
* WRITE .DAT FILES
loc Ctry $Geosample
foreach geo of local Ctry{
refFert2dat , geo("`geo'")
}

RefinedMortality.do

Paramters on sex and education specific remaining life expectancy at age 25 and 65 are produced based on OECD data and XXX for Spain and stored in RefinedMortality.dat.

The parameterfile contains the following parameters:

  • SelectedMortalityModel is a model choice parameter indicating whether the refined mortality model shall be used.
  • LifeExpectancy[SEX][LIFE_EXPECT][SIM_YEAR_RANGE][MORTALITY_GROUP] gives the remaining life expectancy by education level (low, medium or high), sex and agegroup (25 and 65).

Data Source:

  • for AT, FI and UK: Murtin, F., et al. (2017), “Inequalities in longevity by education in OECD countries: Insights from new OECD estimates”, OECD Statistics Working Papers, No. 2017/02, OECD Publishing, Paris
  • for ES: Requena, Miguel (2017) La desigualdad ante la muerte: educación y esperanza de vida en España. Perspectives Demogràfiques Nr. 06.

Version: April 2019

Author(s): Tom Horvath and Marian Fink



loc startyear $startyear
loc endyear $yearmax

loc do $dofile

clear all

cap file close _all

* ------------------------------------------------------------------------------
* Create file AUT
* ------------------------------------------------------------------------------
tempname file

file open `file' using "$param/AT_RefinedMortality_`startyear'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\RefinedMortality.do"
    file write `file' _n

        file write `file' "parameters" _n
        file write `file' "{" _n // Open parameters

        file write `file' _tab "SELECTED_MORTALITY_MODEL SelectedMortalityModel = SMM_MICRO_ALIGNED; //EN Child mortality model selection" _n

        loc numbyears = `endyear' - `startyear' + 1
        display `numbyears'

            file write `file' _tab  "//EN Period life expecatancy" _n
            file write `file' _tab "double" _tab "LifeExpectancy[SEX][LIFE_EXPECT][SIM_YEAR_RANGE][MORTALITY_GROUP] = {" _n // open fertility table

            file write `file' _tab "// Female" _n
                file write `file' _tab _tab "// LE_25" _n
                file write `file' _tab _tab "(`numbyears') { 57.58,    59.29,    60.63,}," _n
                file write `file' _tab _tab "// LE_65" _n
                file write `file' _tab _tab "(`numbyears') { 20.58,    21.49,    22.34,}," _n

            file write `file' _tab "// Male" _n
                file write `file' _tab _tab "// LE_25" _n
                file write `file' _tab _tab "(`numbyears') { 51.40,    54.03,    57.83,}," _n
                file write `file' _tab _tab "// LE_65" _n
                file write `file' _tab _tab "(`numbyears') { 16.79,    17.92,    20.04,}," _n

    file write `file' _tab "};" _n
file write `file' "};"

* ------------------------------------------------------------------------------
* Create file FI
* ------------------------------------------------------------------------------
tempname file

file open `file' using "$param/FI_RefinedMortality_`startyear'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\RefinedMortality.do"
    file write `file' _n

        file write `file' "parameters" _n
        file write `file' "{" _n // Open parameters

        file write `file' _tab "SELECTED_MORTALITY_MODEL SelectedMortalityModel = SMM_MICRO_ALIGNED; //EN Child mortality model selection" _n

        loc numbyears = `endyear' - `startyear' + 1
        display `numbyears'

            file write `file' _tab  "//EN Period life expecatancy" _n
            file write `file' _tab "double" _tab "LifeExpectancy[SEX][LIFE_EXPECT][SIM_YEAR_RANGE][MORTALITY_GROUP] = {" _n // open fertility table

            file write `file' _tab "// Female" _n
                file write `file' _tab _tab "// LE_25" _n
                file write `file' _tab _tab "(`numbyears') { 56.18,    59.06,    60.94,}," _n
                file write `file' _tab _tab "// LE_65" _n
                file write `file' _tab _tab "(`numbyears') { 20.68,    21.56,    22.70,}," _n

            file write `file' _tab "// Male" _n
                file write `file' _tab _tab "// LE_25" _n
                file write `file' _tab _tab "(`numbyears') { 48.98,    52.51,    56.56,}," _n
                file write `file' _tab _tab "// LE_65" _n
                file write `file' _tab _tab "(`numbyears') { 16.57,    17.60,    19.40,}," _n

    file write `file' _tab "};" _n
file write `file' "};"

* ------------------------------------------------------------------------------
* Create file ES !!!! no real data at the moment Italien data implemented
* ------------------------------------------------------------------------------
tempname file

file open `file' using "$param/ES_RefinedMortality_`startyear'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\RefinedMortality.do"
    file write `file' _n

        file write `file' "parameters" _n
        file write `file' "{" _n // Open parameters

        file write `file' _tab "SELECTED_MORTALITY_MODEL SelectedMortalityModel = SMM_MICRO_ALIGNED; //EN Child mortality model selection" _n

        loc numbyears = `endyear' - `startyear' + 1
        display `numbyears'

            file write `file' _tab  "//EN Period life expecatancy" _n
            file write `file' _tab "double" _tab "LifeExpectancy[SEX][LIFE_EXPECT][SIM_YEAR_RANGE][MORTALITY_GROUP] = {" _n // open fertility table


            file write `file' _tab "// Female" _n
                file write `file' _tab _tab "// LE_25" _n
                file write `file' _tab _tab "(`numbyears') { 60.02,    61.01,    61.72,}," _n
                file write `file' _tab _tab "// LE_65" _n
                file write `file' _tab _tab "(`numbyears') {  22.33,    22.75,    23.27,}," _n

            file write `file' _tab "// Male" _n
                file write `file' _tab _tab "// LE_25" _n
                file write `file' _tab _tab "(`numbyears') { 53.60,    55.47,    57.29,}," _n
                file write `file' _tab _tab "// LE_65" _n
                file write `file' _tab _tab "(`numbyears') {18.18,    18.77,    19.52,}," _n

    file write `file' _tab "};" _n
file write `file' "};"

* ------------------------------------------------------------------------------
* Create file UK
* ------------------------------------------------------------------------------
tempname file

file open `file' using "$param/UK_RefinedMortality_`startyear'.dat", w replace

    * File header
    file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\RefinedMortality.do"
    file write `file' _n

        file write `file' "parameters" _n
        file write `file' "{" _n // Open parameters

        file write `file' _tab "SELECTED_MORTALITY_MODEL SelectedMortalityModel = SMM_MICRO_ALIGNED; //EN Child mortality model selection" _n

        loc numbyears = `endyear' - `startyear' + 1
        display `numbyears'

            file write `file' _tab  "//EN Period life expecatancy" _n
            file write `file' _tab "double" _tab "LifeExpectancy[SEX][LIFE_EXPECT][SIM_YEAR_RANGE][MORTALITY_GROUP] = {" _n // open fertility table

            file write `file' _tab "// Female" _n
                file write `file' _tab _tab "// LE_25" _n
                file write `file' _tab _tab "(`numbyears') { 56.74,    59.64,    60.73,}," _n
                file write `file' _tab _tab "// LE_65" _n
                file write `file' _tab _tab "(`numbyears') { 19.74,    21.88,    22.52,}," _n

            file write `file' _tab "// Male" _n
                file write `file' _tab _tab "// LE_25" _n
                file write `file' _tab _tab "(`numbyears') { 53.84,    56.65,    58.19,}," _n
                file write `file' _tab _tab "// LE_65" _n
                file write `file' _tab _tab "(`numbyears') { 17.61,    19.40,    20.55,}," _n

    file write `file' _tab "};" _n
file write `file' "};"
/*
From Data Table:

    AT Age 25
    F 57.58    59.29    60.63
    M 51.40    54.03    57.83
    Age 65
    F 20.58    21.49    22.34
    M 16.79    17.92    20.04

    Fi Age 25
    F 56.18    59.06    60.94
    M 48.98    52.51    56.56
    Age 65
    F 20.68    21.56    22.70
    M 16.57    17.60    19.40

    UK Age 25
    F 56.74    59.64    60.73
    M 53.84    56.65    58.19
    Age 65
    F 19.74    21.88    22.52
    M 17.61    19.40    20.55
*/

SchoolEnrolment.do

Based on the starting population file we derive the share of people in education by age (18 to 30) and gender which are stored in SchoolEnrolment.dat. These enrolement rates by age and sex are assumed to remain stable throughout the simulation period.

The parameterfile contains the following parameters:

  • SchoolEnrolmentRates[SEX][AGE_EDUC_ALIGN][SIM_YEAR_RANGE] give school enrolment rates by sex, age and simulation year (2010 to 2150)
  • AlignSchoolEnrolmentRates is a model choice parameter indicating whether enrolment rates shall be aligned

Data Source:

  • Starting population file based on EU-SILC

Version: March 2020

Author(s): Tom Horvath



use $startpop/startpop.dta, clear

loc Ctry $Geosample

drop if idhh ==.

rename deh isced

ge intwgt=round(dwt)
drop if intwgt == .

keep if age >= 15 & age <= 30

ge in_edu = dec > 0

tab age country [fw = round(intwgt)] if age>=15 & age <=30, sum(in_edu) noob nof nost

bys country age female: egen pop = total(intwgt)
bys country age female in_edu: egen pop_inedu = total(intwgt)

keep if in_edu == 1

ge in_edu_share = pop_inedu/pop

bys country age female: keep if _n == 1

keep country age female in_edu_share

gsort country -female age

save $save/schoolEnrolment.dta, replace

********************************************************************************
* PROGRAM FOR WRITING .DAT FILES

cap pr drop inedu2dat
pr de inedu2dat

syntax , GEO(string)
cap file close _all

use  $save/schoolEnrolment.dta, clear

keep if country == "`geo'"
loc startyear $startyear
loc do $dofile

qui sum age
loc minage = r(min)
loc maxage = r(max)
loc anz_age = `maxage'-`minage'+1
display `anz_age'

loc simyears = $yearmax - $startyear +1
display `simyears'

    * Create file
    tempname file
    file open `file' using "$param/`geo'_SchoolEnrolment_`startyear'.dat", w replace

    * Header
    file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
    file write `file' "// generated by `do'\SchoolEnrolment.do"
    file write `file' _n

    file write `file' "parameters" _n
    file write `file' "{" _n // Open parameters

    file write `file' _tab "//EN School enrolment rates" _n
    file write `file' _tab "double" _tab "SchoolEnrolmentRates[SEX][AGE_EDUC_ALIGN][SIM_YEAR_RANGE] = { " _n
    file write `file' _tab _tab "//Female" _n
    forvalues i = 1(1)`anz_age' {
    file write `file' _tab _tab "(`simyears')" _tab (in_edu_share[`i']) ", "
    }
    file write `file' _n
    drop if female == 1
    file write `file' _tab _tab "//Male" _n
    forvalues i = 1(1)`anz_age' {
    file write `file' _tab _tab "(`simyears')" _tab (in_edu_share[`i']) ", "
    }
    file write `file' _n
    file write `file' _tab "}; " _n
    file write `file' _tab "//EN Align school enrolment rates" _n
    file write `file' _tab "logical" _tab    "AlignSchoolEnrolmentRates = TRUE;" _n
    file write `file' _tab "};" _n

file close _all
end

********************************************************************************
* WRITE .DAT FILES

foreach geo of local Ctry{
inedu2dat , geo("`geo'")
}