Analysis Scripts¶
This section is a technical documentation of the statistical analysis for parameter generation organized in a collection of STATA scripts. Scripts typically correspond to parameter files belonging to one specific module of microWELT. The first script calls all other scripts, thus for reproducing all model parameters, just this first script has to be executed.
0_Base_File.do¶
This file sets paths to data and parameter folders and defines global values for the simulation, such as the start and end year of the simulation. Notice that the usage of the eurostatuse command in STATA requires a non-relational directory path where EUROSTAT data are stored. After setting the paths to all directories and setting globals all parameter generating do-files are called, where these do-files are stored in the dofile directory.
Directories that must be defined:
- dofile contains all parameter do-files
- param defines where parameter files shall be stores
- cfe contains data from the cohort fertility database
- eurostat here EUROSTAT data for population projection parameters or population characteristics are stores. This must be a non-relational path.
- startpop contains the starting population micro dataset
- lfsin contains EUROSTAT’s European Labour force survey scientific use files
- adhoc contains EUROSTAT’s ad-hoc modules of the European Labour force survey use files
- save sets the path to where STATA saves datafiles
Global values must be set for:
- yearmax defines the last year of the simulation period
- startyear is the first year of the simulation
- lfs_y defines the year for which lfs-based calculations are made. This may but must not correspond to the startyear of the simulation.
- Geosample names the countries included in the analysis
Other globals define birth cohort limits used in different parameter files.
clear all
*******************************************
* set paths to dofile and parameter folder:
*******************************************
*gl dofile "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\parameterfiles\dofiles"
*gl param "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\parameterfiles\datfiles"
gl param "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\TESTPARAMETERS\paratest"
gl dofile "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\TESTPARAMETERS"
**************************************************
* set paths to folders conatining divers datasets:
**************************************************
gl cfe "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\parameterfiles\cfe-database"
gl eurostat "D:\horvath\EUROSTAT"
gl startpop "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\startpop"
gl lfsin "\\int.wsr.at\Nabu\restriktive_Daten\EC\WELTRANSIM_RPP_167-2017-ECHP_LFS_EU-SILC\Daten\LFS\vol-2\YearlyFiles"
gl adhoc "\\int.wsr.at\Nabu\restriktive_Daten\EC\WELTRANSIM_RPP_167-2017-ECHP_LFS_EU-SILC\Daten\LFS\vol-1\AdhocModules\LFS_ahm_2009"
gl save "\\int.wsr.at\Nabu\restriktive_Daten\EC\WELTRANSIM_RPP_167-2017-ECHP_LFS_EU-SILC\Auswertungen\TEST"
*gl save "\\int.wsr.at\Nabu\restriktive_Daten\EC\WELTRANSIM_RPP_167-2017-ECHP_LFS_EU-SILC\Auswertungen"
gl share_input "\\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\SHARE_TMP"
************************************************
* define simulation period, sex and maximum age:
************************************************
gl yearmax "2150"
gl startyear "2010"
gl lfs_y "2014"
gl Sex "F M"
gl maxage 105
************************
* define country sample:
************************
gl Geosample "AT ES FI"
*****************************************
* globals needed in MaleChildlessness.do:
*****************************************
gl YOB_MALEFERT_L 1920
gl YOB_MALEFERT_U 2050
********************************
* used in refined fertility do:
********************************
gl yob_birth_low 1960
gl yob_birth_high 2050
gl yob_low 1900
gl yob_high 1959
********************************************************************************
* run do-files generating parameterfiles. !! need to run startpop files first !!
********************************************************************************
do $dofile\PersonCore.do
do $dofile\BaseMortality.do
* UK data no longer contained in eurostat population projections!
* needs eurostat data (unzipped) in a non-relational directory:
* proj_19naasmr <-- variable name proj_19naasmr must be updated in dofile when new projections are used
do $dofile\RefinedMortality.do
* data for Spain are from "Defunciones 2012-14_MR.xlsx"
* data for AT, ES und FI: "Inequalities-in-longevity-by-education-in-OECD-countries.xlsx
do $dofile\BaseFertility.do
* UK data no longer contained in eurostat population projections!
* needs eurostat data (unzipped) in a non-relational directory:
* proj_19naasfr <-- variable name proj_19naasfr must be updated in dofile when new projections are used
do $dofile\RefinedFertility.do
* data for AT, ES and FI from cohort fertilty database http://www.cfe-database.org/database/
* data for UK from Berrington et al. 2015
do $dofile\BaseEduc.do
do $dofile\EducationPattern.do
do $dofile\SchoolEnrolment.do
do $dofile\RefinedEducFate.do
do $dofile\FemalePartnership.do
do $dofile\PartnerMatching.do
do $dofile\NetMigration.do
* UK data no longer contained in eurostat population projections!
* UK data taken from office of nation statistics (hard coded in dofile!)
do $dofile\Emigration.do
do $dofile\Immigration.do
do $dofile\FamilyLinks.do
do $dofile\MaleChildlessness.do
BaseEduc.do¶
Creates parameters on education progression rates by sex and year of birth contained in BaseEduc_2010.dta. These are derived by calculating education shares of the population aged 30 to 34 (assuming that highest education levels are typically acchieved before age 30) and then dividing the cumulated shares i.e. the share achieving educaion levels at least as high as “X” by the cumulated shares of achieving at least an education level lower than “X”.
This results in progression rates reflecting the probability that someone how has education level “X” progresses to an higher education level “Y”.
The parameterfile contains two education progression rate:
- EducProg1 gives the probability of progressing from low to medium education
- EducProg2 gives the probability of progressing from medium to high education
Data Source: EU-LFS 2014 Quarter 1-4
Version: March 2018
Author(s): Tom Horvath
clear all
loc Ctry $Geosample
* ------------------------------------------------------------------------------
* 1. IMPORT CSV DATA TO STATA FORMAT
* ------------------------------------------------------------------------------
* Yearly file
* --------------------------------------
display("$Geosample")
foreach country of glo Geosample {
import delimited ///
using "$lfsin/`country'_YEAR_1998_onwards/`country'${lfs_y}_y.csv", ///
clear delimiters(",") varnames(1) asdouble stripquotes(yes)
save "$save/`country'${lfs_y}_y.dta", replace
}
* ------------------------------------------------------------------------------
* 2. GET FILES TOGEHTER
* ------------------------------------------------------------------------------
* Append files
* --------------------------------------
clear
set obs 1
foreach country of glo Geosample {
ap using "$save/`country'${lfs_y}_y.dta", nol force
rm "$save/`country'${lfs_y}_y.dta"
}
drop if _n == 1
save "$save/lfs_sample_${lfs_y}_y.dta", replace
use "$save/lfs_sample_${lfs_y}_y.dta", replace
* recode education variables
*---------------------------
recode hat11lev ///
( 100/200 = 1) ///
( 300/499 = 2) ///
( 500/800 = 3) ///
, gen(last_edu)
cap lab def last_edu_VL 1 "Low" 2 "Med" 3 "High"
lab val last_edu last_edu_VL
lab var last_edu "highest edu"
ge intwgt=int(1000*coeff)
drop if intwgt == .
drop if last_edu == . | last_edu == 0
ge female = sex == 2
ge male = sex == 1
keep if age >= 25 & age < 35
bys country female: egen tot_pop = total (intwgt)
bys country female last_edu: egen edu_pop = total (intwgt)
ge edu_share = edu_pop / tot_pop
bys country female last_edu: keep if _n == 1
keep country female last_edu edu_share
gsort country female -last_edu
by country female: ge cumsum = sum(edu_share)
by country female: ge prog = cumsum/cumsum[_n+1] if _n<_N
keep if prog !=.
gsort country last_edu -female
drop edu_share cumsum
save $save\base_educ.dta, replace
********************************************************************************
* Program for writing dat file
********************************************************************************
loc Ctry $Geosample
cap pr drop baseeduc2dat
pr de baseeduc2dat
syntax , GEO(string)
cap file close _all
use $save\base_educ.dta, clear
keep if country == "`geo'"
qui sum prog
loc nedu = r(N)
loc startyear = $startyear
loc do $dofile
* Create file
tempname file
file open `file' using "$param/`geo'_BaseEduc_`startyear'.dat", w replace
* Header
file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\BaseEduc.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
//EN Education progression probability low -> medium
file write `file' _tab "double" _tab "EducProg1[SEX][YOB_EDUC_PROG1] = { " _n
file write `file' _tab _tab "(61)" _tab (prog[1]) ", " _n
file write `file' _tab _tab "(61)" _tab (prog[2]) ", " _n
file write `file' _tab "}; " _n
//EN Education progression probability medium -> high
file write `file' _tab "double" _tab "EducProg2[SEX][YOB_EDUC_PROG2] = {" _n
file write `file' _tab _tab "(71)" _tab (prog[3]) ", " _n
file write `file' _tab _tab "(71)" _tab (prog[4]) ", " _n
file write `file' _tab "}; " _n
file write `file' _tab "};" _n
file close _all
end
********************************************************************************
* WRITE .DAT FILES
foreach geo of local Ctry{
baseeduc2dat , geo("`geo'")
}
BaseFertility.do¶
Create age-specific fertility rates contained in BaseFertility_2010.dta based on historic data and projections provided by EUROSTAT. Projected fertility rates correspond to the baseline projection rates of EUROSTAT population forecast.
Two parameters are produced:
- AgeSpecificFertility: contains the fertility rate of females by age (15 to 49) over the entire simulation period (2010 to 2150)
- SexRatio: give the relation between males and females born each year. This is currently set to 101 for each country.
Data Source:
- Fertilityrate projections by EUROSTAT (proj_19naasfr)
- Hisortic Fertilityrates (demo_frate)
Version: Jan 2019
Author(s): Marian Fink, Tom Horvath
cd $eurostat
loc Ctry $Geosample_POP
********************************************************************************
* PREPARE FERTILITY DATA
********************************************************************************
* Perpare FR for years before 2015
* --------------------------------
clear
foreach geo of local Ctry{
display("`geo'")
}
eurostatuse demo_frate, long geo(`Ctry') noerase
drop if age == "TOTAL" | age == "Y10-14" | age == "Y15-19"| age == "Y20-24" ///
| age == "Y25-29" | age == "Y30-34" | age == "Y35-39" | age == "Y40-44" ///
| age == "Y45-49" | age == "Y_GE50"
drop if time < $startyear
save "demo_frate.dta", replace
* prepare FR for years 2015 onwards (up to 2080)
* ----------------------------------------------
clear
eurostatuse proj_19naasfr, long geo(`Ctry') noerase
drop if age == "Y_LT15" | age == "Y_GE50" | age == "TOTAL"
save "proj_19naasfr.dta", replace
* Get files together
* ------------------
use "proj_19naasfr.dta", clear
g proj = 1
ap using "demo_frate.dta"
replace proj = 0 if mi(proj)
drop unit*
drop *label
replace age = subinstr(age, "Y", "", .)
destring age , replace
ge value = proj_19naasfr if proj == 1
replace value = demo_frate if proj == 0
drop proj_19naasfr demo_frate
rename value frate
rename time year
replace projection = "NULL" if proj == 0
drop proj
sort geo year age projection
keep geo year age projection frate
compress
save "fertility_data.dta", replace
********************************************************************************
* PREPARE DATA for DTA-FILE
* - Reshape file
* - Expand data to cover years up to 2150 (assume constand FR from 2080 onwards)
********************************************************************************
use "fertility_data.dta", clear
g byte proj = 1 if projection == "BSL"
replace proj = 2 if projection == "LFRT"
replace proj = 0 if projection == "NULL"
drop projection
reshape wide frate, i( year geo age ) j( proj )
ren frate0 frate
ren frate1 frate_BSL
cap ren frate2 frate_LFRT
qui su year
loc y = r(max)
loc n = $yearmax - `y' + 1
display `n'
expand `n' if year == `y' , gen(copy)
qui su year
loc y = r(max)
sort geo year age copy
by geo year age: replace year = year + _n - 1 if year == `y'
replace frate_BSL = frate if !mi(frate )
cap replace frate_LFRT = frate if !mi(frate )
drop frate copy
save "fertility2dat.dta", replace
use "fertility2dat.dta", clear
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop fertility2dat
pr de fertility2dat
syntax , GEO(string) FRate(string)
clear all
cap file close _all
loc do $dofile
* Create file
tempname file
loc year $startyear
display(`year')
file open `file' using "$param/`geo'_BaseFertility_`year'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\BaseFertility.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
** Parameter table for fertility rates
file write `file' _tab "//EN Age distribution of fertility" _n
file write `file' _tab "double" _tab "AgeSpecificFertility[FERTILE_AGE_RANGE][SIM_YEAR_RANGE] = {" _n // open fertility table
* Get values for fertility rates
use "fertility2dat.dta", clear
keep if geo == "`geo'" // Country
if "`frate'" == "BSL" {
cap drop frate_LFRT
ren frate_BSL frate
}
sort age year
loc age1 = age[1]
loc ageN = age[_N]
loc y1 = 1
loc y11 = 2
loc yN = $yearmax - ($startyear - 1) //2150 - 2009
loc yN1 = `yN' - 1 //2150 - 2009 - 1
display `y1'
display `y11'
display `yN'
display `yN1'
* Write fertility rate values into parameter file
forv a = `age1' / `ageN' {
*display `y1'
file write `file' _tab _tab (frate[`y1']) ", "
forv y = `y11' / `yN1' {
*display `y'
file write `file' (frate[`y']) ", "
}
file write `file' (frate[`yN']) ", " _n
drop if age == `a'
}
file write `file' _tab "};" _n // close life table
** Parameter table for fertility rates
file write `file' _tab "//EN Sex ratio (male per 100 female)" _n
file write `file' _tab "double" _tab "SexRatio[SIM_YEAR_RANGE] = {" _n // open fertility table
file write `file' _tab "(`yN') 101," _n
file write `file' _tab "};" _n // close sexratio table
file write `file' "};" _n // Close parameters
file close _all
end
********************************************************************************
* WRITE .DAT FILES
foreach geo of local Ctry{
fertility2dat , geo("`geo'") fr("BSL")
}
BaseMortality.do¶
Creates sex and age specific mortality hazard rates contained in BaseMortality_2010.dat from life tables provided by EUROSTAT. We use age-sex-specific mortality rates contained in the life tables to compute mortality hazards (exits to death). Hazard rates are calculated by the forumla hx = -log(1-qx) where qx is the age and sex specific mortality rate for a given year.
The parameterfile contains the parameter:
- MortalityTable[SEX][AGE_RANGE][SIM_YEAR_RANGE] containing age and sex specific over the entire simulation period (2010 to 2150).
Data Source:
- Mortalityrate projections by EUROSTAT (proj_19naasmr)
Version:Jan 2019
Author(s): Marian Fink, Tom Horvath
loc Ctry $Geosample_POP
loc Sex $Sex
loc simstart $startyear
*loc Yearmax $yearmax
*gl maxagerange 105
cd $eurostat
********************************************************************************
* Load Mortality DATA
********************************************************************************
clear all
eurostatuse proj_19naasmr, long geo(`Ctry') noerase
replace age = "Y0" if age == "Y_LT1"
drop if age == "Y_GE100"
replace age = subinstr(age, "Y", "", .)
destring age , replace
rename time year
g byte proj = 1 if projection == "BSL"
replace proj = 2 if projection == "LMRT"
* Calculate Hazardrate:
rename proj_19naasmr qx
g double hx = -log(1-qx) // hazard rate
keep geo sex age year hx proj
sort geo year age sex proj
reshape wide hx, i(geo year age sex) j(proj)
ren hx1 mrate_BSL
cap ren hx2 mrate_LMRT
sort geo sex age year
* -----------------------------------------
* Extend mortality rates for age 100 to 105
* -----------------------------------------
qui sum age
loc maxage = r(max)
loc b = $maxage - `maxage' + 1
display `b'
expand `b' if age == `maxage', gen(copy)
sort geo sex year age copy
by geo sex year: replace age = _n -1 if age == `maxage'
* -----------------------------------------
* Extend data to year 2150
* -----------------------------------------
qui su year
loc y = r(max)
loc n = $yearmax - `y' + 1
display `n'
cap drop copy
expand `n' if year == `y' , gen(copy)
sort geo sex year age copy
by geo sex year age: replace year = year + _n - 1 if year == `y'
drop copy
* -----------------------------------------
* Extend data for years 2010 - 2014
* -----------------------------------------
sort geo sex age year
qui su year
loc y = r(min)
loc n = `y' - $startyear + 1
display `n'
expand `n' if year == `y' , gen(copy)
gsort geo sex year age -copy
by geo sex year age: replace year = year - `n' + _n if year == `y' & copy == 1
sort geo sex age year
cap drop copy
save "hr_mortality_19.dta", replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop mortality2dat
pr de mortality2dat
syntax , GEO(string) MRate(string)
clear all
cap file close _all
* Create file
tempname file
loc year = $startyear
loc do $dofile
file open `file' using "$param/`geo'_BaseMortality_`year'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\BaseMortality.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
** Parameter table for mortality hazard rates
file write `file' _tab "//EN Mortality hazard by age" _n
file write `file' _tab "double" _tab "MortalityTable[SEX][AGE_RANGE][SIM_YEAR_RANGE] = {" _n // open fertility table
* Get values for mortality rates
use "hr_mortality_19.dta", clear
keep if geo == "`geo'" // Country
if "`mrate'" == "BSL" {
cap drop mrate_LMRT
ren mrate_BSL frate
}
sort sex age year
loc age1 = age[1]
loc ageN = age[_N]
loc y1 = 1 //2010 - 2009
loc y11 = 2 //2010 - 2009 + 1
loc yN = $yearmax - ($startyear - 1) //2150 - 2009
loc yN1 = `yN' - 1 //2100 - 2009 - 1
display `yN'
display `yN1'
* Write mortality rate values into parameter file
ge male = sex == "M"
forv s = 0 / 1 {
forv a = `age1' / `ageN' {
file write `file' _tab _tab (frate[`y1']) ", "
forv y = `y11' / `yN1' {
file write `file' (frate[`y']) ", "
}
file write `file' (frate[`yN']) ", " _n
drop if age == `a' & male == `s'
}
}
file write `file' _tab "};" _n // close life table
file write `file' "};" _n // Close parameters
file close _all
end
********************************************************************************
* WRITE .DAT FILES
foreach geo of local Ctry{
mortality2dat , geo("`geo'") mr("BSL")
}
EducationPattern.do¶
Here we derive different generic education patterns that reflect different pathways through the education system stored in EducPattern_2010.dat. For each of the three modeled education levels (low, medium and high) we allow for 12 different pathways where pupils can transit between differnt education states.
There are currently 13 possible education states - not all of them are used in the current version:
- EP_LOW: current education state is “Low”
- EP_MED_DUAL: current education state is “Medium Dual”
- EP_MED_VOC: current education state is “Medium Vocational”
- EP_MED_GEN: current education state is “Medium General”
- EP_OUT1: first “Out of school” episode
- EP_HIGH1_FT: first episode of attenting higher education as full time student
- EP_HIGH1_PT: first episode of attenting higher education as part time student
- EP_OUT2: second “Out of school” episode
- EP_HIGH2_FT: second episode of attenting higher education as full time student
- EP_HIGH2_PT: second episode of attenting higher education as part time student
- EP_OUT3: third “Out of school” episode
- EP_HIGH3_FT: third episode of attenting higher education as full time student
- EP_HIGH3_PT: third episode of attenting higher education as part time student
While there are up to 12 different possible paths for each finally acchieved education level we currently only model three paths for each possible education level. Their probabilty is derived simply by assuming that each path for a given education outcome has equal probabilty.
For simplicity we also only use the states EP_LOW, EP_MED_GEN and EP_HIGH1_FT correspoing to time spent in low, medium and high education. In order to derive the number of years spent in each of these states we use information on the highest educational level attained and the age at which this highest level was obtained from EUROSTAT labor force survey data and derive for each education level (low, medium and high) the distribution of age at completion. For simplicity we assess age at education end at three points of the respective age distribution: the median as well as the 33rd and 66th percentiles. This gives us three different possible ages for the completion of each level of education. Assuming that pupils enter the school system at age 6 these age values result in a three different possible values for years spent in the education system for each level of education.
In a final step we split years spent in education into years spent in different education states.
- For those who acchieve low education the number of years spent in EP_LOW is simply on of three possible number of years spent in education.
- For those achieving medium education we deduct the average value of EP_LOW from their total years spent in education to derive the number of years spent in medium education (EP_MED_GEN).
- For those achieving high education we again assume that they spend the average number of years in lowest education before advancing to medium and higher education. The number of years spent in medium education is again derive my the mean number of year spent in medium education for those ending up with meidum education. Deducting mean number of years spent in low and in medium education results in the number of years spent in high education (EP_HIGH1_FT).
The parameterfile contains the parameter:
- EducPattern[EDUC_LEVEL3][EDUC_PATTERN_RANGE][EDUC_PATTERN] containing the number of years spent in each education state by highest level of education
- EducPatternDist[SEX][EDUC_LEVEL3][EDUC_PATTERN_RANGE] containing the probability for each path for given finally acchieved education level
Data Source: EU-LFS 2014 Quarter 1 - 4
Version: March 2018
Author(s): Tom Horvath
loc Ctry $Geosample
* ------------------------------------------------------------------------------
* 1. IMPORT CSV DATA TO STATA FORMAT
* ------------------------------------------------------------------------------
* Yearly file
* --------------------------------------
/*
foreach country of glo Geosample {
import delimited ///
using "$lfsin/`country'_YEAR_1998_onwards/`country'${lfs_y}_y.csv", ///
clear delimiters(",") varnames(1) asdouble stripquotes(yes)
save "$save/`country'${lfs_y}_y.dta", replace
}
* ------------------------------------------------------------------------------
* 2. GET FILES TOGEHTER
* ------------------------------------------------------------------------------
* Append ad hoc module files
* --------------------------------------
clear
set obs 1
foreach country of glo Geosample {
ap using "$save/`country'${lfs_y}_y.dta", nol
rm "$save/`country'${lfs_y}_y.dta"
}
drop if _n == 1
loc file = strtoname("$countries")
save "$save/`file'_${lfs_y}_y.dta", replace
*/
use "$save/lfs_sample_${lfs_y}_y.dta", replace
* recode education variables
*---------------------------
recode hat11lev ///
( 100/200 = 1) ///
( 300/499 = 2) ///
( 500/800 = 3) ///
, gen(last_edu)
cap lab def last_edu_VL 1 "Low" 2 "Med" 3 "High"
lab val last_edu last_edu_VL
lab var last_edu "highest edu"
recode educlevl ///
( 0/2 = 1) ///
( 3/4 = 2) ///
( 5/8 = 3) ///
( 9 = 4) ///
, gen(cur_edu)
cap lab def cur_edu_VL 1 "Low" 2 "Med" 3 "High" 4 "out"
lab val cur_edu cur_edu_VL
lab var cur_edu "current edu"
rename hatyear year_edu_attained
ge age_edu_attained = age - (year - year_edu_attained) if year_edu_attained != 9999
* in education
cap drop inedu
ge inedu = educstat == 1 | educstat == 3
la def inedu_VL 0 "no edu" 1 "yes edu"
la val inedu inedu_VL
* get probabilities
*------------------
ge intwgt=int(1000*coeff)
drop if intwgt == .
drop if last_edu == .
ge female = sex == 2
ge male = sex == 1
keep if age == 32 & intwgt != . & last_edu != 0
bys country female: egen ntot = total(intwgt)
bys country female last_edu: egen nedu = total(intwgt)
ge p_edu = nedu/ntot
sum p_edu
ge p_edu_fem = p_edu if female == 1
ge p_edu_male = p_edu if female == 0
bys country last_edu: egen p_edu_f = min(p_edu_fem)
by country last_edu: egen p_edu_m = min(p_edu_male)
drop p_edu_fem p_edu_male p_edu
* ------------------------------------------------------------------------------
* Prepare 12 paths per education level:
* ------------------------------------------------------------------------------
cap drop age*q
ge ageq1 = 0
ge ageq2 = 0
ge ageq3 = 0
ge ageq4 = 0
ge ageq5 = 0
ge ageq6 = 0
ge ageq7 = 0
ge ageq8 = 0
ge ageq9 = 0
ge ageq10 = 0
ge ageq11 = 0
ge ageq12 = 0
* at the moment only three paths are relevant: age at completion is dividet into
* three ageranges
foreach country of glo Geosample {
foreach edu of numlist 1 2 3 {
_pctile age_edu [fw=intwgt] if country == "`country'" & age == 32 & last_edu == `edu', p(33, 50, 66)
replace ageq1 = r(r1) if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq2 = r(r2) if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq3 = r(r3) if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq4 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq5 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq6 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq7 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq8 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq9 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq10 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq11 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
replace ageq12 = 0 if country == "`country'" & age == 32 & last_edu == `edu'
}
}
bys country last_edu: keep if _n == 1
keep country last_edu ageq* p_edu*
* generate patterns
reshape long ageq, i(country last_edu)
* korrigiere unplausible Werte
replace ageq = 14 if ageq < 14 & ageq > 0
* ------------------------------------------------------------------------------
* Prepare 13 possible education states per finally acchieved education level:
* ------------------------------------------------------------------------------
ge low = 0
ge med_dual = 0
ge med_voc = 0
ge med_gen = 0
ge out1 = 0
ge high1_ft = 0
ge high1_pt = 0
ge out2 = 0
ge high2_ft = 0
ge high2_pt = 0
ge out3 = 0
ge high3_ft = 0
ge high3_pt = 0
sort country last_edu _j
replace low = ageq if last_edu == 1
*replace med_dual = ageq if last_edu == 2
*replace med_voc = ageq if last_edu == 2
replace med_gen = ageq if last_edu == 2
*replace high1_pt = ageq if last_edu == 3
replace high1_ft = ageq if last_edu == 3
*replace high2_pt = ageq if last_edu == 3
*replace high2_ft = ageq if last_edu == 3
*replace high3_pt = ageq if last_edu == 3
*replace high3_ft = ageq if last_edu == 3
* ------------------------------------------------------------------------------
* subtract school starting age from age at completion i.o. to get years of total
* education duration
* ------------------------------------------------------------------------------
replace low = low - 6 if low > 0
*replace med_dual = med_gen - 6 if med_gen > 0
*replace med_voc = med_gen - 6 if med_gen > 0
replace med_gen = med_gen - 6 if med_gen > 0
*replace high1_pt = high1_pt - 6 if high1_pt > 0
replace high1_ft = high1_ft - 6 if high1_ft > 0
*replace high2_pt = high2_pt - 6 if high2_pt > 0
*replace high2_ft = high2_ft - 6 if high2_ft > 0
*replace high3_pt = high3_pt - 6 if high3_pt > 0
*replace high3_ft = high3_ft - 6 if high3_ft > 0
* ------------------------------------------------------------------------------
* for each education level higher thant "low" generate "years in low" (only for
* those paths with probabilty >0 <--> ageq >0)
* ------------------------------------------------------------------------------
cap drop tmp
ge tmp = 0
foreach c of global Geosample{
qui: sum low if low > 0 & country == "`c'"
replace tmp = r(mean) if ageq > 0 & country == "`c'"
}
by country: replace low = tmp if low == 0 & ageq > 0 // jeder pfad > low beginnt mit durchschnittlicher Dauer Low
* substract years in low from age at completion of medium education levels
replace med_dual = med_dual - low if med_dual > 0 & ageq > 0 // dauer für low_phase abziehen
replace med_voc = med_voc - low if med_voc > 0 & ageq > 0 // dauer für low_phase abziehen
replace med_gen = med_gen - low if med_gen > 0 & ageq > 0 // dauer für low_phase abziehen
* derive mean years spent in medium education:
cap drop tmp
ge tmp = 0
foreach c of global Geosample{
qui: sum med_gen if med_gen > 0 & country == "`c'"
replace tmp = r(mean) if ageq > 0 & country == "`c'"
}
by country: replace med_gen = tmp if med_gen == 0 & high1_ft > 0 & ageq > 0 // mindestdauer für PS abziehen
replace high1_pt = high1_pt - med_gen - low if high1_pt > 0 & ageq > 0
replace high1_ft = high1_ft - med_gen - low if high1_ft > 0 & ageq > 0
replace high2_pt = high2_pt - med_gen - low if high2_pt > 0 & ageq > 0
replace high2_ft = high2_ft - med_gen - low if high2_ft > 0 & ageq > 0
replace high3_pt = high3_pt - med_gen - low if high3_pt > 0 & ageq > 0
replace high3_ft = high3_ft - med_gen - low if high3_ft > 0 & ageq > 0
cap drop tmp
/* -----------------------------------------------------------------------------
derive probabilty of each path assuming equal probabilty for each path for given
education outcome
* ----------------------------------------------------------------------------*/
replace p_edu_f = 0 if ageq == 0 // no probabitly for those paths not considered
replace p_edu_m = 0 if ageq == 0 // no probabitly for those paths not considered
ge tmp = 0
replace tmp = 1 if ageq != 0
by country last_edu: egen npaths=total(tmp)
replace p_edu_f = p_edu_f / npaths if npaths>0
replace p_edu_m = p_edu_m / npaths if npaths>0
rename country geo
replace low =round(low)
replace med_dual =round(med_dual)
replace med_voc =round(med_voc)
replace med_gen =round(med_gen)
replace out1 =round(out1)
replace high1_ft =round(high1_ft)
replace high1_pt =round(high1_pt)
replace out2 =round(out2)
replace high2_ft =round(high2_ft)
replace high2_pt =round(high2_pt)
replace out3 =round(out3)
replace high3_ft =round(high3_ft)
replace high3_pt =round(high3_pt)
save "$save/educpattern2dat.dta", replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop pattern2dat
pr de pattern2dat
syntax , GEO(string)
clear all
cap file close _all
loc startyear = $startyear
loc do $dofile
* Create file
tempname file
file open `file' using "$param/`geo'_EducPattern_`startyear'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\EducationPattern.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
** Parameter table for mortality hazard rates
file write `file' _tab "//EN Education Pattern" _n
file write `file' _tab "int" _tab "EducPattern[EDUC_LEVEL3][EDUC_PATTERN_RANGE][EDUC_PATTERN] = {" _n // open fertility table
* Get values for mortality rates
use "$save/educpattern2dat.dta", clear
keep if geo == "`geo'" // Country
sort geo last_edu _j
loc edu1 = 1
loc eduMax = last_edu[_N]
display `eduMax'
qui sum _j
loc pathmax = r(max)
display `pathmax'
loc pathmax_min1 = `pathmax' - 1
display `pathmax_min1'
forv a = `edu1' / `eduMax' {
file write `file' _tab "// EL_`a'" _n
file write `file' _tab _tab (low[1]) ", " (med_dual[1]) ", " (med_voc[1]) ", " (med_gen[1]) ", "
file write `file' (out1[1]) ", " (high1_ft[1]) "," (high1_pt[1]) ", "
file write `file' (out2[1]) ", " (high2_ft[1]) "," (high2_pt[1]) ", "
file write `file' (out3[1]) ", " (high3_ft[1]) "," (high3_pt[1]) ", "
forv s = 2 / `pathmax_min1' {
file write `file' (low[`s']) ", " (med_dual[`s']) ", " (med_voc[`s']) ", " (med_gen[`s']) ", "
file write `file' (out1[`s']) ", " (high1_ft[`s']) "," (high1_pt[`s']) ", "
file write `file' (out2[`s']) ", " (high2_ft[`s']) "," (high2_pt[`s']) ", "
file write `file' (out3[`s']) ", " (high3_ft[`s']) "," (high3_pt[`s']) ", "
}
file write `file' (low[`pathmax']) ", " (med_dual[`pathmax']) ", " (med_voc[`pathmax']) ", " (med_gen[`pathmax']) ", "
file write `file' (out1[`pathmax']) ", " (high1_ft[`pathmax']) "," (high1_pt[`pathmax']) ", "
file write `file' (out2[`pathmax']) ", " (high2_ft[`pathmax']) "," (high2_pt[`pathmax']) ", "
file write `file' (out3[`pathmax']) ", " (high3_ft[`pathmax']) "," (high3_pt[`pathmax']) ", " _n
drop if last_edu == `a'
}
file write `file' "};" _n // close life table
file write `file' _tab "int" _tab "SchoolEntryAge = 6; //EN School entry age" _n
file write `file' _tab "double" _tab "StartSchoolYear = 0.66; //EN Start of school year" _n
file write `file' _tab "//EN Education Pattern Distribution" _n
file write `file' _tab "cumrate" _tab "EducPatternDist[SEX][EDUC_LEVEL3][EDUC_PATTERN_RANGE] = {" _n
* Get values for education pattern probabilities (by sex)
use "$save/educpattern2dat.dta", clear
keep if geo == "`geo'" // Country
reshape long p_edu_, i(last_edu _j) j(sex) string
sort sex last_edu _j
tostring last_edu, replace force
replace last_edu = "Low" if last_edu == "1"
replace last_edu = "Medium" if last_edu == "2"
replace last_edu = "High" if last_edu == "3"
replace sex = "FEMALE" if sex == "f"
replace sex = "MALE" if sex == "m"
loc Sex "FEMALE MALE"
loc Edu "Low Medium High"
foreach sex of loc Sex {
foreach edu of loc Edu{
file write `file' _tab "//" "`sex'""_""`edu'" _n
forv i = 1/`pathmax'{
file write `file' _tab _tab (p_edu_[`i']) ","
}
file write `file' _n
drop if last_edu == "`edu'" & sex == "`sex'"
}
}
file write `file' _tab "};" _n // close life table
file write `file' "};" _n // Close parameters
file close _all
end
********************************************************************************
* WRITE .DAT FILES
foreach geo of local Ctry{
pattern2dat , geo("`geo'")
}
cap log c
exit
Emigration.do¶
Emigration related parameters stored in Emigration_2010.dat are produced using data from EUROSTAT on the total number of emigrants per sex and age and EUROSTAT population data. We derive emigration rates by 5 year age groups and sex. Additionally the model allows to define the total number of emigrants in the parameter EmigraitonTotal. The latter parameter is not relevant in this applicatioin and simply set to zero, since EUROSTAT published only data on total net-migration in it’s population projections. If projections on total number of emigrants become available these can easily be added in the parameterfile.
The parameterfile contains three parameters:
- EmigrationSettings
- EmigrationRates[SEX][AGE5_PART] gives the emigration rate of the population by sex and 5 year age group
- EmigrationTotal[SIM_YEAR_RANGE] gives the total number of emigrants per year which is set to zero in this application
Data Source:
- EUROSTAT migr_emi2 containg the total number of emigrants by sex and age
- EUROSTAT demo_pjan containing the size of the resident population by sex and 5 year age group
Version: March 2018
Author(s): Tom Horvath
loc Ctry $Geosample
loc Sex $Sex
loc simstart $startyear
loc yearmax $yearmax
loc agemax $maxage
*gl maxagerange 105
cd $eurostat
clear
eurostatuse migr_emi2, long geo(`Ctry') noerase
keep if agedef == "COMPLET"
drop agedef agedef_label age_label unit_label sex_label geo_label flags
replace age = "Y0" if age == "Y_LT1"
drop if age == "Y_GE100" | age == "TOTAL" | age == "UNK"
replace age = subinstr(age, "Y", "", .)
destring age , replace
rename time year
rename migr_emi2 n_emi
ge age5part=floor(age/5)
replace age5part = 18 if age5part >18
bys geo sex year age5part: egen n_emi5=total(n_emi)
bys geo sex year age5part: keep if _n==1
keep if year>=2014
bys geo sex age5part: egen n_emi_avg=mean(n_emi5)
bys geo sex age5part: keep if _n == 1
save $eurostat\emi_avg.dta, replace
loc Ctry $Geosample
clear
eurostatuse demo_pjan, long geo(`Ctry') noerase
drop unit *_label flags
rename time year
keep if year >=2014 & year < 2019
drop if sex == "T"
replace age = "Y0" if age == "Y_LT1"
drop if age == "Y_GE100"
drop if age == "TOTAL" | age == "UNK" | age == "Y_OPEN"
replace age = subinstr(age, "Y", "", .)
destring age , replace
ge age5part=floor(age/5)
replace age5part = 18 if age5part >18
bys geo sex year age5part: egen n_pop5=total(demo_pjan)
bys geo sex year age5part: keep if _n==1
keep if year>=2014
bys geo sex age5part: egen n_pop_avg=mean(n_pop5)
bys geo sex age5part: keep if _n == 1
mmerge geo sex age5part using $eurostat\emi_avg.dta, type(1:1)
tab _merge
ge emi_share = n_emi5/ n_pop5
sort geo sex age5
keep geo sex age5 emi_share
save $eurostat\emigration_rates.dta, replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop emig2dat
pr de emig2dat
syntax , GEO(string)
clear all
cap file close _all
use "$eurostat\emigration_rates.dta", clear
keep if geo == "`geo'" // Country
ge male = sex == "M"
* Create file
tempname file
loc startyear = $startyear
loc yearmax = $yearmax
loc do $dofile
loc simyears = $yearmax - $startyear +1
qui sum age
loc minage = r(min)+1 //need to start at position 1 not 0
loc age_limit = r(max)+1
file open `file' using "$param/`geo'_Emigration_`startyear'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\Emigration.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' _tab "EMIGR_SET" _tab "EmigrationSettings = ES_AGERATES;"_n
file write `file' _tab "//EN Emigration rates by sex and age" _n
file write `file' _tab "double" _tab "EmigrationRates[SEX][AGE5_PART] = {" _n
forvalues s = 0(1)1{
forvalues a = `minage'(1)`age_limit'{
file write `file' _tab (emi_share[`a']) ","
}
file write `file' _n
drop if male == `s'
}
file write `file' _tab "};" _n
file write `file' _tab "//Total number of emigrants" _n
file write `file' _tab "long" _tab "EmigrationTotal[SIM_YEAR_RANGE] = {" _n
file write `file' _tab "(`simyears') 0, "_n
file write `file' _tab "};" _n
file write `file' _tab "}; " _n
file close _all
end
********************************************************************************
* WRITE .DAT FILES
loc Ctry $Geosample
foreach geo of local Ctry{
cap emig2dat , geo("`geo'")
}
FamilyLinks.do¶
This file produces to parameters relevant for family dynamics stored in the parameter file FamilyLinks_2010.dat, one giving the probability to stay with mother in case of partnership dissolution of parents and one giving the probability to leave home for young adults (age 18 - 25) in education.
For the first parameter we identify lone parents in the staring population file and derive the share of females within the group of lone parents which we interpret as the probability for children to stay with their mother after parents union dissolution.
For the second parameter we identify students in the starting population file aged 18 to 25 and whether these live with their parents (in which case their family role is “child”). We then run a simple probit model on the probabilty to live with one’s parents depending on age and age squared. The predicted values from this model give the probability estimates for students leaving home depending on students age.
The parameter file contains two paramters:
- ProbStayWithMother containing the probability to stay with the mother after dissolution
- ProbLeaveHome[AGE_18_25] containing the probability to leave home while studying (age 18 to 25)
Data Source:
- starting population file based on EUSILC
Version: October 2020
Author(s): Tom Horvath
use $startpop/startpop.dta, clear
loc Ctry $Geosample
drop if idhh ==.
ge intwgt=round(dwt)
drop if intwgt == .
* Probability of staying with mother after union dissolution
replace idfather = 0 if role < 2
replace idmother = 0 if role < 2
bys country idfamily: egen maxrole = max(role)
by country idfamily: gen partner_in_fam_tmp = idpartner!=0
by country idfamily: egen partner_in_fam = max(partner_in_fam_tmp)
by country idfamily: egen mother_in_family_tmp = max(idmother)
by country idfamily: ge mother_in_family = mother_in_family_tmp != 0
by country idfamily: egen father_in_family_tmp = max(idfather)
by country idfamily: ge father_in_family = father_in_family_tmp != 0
ge sample =0
replace sample = 1 if partner_in_fam == 0 & role != 2 & maxrole == 2 & age>=20 & age<40
ge sample_kids =0
replace sample_kids = 1 if partner_in_fam == 0 & role == 2 & maxrole == 2 //& age>=20 & age<40
tab mother_in_family country [fw = round(intwgt)] if sample_kids == 1 & age<18, nofreq col
tab father_in_family country [fw = round(intwgt)] if sample_kids == 1 & age<18, nofreq col
bys country sample: egen pop = total(intwgt)
bys country sample female: egen fem = total(intwgt)
ge share_mother = .
replace share_mother = fem/pop if sample == 1 & female == 1
tab country, sum(share_mother)
by country: egen share_stay_mother = min(share_mother)
tab country, sum(share_stay_mother)
tab female country [fw = round(intwgt)] if sample == 1, nofreq col
* Probability to leave home (students):
gen student=.
replace student=1 if dec!=0 & age>17 & age<26
gen home=0
replace home=1 if is_child==1 & dec!=0
ge age2 = age*age
tab age country [fw=round(intwgt)] if student == 1 & age>17 & age<26, sum(home) noob nost nof
cap drop stay_pr
ge stay_pr = .
foreach ctry of global Geosample{
probit home age age2 if student == 1 & age>17 & age<26 & country == "`ctry'"
predict stay_pr_`ctry', pr
replace stay_pr = stay_pr_`ctry' if country == "`ctry'"
}
tab age country [fw=round(intwgt)] if student == 1 & age>17 & age<26, sum(home) noob nost nof
tab age country [fw=round(intwgt)] if student == 1 & age>17 & age<26, sum(stay_pr) noob nost nof
keep if student == 1 & age>17 & age<26
bys country age: keep if _n == 1
keep country age stay_pr share_stay_mother
ge p_leave = 1- stay_pr
by country: replace p_leave = 1- stay_pr/stay_pr[_n-1] if _n>1
replace p_leave = 0 if p_leave < 0
save $save/FamilyLinks.dta, replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop links2dat
pr de links2dat
syntax , GEO(string)
cap file close _all
use $save/FamilyLinks.dta, clear
keep if country == "`geo'"
loc startyear $startyear
loc do $dofile
qui sum age
loc minage = r(min)
loc maxage = r(max)
loc anz_age = `maxage'-`minage'+1
display `anz_age'
* Create file
tempname file
*file open `file' using "$param\BaseEduc_AT_2016.dat", w replace
file open `file' using "$param/`geo'_FamilyLinks_`startyear'.dat", w replace
* Header
file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\FamilyLinks.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' "//EN Probability to stay with mother after union dissolution" _n
file write `file' "double" _tab "ProbStayWithMother = " (share_stay_mother[1]) ";" _n
file write `file' "//EN Probability to leave home (students)" _n
file write `file' _tab "double" _tab "ProbLeaveHome[AGE_18_25] = { " _n
forvalues i = 1(1)`anz_age' {
file write `file' _tab _tab (p_leave[`i']) ", "
}
file write `file' _n
file write `file' _tab "}; " _n
file write `file' "};" _n
file close _all
end
********************************************************************************
* WRITE .DAT FILES
foreach geo of local Ctry{
links2dat , geo("`geo'")
}
FemalePartnership.do¶
Here the two parameters contained in the FemalePartnerships_2010.dat file are estimated. The first giving the probability of women being in a partnership when having children is estimated depending on mothers’ own education, the age of the youngest child in the household as well as mothers’ own age. The estimation is based on a probit model, where the probability of being in a partnership is explaind by mothers age, a country specific effect of mother’s education and age of the youngest child in the family. The second parameter is estimated also by a probit model where females’ partnership status is explained by an age polynomial and a country specific education effect.
Results are stored in the parameter file as two parameters:
- InUnionProbWithChildren[EDUC_LEVEL3][CHILD_AGEGR][MOTH_AGEGR]
- InUnionProbNoChildren[PART_AGE_RANGE][EDUC_LEVEL3]
Data Source:
- starting population file based on EUSILC
Version: October 2020
Author(s): Tom Horvath
use $startpop/startpop.dta, clear
loc Ctry $Geosample
drop if idhh ==.
rename deh isced
ge intwgt=round(dwt)
drop if intwgt == .
* ---------------------------------
* recode education and sex variable
* ---------------------------------
recode isced ///
( 1/2 = 1) ///
( 3/4 = 2) ///
( 5/6 = 3) ///
, gen(edu)
cap lab def edu_VL 1 "Low" 2 "Med" 3 "High"
la val edu edu_VL
ge fem = sex == 0
replace idmother = 0 if idmother ==.
replace idfather = 0 if idfather ==.
* -----------------------
* Define person in Union:
* -----------------------
ge inUnion = 0
replace inUnion = 1 if idpartner >0 & idpartner !=.
* generate mother_tag:
* --------------------
order country idhh idperson idpartner idfather idmother age
gsort country idhh idmother
by country idhh: ge anymoth = idmother > 0
order country idhh idperson idpartner idfather idmother age anymoth
gsort country idhh anymoth -age // jungestmother at end of list
cap drop multiple_mothers
ge multiple_mothers = 0
by country idhh: replace multiple_mothers = 1 if idmother[_n-1]!= 0 & idmother != idmother[_n-1] & _n > 1
order multiple_mothers
tab multiple_mothers
by country idhh: ge id_mother = idmother[_N] // keep only mother of youngest child in household (other mothers most likely grandmothers)
cap drop is_mother
ge is_mother = 0
by country idhh: replace is_mother = 1 if idperson == id_mother
order country idhh idpartner is_mother idmother age
tab inUnion is_mother
* -------------------------------
* generate age of youngest child
* -------------------------------
sort country idhh age
ge age_youngest = .
by country idhh: replace age_youngest = age[1] if idmother >0 | idfather > 0
by country idhh: replace age_youngest = age_youngest[1]
tab age_youngest is_mother
cap drop childAgeGr
recode age_youngest (0 /1 = 1 "0")(2/3 = 2 "1-2") (4/5 = 3 "3-5") (6/8 = 4 "6-8") ///
(9/11 = 5 "9-11") (12/14 = 6 "12-14") (15/24 = 7 "15-24") (25/99 = -3), ge(childAgeGr)
tab age_youngest childAge
tab childAge is_mother
replace childAgeGr = -3 if is_mother == 0 | childAgeGr == . // only mothers have children
* ------------------------------------
* reduce sample to persons of interest
* ------------------------------------
keep if fem == 1 // need only women
keep if age >= 15
ge mark = is_mother == 1 & childAgeGr == -3
tab childAge is_mother
recode age (0/19 = 1 "<20") (20/24 = 2 "20-24") (25/29 = 3 "25-29") (30/34 = 4 "30-34") ///
(35/39 = 5 "35-39") (40/99 = 6 "40+"), ge(ownAgeGr)
qui sum ownAgeGr
gl N_Age_Gr = r(max)
display $N_Age_Gr
save "$save\fempart2dat.dta", replace
* ------------------------------------------------------------------------------
* Prepare data for InUnionProbabilty with children
* ------------------------------------------------------------------------------
* generate blank matrix for all possible ownageXchildAgeXedu combinations:
* ------------------------------------------------------------------------
use "$save\fempart2dat.dta", clear
qui tab country
loc nGeo r(r)
display `nGeo'
scalar a = `nGeo'
qui sum ownAgeGr
loc nAge = r(max)
display `nAge'
qui sum edu
loc nEdu = r(max)
display `nEdu'
qui sum childAgeGr
loc nChild = r(max)
display `nChild'
clear
set obs 1
ge country = "A"
expand a
loc i = 1
foreach geo of glo Geosample {
replace country = "`geo'" if _n == `i'
loc i = `i'+1
}
expand `nAge' // expand to age_groups
bys country: ge ownAgeGr = _n
expand `nEdu' // expand to edu groups
bys country ownAgeGr: ge edu = _n
expand `nChild' // expand to child age groups
bys country ownAgeGr edu: ge childAgeGr = _n
save "$save\blank_withkids.dta", replace
* load lfs data and estimate probablities for being inUnion:
* ----------------------------------------------------------
use "$save\fempart2dat.dta", clear
keep if is_mother == 1 // keep women living with their children
drop if edu == . | edu == 0 // make sure everyone has education information
sum country edu ownAgeGr childAgeGr inUnion
sort country edu ownAgeGr childAgeGr inUnion
* calculate observed shares inUnion by groups:
* --------------------------------------------
by country edu ownAgeGr childAgeGr: egen ntot = total(intwgt)
by country edu ownAgeGr childAgeGr inUnion: egen totInUnion = total(intwgt)
ge shareUnion = totInUnion/ntot if inUnion == 1
sort country edu ownAgeGr childAgeGr shareUnion
order country edu ownAgeGr childAgeGr shareUnion
by country edu ownAgeGr childAgeGr: replace shareUnion = shareUnion[1]
* merge blank matrix in order to be able to estimate prob of union for those
* combinations not available in data:
* --------------------------------------------------------------------------
mmerge country ownAgeGr edu childAgeGr using "\$save\blank_withkids.dta"
tab _merge
ge insample = _merge == 3
ge age2 = ownAgeGr*ownAgeGr
xi: probit inUnion ownAgeGr i.edu*i.country i.childAgeGr*i.country [fw = intwgt] if insample == 1
est store estprob
predict p, pr
order country edu ownAgeGr age2 childAgeGr inUnion ntot totInUnion shareUnion p
sort country _merge
replace inUnion = 1 if missing(inUnion)
* keep only one obs per group:
* ----------------------------
bys country edu ownAgeGr childAgeGr inUnion: keep if _n ==1
bys country edu ownAgeGr childAgeGr: keep if _n == 1
keep country edu ownAge childAge shareUnion p ntot
qui sum ownAge
loc ownAgeMax = r(max)
reshape wide shareUnion p ntot, i(country edu childAge) j(ownAge)
forv i = 1/`ownAgeMax' {
replace shareUnion`i' = 0 if missing(shareUnion`i')
}
drop if childAgeGr == -3
* set unplausible values to 0 (no birth before age 15):
* ----------------------------------------------------
by country edu: replace p1 = 0 if childAgeGr > 3 // no Mother <20 child > 5 years
by country edu: replace p2 = 0 if childAgeGr > 5 // no Mother 20-24 child > 9
by country edu: replace p3 = 0 if childAgeGr > 7 // no Mother 25-29 child > 15
order country edu p* share*
drop ntot*
rename country geo
save "$save\fempart_withkids.dta", replace
* ------------------------------------------------------------------------------
* Prepare data for InUnionProbabilty without children
* ------------------------------------------------------------------------------
* generate blank matrix for all possible age*edu combinations:
* ------------------------------------------------------------------------
use "$save\fempart2dat.dta", clear
qui tab country
loc nGeo r(r)
display `nGeo'
scalar a = `nGeo'
qui sum edu
loc nEdu = r(max)
display `nEdu'
clear
set obs 1
ge country = "A"
expand a
loc i = 1
foreach geo of glo Geosample{
replace country = "`geo'" if _n == `i'
loc i = `i'+1
}
loc agerange = 80 - 15 + 1
display `agerange'
expand `agerange'
bysort country : ge age = 14 + _n
tab age country
expand `nEdu'
bys country age: ge edu = _n
save "$save\blank_nokids.dta", replace
* load lfs data and estimate probablities for being inUnion:
* ----------------------------------------------------------
use "$save\fempart2dat.dta", clear
keep if is_mother == 0 & age <=80 // keep women not living with children
drop if edu == . | edu == 0 // make sure everyone has education information
sum country edu age inUnion
sort country edu age inUnion
* calculate observed shares inUnion by groups:
* --------------------------------------------
by country edu age: egen ntot = total(intwgt)
by country edu age inUnion: egen totInUnion = total(intwgt)
ge shareUnion = totInUnion/ntot if inUnion == 1
sort country edu age shareUnion
by country edu age: replace shareUnion = shareUnion[1]
* merge blank matrix in order to be able to estimate prob of union for those
* combinations not available in data:
* --------------------------------------------------------------------------
mmerge country edu age using "\$save\blank_nokids.dta"
tab _merge
ge insample = _merge == 3
ge age2 = age*age
ge age3 = age2*age
ge age4 = age3*age
xi: probit inUnion age age2 age3 age4 i.edu*i.country [fw = intwgt] if insample == 1
est store estprob
predict p, pr
order country edu age age2 inUnion ntot totInUnion shareUnion p
sort country _merge
replace inUnion = 1 if missing(inUnion)
* keep only one obs per group:
* ----------------------------
bys country edu age inUnion: keep if _n ==1
bys country edu age: keep if _n == 1
keep country edu age shareUnion p ntot
qui sum age
loc ownAgeMax = r(max)
reshape wide shareUnion p ntot, i(country age) j(edu)
rename age tmp
by country: ge age = 14 + _n
order country age p* share*
rename p1 low
rename p2 medium
rename p3 high
drop ntot* tmp
rename country geo
save "$save\fempart_nokids.dta", replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop fempartner2dat
pr de fempartner2dat
syntax , GEO(string)
clear all
cap file close _all
loc startyear $startyear
loc do $dofile
* Create file
tempname file
file open `file' using "$param/`geo'_FemalePartnerships_`startyear'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\FemaleParntership.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
** Parameter table for mortality hazard rates
file write `file' _tab "//EN Probability to be in a partnership - Females living with children" _n
file write `file' _tab "double" _tab "InUnionProbWithChildren[EDUC_LEVEL3][CHILD_AGEGR][MOTH_AGEGR] = {" _n // open partner education table
* Get values for partnerships with kids
use "$save\fempart_withkids.dta", clear
keep if geo == "`geo'" // Country
sort geo edu childAgeGr
qui sum edu
loc eduMax = r(max)
display `eduMax'
loc eduMax_low = `eduMax' - 1
qui sum childAgeGr
loc nChild = r(max)
display `nChild'
loc nChildLow = `nChild' - 1
loc ageMax = $N_Age_Gr
loc ageMax1 = `ageMax' - 1
display `ageMax'
forv edu = 1 / `eduMax' {
file write `file' _tab _tab "// EN Edu `edu'" _n
forv cage = 1 / `nChild' {
file write `file' _tab _tab (p1[`cage']) ", "
forv mage = 2 / `ageMax1' {
file write `file' (p`mage'[`cage']) ", "
}
file write `file' (p`ageMax'[`cage']) "," _n
}
drop if edu == `edu'
}
file write `file' _tab "};" _n // close fem with kids table
file write `file' _n
file write `file' _tab "//EN Probability to be in a partnership - Females not living with children" _n
file write `file' _tab "double" _tab "InUnionProbNoChildren[PART_AGE_RANGE][EDUC_LEVEL3] = {" _n
use "$save\fempart_nokids.dta", clear
keep if geo == "`geo'"
loc max = _N
display `max'
forv a = 1/ `max'{
file write `file' _tab _tab (low[`a']) ","
file write `file' (medium[`a']) ","
file write `file' (high[`a']) "," _n
}
file write `file' _tab "};" _n // close fem with no kids table
file write `file' "};" _n // Close parameters
file close _all
end
********************************************************************************
* WRITE .DAT FILES
********************************************************************************
foreach geo of local Ctry{
fempartner2dat, geo("`geo'")
}
cap log c
exit
Immigration.do¶
This file produces the parameters contained in Immigration_2010.dat. The first parameter gives the age distribution of newly arriving immigrants using EUROSTAT data. The second parameter calculates the age distribution of mothers at birth based on the starting population file.
The last parameter contaings the total number of immigrants per year of the simulation. The latter parameter is not relevant in this applicatioin and simply set to zero, since EUROSTAT published only data on total net-migration in it’s population projections. If projections on total number of immigrants become available these can easily be added in the parameterfile.
Currently the parameterfile contains three parameters:
- AgeOfImmigrantMother[FERTILE_AGE_RANGE] giving the age distribution of mothers at birth
- ImmigrationAgeSexAll[SEX][AGE_RANGE] giving the age distribution of immigrants
- ImmigrationTotal[SIM_YEAR_RANGE] gives the total number of immigrants per year which is set to zero in this application
Data Source:
- starting population file based on EUSILC
- EUROSTAT migr_imm8
Version: March 2019
Author(s): Tom Horvath
loc Ctry $Geosample
loc Sex $Sex
loc simstart $startyear
loc yearmax $yearmax
loc agemax $maxage
*gl maxagerange 105
*Prepare data for eurostat based parameters
cd $eurostat
clear
eurostatuse migr_imm8, long geo(`Ctry') noerase
keep if agedef == "COMPLET"
drop agedef agedef_label age_label unit_label sex_label geo_label flags_migr_imm8
replace age = "Y0" if age == "Y_LT1"
drop if age == "Y_GE100"
drop if age == "TOTAL" | age == "UNK"
replace age = subinstr(age, "Y", "", .)
destring age , replace
rename time year
rename migr_imm8 n_immi
drop if sex == "T"
keep if year >= $startyear & year < 2019
sort geo year sex age
bys geo sex age: egen n_mig_avg = mean(n_immi)
by geo sex age: keep if _n == 1
by geo sex: egen n_tot = total(n_mig_avg)
ge immi_share = n_mig_avg/n_tot
sort geo sex age
qui sum age
local agelimit = r(max)
local nexp = $maxage - `agelimit' +1
display `nexp'
expand `nexp' if age == `agelimit'
sort geo sex age
by geo sex: replace age = age[_n-1] +1 if _n > `agelimit'
replace immi_share = 0 if age > `agelimit'
save $eurostat\immi_shares.dta, replace
*Prepare data for eusilc based parameters (starting population file)
use $startpop/startpop.dta, clear
* mark families with kids aged 0 to 10 with mothers born abroad
cap drop is_mother
ge is_mother = 0
gsort country idfamily -idmother
by country idfamily: replace is_mother = 1 if idperson == idmother[1]
ge mother_tag = is_mother //& foreign_born
ge child_tag = age < 18 & is_child
bys country idfamily: egen fam_with_mother = max(mother_tag)
by country idfamily: egen child_0_18 = max(child_tag)
ge sample = fam_with_mother == 1 & child_0_18 == 1
* derive age of mother at birth
ge mother_age_tmp = age if is_mother == 1
bys country idfamily: egen mother_age = min(mother_age_tmp)
keep if mother_age <= 41
ge mother_age_at_birth = mother_age - age if child_tag == 1
order country idfamily sample mother_age* age fam_with_mother child_0_*
keep if child_tag == 1 & mother_age_at_birth >= 16 & mother_age_at_birth <= 40
keep if sample == 1
bys country: egen pop = total(dwt)
bys country mother_age_at_birth: egen age_gr = total(dwt)
by country mother_age_at_birth: keep if _n == 1
ge share_age = age_gr / pop
keep country mother_age_at_birth share_age
save $save/age_immi_mother.dta, replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop immig2dat
pr de immig2dat
syntax , GEO(string)
clear all
cap file close _all
* Create file
tempname file
loc startyear = $startyear
loc yearmax = $yearmax
loc simyears = $yearmax - $startyear +1
file open `file' using "$param/`geo'_Immigration_`startyear'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by Immigration.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' _tab "//EN Age distribution of mothers at birth" _n
file write `file' _tab "cumrate" _tab "AgeOfImmigrantMother[FERTILE_AGE_RANGE] = {" _n
use $save/age_immi_mother.dta, clear
keep if country == "`geo'"
qui sum mother_age
loc minage = r(min) //16
loc maxage = r(max) //40
loc anz_age = 50 - `minage' + 1 // 49-16+1
display `anz_age'
forvalues i = 1(1)`anz_age' {
if `minage' + `i' - 1 < = `maxage' {
file write `file' (share_age[`i']) ","
}
else if `minage' + `i' - 1 > `maxage'{
file write `file' "0,"
}
}
file write `file' _tab "};" _n
use "$eurostat\immi_shares.dta", clear
keep if geo == "`geo'" // Country
ge male = sex == "M"
qui sum age
loc minage = r(min)+1 //need to start at position 1 not 0
loc age_limit = r(max)+1
file write `file' _tab "//EN Age-Sex distribution of immigrants" _n
file write `file' _tab "double" _tab "ImmigrationAgeSexAll[SEX][AGE_RANGE] = {" _n
forvalues s = 0(1)1{
forvalues a = `minage'(1)`age_limit'{
file write `file' _tab (immi_share[`a']) ","
}
file write `file' _n
file write `file' _n
drop if male == `s'
}
file write `file' _tab "};" _n
file write `file' _tab "//Total number of immigrants" _n
file write `file' _tab "long" _tab "ImmigrationTotal[SIM_YEAR_RANGE] = {" _n
file write `file' _tab "(`simyears') 0, "_n
file write `file' _tab "};" _n
file write `file' _tab "}; " _n
file close _all
end
********************************************************************************
* WRITE .DAT FILES
loc Ctry $Geosample
foreach geo of local Ctry{
cap immig2dat , geo("`geo'")
}
MaleChildlessness.do¶
This file produces a parameter for male childlessness by birthyear and education level stored in the parameterfile MaleChildlessness_2010.dat. The calculation of the share of males remaining childless are taken from SHARE data for AT and ES and from the cohort fertility database for FI.
Currently the parameterfile contains one parameter:
- MaleCohortChildlessness[YOB_MALEFERT][BIRTH1_GROUP]
Data Source:
- SHARE data for AT and ES
- Cohort fertility database for FI http://www.cfe-database.org/database/
Version: March 2019
Author(s): Tom Horvath
clear all
loc Ctry $Geosample
cd $share_input
use weltransim_share, clear
***********************************************************
* Countries
***********************************************************
tab country
ge geo = substr(mergeid,1,2)
ge keep = 0
foreach geo of loc Ctry {
replace keep = 1 if geo == "`geo'"
}
keep if keep == 1
***********************************************************
* Men only
***********************************************************
keep if gender == 1
***********************************************************
* Create socio-demographic variables
***********************************************************
// Childless
g byte childless = ch001_ == 0
// Education
ren isced1997_r isced97
drop if isced97 < 0 | mi(isced97)
drop if isced97 == 95 // still in school
drop if isced97 == 97 // other
cap drop educ
recode isced97 (0/2=0) (3/4=1) (5/6=2), gen(educ)
la de educ 0 "low" 1 "medium" 2 "high"
la val educ educ
la var educ "education"
// Age group
drop if int_year < 0 | mi(int_year)
drop if yrbirth < 0 | mi(yrbirth)
g int age = int_year - yrbirth
drop if age < 0 | mi(age)
egen age_group = cut( age ), at( 0, 55, 60, 65, 70, 75, 80, 150 ) icodes
la de age_group 0 "0-54" 1 "55-59" 2 "60-64" 3 "65-69" 4 "70-74" 5 "75-79" 6 "80+"
la val age_group age_group
***********************************************************
* Share of childless men by education
***********************************************************
table age_group educ, by(country)
table age_group educ [w=int(cciw_w4)], c(mean childless) by(country)
table educ [w=int(cciw_w4)] if age>=40, c(mean childless) by(country)
tab country educ [w=int(cciw_w4)], sum(childless) noobs nost nof
keep if age >= 40 & age <= 65
bys country educ: egen pop = total(cciw_w4)
bys country educ childless: egen tmp = total(cciw_w4)
keep if childless == 1
ge share_childless = tmp/pop
bys country educ: keep if _n==1
keep geo educ share_childless
rename geo country
save $save\male_childlessness.dta, replace
/* #############################################################################
Finland
############################################################################# */
import delimited using "$cfe\Finland_Population Register 2015.csv", varnames(1) clear
destring cohort, ge(birthyear) force
keep if sex == "M"
qui sum birthyear
loc last_cohort = r(max)
drop if birthyear > 2015 - 40
drop if birthyear < 2015 - 65
drop if birthyear == .
keep if origin == "Total" // do not destinguish natives and foreignborn here
ge educ = 0
replace educ = 1 if edu == "ISCED3A-4A"
replace educ = 2 if edu == "ISCED5B-6"
keep birthyear educ women_total parity_0
collapse (sum) women_total parity_0, by(educ)
ge share_childless = parity_0/women_total
ge country = "FI"
save $save\male_childlessness_FI.dta, replace
use $save\male_childlessness.dta, clear
append using $save\male_childlessness_FI.dta
save $save\male_childlessness.dta, replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop maleChild2dat
pr de maleChild2dat
syntax , GEO(string)
cap file close _all
use $save\male_childlessness.dta, clear
keep if country == "`geo'"
loc startyear $startyear
loc do $dofile
qui sum educ
loc minedu = r(min)
loc maxedu = r(max)
loc anz_edu = `maxedu'-`minedu'+1
display `anz_edu'
loc cohort_l $YOB_MALEFERT_L
loc cohort_u $YOB_MALEFERT_U
loc anz = `cohort_u' - `cohort_l' +1
display `anz'
* Create file
tempname file
*file open `file' using "$param\BaseEduc_AT_2016.dat", w replace
file open `file' using "$param/`geo'_MaleChildlessness_`startyear'.dat", w replace
* Header
file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\MaleChildlessness.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' "//EN Male cohort childlessness" _n
file write `file' _tab "double" _tab "MaleCohortChildlessness[YOB_MALEFERT][BIRTH1_GROUP] = {" _n
file write `file' _tab _tab "(`anz') {"
forvalues i = 1(1)`anz_edu' {
file write `file'(share_childless[`i']) ", "
}
file write `file' _tab "}, " _n
file write `file' _tab "}; " _n
file write `file' _n
file write `file' "};" _n
file close _all
end
********************************************************************************
* WRITE .DAT FILES
loc Ctry $Geosample
foreach geo of local Ctry{
maleChild2dat , geo("`geo'")
}
/*
parameters {
//EN Male cohort childlessness
double MaleCohortChildlessness[YOB_MALEFERT][BIRTH1_GROUP] = {
(131) {
0.18, 0.14, 0.1,
},
};
};
*/
NetMigration.do¶
Data from EUROSTAT population projections are used to produce the parameters on net migration by sex, age and simulationyear stored in the parameterfile NetMigration_2010.dat.
The parameterfile contains two parameters:
- MIGRATION_SETTINGS: model choice parameter
- NetMigrationSexPeriodAge[SEX][SIM_YEAR_RANGE][AGE_RANGE]: total number of net migrants by sex, age and simulation year (2010 to 2150)
Data Source:
- AT, ES and FI: Eurostat population projections (proj_19nanmig) and historic data (2010 to 2018) from EUROSTAT data on realized net migration (migr_imm8)
- UK: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationprojections/bulletins/nationalpopulationprojections/2018based
Version: Oct. 2020
Author(s): Tom Horvath
loc Ctry $Geosample
loc Sex $Sex
loc simstart $startyear
loc yearmax $yearmax
loc agemax $maxage
*gl maxagerange 105
cd $eurostat
********************************************************************************
* Load Mortality DATA
********************************************************************************
clear all
eurostatuse proj_19nanmig, long noerase
keep if geo == "AT" | geo == "ES" | geo == "FI" // Uk is NA
drop projection_label unit unit_label sex_label age_label geo_label flags_proj_19nanmig
keep if projection == "BSL"
replace age = "Y0" if age == "Y_LT1"
drop if age == "Y_GE100"
drop if age == "TOTAL"
replace age = subinstr(age, "Y", "", .)
destring age , replace
rename time year
drop if sex == "T"
rename proj_19nanmig net_mig
save $eurostat\net_mig2019.dta, replace
clear
eurostatuse migr_imm8, long geo(`Ctry') noerase
keep if agedef == "COMPLET"
drop agedef agedef_label age_label unit_label sex_label geo_label flags_migr_imm8
keep if geo == "AT" | geo == "ES" | geo == "FI" |geo == "UK" // Uk is NA
replace age = "Y0" if age == "Y_LT1"
drop if age == "Y_GE100"
drop if age == "TOTAL" | age == "UNK"
replace age = subinstr(age, "Y", "", .)
destring age , replace
rename time year
rename migr_imm8 n_immi
drop if sex == "T"
keep if year >= $startyear & year < 2019
sort geo year sex age
save $eurostat\immi_upto_2019.dta, replace
clear
eurostatuse migr_emi2, long geo(`Ctry') noerase
keep if agedef == "COMPLET"
drop agedef agedef_label age_label unit_label sex_label geo_label flags
keep if geo == "AT" | geo == "ES" | geo == "FI" |geo == "UK" // Uk is NA
replace age = "Y0" if age == "Y_LT1"
drop if age == "Y_GE100"
drop if age == "TOTAL" | age == "UNK"
replace age = subinstr(age, "Y", "", .)
destring age , replace
rename time year
rename migr_emi2 n_emi
drop if sex == "T"
keep if year >= $startyear & year < 2019
sort geo year sex age
save $eurostat\emi_upto_2019.dta, replace
mmerge geo year sex age using $eurostat\immi_upto_2019.dta, type(1:1)
ge net_mig = n_immi-n_emi
save $eurostat\net_mig_upto_2019.dta, replace
use $eurostat\net_mig_upto_2019.dta, clear
append using $eurostat\net_mig2019.dta
replace net_mig = 0 if missing(net_mig)
* expand data to last simyear
qui sum year
loc ylow = r(min)
loc ymax = r(max)
loc nexpand = $yearmax -`ymax' +1
display `nexpand'
loc yearchange = `ymax' - `ylow' + 1
display `yearchange'
expand `nexpand' if year == `ymax'
sort geo sex age year
by geo sex age: replace year = year[_n-1]+1 if _n> `yearchange'
replace net_mig = 0 if year > `ymax' | missing(net_mig)
sort geo sex year age
save $eurostat\net_mig.dta, replace
use $eurostat\net_mig.dta, clear
bys geo year: egen tot_mig = total(net_mig)
ge sh_mig = net_mig/tot_mig
replace sh_mig =. if year != 2018
bys geo sex age: egen sh_mig_hyp = min(sh_mig)
loc nexp = $yearmax - 2018 + 1
expand `nexp' if geo == "UK" & year == 2018
sort geo sex age year
by geo sex age: replace year = year[_n-1] + 1 if year == 2018
replace net_mig = round(sh_mig_hyp * 265000) if year == 2019 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 255000) if year == 2020 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 236000) if year == 2021 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 224000) if year == 2022 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 213000) if year == 2023 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 201000) if year == 2024 & geo == "UK"
replace net_mig = round(sh_mig_hyp * 190000) if year >= 2025 & geo == "UK"
sort geo sex year age
save $eurostat\net_mig.dta, replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop netmig2dat
pr de netmig2dat
syntax , GEO(string)
clear all
cap file close _all
use "$eurostat\net_mig.dta", clear
keep if geo == "`geo'" // Country
ge male = sex == "M"
* Create file
tempname file
loc startyear = $startyear
loc yearmax = $yearmax
loc do $dofile
qui sum age
loc minage = r(min)
loc age_limit = r(max)
loc agemax $maxage
file open `file' using "$param/`geo'_Netmigration_`startyear'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'_Netmigration.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' _tab "//EN Migration Settings" _n
file write `file' _tab "MIGRATION_SETTINGS" _tab "MigrationSettings = MSE_NET;" _n
file write `file' _tab "//EN Net migration by age and sex" _n
file write `file' _tab "double" _tab "NetMigrationSexPeriodAge[SEX][SIM_YEAR_RANGE][AGE_RANGE] = {" _n // open fertility table
forvalues s = 0(1)1{
forvalues t = `startyear'(1)`yearmax'{
forvalues i = 0(1)`agemax' {
if `i' <= `age_limit' {
loc j = `i' + 1
file write `file' (net_mig[`j']) ", "
}
else if `i' > `age_limit' {
file write `file' "0 , "
}
}
file write `file' _n
drop if year == `t' & male == `s'
}
file write `file' _n
file write `file' _n
}
file write `file' _tab "}; " _n
file write `file' _tab "};" _n
file close _all
end
********************************************************************************
* WRITE .DAT FILES
loc Ctry $Geosample
foreach geo of local Ctry{
cap netmig2dat , geo("`geo'")
}
PartnerMatching.do¶
Based on the starting population file two parameters storen in PartnerMatching_2010.dat are produced. Firstly, the distribution of partners’ education levels by own education is derived and secondly the age distribution of females’ partners dependung on their own age.
While the first is simply taken from a cross tabulation of education levels from partners aged 25 to 35 observed in the starting population file the latter is derived by smoothing observed age distributions of females’ parterns by assuming a normal distribution of partners’ age with mean and standard deviation as observed in the data. To acchieve smoother distributions these calculations are based on 5 year age groups before disaggregating the parameter to single year of age categories used in the model parameter.
The parameterfile contains one parameter:
- PartnerEducation[EDUC_LEVEL3][EDUC_LEVEL3] giving the education distribution of parterns by females’ highest level of education
- PartnerAgeDistribution[PART_AGE_RANGE][MALE_PART_AGE_RANGE] ginving the age distribution of females’ partners by own age
Data Source:
- starting population file based on EU-SILC data
Version: Oct. 2020
Author(s): Tom Horvath
use $startpop/startpop.dta, clear
loc Ctry $Geosample
rename deh isced
ge intwgt=round(dwt)
drop if intwgt == .
drop if age <= 15
keep if idpartner != 0 & idpartner != . // keep couples only
tab country
* recode education and sex variable
recode isced ///
( 1/2 = 1) ///
( 3/4 = 2) ///
( 5/6 = 3) ///
, gen(edu)
cap lab def edu_VL 1 "Low" 2 "Med" 3 "High"
la val edu edu_VL
ge fem = sex == 0
ge edu_spouse = 0
ge age_spouse = 0
order country idhh idperson fem edu edu_spouse idpartner
gsort country idhh -fem
bys country idhh: replace edu_spouse = edu[_n+1] if _n < _N & idpartner == idperson[_n+1]
bys country idhh: replace edu_spouse = edu[_n+2] if _n + 1 <_N & idpartner == idperson[_n+2]
bys country idhh: replace edu_spouse = edu[_n+3] if _n + 2 <_N & idpartner == idperson[_n+3]
bys country idhh: replace edu_spouse = edu[_n+4] if _n + 3 <_N & idpartner == idperson[_n+4]
la val edu_spouse edu_VL
bys country idhh: replace age_spouse = age[_n+1] if _n < _N & idpartner == idperson[_n+1]
bys country idhh: replace age_spouse = age[_n+2] if _n + 1 < _N & idpartner == idperson[_n+1]
bys country idhh: replace age_spouse = age[_n+3] if _n + 2 < _N & idpartner == idperson[_n+2]
bys country idhh: replace age_spouse = age[_n+4] if _n + 3 < _N & idpartner == idperson[_n+3]
la val age_spouse age_VL
keep if fem == 1 & age_spouse != 0
save "$save\partner2dat.dta", replace
* -----------------------------------------------------------------
* Generate Dataset für Educational Distribution of females' spouses
* -----------------------------------------------------------------
use "$save\partner2dat.dta", clear
keep if edu != 0 & edu_spouse != 0 & age >= 25 & age <= 35
drop if edu ==. | edu_spouse == .
rename country geo
sort geo edu edu_spouse
by geo edu: egen totnum = total(intwgt)
by geo edu edu_spouse: egen totedu = total(intwgt)
ge edu_share = totedu / totnum
by geo edu edu_spouse: keep if _n == 1
keep geo edu edu_spouse edu_share
save "$save\partnerEdu2dat.dta", replace
* -----------------------------------------------------------------
* Generate Dateset for females' spouses age distriubtion
* --> normal distribution of spouse's age by mean and standard deviation of
* spouses age observed in data
* -----------------------------------------------------------------
foreach geo of glo Geosample {
use "$save\partner2dat.dta", clear
recode age (15/19 = 17) (20/24 = 22) (25/29 = 27) (30/34 = 32) (35/39 = 37) ///
(40/44 = 42) (45/49 = 47) (50/54 = 52) (55/59 = 57) (60/64 = 62) (65/69 = 67) ///
(70/74 = 72) (75/79 = 77) (80/84 = 82) (85/89 = 87), ge(age5)
keep if country == "`geo'"
bys country age5: egen m_age = mean(age_spouse)
bys country age5: egen sd_age = sd(age_spouse)
bys country age5: keep if _n == 1
keep country *age*
by country: replace m_age = m_age[_n+1] - 3 if age5 == 17 & m_age > m_age[_n+1] // some unlikely values here
by country: replace sd_age = 5 if age5 == 17 & sd_age > 9 // some unlikely values here
by country: replace sd_age = 5 if age5 == 22 & sd_age > 9 // some unlikely values here
expand 5
sort age5
ge agey = 14 + _n // age of females in 1-year categories
by age5: ge dif_parameter = _n - 3
replace m_age = m_age + dif_parameter
drop dif_parameter
cap drop age_*
forv a = 15/105{
ge age_`a' = 0
replace age_`a' = round(normal((`a'-m_age)/sd_age),0.001)
}
ge age_cor_15 = age_15
forv a = 16/105{
loc amin = `a'-1
ge age_cor_`a' = 0
replace age_cor_`a' = age_`a' - age_`amin'
}
replace age_cor_15 = age_cor_16 if age_cor_15 > age_cor_16
forv a = 15 / 104{
loc aplus = `a'+1
replace age_cor_`a' = 0 if age_cor_`aplus' == 0
}
egen check = rowtotal(age_cor_*)
sum check
keep agey age_cor*
keep if agey <=80
save "$save\partner2dat_`geo'.dta", replace
}
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop partner2dat
pr de partner2dat
syntax , GEO(string)
clear all
cap file close _all
loc startyear $startyear
loc do $dofile
* Create file
tempname file
file open `file' using "$param/`geo'_partnerMatching_`startyear'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\PartnerMatching.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
** Parameter table for mortality hazard rates
file write `file' _tab "//EN Partner Education" _n
file write `file' _tab "cumrate" _tab "PartnerEducation[EDUC_LEVEL3][EDUC_LEVEL3] = {" _n // open partner education table
* Get values for mortality rates
use "$save\partnerEdu2dat.dta", clear
keep if geo == "`geo'" // Country
sort geo edu edu_spouse
qui sum edu
loc eduMax = r(max)
display `eduMax'
loc eduMax_low = `eduMax' - 1
qui sum edu_spouse
loc edu_SMax = r(max)
display `edu_SMax'
loc edu_SMax_low = `edu_SMax' - 1
forv s = 1 / `eduMax' {
file write `file' _tab _tab (edu_share[1]) ", "
forv a = 2 / `edu_SMax_low' {
file write `file' (edu_share[`a']) ", "
}
file write `file' (edu_share[`edu_SMax']) ", " _n
drop if edu == `s'
}
file write `file' _tab "};" _n // close educ table
file write `file' _n
** Parameter table for mortality hazard rates
file write `file' _tab "//EN Distribution of partner ages by age of female partner" _n
file write `file' _tab "double" _tab "PartnerAgeDistribution[PART_AGE_RANGE][MALE_PART_AGE_RANGE] = {" _n // open partner education table
* Get values for Age distribution
use "$save\partner2dat_`geo'.dta", clear
sort agey
loc agemin = 1
loc agemax = _N
loc agemax_min1 = `agemax' - 1
loc partage_min = 15
loc partage_max_min1 = 104
loc partage_max = 105
forv a = `agemin' / `agemax' { // agey = 15 bis 80.. 1 bis 65
forv pa = `partage_min' / `partage_max_min1' {
file write `file' (age_cor_`pa'[`a']) ","
}
file write `file' (age_cor_`partage_max'[`a']) "," _n
}
file write `file' _tab "};" _n // close age distribution table
file write `file' "};" _n // Close parameters
file close _all
end
********************************************************************************
* WRITE .DAT FILES
*
foreach geo of local Ctry{
partner2dat, geo("`geo'")
}
cap log c
exit
PersonCore.do¶
The file creates PersonCore_2010.dat file containing some relevant information on the starting population file such as the name and the sample size of the starting population file.
The parameterfile contains following parameters:
- MicroDataInputFile containing the name of the staring population mirco data file
- MicroDataInputFileSize gives the number of oversations contained in the starting population file
- StartPopSize is the real population size in the start year of the simulation
- StartPopSampleSize is the default value for number of persons simulated
- WriteMicrodata is a boolean parameter indicating whether a micro data output file should be produced
- TimeMicroOutput[OUTPUT_TIMES] give the years for which the microdatafile shall be produced
- MicroRecordFileName gives the name of the micro data file
Data Source:
- starting population file based on EU-SILC data
Version: Oct. 2020
Author(s): Tom Horvath
use $startpop/startpop.dta, clear
drop if idhh ==.
loc Ctry $Geosample
cap pr drop generate_dat
pr de generate_dat
syntax , GEO(string) POPSAMPLEsize(integer)
loc geo = upper("`geo'")
qui count if country == "`geo'"
loc size = r(N)
cap drop totpop
bys country: egen totpop = total(weight)
qui sum totpop if country == "`geo'"
loc realpopsize = r(max)
loc startyear = $startyear
loc do $dofile
cap file close _all
tempname file
//file open `file' using "$para/StartPop`geo'`year'.dat", w replace
file open `file' using "$param/`geo'_PersonCore_`startyear'.dat", w replace
file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\PersonCore.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
* Parameters for starting population
*file write `file' _tab "file" _tab "MicroDataInputFile = " _char(34) "pop_2010_`geo'.csv" _char(34) ";" _n
file write `file' _tab "file" _tab "MicroDataInputFile = " _char(34) "final_pop_2010_`geo'.csv" _char(34) ";" _n
file write `file' _tab "long" _tab "MicroDataInputFileSize = `size';" _n
file write `file' _tab "double" _tab "StartPopSize = `realpopsize';" _n
file write `file' _tab "double" _tab "StartPopSampleSize = `popsamplesize';" _n
file write `file' _n
file write `file' _tab "//EN Write micro-data output file Y/N" _n
file write `file' _tab "logical" _tab "WriteMicrodata = FALSE;" _n
file write `file' _tab "//EN Time(s) of micro-data output" _n
file write `file' _tab "double" _tab "TimeMicroOutput[OUTPUT_TIMES] = {" _n
file write `file' _tab "2010, 2050, 10," _n
file write `file' _tab "};" _n
file write `file' _tab "//EN File name micro-data output file" _n
file write `file' _tab "file" _tab "MicroRecordFileName =" _char(34) "MicroDataOutput.csv" _char(34) ";"_n
file write `file' "};" _n // Close parameters
file close _all
end
foreach geo of local Ctry{
generate_dat, geo("`geo'") popsample(75000)
}
RefinedEducFate.do¶
We use information on highest educational level attained and parents highest level of education to estimate the probability of attaining low, medium or high education given the maximum of the parents highest education is Low, Medium or High. The resulting parameter is stored in the parameter file RefinedEducFate. The parameter is taken from a cross tabulation of own education and parents highest education based on EU-LFS data from the 2009 ad-hoc module.
The parameterfile contains one parameter:
- EducModel is the model choice parameter indicating whether the refined education model shall be used
- EducFirstCohortRefinedModel indicated from which birth cohort onwards the refined education modell shall be used
- EducProg1Odds[EDUC_GROUP][SEX] gives the odds ratio of progressing from low to medium or high education by parents’ highest level of education and sex
- EducProg2Odds[EDUC_GROUP][SEX] gives the odds ratio of progressing from medium to high education by parents’ highest level of education and sex
Data Source:
- EU-LFS 2009 adhoc module
Version: Oct. 2020
Author(s): Tom Horvath and Marian Fink
* ------------------------------------------------------------------------------
* 1. IMPORT CSV DATA TO STATA FORMAT
* ------------------------------------------------------------------------------
* Ad hoc module
* --------------------------------------
loc Ctry $Geosample
foreach country of glo Geosample {
import delimited using "$adhoc/`country'2009_y.csv", ///
clear delimiters(",") varnames(1) asdouble stripquotes(yes)
keep age hat97lev sex ahm2009_stopdate ahm2009_parhat country hhnum hhseqnum qhhnum year coeff
g byte quarter = real( substr( qhhnum, 2, 1 ) )
g pid = country + strofreal(year) + qhhnum + strofreal(hhseqnum)
drop qhhnum
ren ahm2009_parhat parhat
ren ahm2009_stopdate stopdate
recode parhat (.=-1) (9=-2)
replace stopdate = -1 if stopdate == 0
replace stopdate = -2 if stopdate == 999999
recode stopdate parhat (-1=.a) (-2=.b) (-3=.c) (-4=.d) (-5=.e) (-9=.i)
save "$save/ahm_`country'2009_y.dta", replace
}
* ------------------------------------------------------------------------------
* 2. GET FILES TOGEHTER
* ------------------------------------------------------------------------------
* Append ad hoc module files
* --------------------------------------
clear
set obs 1
foreach country of glo Geosample {
ap using "$save/ahm_`country'2009_y.dta", nol
rm "$save/ahm_`country'2009_y.dta"
}
drop if _n == 1
save "$save/ahm_2009_y.dta", replace
* recode education variable
* --------------------------------------
recode hat97lev (0 11 21 22 = 1 "L") (30 31 32 41 42 = 2 "M") ///
(51 52 60 = 3 "H"), gen(edu_lev)
ge intwgt=int(1000*coeff)
* ------------------------------------------------------------------------------
* 5. PROBABILITIES
* ------------------------------------------------------------------------------
replace sex = 0 if sex == 2
drop if missing(parhat) | missing(edu_lev) //touse == 0
drop if age < 25 | age >35
egen educ_groups = group(parhat edu_lev sex country)
decode edu_lev, ge(edu_str)
collapse (sum) intwgt (firstnm) parhat edu_lev edu_str sex country, by(educ_group)
drop educ_group
rename country geo
rename parhat edu_par
sort geo sex edu_par edu_lev
by geo sex edu_par: egen sum_edu = total(intwgt)
ge edu_share = intwgt/sum_edu
drop sum_edu
gsort geo sex edu_par -edu_lev
by geo sex edu_par: ge edu_cum = sum(edu_share)
ge odds = edu_cum/(1-edu_cum)
sort geo sex edu_lev edu_par
by geo sex edu_lev: ge odds_ratio = odds/odds[1]
sort geo edu_lev edu_par sex
drop if edu_lev == 1
save "$save/educ2dat.dta", replace
use "$save/educ2dat.dta", clear
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop transprob2dat
pr de transprob2dat
syntax , GEO(string)
clear all
cap file close _all
loc startyear $startyear
loc do $dofile
* Create file
tempname file
file open `file' using "$param/`geo'_RefinedEducFate_`startyear'.dat", w replace
* Get values for mortality rates
use "$save/educ2dat.dta", clear
keep if geo == "`geo'" // Country
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\EducRefinedFate.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' _tab "EDUC_MODEL EducModel = EM_BASE; //EN Model Selection" _n
file write `file' _tab "int" _tab "EducFirstCohortRefinedModel = `startyear'; //EN First birth cohort to apply refined model" _n
qui sum edu_par
loc edparmax = r(max)
loc edparmax1 = `edparmax' - 1
qui sum edu_lev
loc ownedmax = r(max)
loc ownedmax1 = `ownedmax' - 1
ge sexstr = "F"
replace sexstr = "M" if sex == 1
ge edustr = "Medium" if edu_lev == 2
replace edustr = "High" if edu_lev == 3
loc i = 1
forv own = 2/`ownedmax'{
loc edstr = edustr
file write `file' _tab _tab "//EN Odds of achieving `edstr'" _n
file write `file' _tab _tab "double" _tab "EducProg`i'Odds[EDUC_GROUP][SEX] = {" _n
file write `file' _tab _tab (odds_ratio[1]) ", "(odds_ratio[2]) "," _n
file write `file' _tab _tab (odds_ratio[3]) ", "(odds_ratio[4]) "," _n
file write `file' _tab _tab (odds_ratio[5]) ", "(odds_ratio[6]) "," "};" _n
drop if edu_lev == `own'
loc i = `i' +1
display(`i')
}
file write `file' "};" _n // Close parameters
file close _all
end
********************************************************************************
* WRITE .DAT FILES
foreach geo of local Ctry{
transprob2dat, geo("`geo'")
}
cap log c
exit
RefinedFertility.do¶
Create first birth rates by age and highest level of education and a parameter on females’ childlessness by highest level of education stored in RefinedFertility.dat. The first parameter is estimated based on the staring population file. The probit model that we apply estimates the probability for a woman to have a first birth in a given year depending on her age and education. Data on female childlessness are taken from the cohort fertility data base for AT, ES and FI and a publication from Berrington et al. 2015 for the UK.
The parameterfile contains the following parameters:
- selected_fertility_model is a model choice paramter indicating whether the refined fertility model shall be used
- CalibrateChildlessness indicates whether childlessness shall be calibrated in order to meet predefined target values
- ChildlessnessYob[YOB_START50][BIRTH1_GROUP] give the target values for female childlessness
- ChildlessnessYobTargets[YOB_BIRTH1][BIRTH1_GROUP]
- FirstBirthCohortRates[BIRTH1_GROUP][FERTILE_AGE_RANGE][YOB_BIRTH1] contain the first birth rates by education level, age and birth cohort
Data Source on first birth rates:
- Starting population file based on EU-SILC
Data Source on childlessness:
- for AT, ES and FI: http://www.cfe-database.org/database/
- for UK: Berrington, A.,Stone, J., Beaujouan, E., Educational differences in timing and quantum of childbearing in Britain: A study of cohorts born 1940−1969 DEMOGRAPHIC RESEARCH, VOLUME 33, ARTICLE 26, PAGES 733−764 https://www.demographic-research.org/Volumes/Vol33/26/
Version: Oct. 2020
Author(s): Tom Horvath and Marian Fink
/* #############################################################################
AUSTRIA
############################################################################# */
import delimited using "$cfe\Austria_Census 2001.csv", varnames(1) clear
loc yob_low $yob_low
loc yob_high $yob_high
destring cohort, ge(birthyear) force
keep if sex == "F"
qui sum birthyear
loc last_cohort = r(max)
keep if birthyear >= `yob_low' | birthyear == `last_cohort'
drop if birthyear > `yob_high'
keep if origin == "Total"
ge edu3 = 0
replace edu3 = 1 if edu == "ISCED3A-4A" | edu == "ISCED3B"
replace edu3 = 2 if edu == "ISCED5B-6"
keep birthyear edu3 women_total parity_0
collapse (sum) women_total parity_0, by(birthyear edu3)
ge share_childless = parity_0/women_total
qui sum birthyear
loc min_year = r(min)
loc max_year = r(max)
display `min_year'
loc n_exp_low = `min_year' - `yob_low' +1
display `n_exp_low'
expand `n_exp_low' if birthyear == `min_year'
sort edu birthyear
by edu: replace birthyear = `yob_low' if _n==1
by edu: replace birthyear = birthyear[_n-1] + 1 if _n > 1 & birthyear <= `min_year'
sort birthyear edu
loc n_exp_high = `yob_high' - `max_year' +1
display `n_exp_high'
expand `n_exp_high' if birthyear == `max_year'
sort edu birthyear
by edu: replace birthyear = birthyear[_n-1] + 1 if birthyear >= `max_year'
sort birthyear edu
save $save\childless_females_AT.dta, replace
/* #############################################################################
Spain
############################################################################# */
import delimited using "$cfe\Spain_Census 2011.csv", varnames(1) clear
loc yob_low $yob_low
loc yob_high $yob_high
destring cohort, ge(birthyear) force
keep if sex == "F"
qui sum birthyear
loc last_cohort = r(max)
keep if birthyear >= `yob_low' | birthyear == `last_cohort'
drop if birthyear > `yob_high'
drop if birthyear == .
keep if origin == "Total"
ge edu3 = 0
replace edu3 = 1 if edu == "ISCED3B-4A" | edu == "ISCED3C"
replace edu3 = 2 if edu == "ISCED5B-6"
keep birthyear edu3 women_total parity_0
collapse (sum) women_total parity_0, by(birthyear edu3)
ge share_childless = parity_0/women_total
qui sum birthyear
loc min_year = r(min)
loc max_year = r(max)
display `min_year'
loc n_exp_low = `min_year' - `yob_low' +1
display `n_exp_low'
expand `n_exp_low' if birthyear == `min_year'
sort edu birthyear
by edu: replace birthyear = `yob_low' if _n==1
by edu: replace birthyear = birthyear[_n-1] + 1 if _n > 1 & birthyear <= `min_year'
sort birthyear edu
loc n_exp_high = `yob_high' - `max_year' +1
display `n_exp_high'
expand `n_exp_high' if birthyear == `max_year'
sort edu birthyear
by edu: replace birthyear = birthyear[_n-1] + 1 if birthyear >= `max_year'
sort birthyear edu
save $save\childless_females_ES.dta, replace
/* #############################################################################
Finland
############################################################################# */
import delimited using "$cfe\Finland_Population Register 2015.csv", varnames(1) clear
loc yob_low $yob_low
loc yob_high $yob_high
destring cohort, ge(birthyear) force
keep if sex == "F"
qui sum birthyear
loc last_cohort = r(max)
keep if birthyear >= `yob_low' | birthyear == `last_cohort'
drop if birthyear > `yob_high'
drop if birthyear == .
keep if origin == "Total"
ge edu3 = 0
replace edu3 = 1 if edu == "ISCED3A-4A"
replace edu3 = 2 if edu == "ISCED5B-6"
keep birthyear edu3 women_total parity_0
collapse (sum) women_total parity_0, by(birthyear edu3)
ge share_childless = parity_0/women_total
qui sum birthyear
loc min_year = r(min)
loc max_year = r(max)
display `min_year'
loc n_exp_low = `min_year' - `yob_low' +1
display `n_exp_low'
expand `n_exp_low' if birthyear == `min_year'
sort edu birthyear
by edu: replace birthyear = `yob_low' if _n==1
by edu: replace birthyear = birthyear[_n-1] + 1 if _n > 1 & birthyear <= `min_year'
sort birthyear edu
loc n_exp_high = `yob_high' - `max_year' +1
display `n_exp_high'
expand `n_exp_high' if birthyear == `max_year'
sort edu birthyear
by edu: replace birthyear = birthyear[_n-1] + 1 if birthyear >= `max_year'
sort birthyear edu
save $save\childless_females_FI.dta, replace
/* #############################################################################
UK
NOTE no data for UK in cfe-database
use \\int.wsr.at\Nabu\ext_Projekte\Weltransim_PN-7316\parameterfiles\cfe-database\Berrington_ea__2015_ChildlessnessUK.pdf
instead
Low Medium High
1940-49 0.084 0.120 0.185
1950-59 0.1 0.149 0.206
1960-69 0.102 0.139 0.22
############################################################################# */
use $save\childless_females_FI.dta, clear
drop women_total parity_0
replace share_childless = 0.084 if edu == 0 & birthyear <1950
replace share_childless = 0.120 if edu == 1 & birthyear <1950
replace share_childless = 0.185 if edu == 2 & birthyear <1950
replace share_childless = 0.1 if edu == 0 & birthyear <1960 & birthyear >=1950
replace share_childless = 0.149 if edu == 1 & birthyear <1960 & birthyear >=1950
replace share_childless = 0.206 if edu == 2 & birthyear <1960 & birthyear >=1950
replace share_childless = 0.102 if edu == 0 & birthyear >= 1960
replace share_childless = 0.139 if edu == 1 & birthyear >= 1960
replace share_childless = 0.22 if edu == 2 & birthyear >= 1960
save $save\childless_females_UK.dta, replace
********************************************************************************
********************************************************************************
* Derive "first birth rates from starpop files
********************************************************************************
********************************************************************************
use $startpop/startpop.dta, clear
loc Ctry $Geosample
drop if idfamily ==.
*rename deh isced
ge intwgt=round(dwt)
drop if intwgt == .
rename idfamily famid
sort country famid age
by country famid: egen n_kids = total(role == 2)
by country famid: ge age_youngest = age[1]
keep if female == 1 & role < 2
cap drop sample
ge sample = n_kids == 0 | (n_kids == 1 & age_youngest == 1)
tab country sample
ge first_birth = n_kids == 1 & age_youngest == 1
tab age country if sample == 1 & age >= 15 & age < 50 [fw=intwgt], sum(first_birth) noobs nost nof
keep if sample == 1 & age >= 15 & age <= 50
ge age2 = age*age
ge first_pr = 0
loc Ctry $Geosample
foreach geo of local Ctry{
probit first_birth age age2 educ [fw=intwgt] if country == "`geo'"
predict first_pr_`geo', pr
replace first_pr = first_pr_`geo' if country == "`geo'"
}
tab age country if sample == 1 & age >= 15 & age < 50 [fw=intwgt], sum(first_pr) noobs nost nof
bys country age educ: egen pop = total(intwgt)
bys country age educ first_birth: egen first_births = total(intwgt)
gsort country age educ -first_birth
by country age educ: keep if _n == 1
ge share_first = first_births/pop
replace share_first = 0 if missing(share_first)
keep country age educ share_first first_pr
sort country age educ
save $save\first_births.dta, replace
* prepare grid for parameter table
qui tab country
loc nGeo r(r)
display `nGeo'
scalar a = `nGeo'
qui sum age
loc minage = r(min)
loc maxage = r(max)
loc nAge = `maxage' - `minage' + 1
display `minage'
display `maxage'
qui sum edu
loc nEdu = r(max) + 1
display `nEdu'
clear
set obs 1
ge country = "A"
expand a
loc i = 1
foreach geo of glo Geosample {
replace country = "`geo'" if _n == `i'
loc i = `i'+1
}
expand `nAge' // expand to age_groups
bys country: ge age = _n + `minage' - 1
expand `nEdu' // expand to edu groups
bys country age: ge educ = _n - 1
save "$save\grid_firstbirths.dta", replace
mmerge country age edu using \$save\first_births.dta, type(1:1)
tab _merge
sort country edu age
replace first_pr = 0 if missing(first_pr)
// in case that no birth observed at age x take value from x-1
by country edu: replace first_pr = first_pr[_n-1] if first_pr == 0 & first_pr[_n-1] != 0 & first_pr[_n+1] != 0 & _n>1 & _n < _N
save $save\first_births.dta, replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop refFert2dat
pr de refFert2dat
syntax , GEO(string)
cap file close _all
use $save\childless_females_`geo'.dta, clear
loc year $startyear
loc do $dofile
qui tab birthyear
loc anz = r(N)
display `anz'
* Create file
tempname file
file open `file' using "$param/`geo'_RefinedFertility_`year'.dat", w replace
* Header
file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\RefinedFertility.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' "//EN Fertility model selection" _n
file write `file' _tab "SELECTED_FERTILITY_MODEL" _tab "selected_fertility_model = SFM_MACRO;"_n
file write `file' _n
file write `file' _tab "logical" _tab "CalibrateChildlessness = TRUE; //EN Calibrate cohort childlessness to targets y/n" _n
file write `file' _n
file write `file' _tab "//EN Childlessness in older population (female)" _n
file write `file' _tab "double" _tab "ChildlessnessYob[YOB_START50][BIRTH1_GROUP] = {" _n // YOB_START50 1904 1963
forvalues i = 1(1)`anz' {
file write `file' _tab (share_childless[`i']) ", "
}
file write `file' _tab "}; " _n
file write `file' _tab "//EN Calibtration Targets Cohort Childlessness" _n
file write `file' _tab "double" _tab "ChildlessnessYobTargets[YOB_BIRTH1][BIRTH1_GROUP] = {" _n
loc yob_bir_l $yob_birth_low
loc yob_bir_h $yob_birth_high
loc n_years = `yob_bir_h' - `yob_bir_l' + 1
qui sum birthyear
loc lastyear = r(max)
keep if birthyear == `lastyear'
file write `file' _tab "(`n_years'){" (share_childless[1]) "," (share_childless[2]) "," (share_childless[3]) ",}" _n
file write `file' _tab "}; " _n
file write `file' _n
file write `file' "//EN First Birth Rates" _n
file write `file' "double" _tab "FirstBirthCohortRates[BIRTH1_GROUP][FERTILE_AGE_RANGE][YOB_BIRTH1] = {"_n //15-49
use $save\first_births.dta, clear
keep if country == "`geo'"
*keep if country == "AT"
qui sum edu
loc nedu = r(max)
display `nedu'
qui sum age
loc agemin = r(min)
loc agemax = r(max)
display `agemax'
loc yob_bir_l $yob_birth_low
loc yob_bir_h $yob_birth_high
loc n_years = `yob_bir_h' - `yob_bir_l' + 1
forvalues educ = 0(1)`nedu' {
*display `nedu'
forvalues a = `agemin'(1)`agemax'{
loc x = `a'-`agemin' + 1
*display `x'
file write `file' "(`n_years')" (first_pr[`x']) "," _n
}
drop if educ == `educ'
file write `file' _n
}
file write `file' _tab "};" _n
file write `file' _tab "};" _n
file close _all
end
********************************************************************************
* WRITE .DAT FILES
loc Ctry $Geosample
foreach geo of local Ctry{
refFert2dat , geo("`geo'")
}
RefinedMortality.do¶
Paramters on sex and education specific remaining life expectancy at age 25 and 65 are produced based on OECD data and XXX for Spain and stored in RefinedMortality.dat.
The parameterfile contains the following parameters:
- SelectedMortalityModel is a model choice parameter indicating whether the refined mortality model shall be used.
- LifeExpectancy[SEX][LIFE_EXPECT][SIM_YEAR_RANGE][MORTALITY_GROUP] gives the remaining life expectancy by education level (low, medium or high), sex and agegroup (25 and 65).
Data Source:
- for AT, FI and UK: Murtin, F., et al. (2017), “Inequalities in longevity by education in OECD countries: Insights from new OECD estimates”, OECD Statistics Working Papers, No. 2017/02, OECD Publishing, Paris
- for ES: Requena, Miguel (2017) La desigualdad ante la muerte: educación y esperanza de vida en España. Perspectives Demogràfiques Nr. 06.
Version: April 2019
Author(s): Tom Horvath and Marian Fink
loc startyear $startyear
loc endyear $yearmax
loc do $dofile
clear all
cap file close _all
* ------------------------------------------------------------------------------
* Create file AUT
* ------------------------------------------------------------------------------
tempname file
file open `file' using "$param/AT_RefinedMortality_`startyear'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\RefinedMortality.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' _tab "SELECTED_MORTALITY_MODEL SelectedMortalityModel = SMM_MICRO_ALIGNED; //EN Child mortality model selection" _n
loc numbyears = `endyear' - `startyear' + 1
display `numbyears'
file write `file' _tab "//EN Period life expecatancy" _n
file write `file' _tab "double" _tab "LifeExpectancy[SEX][LIFE_EXPECT][SIM_YEAR_RANGE][MORTALITY_GROUP] = {" _n // open fertility table
file write `file' _tab "// Female" _n
file write `file' _tab _tab "// LE_25" _n
file write `file' _tab _tab "(`numbyears') { 57.58, 59.29, 60.63,}," _n
file write `file' _tab _tab "// LE_65" _n
file write `file' _tab _tab "(`numbyears') { 20.58, 21.49, 22.34,}," _n
file write `file' _tab "// Male" _n
file write `file' _tab _tab "// LE_25" _n
file write `file' _tab _tab "(`numbyears') { 51.40, 54.03, 57.83,}," _n
file write `file' _tab _tab "// LE_65" _n
file write `file' _tab _tab "(`numbyears') { 16.79, 17.92, 20.04,}," _n
file write `file' _tab "};" _n
file write `file' "};"
* ------------------------------------------------------------------------------
* Create file FI
* ------------------------------------------------------------------------------
tempname file
file open `file' using "$param/FI_RefinedMortality_`startyear'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\RefinedMortality.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' _tab "SELECTED_MORTALITY_MODEL SelectedMortalityModel = SMM_MICRO_ALIGNED; //EN Child mortality model selection" _n
loc numbyears = `endyear' - `startyear' + 1
display `numbyears'
file write `file' _tab "//EN Period life expecatancy" _n
file write `file' _tab "double" _tab "LifeExpectancy[SEX][LIFE_EXPECT][SIM_YEAR_RANGE][MORTALITY_GROUP] = {" _n // open fertility table
file write `file' _tab "// Female" _n
file write `file' _tab _tab "// LE_25" _n
file write `file' _tab _tab "(`numbyears') { 56.18, 59.06, 60.94,}," _n
file write `file' _tab _tab "// LE_65" _n
file write `file' _tab _tab "(`numbyears') { 20.68, 21.56, 22.70,}," _n
file write `file' _tab "// Male" _n
file write `file' _tab _tab "// LE_25" _n
file write `file' _tab _tab "(`numbyears') { 48.98, 52.51, 56.56,}," _n
file write `file' _tab _tab "// LE_65" _n
file write `file' _tab _tab "(`numbyears') { 16.57, 17.60, 19.40,}," _n
file write `file' _tab "};" _n
file write `file' "};"
* ------------------------------------------------------------------------------
* Create file ES !!!! no real data at the moment Italien data implemented
* ------------------------------------------------------------------------------
tempname file
file open `file' using "$param/ES_RefinedMortality_`startyear'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\RefinedMortality.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' _tab "SELECTED_MORTALITY_MODEL SelectedMortalityModel = SMM_MICRO_ALIGNED; //EN Child mortality model selection" _n
loc numbyears = `endyear' - `startyear' + 1
display `numbyears'
file write `file' _tab "//EN Period life expecatancy" _n
file write `file' _tab "double" _tab "LifeExpectancy[SEX][LIFE_EXPECT][SIM_YEAR_RANGE][MORTALITY_GROUP] = {" _n // open fertility table
file write `file' _tab "// Female" _n
file write `file' _tab _tab "// LE_25" _n
file write `file' _tab _tab "(`numbyears') { 60.02, 61.01, 61.72,}," _n
file write `file' _tab _tab "// LE_65" _n
file write `file' _tab _tab "(`numbyears') { 22.33, 22.75, 23.27,}," _n
file write `file' _tab "// Male" _n
file write `file' _tab _tab "// LE_25" _n
file write `file' _tab _tab "(`numbyears') { 53.60, 55.47, 57.29,}," _n
file write `file' _tab _tab "// LE_65" _n
file write `file' _tab _tab "(`numbyears') {18.18, 18.77, 19.52,}," _n
file write `file' _tab "};" _n
file write `file' "};"
* ------------------------------------------------------------------------------
* Create file UK
* ------------------------------------------------------------------------------
tempname file
file open `file' using "$param/UK_RefinedMortality_`startyear'.dat", w replace
* File header
file write `file' "// created on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\RefinedMortality.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' _tab "SELECTED_MORTALITY_MODEL SelectedMortalityModel = SMM_MICRO_ALIGNED; //EN Child mortality model selection" _n
loc numbyears = `endyear' - `startyear' + 1
display `numbyears'
file write `file' _tab "//EN Period life expecatancy" _n
file write `file' _tab "double" _tab "LifeExpectancy[SEX][LIFE_EXPECT][SIM_YEAR_RANGE][MORTALITY_GROUP] = {" _n // open fertility table
file write `file' _tab "// Female" _n
file write `file' _tab _tab "// LE_25" _n
file write `file' _tab _tab "(`numbyears') { 56.74, 59.64, 60.73,}," _n
file write `file' _tab _tab "// LE_65" _n
file write `file' _tab _tab "(`numbyears') { 19.74, 21.88, 22.52,}," _n
file write `file' _tab "// Male" _n
file write `file' _tab _tab "// LE_25" _n
file write `file' _tab _tab "(`numbyears') { 53.84, 56.65, 58.19,}," _n
file write `file' _tab _tab "// LE_65" _n
file write `file' _tab _tab "(`numbyears') { 17.61, 19.40, 20.55,}," _n
file write `file' _tab "};" _n
file write `file' "};"
/*
From Data Table:
AT Age 25
F 57.58 59.29 60.63
M 51.40 54.03 57.83
Age 65
F 20.58 21.49 22.34
M 16.79 17.92 20.04
Fi Age 25
F 56.18 59.06 60.94
M 48.98 52.51 56.56
Age 65
F 20.68 21.56 22.70
M 16.57 17.60 19.40
UK Age 25
F 56.74 59.64 60.73
M 53.84 56.65 58.19
Age 65
F 19.74 21.88 22.52
M 17.61 19.40 20.55
*/
SchoolEnrolment.do¶
Based on the starting population file we derive the share of people in education by age (18 to 30) and gender which are stored in SchoolEnrolment.dat. These enrolement rates by age and sex are assumed to remain stable throughout the simulation period.
The parameterfile contains the following parameters:
- SchoolEnrolmentRates[SEX][AGE_EDUC_ALIGN][SIM_YEAR_RANGE] give school enrolment rates by sex, age and simulation year (2010 to 2150)
- AlignSchoolEnrolmentRates is a model choice parameter indicating whether enrolment rates shall be aligned
Data Source:
- Starting population file based on EU-SILC
Version: March 2020
Author(s): Tom Horvath
use $startpop/startpop.dta, clear
loc Ctry $Geosample
drop if idhh ==.
rename deh isced
ge intwgt=round(dwt)
drop if intwgt == .
keep if age >= 15 & age <= 30
ge in_edu = dec > 0
tab age country [fw = round(intwgt)] if age>=15 & age <=30, sum(in_edu) noob nof nost
bys country age female: egen pop = total(intwgt)
bys country age female in_edu: egen pop_inedu = total(intwgt)
keep if in_edu == 1
ge in_edu_share = pop_inedu/pop
bys country age female: keep if _n == 1
keep country age female in_edu_share
gsort country -female age
save $save/schoolEnrolment.dta, replace
********************************************************************************
* PROGRAM FOR WRITING .DAT FILES
cap pr drop inedu2dat
pr de inedu2dat
syntax , GEO(string)
cap file close _all
use $save/schoolEnrolment.dta, clear
keep if country == "`geo'"
loc startyear $startyear
loc do $dofile
qui sum age
loc minage = r(min)
loc maxage = r(max)
loc anz_age = `maxage'-`minage'+1
display `anz_age'
loc simyears = $yearmax - $startyear +1
display `simyears'
* Create file
tempname file
file open `file' using "$param/`geo'_SchoolEnrolment_`startyear'.dat", w replace
* Header
file write `file' "// generated on `c(current_date)' at `c(current_time)'" _n
file write `file' "// generated by `do'\SchoolEnrolment.do"
file write `file' _n
file write `file' "parameters" _n
file write `file' "{" _n // Open parameters
file write `file' _tab "//EN School enrolment rates" _n
file write `file' _tab "double" _tab "SchoolEnrolmentRates[SEX][AGE_EDUC_ALIGN][SIM_YEAR_RANGE] = { " _n
file write `file' _tab _tab "//Female" _n
forvalues i = 1(1)`anz_age' {
file write `file' _tab _tab "(`simyears')" _tab (in_edu_share[`i']) ", "
}
file write `file' _n
drop if female == 1
file write `file' _tab _tab "//Male" _n
forvalues i = 1(1)`anz_age' {
file write `file' _tab _tab "(`simyears')" _tab (in_edu_share[`i']) ", "
}
file write `file' _n
file write `file' _tab "}; " _n
file write `file' _tab "//EN Align school enrolment rates" _n
file write `file' _tab "logical" _tab "AlignSchoolEnrolmentRates = TRUE;" _n
file write `file' _tab "};" _n
file close _all
end
********************************************************************************
* WRITE .DAT FILES
foreach geo of local Ctry{
inedu2dat , geo("`geo'")
}