To merge different modules and/or waves of the SHARE data on individual level mergeid is the key person identifier. mergeid is non-varying across waves. If the data are to be merged on household level one of the hhid`w’ (where `w’ stands for the respective wave) variables should be used as key identifier.
The coverscreen (cv_r) includes all members of the household – also ineligible and non-responding household members as well as end-of-life interviews conducted with proxy respondents for respondents who deceased between waves. The case number in the other regular CAPI modules is lower because they only include persons with interview. Household members without interview can be identified by the variable interview in the cv_r module. It takes the value 0 for those who did not do an interview.
In SHARE, partners can be identified by the mergeidp`w’ (where `w’ stands for the respective wave) which indicates the mergeid of a respondent’s partner. Each couple has a coupleid indicated by the variable coupleid`w’. The coupleid is generated using mergeid of both partners and is therefore unique to each couple as well as fix across waves if the couple stays the same.
The reason for this is that time constant variables are only asked in the baseline interview. The baseline interview is the first SHARE interview of each respondent. SHARE’s sample is refreshed from time to time in several countries, which is why the baseline interview is not necessarily the Wave 1 interview.
Height is one example for such time constant variables. If users want to use these variables in later waves than the one in which the baseline interview took place, the information has to be transferred by first merging the waves together and then assigning the information to later waves. Furthermore, some questions for the longitudinal sample are only asked if there was a change since last interview, e.g. marital status. This also leads to a high amount of missing values in the respective variable.
When a respondent does not know (DK) or refuses (RF) the answer to a question about amounts of money, usually an unfolding sequence of bracket questions starts. The aim of unfolding brackets is to get at least a range in which e.g. the respondent´s income is located.
There are three entry points, the starting point is chosen randomly. The public release includes the country-specific bracket values (in Euros) and the final respondent´s category. When a DK or RF is given during the unfolding bracket sequence, the value for the final category is set to either DK or RF. The name of unfolding bracket variables contains “ub” after module identifier and question number (see question 5.6 for the general naming format). For more information on unfolding brackets see SHARE Release Guide 7.0.0.
The naming of variables is harmonised across waves. Variable names in the CAPI instrument data use the following format: mmXXXyyy_LL. “mm” is the module identifier, e.g. DN for the demographics module, “XXX” refers to the question number, e.g. 001, and “yyy” are optional digits for dummy variables (indicated by “d”), euro conversion (indicated by “e”) or unfolding brackets (indicated by “ub”). The separation character “_” is followed by “LL” optional digits for category or loop indication (“outer loop”).
The ado-file sharetom is a programme that recodes missing values and labels them appropriately. If users want to apply sharetom.ado we recommend executing it immediately after opening the data file or after merging the modules needed. Note that sharetom is updated from time to time. The current version is sharetom5.
For longitudinal analyses on children users cannot rely on the order of the children in the CH module. It is necessary to match them on gender and year of birth - this will lead to correct merges in most cases. There are a couple of reasons behind this. First, respondents are supposed to report on their children in a defined order, but they may not necessarily do so. Second, partners may change and respondents always are supposed to report on both partners´ children. Third, you can never exclude reporting errors.
In Wave 4, children named by the respondents in the CH module cannot be linked directly to the SP and FT module. The reason is a change in the so-called list with relations that comprehends all persons of the respondents´ social environment. Information on persons receiving or providing social support or financial transfers from/to the respondents is based on this list. Unlike Waves 1 and 2 in which the list included up to 9 children, the list with relations in Wave 4 includes up to 7 social network members and just one 'other child' option. Only those children named by the respondents as members of their social network are explicitly listed for the interviewer on the screen. It is thus not possible to specify children for questions on social support or financial transfers who are not named as social network members (for whatever reason).
The questions DN026_1 and DN026_2 contain the information if the respondents´ natural parents are still alive (dn026_1 for the respondents´ mother and dn026_2 for the respondents´ father). The routing for these variables involves information from previous waves for respondents who already participated and information from the social network module. Similar to other variables in SHARE, the amount of missing values can be reduced by merging the wave 4 data with previous waves. Based on the assumption that persons belonging to the respondent´s social network are still alive, the proportion of missing values can be additionally reduced by using the sn005* variables. Unfortunately the routing for DN026_1 and DN026_2 did not work adequately for all respondents in the wave 4 questionnaire. Not every respondent who should have been asked was indeed asked, so still a high amount of missing values remains.
In order to compensate this shortcoming, all participants with missing information in DN026_1 and DN026_2 have received these questions in Wave 5.