Newsletter
Subscribe to the SHARE newsletter by sending an e-mail to info@share-project.org





Intranet Login
Username
Password

Frequently Asked Questions

In order to search for keywords, please use your browsers search function (CTRL + F).

1. Data Access

1.1 Who can use the data?
1.2 Where can I download the data? 
1.3 Which data formats are available?
1.4 What can I do when I lost username and/or password for downloading the data?

2. Documentation 

2.1 Which documentation files do exist? 
2.2 Which types of questionnaires are used in SHARE?  

3Methodology

3.1 How are the data collected?
3.2 Who is eligible?
3.3 Why are there different types of respondents?
3.4 What are proxy-interviews?
3.5 How are issues of attrition dealt with?
3.6 Is there a data set that links administrative data and the SHARE data?
3.7 How is SHARE ethically approved?

4 . Structure and Content

4.1 What information does the SHARE questionnaire contain?
4.2 What is the content of SHARELIFE?
4.3 What is easySHARE?
4.4 Does SHARE contain information on race/ethnicity?
4.5 What kind of information is provided by the interviewer observation module (IV)?
4.6 What is a "drop off" questionnaire?
4.7 What are "vignettes"?
4.8 What kind of physical measurements are included in SHARE?
4.9 How is mortality assessed?

5. Handling of the Data

5.1 How can I merge the data?
5.2 Why does the case number of the coverscreen not match with the other modules?
5.3 How can I identify partners?
5.4 Why do some variables like education or height contain so many missing values?
5.5 What are “unfolding brackets”?
5.6 What is the general naming format of variables?
5.7 What is the ado-file sharetom good for?
5.8 Is longitudinal analysis about respondents´ children possible?
5.9 Can the children of the CH module be linked to information on social networks (SN), social support (SP) and financial transfers (FT)?
5.10 Why do the variables on whether natural parent is still alive (dn026_1 and dn026_2) contain so many missing values in Wave 4?

6. Generated Variables

6.1 What is the purpose of generated variables and which generated variables are provided?
6.2 What is the allwaves-coverscreen good for?
6.3 Can SHARE data be linked to administrative data?
6.4 What is the content of gv_exrates?
6.5 What is the Job Episodes Panel (JEP)?
6.6 Which generated health variables are provided?
6.7 How is education measured in SHARE?
6.8 What information does the gv_isco module contain?
6.9 What kind of geographical information is available in the gv_housing module?
6.10 How are social networks captured in SHARE?
6.11 How is deprivation measured?
6.12 Does SHARE contain a measure for social security wealth?
6.13 When do I need gv_grossnet?
6.14 What information does gv_children contain?
6.15 Which generated variables are stored in gv_dbs?

7. Weights

7.1 There are many SHARE papers where researchers don't use weights in data analyses. Is there a general strategy with this topic?
7.2 Which weights should be used for cross-sectional analyses and which for longitudinal analyses?
7.3 In section 8.7 of the Wave 4 Innovations & Methodology it is mentioned that a weighting do-file and a Stata command cweight.ado are available for SHARE users. Where can I find them?
7.4 What is the difference between the weights across waves?
7.5 Can we drop sample observations with missing weights?

8. Imputations

8.1 What is the imputation method? 
8.2 Can the imputed variables be used for longitudinal analysis?
8.3 Why does the imputations module contain so many cases?
8.4 Why are the imputed variables of variables in Release 5.0.0 onwards different from the ones of previous releases? 
8.5 Why are there two variables for household income?
8.6 Are monetary amounts in the imputation data set Euro converted?

 

1.         Data Access

1.1 Who can use the data?

After filling out a user statement available here, every person with scientific affiliation can download the data for free as long as the data are used for purely scientific purposes. More information on the conditions for data access is available here.

^ TOP 

1.2 Where can I download the data?

SHARE data is distributed by CentERdata which is located at the Tilburg University campus in the Netherlands. The download procedure and the conditions for data access are described here.

^ TOP 

1.3 Which data formats are available?

SHARE data are provided in Stata and SPSS format. easySHARE is additionally available for the software R. For the use with other statistical softwares, the data have to be transferred by users themselves.

^ TOP 

1.4 What can I do if I lost my username and/or password for downloading the data?

If you lost your username and/or password for downloading the data, please enter your email address here and you will receive a reminder email with your password. If you do not remember the email address you used for registration, please contact the SHARE Research Data Center.

^ TOP

 

2.         Documentation

2.1. Which documentation files do exist?

A set of documentation files is offered to facilitate the use of the SHARE data. The data resource profile published in the International Journal of Epidemiology provides a compact overview on SHARE. Additional to the wave- and country-specific questionnaires, release guide 5.0.0 is specifically directed to researchers working with the data

Except for Wave 2 there are also wave-specific methodology volumes. Methodological changes in Wave 2 are shortly summarized in chapter 8 of the First Results Book (FRB) of Wave 2. Because of its divergence from the other waves, the SHARELIFE documentation is not based on the documentation files of previous waves. Furthermore, a tool for the identification of country-specific deviations is to date available for Waves 1 and 2. Also provided is a tool to overhaul deviations between Waves 1, 2, 4 (and 5). Table 2 contains the links to the essential documentation files of SHARE. 

Table 1: Overview of documentation files for waves 1 to 6

Wave 1

Wave 2

Wave 3 (SHARELIFE)

Wave 4

Wave 5

Wave 6

Release Guides

SHARE Release Guide 6.0.0 (PDF)

SHARELIFE Release Guide 6.0.0 (PDF)

SHARE Release Guide 6.0.0 (PDF)

Questionnaires

X

X

X

X

X

X

Deviations between countries

X

X

 

 

 

 

…and waves

Deviation between Waves 1, 2...

4

… 5 …

… and 6

Methodology

X

Chapter 8 of W2 FRB

X

X

X

 

Scales Manual

Scales and Multi-Item Indicators (PDF)

Data Resource Profile

Börsch-Supan A. et al. (2013): Data Resource Profile: The Survey of Health, Ageing and Retirement in Europe (SHARE), Int J of Epidemiology

 

 ^ TOP
 

2.2. Which types of questionnaires are used in SHARE?

SHARE applies a concept of ex-ante harmonization: there is one common generic questionnaire that is translated into the 31 national languages (in some countries more than one language is used) using an internet based translation tool and processed automatically in a common CAPI instrument. The generic questionnaire and the country-specific questionnaire versions can be downloaded from the SHARE website (see the links in Table 2). However, some internationally highly diverse variables require country-specific measurements and ex-post harmonization, for example in the areas of education (ISCED) or occupation (ISCO, NACE).

Apart from generic and country-specific questionnaires there are also special questionnaire types like the coverscreens, drop offs, vignettes and end-of-life questionnaires. The coverscreen is the first module of each interview. It collects basic demographic information about every person currently living in the household. The coverscreen questionnaire is completed by only one member of the household on behalf of all household members. The interview usually ends with the self-completion of a paper & pencil questionnaire, the so-called drop off (see question 4.6). Another special self-completion questionnaire is the vignettes questionnaire (see question 4.7) that was collected in Wave 1 and 2. Vignettes are supposed to improve cross-national comparability. If a respondent deceased between waves, SHARE tries to conduct an end-of-life interview (see qestion 4.9) with a proxy respondent. The end-of-life questionnaire mainly contains information on life circumstances in the year before the respondent deceased and on the circumstances of death. 

^ TOP

3.         Methodology

3.1 How are the data collected?

SHARE data collection is based on computer-assisted personal interviewing (CAPI). The interviewers conduct face-to-face interviews using a laptop computer on which the CAPI instrument is installed. Personal interviews are necessary for SHARE because they make the execution of physical tests and the collection of biomarkers possible. Exceptions are the drop off and the vignettes questionnaires which are conducted via paper & pencil as well as the end-of-life interviews that can be conducted via CATI (computer-assisted telephone interview), too. For more details on SHARE data collection see the methodology of Börsch-Supan, A. and H. Jürges (2005).

^ TOP  

3.2 Who is eligible?

The SHARE target population consists of all persons aged 50 years and over at the time of sampling who have their regular domicile in the respective SHARE country. A person is excluded if she or he is incarcerated, hospitalized or out of the country during the entire survey period, unable to speak the country’s language(s) or has moved to an unknown address. In Wave 1 all household members born 1954 or earlier, are eligible for an interview. Starting in the second wave, for new countries or refreshment samples, there is only one selected respondent per household who has to be born 1956 or earlier in wave 2, 1960 or earlier in wave 4, 1962 or earlier in wave 5 and 1964 or earlier in wave 6. In addition – in all waves – current partners living in the same household are interviewed regardless of their age.

All SHARE respondents who were interviewed in any previous wave are part of the longitudinal sample. If they have a new partner living in the household, the new partner is eligible for an interview as well, regardless of age. Age eligible respondents who participated are traced and re-interviewed if they move within the country and end-of-life interviews are conducted if they decease. Younger partners, new partners and partners who never participated in SHARE will not be traced and are not eligible for an end-of-life interview.

^ TOP  

3.3 Why are there different types of respondents?

In order to save time and reduce the respondents' interview burden, the CAPI main questionnaire is designed in a way that not every eligible household member is asked every questionnaire module. Household respondents answer questions on housing, household income and consumption representative for all household members. On behalf of the couple, financial respondents answer financial transfer and asset questions and family respondents answer questions on children and social support – also on behalf of the couple. The respondent types are indicated by the variables hou_resp  (household respondent), fin_resp (financial respondent) and fam_resp (family respondent) in the cv_r module as well as in the technical variables module. The SHARELIFE questionnaire does not differentiate between respondent types.

^ TOP  

3.4 What are proxy-interviews?

If physical and/or cognitive limitations make it too difficult for a respondent to complete the interview her-/himself it is possible that the sample respondent is assisted by a so-called proxy respondent to complete the interview (“partly proxy” interview). If the proxy respondent answers the entire questionnaire in lieu of the respondent, the interview is referred to as a “fully proxy” interview. Examples of conditions under which proxy interviewing is allowed are hearing loss, speaking problems, Alzheimer´s disease and difficulties in concentrating for the whole interview time period. Proxy respondents are also asked for end-of-life interviews in case of a respondent´s decease. Some questionnaire modules are defined as non-proxy sections because those cannot be answered by other persons. Cognitive functioning, mental health (partly), grip strength, walking speed, activities, and expectations modules are non-proxy sections. The other sections contain the information on who answered the section at the end of the respective questionnaire module: (1) respondent only, (2) respondent and proxy or (3) proxy only.

^ TOP  

3.5 How are issues of attrition dealt with?

Sample attrition means that respondents drop out from the survey over time. The reasons for the drop-out can be multifarious. For a longitudinal sample which was drawn randomly at the beginning of the data collection process, sample attrition would not pose any challenges if the attrition occurs randomly – which is not the case in reality. Besides refreshing the sample in several countries (which is also dependent on funding) the strategy of SHARE to deal with problems of sample attrition is to dedicate special effort into re-interviewing respondents who participated in previous waves and to provide calibrated weights. Under certain conditions, these weights may help to reduce the potential selectivity bias generated by sample attrition and unit nonresponse.

^ TOP  

3.6 Is there a data set that links administrative data and the SHARE data?

Survey data can cover a wide range of topics. However,  a survey cannot cover all topics of interest and information provided by respondents could be incomplete or inaccurate. Administrative data is more accurate but usually limited to a certain topic. Linking survey data with administrative data is a way to combine the best of both worlds. Upon respondents’ written consent, administrative data of the German Pension Fund can be linked to the survey data of the German subsample of SHARE (SHARE-RV). For more information on this project see question 6.3.

^ TOP

 

3.7 How is SHARE ethically approved?

The SHARE study is subject to continuous ethics review. During Waves 1 to 4, SHARE was reviewed and approved by the Ethics Committee of the University of Mannheim. Wave 4 and the continuation of the project were reviewed and approved by the Ethics Council of the Max Planck Society. In addition, the country implementations of SHARE were reviewed and approved by the respective ethics committees or institutional review boards whenever this was required. The numerous reviews covered all aspects of the SHARE study, including sub-projects and confirmed the project to be compliant with the relevant legal norms and that the project and its procedures agree with international ethical standards. Please see overview and summary of the ethics approvals for more information.

^ TOP

 

 

4.         Structure and Content

4.1 What information does the SHARE questionnaire contain?

The SHARE interview consists of various thematic blocks or modules. Prior to the main interview, the coverscreen (cv_r module) is completed by one household member on behalf of the household.  The main questionnaire is based on various different CAPI modules that are listed in table 2. To pick up contemporary issues and due to alterations and time constraints not every module was part of every wave.

Table 2: Questionnaire modules of Waves 1, 2, 4, 5 and 6

 

Questionnaire-Modules

Wave 1

Wave 2

Wave 4

Wave 5

Wave 6

CV_R

Coverscreen on individual level

X

X

X

X

X

DN

Demographics and Networks

X

X

X

X

X

SN

Social Networks

 

 

X

 

X

CH

Children

X

X

X

X

X

PH

Physical Health

X

X

X

X

X

BR

Behavioral Risks

X

X

X

X

X

CF

Cognitive Function

X

X

X

X

X

MH

Mental Health

X

X

X

X

X

HC

Health Care

X

X

X

X

X

EP

Employment and Pensions

X

X

X

X

X

IT

Computer Use

 

 

 

X

 

MC

Mini Childhood

 

 

 

X

 

GS

Grip Strength

X

X

X

X

X

WS

Walking Speed

X

X

 

 

 

CS

Chair Stand

 

X

 

X

 

BS

Blood Spot

 

 

 

 

X

PF

Peak Flow

 

X

X

 

X

SP

Social Support

X

X

X

X

X

FT

Financial Transfers

X

X

X

X

X

HO

Housing

X

X

X

X

X

HH

Household Income

X

X

X

X

X

CO

Consumption

 X

 X

X

X

AS

Assets

X

X

X

X

X

AC

Activities

X

X

X

X

X

EX

Expectations

X

X

X

X

X

IV

Interviewer Observations

 X

X

X

X

 

Special Questionnaire Modules

 

 

 

 

 

XT

End-of-Life Interview

 

X

X

X

 

DO

Drop-off

X

X

X

X

X

VI

Vignettes

X

X

 

 

 

TC

Technical Variables

X

X

X

X

X

 

^ TOP

4.2 What is the content of the SHARELIFE?

The SHARELIFE questionnaire has a different focus than the regular waves. It contains all important areas of the respondents’ live histories, ranging from childhood conditions, partners and children over housing and financial history and employment history to detailed questions on health and health care. Table 3 indicates the questionnaire modules of SHARELIFE. Additionally, some single questions on household income (HH) and present physical health (PH) are included.


Table 3: Questionnaire modules of SHARELIFE

CV_R

Coverscreen on individual level

ST

Demographics

RC

Retrospective Children

RP

Retrospective Partner

AC

Accommodation Section

CS

Childhood Section

RE

Retrospective Employment

WQ

Work Quality

DQ

Disability

FS

Financial History Section

HS

Health Section

HC

Health Care

GL

General Life

GS

Grip Strength

IV

Interviewer Observations

XT

End-of-Life Interview


^ TOP 
 

4.3 What is easySHARE?

easySHARE is a simplified HRS-adapted dataset for student training, and for researchers who have little experience in quantitative analyses of complex survey data. easySHARE stores information of all respondents and of all currently released data collection waves in one single dataset. For the subset of variables covered in easySHARE, the complexity was considerably reduced. easySHARE is stored as long format panel dataset. In addition to the data and the release guide, the download zip files include the Stata programme that was used to extract easySHARE from the regular SHARE data. This allows users to retrace how each variable was extracted and modified and facilitates adding or changing information. It can also be used as an example of how to create an analysis dataset yourself. For more information please click here.

^ TOP

 

4.4 Does SHARE contain information on race/ethnicity?

No, SHARE does not contain information about race/ethnicity. But it contains the respondents´ country of birth (dn004_ + dn005c) and the respondents´ citizenship (dn007_ + dn008c), both available in the demographics module. From Wave 5 onwards SHARE also includes the country of birth of the respondents´ parents (dn504c + dn505c).  The introduction of the latter variables enables the identification of second-generation migrants. Citizenship and country of birth are coded according to ISO 3166-1 (numeric-3).

^ TOP 

4.5 What kind of information is provided by the interviewer observation module (IV)?

This module is answered by the interviewer right after finishing the interview. It contains information on the interviewing experience which is important in order to understand the circumstances under which the interview was conducted.

^ TOP 

4.6 What is a “drop off” questionnaire?

In Waves 1, 2, 4, 5 and 6 the interview ends with the self-completion of a paper & pencil questionnaire. This questionnaire includes additional questions on e.g. mental and physical health, health care and social networks. Partly the content of the drop off questionnaire is country-specific. Especially the Wave 4 drop off questionnaire contains many country-specific questions aside from a generic part on health and health care.

In the drop off data, the generic variables have variable names starting with “q”, country-specific variables contain the country code as prefix, e.g. “at_” for Austria. The drop offs differ across waves. This is due to new questions added and questions that are not asked anymore. In addition some questions of the Wave 1 drop off are asked in the CAPI in Wave 2 (see appendix of  SHARE Release Guide 6.0.0). 

^ TOP 

4.7 What are “vignettes”?

For the vignettes extra samples were taken in eight countries in Wave 1 (BE, DE, FR, GR, IT, NL, SP, SW) and in eleven countries in Wave 2 (BE, CZ, DK, DE, FR, GR, IT, NL, PL, SP, SW) in order to collect a special self-completion questionnaire with anchoring vignette questions. These are supposed to improve cross-national comparability. Two types were randomly assigned to the respondents. They differ with regard to question order and gender of the people described in the statements. The variable “type” contains information on the vignette type. The labels of the variables show which questions correspond to the other type.

^ TOP 

4.8 What kind of physical measurements are included in SHARE?

Physical measurements and biomarkers are part of SHARE since there is promising scientific value to it. Standard health questions are often subject to the respondents´ evaluation or perception. Objective measurements can help (1) to validate respondents´ self-reports, (2) to understand the complex relationships between social status and health and their physiological pathways and (3) to identify pre-disease pathways. SHARE combines self-reports on health with four physical performance measurements: grip strength (GS), walking speed (WS), peak-flow (PF) and chair stand (CS). Additionally, dried blood spots (DBS) samples were collected in SHARE wave 6 in 12 countries: Belgium, Denmark, Estonia, France (subsample only), Germany, Greece, Israel, Italy, Slovenia, Spain, Sweden, and Switzerland. The DBS data is not yet available. Please subscribe for the SHARE users’ newsletter and/or check our homepage to be informed as soon as the data will be available.

^ TOP 

4.9 How is mortality assessed?

SHARE requests that the interviewers confirm the decease of a respondent by a proxy-respondent. In case of decease, interviewers try to conduct an end-of-life interview with a proxy-respondent. The proxy-respondent can be a family member, a household member, a neighbor or any other person of the closer social network of the deceased respondent. The end-of-life interview mainly contains information on the circumstances of death like time and cause of death. The variables are stored in the xt-module of Wave 2 onwards. Apart from the end-of-life interview, the gv_allwaves_cv_r  module contains the variables deadoralive, deceased_year, deceased_month and deceased_age.

^ TOP 

 

5.         Handling of the Data

5.1 How can I merge the data?

To merge different modules and/or waves of the SHARE data on individual level mergeid is the key person identifier. mergeid is non-varying across waves. If the data are to be merged on household level one of the hhid`w’ (where `w’ stands for the respective wave) variables should be used as key identifier.

^ TOP 

5.2 Why does the case number of the coverscreen not match with the other modules?   

The coverscreen (cv_r) includes all members of the household – also ineligible and non-responding household members. The case number in the other regular CAPI modules is lower because they only include persons with interview. Household members without interview can be identified by the variable interview in the cv_r module. It takes the value 0 for those who did not do an interview.

^ TOP 

5.3 How can I identify Partners? 

In SHARE, partners can be identified by the mergeidp`w’ (where `w’ stands for the respective wave) which indicates the mergeid of a respondent’s partner. Each couple has a coupleid indicated by the variable coupleid`w’. The coupleid is generated using mergeid of both partners and is therefore unique to each couple as well as fix across waves if the couple stays the same.

^ TOP 

5.4 Why do some variables like education or height contain so many missing values?

The reason for this is that time constant variables are only asked in the baseline interview. The baseline interview is the first SHARE interview of each respondent. SHARE’s sample is refreshed from time to time in several countries, which is why the baseline interview is not necessarily the Wave 1 interview. 

Height is one example for such time constant variables. If users want to use these variables in later waves than the one in which the baseline interview took place, the information has to be transferred by first merging the waves together and then assigning the information to later waves. Furthermore, some questions for the longitudinal sample are only asked if there was a change since last interview, e.g. marital status. This also leads to a high amount of missing values in the respective variable.

^ TOP 

5.5 What are “unfolding brackets”?

When a respondent does not know (DK) or refuses (RF) the answer to a question about amounts of money, usually an unfolding sequence of bracket questions starts. The aim of unfolding brackets is to get at least a range in which e.g. the respondent´s income is located.

There are three entry points, the starting point is chosen randomly. The public release includes the country-specific bracket values (in Euros) and the final respondent´s category. When a DK or RF is given during the unfolding bracket sequence, the value for the final category is set to either DK or RF. The name of unfolding bracket variables contains “ub” after module identifier and question number (see question 5.6 for the general naming format). For more information on unfolding brackets see chapter 10.5 of SHARE Release Guide 6.0.0.

^ TOP 

5.6 What is the general naming format of variables?

The naming of variables is harmonized across waves. Variable names in the CAPI instrument data use the following format: mmXXXyyy_LL. “mm” is the module identifier, e.g. DN for the demographics module, “XXX” refers to the question number, e.g. 001, and “yyy” are optional digits for dummy variables (indicated by “d”), euro conversion (indicated by “e”) or unfolding brackets (indicated by “ub”). The separation character “_” is followed by “LL” optional digits for category or loop indication (“outer loop”).

^ TOP 

5.7 What is the ado-file sharetom good for?

The ado-file sharetom is a programme that recodes missing values and labels them appropriately. If users want to apply sharetom.ado we recommend executing it immediately after opening the data file or after merging the modules needed. Note that sharetom is updated from time to time. The current version is sharetom5.

^ TOP 

5.8 Is longitudinal analysis about respondents´ children possible?

For longitudinal analyses on children users cannot rely on the order of the children in the CH module. It is necessary to match them on gender and year of birth - this will lead to correct merges in most cases. There are a couple of reasons behind this. First, respondents are supposed to report on their children in a defined order, but they may not necessarily do so. Second, partners may change and respondents always are supposed to report on both partners´ children. Third, you can never exclude reporting errors. 

^ TOP 

5.9 Can the children of the CH module be linked to information on social support (SP) and financial transfers (FT)?

In Wave 4, children named by the respondents in the CH module cannot be linked directly to the SP and FT module. The reason is a change in the so-called list with relations that comprehends all persons of the respondents´ social environment. Information on persons receiving or providing social support or financial transfers from/to the respondents is based on this list. Unlike Waves 1 and 2 in which the list included up to 9 children, the list with relations in Wave 4 includes up to 7 social network members and just one 'other child' option. Only those children named by the respondents as members of their social network are explicitly listed for the interviewer on the screen. It is thus not possible to specify children for questions on social support or financial transfers who are not named as social network members (for whatever reason).

^ TOP

5.10 Why do the variables on whether natural parent is still alive (dn026_1 and dn026_2) contain so many missing values in Wave 4?

The questions DN026_1 and DN026_2 contain the information if the respondents´ natural parents are still alive (dn026_1 for the respondents´ mother and dn026_2 for the respondents´ father). The routing for these variables involves information from previous waves for respondents who already participated and information from the social network module. Similar to other variables in SHARE, the amount of missing values can be reduced by merging the wave 4 data with previous waves. Based on the assumption that persons belonging to the respondent´s social network are still alive, the proportion of missing values can be additionally reduced by using the sn005* variables. Unfortunately the routing for DN026_1 and DN026_2 did not work adequately for all respondents in the wave 4 questionnaire. Not every respondent who should have been asked was indeed asked, so still a high amount of missing values remains.

In order to compensate this shortcoming, all participants with missing information in DN026_1 and DN026_2 have received these questions in Wave 5. 

^ TOP

 

6.         Generated Variables

6.1 What is the purpose of generated variables and which generated variables are provided?

To assure an easy and fast entry into cross-national data and high convenience while working with the data, it is necessary that certain variables are readily provided for the SHARE users, especially those that allow a valid comparison between countries, like the International Standard Classification of Education (ISCED). Besides internationally standardized variables, there are further generated variables that ease or enhance working with the SHARE data. Table 4 gives an overview of all generated variable modules.

Table 4: Generated variable modules

 

Generated-Variable-Modules

Content

W1

W2

W3

W4

W5

W6

gv_allwaves_cv_r

Coverscreen information across waves

Cross-wave module

gv_linkage

Linkage to Statutory Pension Insurance data (Germany only)

Cross-wave module 

gv_exrates

Exchange rates for all waves, incl. nominal and ppp-adjusted exchange rates 

Cross-wave module 

gv_job_episodes_panel

 

Cross-wave module 

gv_health

Physical and mental health variables and indices like BMI, EURO-D depression scale, etc.

X

X

 

X

X

X

gv_isced

International Standard Classification of Education (ISCED-97/in wave 5 additionally ISCED-11)

X

X

 

X

X

X

gv_isco

Classification of occupations via ISCO and of industries via NACE codes

X

 

 

 

 

 

gv_housing

Housing and NUTS codes

X

X

 

X

X

X

gv_networks

Information on social networks

 

 

 

X

 

X

gv_deprivation

Indices for material and social deprivation

 

 

 

 

X

 

gv_ssw

Social security wealth

 

 

 

X

 

 

gv_children

Combined children information

 

 

 

 

 

X

gv_dbs

Dried Blood Spots

 

 

 

 

 

X

gv_weights

Cross-sectional sampling design weights and calibrated weights

X

X

X

X

X

X

gv_longitudinal_weights

Longitudinal weights

Cross-wave module 

gv_imputations

Imputations based on the fully conditional specification (FCS)

X

X

 

X

X

X

^ TOP

 

6.2 What is the allwaves-coverscreen good for?

This module is a dataset with merged and enriched information from all waves. In a straightforward way, gv_allwaves_cv_r allows to monitor household composition, changes of status (Is a respondent part of a couple or not? Is he or she dead or alive? etc.) and the type of interviews conducted.

^ TOP

 

6.3 Can SHARE data be linked to administrative data?

Upon respondents’ written consent, administrative data of the German Pension Fund can be linked to the survey data of the German subsample of SHARE. Beginning in Wave 3, all respondents of the German subsample are asked for consent to link their survey data with administrative data of the German Pension Fund. This longitudinal dataset includes very detailed information on respondents´ employment histories. The module gv_linkage provides first information about who gave consent to link their data with the pension fund. To get access to the administrative data, researchers have to submit an additional form, directly to the data center of the German Pension Fund. Further information on access conditions as well as user guide and codebook for SHARE-RV is available here

^ TOP

 

6.4 What is the content of gv_exrates?

This module contains currencies (also pre-Euro) and exchange rates for non-Euro countries. Additionally, the module stores nominal exchange rates as well as exchange rates that adjust for purchasing power parity (ppp-adjusted).

^ TOP

 

6.5 What is the Job Episodes Panel (JEP)?

The JEP is a generated dataset that rearranges information taken from Waves 1 to 3 of SHARE in order to create a ready-to-use “long panel”. It contains the labour market status of each SHARELIFE respondent throughout her/his life. A detailed description of the methodology and assumptions underlying the construction of the dataset is available in the SHARE working paper 11-2013: “Working life histories from SHARELIFE: a retrospective panel”, by Agar Brugiavini, Danilo Cavapozzi, Giacomo Pasini, and Elisabetta Trevisan. When publishing with the SHARE job episodes panel data please use an additional disclaimer as described in the corresponding documentation file (PDF) which is available when downloading the data.

^ TOP

 

6.6 Which generated health variables are provided?

The gv_health module contains a broad range of generated health variables and health related indices regarding the respondents´ physical and mental health status. The majority of the variables is comparable to the US Health and Retirement Study (HRS). Variables on physical health module are e.g. the US version of self-perceived health (sphus), the body mass index (bmi), the number of chronic diseases (chronic), an index on mobility (mobility) and limitations with instrumental activities of daily living (iadl). Variables on mental health are e.g. the EURO-D depression scale (eurod), a measurement of orientation to date (orienti) and a numeracy score for mathematical performance (numeracy). 

^ TOP

 

6.7 How is education measured in SHARE?

Education is one of the most diverse international variables. Therefore a standard coding is required for international comparisons. The gv_isced module contains the 1997 International Standard Classification of Education (ISCED-97). It is not only provided for respondents´ educational level but also for respondents´ children and former spouses´ as well as interviewers´ level of education (latter only in Wave 1). In Waves 1 and 2 the education of up to four selected children was asked. Wave 4 contains the ISCED-97 values for all children. In 2011, a revision of ISCED was adopted by UNESCO Member States. It takes into account significant changes in education systems worldwide since the last ISCED revision in 1997. From Wave 5 onwards both ISCED versions are provided in the SHARE data. Furthermore also the educational level of the respondents´ parents is included in Waves 5 and 6.

^ TOP

 

6.8 What information does the gv_isco module contain?

Respondents are asked for their own, their former partner’s and their parents’ occupation. For Wave 1 this information is coded based on the International Standard Classification of Occupations (ISCO-88) provided by the International Labour Organization (ILO). To classify the corresponding industries the gv_isco module additionally contains a version of the Statistical Classification of Economic Activities in the European Community (NACE, version 4 rev. 1 1993) which is slightly modified.

^ TOP 

6.9 What kind of geographical information is available in the gv_housing module?

SHARE provides the "Nomenclature of Territorial Units for Statistics" (NUTS) which is a hierarchical classification system for dividing up the economic territory of the EU. It is used to indicate in which territorial unit the SHARE households are located at the moment of sampling and is available in different levels:

•          NUTS 1: major socio-economic regions
•          NUTS 2: basic regions for the application of regional policies
•          NUTS 3: small regions for specific diagnoses.

Due to privacy legislation reasons not every NUTS level is available for every country. E.g. for Germany only NUTS 1 is provided.

^ TOP 

 

6.10 How are social networks captured in SHARE?

The CAPI module on social network (SN) was implemented in the fourth wave of SHARE as an innovative means to measure the personal social environment. The module was again part of wave 6 and is based on an approach that goes beyond the more common role-relational method of measuring social networks mostly based on socio-demographic proxies. The SN module contains a detailed description of respondents´ personal social networks. Each respondent can name a maximum of seven persons who she/he considers to be her/his confidants. The module records the role relationship of each social network member and obtains information regarding each named person's gender, residential proximity to the respondent, frequency of contact and level of emotional closeness. Information of the SN module can be linked to the social support (SP) and the financial transfers (FT) module.

The generated variables module “gv_networks” stores variables that summarize information on the different attributes of the network. In Wave 6, the variables additionally summarize panel information and provide full information on each social network member.

^ TOP

 

6.11 How is deprivation measured?

This module is available in wave 5 and contains three variables on material and social deprivation: depmat, depsoc and depsev. depmat is an aggregate measure of material conditions of older individuals in Europe using a set of 11 items that refer to two broad domains: the failure in the affordability of basic needs and financial difficulties. depsoc is an index for measuring social deprivation based on 15 items. depsev is a single two-dimensional indicator that identifies those with high levels of deprivation in each dimension. The threshold is the 75th percentile of the total distribution of each deprivation index. Individuals with deprivation measures placing them above the threshold in both dimensions are classified as being “severely deprived”.

^ TOP

 

6.12 Does SHARE contain a measure for social security wealth?

Since release 5.0.0, SHARE wave 4 includes a new generated module containing two measures of individual accrued social security wealth (SSW). The two variables are SSW_nw and SSW_gw respectively. The former is based on net wages earned by individuals during their working career. The latter is based on their approximately grossed-up wages, and additionally takes into account minimum pension benefits whenever the individual is entitled to that benefit. Note that since no information from the JEP was required to compute the SSW for retirees, the two variables SSW_nw and SSW_gw are equal for this group.

^ TOP

 

6.13 When do I need gv_grossnet?

In SHARE wave 1 income variables have been collected before taxes and social insurance contributions. In the following waves income variables have been gathered after taxes and social contributions, to capture the notion of take-home pay. To make the different income measures comparable across waves and to facilitate longitudinal analyses, the module gv_gross_net contains net income measures from reported gross incomes for SHARE wave 1. The instrument chosen to carry out this task is EUROMOD, the EU tax-benefit micro-simulation model.

A detailed description of the dataset and the method used is available in the SHARE Working Paper 25-2016.

^ TOP 

6.14 What information does gv_children contain?

Information on the respondents’ children is collected in various parts of the SHARE questionnaire. The variables in the gv_children module were generated in an attempt to make this information more easily accessible to SHARE users. The module combines information from the wave 6 CAPI modules CH, SN, SP and FT. Please be aware that the gv_children variables are an aggregate of information from within wave 6 but not of information from previous waves.

^ TOP

 

6.15 Which generated variables are stored in gv_dbs?

In addition to the CAPI variables included in the BS Module, some generated variables are already provided in gv_dbs. The most important one is dbs_values_exp (“Expected availability of laboratory results”). Results will only be available if (a) there is proof of written consent by the respondent, (b) the DBS sample is linkable to the CAPI interview via its barcode number, and (c) the DBS filter card contains enough blood material for at least one analysis. Given all these conditions are met, dbs_values_exp= 1. Further variables in gv_dbs are spots_nr (“Number of blood spots collected”), which ranges from 0 to 5, and spots_co (“Number of blood spots filling pre_printed circle”). The latter indicates how many of the blood spots contain the amount of blood covering the size of the pre-printed circle (1 cm in diameter) on the blood collection card.

^ TOP

 

7.         Weights

7.1 There are many SHARE papers where researchers don't use weights in data analyses. Is there a general strategy with this topic?

It is not easy to give a general strategy for this question. We refer the SHARE users to the recent paper by Solon, Haider and Wooldridge (2013). The authors distinguish between two types of empirical research: (i) research directed at estimating population descriptive statistics, and (ii) research directed at estimating causal effects (e.g. to achieve more precise estimates by correcting for heteroskedasticity, to achieve consistent estimates by correcting for endogenous sampling, and to identify average partial effects in the presence of unmodeled heterogeneity of effects). For the former, weighting is called for to make the analysis sample representative of the target population. The choice of using weighted sample statistics is intuitive and not controversial: population statistics can be consistently estimated by  weighted sample statistics. For the latter, the question of whether and how to weight is more nuanced. Researchers have to be clear about the reason for using weighted estimation, think carefully about whether the reason really applies, and double-check with appropriate diagnostics. In situations where researchers might be inclined to weight, it often is useful to report both weighted and unweighted estimates and to discuss what the contrast implies for the interpretation of the results. It is also advisable to use robust standard error estimates.

^ TOP 

7.2 Which weights should be used for cross-sectional analyses and which for longitudinal analyses?

SHARE provides calibrated cross-sectional and longitudinal weights. For cross-sectional analyses, the calibrated weight to be used depends on the basic sample unit of analysis. For example, in wave 4, this is the variable cciw_w4 if the basic sample unit is the individual and cchw_w4 if the basic sample unit is the household.

For longitudinal analyses, the calibrated weight to be used depends on both the wave combination of interest (i.e. the waves used to form the panel) and the basic sample unit of analysis. For example, for the fully balanced panel (wave combination 1-2-3-4-5-6), this is variable cliw_a if the basic sample unit is the individual, and clhw_a if the basic sample unit is the household.

For longitudinal analyses based on different wave combinations, users are required to compute their own calibrated weights. To support users in this nontrivial methodological task, we provide a Stata ado-file called `cweight.ado’ which implements the calibration procedure of Deville and Särndal (1992), and Stata do-files which illustrate step-by-step how to compute calibrated longitudinal weights at the individual and the household level. Further information is available here

^ TOP 

7.3 In section 8.7 of the Wave 4 Innovations & Methodology it is mentioned that a weighting do-file and a Stata command cweight.ado are available for SHARE users. Where can I find them?

The ado file cweight.ado as well as the other files provided for generating longitudinal calibrated weights can be downloaded from the regular SHARE download website (“Generate Calibrated Weights Using Stata 1.0.0”). Please note that we are currently working on an update of these files that will be available as soon as possible. 

^ TOP 

7.4 What is the difference between the weights across waves?

Sampling design weights may differ across waves because of changes in the national sampling designs. Calibrated cross-sectional and longitudinal weights are instead computed through the procedure of Deville and Särndal (1992) in all waves. The other main differences with respect to the previous waves are that: (i) we do not distinguish any more between alternative variants of the SHARE sample (i.e. main sample alone, vignette sample alone and the two samples combined); (ii) we do not provide any more calibrated cross sectional weights for non-responding partners because of substantive change in the imputation procedure used in wave 4; (iii) we do not provide any more calibrated longitudinal weights for all possible wave combinations of the panel.

^ TOP

 

7.5 Can we drop sample observations with missing weights?

Missing data in sampling design and calibrated weights may be due to (i) age-ineligibility (i.e. respondents younger than 50 years), (ii) missing sampling frame information, (iii) missing information on the set of calibration variables (age, gender, NUTS1 regional code), (iv) respondents not belonging to the selected balanced sample (only for calibrated longitudinal weights). Observations with missing weights due to (i) are not problematic if we want to make inference on the 50+ population. Since there are very few observations with missing weights due to (ii) and (iii), these observations can in general be dropped for substantive analysis of the SHARE data. Observations with missing longitudinal weights due to (iv) can be more problematic if the process generating missing observations is not missing-at-random (based on the chosen set of conditioning variables). Notice that, in order to compensate for attrition, users may exploit a larger set of conditioning variables by exploiting the information available from the starting wave. Alternative methods, such as weights based on the propensity score and sample selection models, could also be used to impose weaker assumption on the missing data mechanism associated with attrition.

^ TOP

 

8.         Imputations

8.1 What is the imputation method? 

Items can be imputed either sequentially by simple hot-deck method or jointly by the fully conditional specification method (FCS). Hot-deck imputations are carried out separately by country, while FCS imputations are carried out by country and sample type (singles and 3rd respondents, couples with both partners interviewed, and all couples - with and without non responding partners). For each wave and country, the FCS method is used only for the monetary variables that satisfy the requirement of having at least 100 donor observations in sample 1 (singles and 3rd respondents) and 150 donor observations in sample 2 (couples with both partners interviewed) and sample 3 (all couples - with and without non responding partners). Independently of the chosen imputation method, SHARE provides five multiple imputations of the missing values on each variable.    

^ TOP 

8.2 Can the imputed variables be used for longitudinal analysis?

Yes, but there could be various problems. First, users have to check if the variables of interest have been imputed in all waves of interest and if the underlying information is fully comparable across wave. Second, users should be aware that the imputation model does not include lagged variables from the previous wave as predictors of the missing values in the current wave. This implies that the imputation model could be less general than the model used to analyze the imputed longitudinal data (see Meng 1994 for a discussion of this uncongeniality issue). To address this issue, we plan to use a more general imputation model in the future releases of the SHARE data. 

^ TOP 

8.3 Why does the imputations module contain so many cases?

SHARE provides multiple imputations of the missing values so that users can account for the additional variability induced by the imputation process when assessing the precision of their estimators (see Rubin 1987). The method of multiple imputation implies that there are m>1 imputed values for each missing value. In SHARE, the number of multiple imputations is m=5. Thus, there are 5 independent imputations indexed by the variable implicat. Notice that the observations differ only with respect to the imputed values, but are identical with respect to the complete cases. Users who want to rely on single imputation methods (despite our warning of taking into account the variability induced by the imputation process) can select only one of the five available implicats. Since they are five independent draws from the estimated distribution of missing values, there is no specific reason to prefer one particular implicat to the others.

^ TOP 

8.4 Why are the imputed variables in Release 5.0.0 onwards different from the ones of previous Releases? 

As discussed in the documentation, there are differences in the basic raw data as well as important innovations in the imputation procedure. For the latter, there are major differences with respect to the imputation procedure adopted for previous release waves 1 and 2 data: (i) the way of dealing with the problem of non-responding partners, (ii) the use of a smaller set of aggregated variables, (iii) a lower number of predictors, and (iii) the use of two alternative measures of total household income. The aim of these changes is to have a more reliable imputation model, but it is difficult to assess the implications of all these differences in substantive analysis of the SHARE data.

^ TOP 

8.5 Why are there two variables for household income?

Since wave 2, SHARE collects data on two different definitions of total household income: thinc is the sum of individual imputed income for all household components, while thinc2 is the measure of total household income collected through the question HH017.
In our view, the choice between these two alternative measures is not obvious and therefore we let the users decide which of the two measures is more suitable for their research questions. Moreover, our imputation model exploits both measures. In that respect, we strongly encourage users to carry out sensitivity analysis on the two available measures. This may help us to understand which of the two measures can be considered more reliable on the basis of a scientific ground.

^ TOP 

8.6 Are monetary amounts in the imputation data set Euro converted?

Yes, all monetary variables are expressed in annual Euro. This implies that when adjusting for the purchasing power parity (PPP) for non-Euro countries monetary amounts have to be converted first into local currency using the variable exrate and then in ppp-adjusted amounts dividing the local currency amount by the PPP-exchange rate.

^ TOP