Newsletter
Subscribe to the SHARE newsletter by sending an e-mail to info@share-project.org





Intranet Login
Username
Password

Frequently Asked Questions

In order to search for keywords, please use your browsers search function (CTRL + F).

    8. Weights

 

1.  About SHARE

1.1 What is SHARE? 

SHARE - Survey of Health, Ageing and Retirement in Europe - is a multidisciplinary and cross-national ex-ante harmonized panel survey. It provides micro data on physical and psychological health, socio-economic status, demographic characteristics, and social and family networkssupport of about 123,000 individuals aged 50 or over and their (younger) partners. It is partly harmonized with the U.S. Health and Retirement Study (HRS) and the English Longitudinal Study of Ageing (ELSA) and has become a role model for several ageing surveys worldwide: KLoSA in South Korea, JSTAR in Japan, CHARLS in China, LASI in India and ELSI in Brasil. For a general overview on SHARE see the data resource profile published in the International Journal of Epidemiology.  

^ TOP 

1.2 When and where are the SHARE data collected?

To date, SHARE has collected more than 293,000 interviews in four panel waves on current living circumstances and one wave on retrospective life histories (SHARELIFE). The first wave was collected in 2004/2005, the second in 2006/2007, SHARELIFE in 2008/2009, the fourth wave mainly in 2011 and the fifth wave in 2013. Up to the present 20 countries participated in SHARE. However, not all countries were part of each wave and also the timing of data collection differs between countries. Table 1 gives information on the participation of countries and time of data collection for Wave 1 to Wave 5. 

Table 1: Participation of countries in SHARE Wave 1 to Wave 5

Country ID

Langauge ID

Country & language

Wave 1

Wave 2

Wave 3

SHARELIFE

Wave 4

Wave 5

11

11

Austria

2004

2006/07

2008/09

2011

2013

12

12

Germany

2004

2006/07

2008/09

2011/12

2013

13

13

Sweden

2004

2006/07

2008/09

2011

2013

14

14

Netherlands

2004

2007

2008/09

2011

2013

15

15

Spain (Castilian)

2004

2006/07

2008/09

2011

2013

15

39

Spain/Girona (Catalan)

-

-

-

-

2013

15

40

Spain/Girona (Castilian)

-

-

-

-

2013

16

16

Italy

2004

2006/07

2008/09

2011

2013

17

17

France

2004/05

2006/07

2009

2011

2013

18

18

Denmark

2004

2006/07

2008/09

2011

2013

19

19

Greece

2004/05

2007

2008/09

-

-

20

20

Switzerland (German)

2004

2006/07

2008/09

2011

2013

20

21

Switzerland (French)

2004

2006/07

2008/09

2011

2013

20

22

Switzerland (Italian)

2004

2006/07

2008/09

2011

2013

23

23

Belgium (French)

2004/05

2006/07

2008/09

2011

2013

23

24

Belgium (Flemish)

2004/05

2006/07

2008/09

2011

2013

25

25

Israel (Hebrew)

2005/06

2009/10

-

-

2013

25

26

Israel   (Arabic)

2005/06

2009/10

-

-

2013

25

27

Israel (Russian)

2005/06

2009/10

-

-

2013

28

28

Czech Republic

-

2006/07

2008/09

2011

2013

29

29

Poland

-

2006/07

2008/09

2011/12

-

30

30

Ireland

-

2007

2009/10/11

-

-

31

41

Luxembourg (French)

-

-

-

-

2013

31

42

Luxembourg (German)

-

-

-

-

2013

32

32

Hungary

-

-

-

2011

-

33

33

Portugal

-

-

-

2011

-

34

34

Slovenia

-

-

-

2011

2013

35

35

Estonia (Estonian or Russian)

-

-

-

2010/11

2013

(XT only)

35

36

Estonia (Estonian)

-

-

-

-

2013

35

37

Estonia (Russian)

-

-

-

-

2013


^ TOP 

1.3 How is SHARE organized?

SHARE is an enterprise by researchers for researchers. Many people are involved in organizing SHARE in various teams all over Europe and beyond. Main actors are the country teams, the area teams, teams providing weights and imputations, the programmers, and the central coordination team. SHARE is coordinated in Germany at the Munich Center for the Economics of Aging (MEA), Max Planck Institute for Social Law and Social Policy. The central coordination team can be contacted via info@share-project.org. Country teams play a crucial role, not only when it comes to tasks of SHARE for which knowledge of the language or other country specific issues are essential. The country teams and their leaders are listed here. Area coordinators are responsible for the central research fields of SHARE - economics, health, health care and social networks. A list of the coordinators and members of the research areas is provided here. Weights and imputations are managed by our respective expert teams in Italy. The programming of the instrument and data distribution is conducted by CentERdata, located at the University of Tilburg, Netherlands. The governance of the scientific work to build up SHARE involves three separate bodies: a legal entity called SHARE-ERIC (European Research Infrastructure Consortium), a research consortium formed by the scientists who carry out the scientific work in SHARE, and a Scientific Monitoring Board which is independent from the two other bodies and advises both SHARE-ERIC and the Research Consortium.

^ TOP 

1.4 How is SHARE funded? 

SHARE data collection has been primarily funded by the European Commission through the 5th, the 6th and the 7th framework programme as well as Horizon 2020. Additional funding from the German Ministry of Education and Research, the U.S. National Institute on Aging as well as from various national sources is gratefully acknowledged. For a full list of funding institutions see this page.

1.5 Where can I get information on ethics approval of SHARE?

Until July 2011, SHARE has been reviewed and approved by the Ethics Committee of the University of Mannheim. Since then, the Ethics Council of the Max-Planck-Society for the Advancement of Science (MPG) is responsible for ethical reviews and the approval of the study. Further information on ethics approvals of SHARE are provided upon individual request. Please address your inquiries concerning this matter to info@share-project.org.

^ TOP

 

1.6 What is SHARELIFE?

SHARELIFE is the third wave of data collection in SHARE, which focuses on respondent's life histories. SHARELIFE gathered more detailed information on important areas of the respondents’ lives, ranging from childhood conditions, the respondents´ partners and children, housing, financial and employment history to detailed questions on health and health care. SHARELIFE thus complements the SHARE panel data by providing life history information to enhance our understanding of how early life experiences and events throughout life influence the circumstances of older people. The SHARELIFE data can be linked to the regular SHARE waves. 

^ TOP 


1.7 What is easySHARE?

easySHARE is a simplified HRS-adapted dataset for student training, and for researchers who have little experience in quantitative analyses of complex survey data. easySHARE stores information of all respondents and of all currently released data collection waves in one single dataset. For the subset of variables covered in easySHARE, the complexity was considerably reduced. easySHARE is stored as long format panel dataset. In addition to the data and the release guide, the download zip files include the Stata programme that was used to extract easySHARE from the regular SHARE data. This allows users to retrace how each variable was extracted and modified and facilitates adding or changing information. It can also be used as an example of how to create an analysis dataset yourself. For more information please click here.  

^ TOP 

1.8 What is the job episodes panel (JEP)?

The JEP is a generated dataset that rearranges information taken from Waves 1 to 3 of SHARE in order to create a ready-to-use “long panel”. It contains the labour market status of each SHARELIFE respondent throughout her/his life. A detailed description of the methodology and assumptions underlying the construction of the dataset is available in the SHARE working paper 11-2013: “Working life histories from SHARELIFE: a retrospective panel”, by Agar Brugiavini, Danilo Cavapozzi, Giacomo Pasini, and Elisabetta Trevisan. When publishing with the SHARE job episodes panel data please use an additional disclaimer as described in the corresponding documentation file (PDF) which is available when downloading the data.

^ TOP

2. Data Access

2.1 Who can use the data?

After filling out a user statement (PDF), every person with scientific affiliation can download the data for free as long as the data are used for purely scientific purposes. More information on the conditions for data access is available here.  

^ TOP 

2.2 Where can I download the data?

SHARE data are distributed by CentERdata which is located at the Tilburg University campus in the Netherlands. The download procedure and the conditions for data access are described here

^ TOP 

2.3 Which data formats are available?

SHARE data are provided in Stata and SPSS format. easySHARE is additionally available for the software R. For the use with other statistical programmes the data have to be transferred by users themselves. 

^ TOP 

2.4 What can I do if I lost my username and/or password for downloading the data?

If you lost your username and/or password for downloading the data, please enter your email address here and you will receive a reminder email with your password. If you do not remember the email address you used for registration, please contact Josette Janssen: jjanssen@uvt.nl   

^ TOP

 

3. Documentation

3.1. Which documentation files do exist?

A set of documentation files is offered to facilitate the use of the SHARE data. The data resource profile published in the International Journal of Epidemiology provides a compact overview on SHARE. Additional to the wave- and country-specific questionnaires, release guide 5.0.0 (PDF) is specifically directed to researchers working with the data. Except for Wave 2 there are also wave-specific methodology volumes. Methodological changes in Wave 2 are shortly summarized in chapter 8 of the First Results Book (FRB) of Wave 2. Because of its divergence from the other waves, the SHARELIFE documentation is not based on the documentation files of previous waves. Furthermore, a tool for the identification of country-specific deviations is to date available for Waves 1 and 2. Also provided is a tool to overhaul deviations between Waves 1, 2, 4 (and 5). Table 2 contains the links to the essential documentation files of SHARE.  

Table 2: Documentation files

Documentation file

Wave 1

Wave 2

Wave 3 (SHARELIFE)

Wave 4

Wave 5

Data Resource Profile

Börsch-Supan A. et al. (2013): Data Resource Profile: The Survey of Health, Ageing and Retirement in Europe (SHARE), Int J of Epidemiology

Questionnaires

W1-Questionnaire

W2-Questionnaire

SHARELIFE-Questionnaire

W4-Questionnaire

W5-Questionnaire

Release Guides

  SHARE Release Guide 5.0.0 (PDF)

SHARELIFE Release Guide 5.0.0 (PDF)

SHARE
Release Guide 5.0.0 (PDF)

SHARE Release Guide 5.0.0 (PDF)

Methodology

 General Methodology

Chapter 8 of W2 FRB

SHARELIFE Methodology

 Wave 4 Innovations & Methodology

SHARE Wave 5: Innovations & Methodology

Deviations between countries

Country-specific deviations W1

Country-specific deviations W2

-

-

-

Deviations between waves

Deviation between waves 1, 2...

-

4

… and 5


^ TOP
 

3.2. Which types of questionnaires are used in SHARE?

SHARE applies a concept of ex-ante harmonization: there is one common generic questionnaire that is translated into the 27 national languages (in some countries more than one language is used) using an internet based translation tool and processed automatically in a common CAPI instrument. The generic questionnaire and the country-specific questionnaire versions can be downloaded from the SHARE website (see the links in Table 2). However, some internationally highly diverse variables require country-specific measurements and ex-post harmonization, for example in the areas of education (ISCED) or occupation (ISCO, NACE).

Apart from generic and country-specific questionnaires there are also special questionnaire types like the coverscreens, drop offs, vignettes and end-of-life questionnaires. The coverscreen is the first module of each interview. It collects basic demographic information about every person currently living in the household. The coverscreen questionnaire is completed by only one member of the household. The interview usually ends with the self-completion of a paper & pencil questionnaire, the so-called drop off (see question 5.5). Another special self-completion questionnaire is the vignettes questionnaire (see question 5.6) that was collected in Wave 1 and 2. Vignettes are supposed to improve cross-national comparability. If a respondent deceased between waves SHARE tries to conduct an end-of-life interview (see qestion 5.8) with a proxy respondent. The end-of-life questionnaire mainly contains information on life circumstances in the year before the respondent deceased and on the circumstances of death.  

^ TOP

3.3. Where can I find a codebook?

A codebook for all SHARE waves does not exist so far. At the moment the SHARE Central team in Munich together with the programmers from CentERdata are working on a web tool facilitating users to generate module- and wave-specific codebooks. As the tool is not available so far, the Institute for Research and Information in Health Economics (IRDES) offers a codebook for Wave 2 that is accessible here. In general, information on content, coding and routing can be derived from the SHARE questionnaires and release guides (see table 2). 

^ TOP

 

4. Methodology

4.1 How are the data collected?

SHARE data collection is based on computer-assisted personal interviewing (CAPI). The interviewers conduct face-to-face interviews using a laptop computer on which the CAPI instrument is installed. Personal interviews are necessary for SHARE because they make the execution of physical tests possible. Exceptions are the drop off and the vignettes questionnaires which are conducted via paper & pencil as well as the end-of-life interviews that can be conducted via CATI (computer-assisted telephone interview), too. For more details on SHARE data collection see the methodology of Börsch-Supan, A. and H. Jürges (2005).

4.2 Who is eligible?

The SHARE target population consists of all persons aged 50 years and over at the time of sampling who have their regular domicile in the respective SHARE country. A person is excluded if she or he is incarcerated, hospitalized or out of the country during the entire survey period, unable to speak the country’s language(s) or has moved to an unknown address. In Wave 1 all household members born 1954 or earlier are eligible for an interview. Starting in Wave 2, for new countries or refreshment samples, there is only one primary respondent per household. In addition - in all waves - current partners living in the same household are interviewed regardless of their age.

All SHARE respondents who were interviewed in any previous wave are part of the longitudinal sample. If they have a new partner living in the household, the new partner is eligible for an interview as well, regardless of age. Age eligible respondents who participated are traced and re-interviewed if they move within the country and end-of-life interviews are conducted if they decease. Younger partners, new partners and partners who never participated in SHARE will not be traced and are not eligible for an end-of-life interview.

^ TOP 

4.3 Why are there different types of respondents?

For time reasons the CAPI main questionnaire is designed in a way that not every eligible household member is asked every questionnaire module. Household respondents answer questions on housing, household income and consumption representative for all household members. On behalf of the couple, financial respondents answer financial transfer and asset questions and family respondents answer questions on children and social support – also on behalf of the couple. The respondent types are indicated by the variables hou_resp (household respondent), fin_resp (financial respondent) and fam_resp (family respondent) in the cv_r module as well as in the technical variables module. The SHARELIFE questionnaire does not differentiate between respondent types. Note that in Wave 4 and Wave 5 financial respondent and household respondent are factual the same person. 

^ TOP 

4.4 What are proxy-interviews?

If physical and/or cognitive limitations make it too difficult for a respondent to complete the interview her-/himself it is possible that the sample respondent is assisted by a so-called proxy respondent to complete the interview (“partly proxy” interview). If the proxy respondent answers the entire questionnaire in lieu of the respondent, the interview is referred to as a “fully proxy” interview. Examples of conditions under which proxy interviewing is allowed are hearing loss, speaking problems, Alzheimer´s disease and difficulties in concentrating for the whole interview time period. Proxy respondents are also asked for end-of-life interviews in case of a respondent´s decease. Some questionnaire modules are defined as non-proxy sections because those cannot be answered by other persons. Cognitive functioning, mental health (partly), grip strength, walking speed, activities, and expectations modules are non-proxy sections. The other sections contain the information on who answered the section at the end of the respective questionnaire module: (1) respondent only, (2) respondent and proxy or (3) proxy only. 

^ TOP 

4.5 How are issues of attrition dealt with?

Sample attrition means that respondents drop out from the survey over time. For a longitudinal sample which was drawn randomly at the beginning of the data collection process, sample attrition would not pose any challenges if the attrition occurs randomly – which is not the case in reality. Besides refreshing the sample in several countries (which is also dependent on funding) the strategy of SHARE to deal with problems of sample attrition is to dedicate special effort into re-interviewing respondents who participated in previous waves and to provide calibrated weights. Under certain conditions, these weights may help to reduce the potential selectivity bias generated by sample attrition and unit nonresponse.

^ TOP 

4.6 Is there a data set that links administrative data and the SHARE data?

Survey data can cover a wide range of topics. However, a survey cannot cover all topics of interest and  information provided by respondents could be incomplete or inaccurate. Administrative data is more accurate but usually limited to a certain topic. Linking survey data with administrative data is a way to combine the best of both worlds. Upon respondents’ written consent, administrative data of the German Pension Fund can be linked to the survey data of the German subsample of SHARE (SHARE-RV). For more information on this project see question 7.7.

^ TOP

 

5. Structure and Content

5.1 What information does the SHARE questionnaire contain?

The SHARE interview consists of various thematic blocks or modules. Prior to the main interview, the coverscreen (cv_r module) is completed by one household member on behalf of the household. The main questionnaire is based on various different CAPI modules that are listed in table 3. To pick up contemporary issues and due to alterations and time constraints not every module was part of every wave. 

Table 3: Questionnaire modules of Waves 1, 2, 4 and 5

 

Questionnaire-Modules

Wave 1

Wave 2

Wave 4

Wave 5

CV_R

Coverscreen on individual level


X


X


X


X

DN

Demographics and Networks


X


X


X


X

SN

Social Networks

 

 

X

 

CH

Children

X

X

X

X

PH

Physical Health

X

X

X

X

BR

Behavioral Risks

X

X

X

X

CF

Cognitive Function

X

X

X

X

MH

Mental Health

X

X

X

X

HC

Health Care

X

X

X

X

EP

Employment and Pensions

X

X

X

X

IT

IT Module

 

 

 

X

MC

Mini Childhood

 

 

 

X

GS

Grip Strength

X

X

X

X

WS

Walking Speed

X

X

 

 

CS

Chair Stand

 

X

 

X

PF

Peak Flow

 

X

X

 

SP

Social Support

X

X

X

X

FT

Financial Transfers

X

X

X

X

HO

Housing

X

X

X

X

HH

Household Income

X

X

X

X

CO

Consumption

 X

 X

X

AS

Assets

X

X

X

X

AC

Activities

X

X

X

X

EX

Expectations

X

X

X

X

IV

Interviewer Observations

 X

X

X

XT

End-of-Life Interview

 

X

X

X


^ TOP

5.2 What is the content of the SHARELIFE?

The SHARELIFE questionnaire has a different focus than the regular waves. It contains all important areas of the respondents’ live histories, ranging from childhood conditions, partners and children over housing and financial history and employment history to detailed questions on health and health care. Table 4 indicates the questionnaire modules of SHARELIFE. Additionally some single questions on household income (HH) and present physical health (PH) are included. 

Table 4: Questionnaire modules of SHARELIFE

CV_R

Coverscreen on individual level

ST

Demographics

RC

Retrospective Children

RP

Retrospective Partner

AC

Accommodation Section

CS

Childhood Section

RE

Retrospective Employment

WQ

Work Quality

DQ

Disability

FS

Financial History Section

HS

Health Section

HC

Health Care

GL

General Life

GS

Grip Strength

IV

Interviewer Observations

XT

End-of-Life Interview


^ TOP 
 

5.3 Does SHARE contain information on race/ethnicity?

SHARE only contains the respondents´ country of birth (dn004_ + dn005c) and the respondents´ citizenship (dn007_ + dn008c), both available in the demographics module. From Wave 5 onwards SHARE data also includes the country of birth of the respondents´ parents.  

^ TOP 

5.4 What kind of information is provided by the interviewer observation module (IV)?

This module is answered by the interviewer right after finishing the interview. It contains information on the interviewing experience which is important in order to understand the circumstances under which the interview was conducted.

^ TOP 

5.5 What is a “drop off” questionnaire?

In Waves 1, 2, 4 and 5 the interview ends with the self-completion of a paper & pencil questionnaire. This questionnaire includes additional questions on e.g. mental and physical health, health care and social networks. Partly the content of the drop off questionnaire is country-specific. Especially the Wave 4 drop off questionnaire contains many country-specific questions aside from a generic part on health and health care. In Wave 5 only three countries conducted a drop-off questionnaire: Austria, the Czech Republic and Israel. The generic variables have variable names starting with “q”, country-specific variables contain the country code as prefix, e.g. “at_” for Austria. The drop-offs differ across waves. This is due to new questions added and questions that are not asked anymore. In addition some questions of the Wave 1 drop off are asked in the CAPI in Wave 2 (see appendix of SHARE Release Guide 5.0.0).  

^ TOP 

5.6 What are “vignettes”?

For the vignettes extra samples were taken in eight countries in Wave 1 (BE, DE, FR, GR, IT, NL, SP, SW) and in eleven countries in Wave 2 (BE, CZ, DK, DE, FR, GR, IT, NL, PL, SP, SW) in order to collect a special self-completion questionnaire with anchoring vignette questions. These are supposed to improve cross-national comparability. Two types were randomly assigned to the respondents. They differ with regard to question order and gender of the people described in the statements. The variable “type” contains information on the vignette type. The labels of the variables show which questions correspond to the other type.

^ TOP 

5.7 What kind of physical measurements are included in SHARE? 

Physical measurements and biomarkers are part of SHARE since there is promising scientific value to it. Standard health questions are often subject to the respondents´ evaluation or perception. Objective measurements can help (1) to validate respondents´ self-reports, (2) to understand the complex relationships between social status and health and their physiological pathways and (3) to identify pre-disease pathways. SHARE combines self-reports on health with four physical performance measurements: grip strength (GS), walking speed (WS), peak-flow (PF) and chair stand (CS).

^ TOP 

5.8 How is mortality assessed?

SHARE requests that the interviewers confirm the decease of a respondent by a proxy-respondent. In case of decease, interviewers try to conduct an end-of-life interview with a proxy-respondent. The proxy-respondent can be a family member, a household member, a neighbor or any other person of the closer social network of the deceased respondent. The end-of-life interview mainly contains information on the circumstances of death like time and cause of death. The variables are stored in the xt-module of Wave 2 onwards. Apart from the end-of-life interview, the gv_allwaves_cv_r  module contains the variables deadoralive, deceased_year, deceased_month and deceased_age.  

^ TOP 

 

6. Handling of the Data

6.1 How can I merge the data?

To merge different modules and/or waves of the SHARE data on individual level mergeid is the key person identifier. If the data are to be merged on household level one of the hhid(#) variables should be used as key identifier. Both mergeid and hhid are non-varying across waves. The additional wave-specific household identifiers hhid# (# equals the corresponding wave) also denote household splits. 

^ TOP 

6.2 How can non-responding household members be identified?  

Non-responding household members can be identified by the variable interview in the cv_r module. It takes the value 0 for those who did not do an interview.

^ TOP 

6.3 How can I identify partners? 

In SHARE, partners can be identified by the variable mergeidp which is the mergeid of the partner. Couples can be identified by the variable coupleid#. This couple identifier remains constant across waves if the couple stays together. However, if partners change, coupleid# changes as well.

6.4 Why do some variables like education or height contain so many missing values?

The reason for this is that time constant variables are only asked in the baseline interview. The baseline interview is the first SHARE interview of each respondent. SHARE’s sample is refreshed from time to time in several countries, which is why the baseline interview is not necessarily the Wave 1 interview.

Height and education (considering that SHARE participants are aged mainly 50 plus) are examples for such time constant variables. If users want to use these variables in later waves than the one in which the baseline interview took place, the information has to be transferred by first merging the waves together and then assigning the information to later waves. Furthermore, some questions for the longitudinal sample are only asked if there was a change since last interview, e.g. marital status. This also leads to a high amount of missing values in the respective variable. 

^ TOP 

6.5 What are “unfolding brackets”?

When a respondent does not know (DK) or refuses (RF) the answer to a question about amounts of money, usually an unfolding sequence of bracket questions starts. The aim of unfolding brackets is to get at least a range in which e.g. the respondent´s income is located. 

There are three entry points, the starting point is chosen randomly. The scientific release includes the country-specific bracket values (in Euros) and the final respondent´s category. When a DK or RF is given during the unfolding bracket sequence, the value for the final category is set to either DK or RF. The name of unfolding bracket variables contains “ub” after module identifier and question number (see question 6.6 for the general naming format). For more information on unfolding brackets see chapter 11.5 of SHARE Release Guide 5.0.0

^ TOP 

6.6 What is the general naming format of variables?

The naming of variables is harmonized across waves. Variable names in the CAPI instrument data use the following format: mmXXXyyy_LL. “mm” is the module identifier, e.g. DN for the demographics module, “XXX” refers to the question number, e.g. 001, and “yyy” are optional digits for dummy variables (indicated by “d”), euro conversion (indicated by “e”) or unfolding brackets (indicated by “ub”). The separation character “_” is followed by “LL” optional digits for category or loop indication (“outer loop”).

^ TOP 

6.7 What is the ado-file sharetom good for?

The ado-file sharetom is a programme that recodes missing values and labels them appropriately. If users want to apply sharetom.ado we recommend executing it immediately after opening the data file or after merging the modules needed. Note that sharetom is updated from time to time. The current version is sharetom5.

^ TOP 

6.8 Is longitudinal analysis about respondents´ children possible?

For longitudinal analyses on children users cannot rely on the order of the children in the CH module. It is necessary to match them on gender and year of birth - this will lead to correct merges in most cases. There are a couple of reasons behind this. First, respondents are supposed to report on their children in a defined order, but they may not necessarily do so. Second, partners may change and respondents always are supposed to report on both partners´ children. Third, you can never exclude reporting errors.  

^ TOP 

6.9 Can the children of the CH module be linked to information on social support (SP) and financial transfers (FT)?

In Wave 4 children named by the respondents in the CH module cannot be linked directly to the SP and FT module. The reason is a change in the so-called list with relations that comprehends all persons of the respondents´ social environment. Information on persons receiving or providing social support or financial transfers from/to the respondents is based on this list. Unlike Waves 1 and 2 in which the list included up to 9 children, the list with relations in Wave 4 includes up to 7 social network members and just one 'other child' option. Only those children named by the respondents as members of their social network are explicitly listed for the interviewer on the screen. It is thus not possible to specify children for questions on social support or financial transfers who are not named as social network members (for whatever reason).

^ TOP

6.10 Why do the variables on whether natural parent is still alive (dn026_1 and dn026_2) contain so many missing values in Wave 4?

The questions DN026_1 and DN026_2 contain the information if the respondents´ natural parents are still alive (dn026_1 for the respondents´ mother and dn026_2 for the respondents´ father). The routing for these variables involves information from previous waves for respondents who already participated and information from the social network module. Similar to other variables in SHARE, the amount of missing values can be reduced by merging the wave 4 data with previous waves. Based on the assumption that persons belonging to the respondent´s social network are still alive, the proportion of missing values can be additionally reduced by using the sn005* variables. Unfortunately the routing for DN026_1 and DN026_2 did not work adequately for all respondents in the Wave 4 questionnaire. Not every respondent who should have been asked was indeed asked, so still a high amount of missing values remains.

In order to compensate this shortcoming, all participants with missing information in DN026_1 and DN026_2 have received these questions in Wave 5.

^ TOP

7. Generated Variables

7.1 What is the purpose of generated variables and which generated variables are provided?

To assure an easy and fast entry into cross-national data and high convenience while working with the data, it is necessary that certain variables are readily provided for the SHARE users, especially those that allow a valid comparison between countries, like the International Standard Classification of Education (ISCED). Besides internationally standardized variables, there are further generated variables that ease or enhance working with the SHARE data. Table 5 gives an overview on all generated variable modules. 

Table 5: Generated variable modules 

Generated-Variable-Modules

Content

W1

W2

W3

W4

W5

gv_allwaves_cv_r

Coverscreen information across waves

Cross-wave module

gv_linkage

Linkage to Statutory Pension Insurance data (Germany only)

Cross-wave module 

gv_exrates

Exchange rates for all waves, incl. nominal and ppp-adjusted exchange rates 

Cross-wave module 

gv_job_episodes_panel

 

Cross-wave module 

gv_health

Physical and mental health variables and indices like BMI, EURO-D depression scale, etc.

X

X

 

X

X

gv_isced

International Standard Classification of Education (ISCED-97/in wave 5 additionally ISCED-11)

X

X

 

X

X

gv_isco

Classification of occupations via ISCO and of industries via NACE codes

X

 

 

 

 

gv_housing

Housing and NUTS codes

X

X

 

X

X

gv_networks

Information on social networks

 

 

 

X

 

gv_deprivation

Indices for material and social deprivation

 

 

 

 

X

gv_ssw

Social security wealth

 

 

 

X

 

gv_weights

Cross-sectional sampling design weights and calibrated weights

X

X

X

X

X

gv_longitudinal_weights

Longitudinal weights

Cross-wave module 

gv_imputations

Imputations based on the fully conditional specification (FCS)

X

X

 

X

X

 

 

^ TOP

7.2 Which generated health variables are provided?

The gv_health module contains a broad range of generated health variables and health related indices regarding the respondents´ physical and mental health status. The majority of the variables is comparable to the US Health and Retirement Study (HRS). Variables on physical health module are e.g. the US version of self-perceived health (sphus), the body mass index (bmi), the number of chronic diseases (chronic), an index on mobility (mobility) and limitations with instrumental activities of daily living (iadl). Variables on mental health are e.g. the EURO-D depression scale (eurod), a measurement of orientation to date (orienti) and a numeracy score for mathematical performance (numeracy).     

 

^ TOP 

7.3 What kind of geographical information is available in the gv_housing module?

SHARE provides the "Nomenclature of Territorial Units for Statistics" (NUTS) which is a hierarchical classification system for dividing up the economic territory of the EU. It is used to indicate in which territorial unit the SHARE households are located at the moment of sampling and is available in different levels:

NUTS 1: major socio-economic regions 

NUTS 2: basic regions for the application of regional policies 

NUTS 3: small regions for specific diagnoses. 

Due to privacy legislation reasons not every NUTS level is available for every country. E.g. for Germany only NUTS 1 is provided.

 

^ TOP 

7.4 How is education measured in SHARE?

Education is one of the most diverse international variables. Therefore a standard coding is required for international comparisons. The gv_isced module contains the 1997 International Standard Classification of Education (ISCED-97). It is not only provided for respondents´ educational level but also for respondents´ children and former spouses´ as well as interviewers´ level of education (latter only in Wave 1). In Waves 1 and 2 the education of up to four selected children was asked. Wave 4 contains the ISCED-97 values for all children. In 2011, a revision of ISCED was adopted by UNESCO Member States. It takes into account significant changes in education systems worldwide since the last ISCED revision in 1997. From Wave 5 onwards both ISCED versions are provided in the SHARE data. Furthermore also the educational level of the respondents´ parents is included in Wave 5. 

 

^ TOP 

7.5 What information does the gv_isco module contain?

Respondents are asked for their own, their former partner’s and their parents’ occupation. For Wave 1 this information is coded based on the International Standard Classification of Occupations (ISCO-88) provided by the International Labour Organization (ILO). To classify the corresponding industries the gv_isco module additionally contains a version of the Statistical Classification of Economic Activities in the European Community (NACE, version 4 rev. 1 1993) which is slightly modified.

 

^ TOP 

7.6 How are social networks captured in SHARE?

The CAPI module on social network (SN) was implemented in the fourth wave of SHARE as an innovative means to measure the personal social environment. The module is based on an approach that goes beyond the more common role-relational method of measuring social networks mostly based on socio-demographic proxies. The SN module contains a detailed description of respondents´ personal social networks. Each respondent can name a maximum of seven persons who she/he considers to be her/his confidants. The module records the role relationship of each social network member and obtains information regarding each named person's gender, residential proximity to the respondent, frequency of contact and level of emotional closeness. Information of the SN module can be linked to the social support (SP) and the financial transfers (FT) module.

The gv_networks module stores a total of 96 generated variables. Based on a list of social network members that is generated in the SN questionnaire module, the SN module is closely linked to the children (CH), the social support (SP), the financial transfers (FT) and the demographics (DN) module. gv_networks combines information that is gathered in these modules. More details on the capture of social networks in SHARE are provided in chapter 14.1 of the SHARE Release Guide 5.0.0.

 

^ TOP 

7.7 What is the gv_linkage module?

Upon respondents’ written consent, administrative data of the German Pension Fund can be linked to the survey data of the German subsample of SHARE. Beginning in Wave 3, all respondents of the German subsample are asked for consent to link their survey data with administrative data of the German Pension Fund. This longitudinal dataset includes very detailed information on respondents´ employment histories. The module gv_linkage provides first information about who gave consent to link their data with the pension fund. To get access to the administrative data, researchers have to submit an additional form, directly to the data center of the German Pension Fund. Further information on access conditions as well as user guide and codebook for SHARE-RV is available here.  

 

^ TOP 

8. Weights

8.1 There are many SHARE papers where researchers don't use weights in data analyses. Is there a general strategy with this topic?

It is not easy to give a general strategy for this question. We refer the SHARE users to the recent paper by Solon, Haider and Wooldridge (2013). The authors distinguish between two types of empirical research: (i) research directed at estimating population descriptive statistics, and (ii) research directed at estimating causal effects (e.g. to achieve more precise estimates by correcting for heteroskedasticity, to achieve consistent estimates by correcting for endogenous sampling, and to identify average partial effects in the presence of unmodeled heterogeneity of effects). For the former, weighting is called for to make the analysis sample representative of the target population. The choice of using weighted sample statistics is intuitive and not controversial: population statistics can be consistently estimated by  weighted sample statistics. For the latter, the question of whether and how to weight is more nuanced. Researchers have to be clear about the reason for using weighted estimation, think carefully about whether the reason really applies, and double-check with appropriate diagnostics. In situations where researchers might be inclined to weight, it often is useful to report both weighted and unweighted estimates and to discuss what the contrast implies for the interpretation of the results. It is also advisable to use robust standard error estimates.

 

^ TOP 

8.2 Which weights should be used for cross-sectional analyses and which for longitudinal analyses?

SHARE provides calibrated cross-sectional and longitudinal weights. For cross-sectional analyses, the calibrated weight to be used depends on the basic sample unit of analysis. For example, in wave 4, this is the variable cciw_w4 if the basic sample unit is the individual and cchw_w4 if the basic sample unit is the household.  For longitudinal analyses, the calibrated weight to be used depends on both the wave combination of interest (i.e. the waves used to form the panel) and the basic sample unit of analysis. For example, for the fully balanced panel (wave combination 1-2-3-4-5), this is variable cliw_a if the basic sample unit is the individual, and clhw_a if the basic sample unit is the household.  For longitudinal analyses based on different wave combinations, users are required to compute their own calibrated weights. To support users in this nontrivial methodological task, we provide a Stata ado-file called `cweight.ado’ which implements the calibration procedure of Deville and Särndal (1992), and Stata do-files which illustrate step-by-step how to compute calibrated longitudinal weights at the individual and the household level. Further information is available here.

 

^ TOP 

8.3 In section 8.7 of the Wave 4 Innovations & Methodology it is mentioned that a weighting do-file and a Stata command cweight.ado are available for SHARE users. Where can I find them?

The ado file cweight.ado as well as the other files provided for generating longitudinal calibrated weights can be downloaded from the regular SHARE download website (“Generate Calibrated Weights Using Stata 1.0.0”). Please note that we are currently working on an update of these files that will be available as soon as possible.  

 

^ TOP 

8.4 What is the difference between the weights across waves?

Sampling design weights may differ across waves because of changes in the national sampling designs. Calibrated cross-sectional and longitudinal weights are instead computed through the procedure of Deville and Särndal (1992) in all waves. The other main differences with respect to the previous waves are that: (i) we do not distinguish any more between alternative variants of the SHARE sample (i.e. main sample alone, vignette sample alone and the two samples combined); (ii) we do not provide any more calibrated cross sectional weights for non-responding partners because of substantive change in the imputation procedure used in Wave 4; (iii) we do not provide any more calibrated longitudinal weights for all possible wave combinations of the panel.

 

^ TOP

 

8.5 Can we drop sample observations with missing weights?

Missing data in sampling design and calibrated weights may be due to (i) age-ineligibility (i.e. respondents younger than 50 years), (ii) missing sampling frame information, (iii) missing information on the set of calibration variables (age, gender, NUTS1 regional code), (iv) respondents not belonging to the selected balanced sample (only for calibrated longitudinal weights). Observations with missing weights due to (i) are not problematic if we want to make inference on the 50+ population. Since there are very few observations with missing weights due to (ii) and (iii), these observations can in general be dropped for substantive analysis of the SHARE data. Observations with missing longitudinal weights due to (iv) can be more problematic if the process generating missing observations is not missing-at-random (based on the chosen set of conditioning variables). Notice that, in order to compensate for attrition, users may exploit a larger set of conditioning variables by exploiting the information available from the starting wave. Alternative methods, such as weights based on the propensity score and sample selection models, could also be used to impose weaker assumption on the missing data mechanism associated with attrition.

 

^ TOP

9. Imputations

9.1 What is the imputation method? 

Items can be imputed either sequentially by simple hot-deck method or jointly by the fully conditional specification method (FCS). Hot-deck imputations are carried out separately by country, while FCS imputations are carried out by country and sample type (singles and 3rd respondents, couples with both partners interviewed, and all couples - with and without non responding partners). For each wave and country, the FCS method is used only for the monetary variables that satisfy the requirement of having at least 100 donor observations in sample 1 (singles and 3rd respondents) and 150 donor observations in sample 2 (couples with both partners interviewed) and sample 3 (all couples - with and without non responding partners). Independently of the chosen imputation method, SHARE provides five multiple imputations of the missing values on each variable.

 

^ TOP 

9.2 Can the imputed variables be used for longitudinal analysis?

Yes, but there could be various problems. First, users have to check if the variables of interest have been imputed in all waves of interest and if the underlying information is fully comparable across wave. Second, users should be aware that the imputation model does not include lagged variables from the previous wave as predictors of the missing values in the current wave. This implies that the imputation model could be less general than the model used to analyze the imputed longitudinal data (see Meng 1994 for a discussion of this uncongeniality issue). To address this issue, we plan to use a more general imputation model in the future releases of the SHARE data.

 

^ TOP 

9.3 Why does the imputations module contain so many cases?

SHARE provides multiple imputations of the missing values so that users can account for the additional variability induced by the imputation process when assessing the precision of their estimators (see Rubin 1987). The method of multiple imputation implies that there are m>1 imputed values for each missing value. In SHARE, the number of multiple imputations is m=5. Thus, there are 5 independent imputations indexed by the variable implicat. Notice that the observations differ only with respect to the imputed values, but are identical with respect to the complete cases. Users who want to rely on single imputation methods (despite our warning of taking into account the variability induced by the imputation process) can select only one of the five available implicats. Since they are five independent draws from the estimated distribution of missing values, there is no specific reason to prefer one particular implicat to the others.

 

^ TOP 

9.4 Why are the imputed variables in Release 5.0.0 onwards different from the ones of previous releases? 

As discussed in the documentation, there are differences in the basic raw data as well as important innovations in the imputation procedure. For the latter, there are major differences with respect to the imputation procedure adopted for previous release waves 1 and 2 data: (i) the way of dealing with the problem of non-responding partners, (ii) the use of a smaller set of aggregated variables, (iii) a lower number of predictors, and (iii) the use of two alternative measures of total household income. The aim of these changes is to have a more reliable imputation model, but it is difficult to assess the implications of all these differences in substantive analysis of the SHARE data.

 

^ TOP 

9.5 Why are there two variables for household income?

Since Wave 2, SHARE collects data on two different definitions of total household income: thinc is the sum of individual imputed income for all household components, while thinc2 is the measure of total household income collected through the question HH017.  In our view, the choice between these two alternative measures is not obvious and therefore we let the users decide which of the two measures is more suitable for their research questions. Moreover, our imputation model exploits both measures (see Section 2.4). In that respect, we strongly encourage users to carry out sensitivity analysis on the two available measures. This may help us to understand which of the two measures can be considered more reliable on the basis of a scientific ground.

 

^ TOP 

9.6 Are monetary amounts in the imputation data set Euro converted?

Yes, all monetary variables are expressed in annual Euro. This implies that when adjusting for the purchasing power parity (PPP) for non-Euro countries monetary amounts have to be converted first into local currency using the variable exrate and then in ppp-adjusted amounts dividing the local currency amount by the PPP-exchange rate.

 

^ TOP