FAQ

1. What does SHARE mean?

SHARE - Survey of Health, Ageing and Retirement in Europe - is a multidisciplinary and cross-national panel database. It provides micro data on health, socio-economic status and social and family networks of more than 45,000 individuals aged 50 or over.

2. How can I access the data?

  • Who can use the data?
  • Where can I download the data?
  • Which data formats are available?

2.1 Who can use the data?

The data collected in the 2004 SHARE baseline study are available for scientific use, free of charge.

2.2 Where can I download the data?

Click on "Access the data" on the SHARE website. There you find information about the conditions for data usage and hyperlinks to documents that you have to sign in order to accept the conditions. After signing the document, please send it to Josette Janssen at CentERdata (Tilburg, NL). CentERdata is responsible for the technical implementation of the CAPI questionnaire and collects the data centrally. CentERdata will provide you with a login code and a password.

2.3 Which data formats are available?

The data is available in STATA and SPSS format.

3. Which countries participate in the survey?

2004 Baseline Study (Wave 1): Eleven countries contribute data to SHARE - Austria, Belgium, Denmark, France, Germany, Greece, Italy, The Netherlands, Spain, Sweden and Switzerland.

2005-06: Further data are collected in Israel.

2006-07 Wave 2: The Czech Republic, Ireland and Poland join SHARE.

2008-09 SHARELIFE (Wave 3): SHARELIFE collects detailed retrospective life-histories.

2010-2011 Wave 4: SHARE aims to include all EU member countries. For the fourth wave Estonia, Hungary, Luxemburg and Portugal are scheduled to participate in data collection.

4. How are the data collected?

SHARE data collection is based on a computer-assisted personal interview (CAPI) and a self-completion paper & pencil questionnaire (the “drop off” (question 8.6). In the first and the second wave the so called “vignettes” (question 8.7) were also used.

5. What kind of samples are available?

Samples are full probability samples. In most countries we have two separate samples: the main and the vignette sample. Respondents in the main sample complete a CAPI interview and a “drop off” (question 8.6). In the vignettes sample, the respondents answer the CAPI instrument and a “vignette” (question 8.7). To which sample a household belongs can be seen from the variable named “samptype” in the weights file (for details see: SHARE_Release_Guide_2.3.1).

6. Who is eligible?

Eligible  are all household members aged 50 and over, plus their spouses, independent of their age.

7. Where are the questionnaires available?

You can download the generic questionnaire and the country-specific versions from the SHARE website. The generic questionnaire serves as a model for the translation of the country-specific questionnaires. However, some internationally highly diverse variables such as education require country-specific measurement. Documentation of the country-specific deviations is also available.

There are different questionnaires (main questionnaire, “drop off” (question 8.6), “vignettes” (question 8.7). For a description of the different questionnaire types see question 8 on data modules.

8. Which data modules exist in SHARE?

There are three types of data modules in SHARE:

1. the main questionnaire (CAPI),
2. Drop off questionnaire and vignettes (self-completion paper & pencil) and
3. generated variables

8.1 What kind of information is contained in the main questionnaire?

There are 21 different CAPI-modules.

Table 1: Data modules and different types of respondents

Order: chronological order of the CAPI questionnaire

8.2 What does "coverscreen" mean?

The interview starts with the coverscreen, which is completed by one member of the household only. The coverscreen collects basic demographic information about everyone currently living in the household (gender, month and year of birth, relationship to coverscreen informant, marital status), no matter whether this person is eligible or not. There are two coverscreens: one on the household level (filename: CV_H) and one on the individual level (filename: CV_R). There is one coverscreen for each household (CV_H) and one CV_R file for each individual in that household.

8.3 What is “grip strength” (GS)?

This type of physical measurement involves recording the respondent’s maximum handgrip strength with the aid of a dynamometer.

8.4 What is “walking speed” (WS)?

Walking speed is a type of physical measurement. The respondent is asked to walk a certain distance. The time it takes the respondent to complete this activity is measured.

8.5 What kind of information is contained in the interviewer observations module?

This module concerns the interviewing experience and is answered by the interviewer after the interview. These questions are important in order to understand the circumstances under which the interview was conducted.

8.6 What is a “drop off” questionnaire?

In the main sample, the interview ends with the self-completion of a paper & pencil questionnaire. This questionnaire includes additional questions about issues like mental and physical health, health care and social networks. The Israeli drop off includes additional questions that are not asked in other countries. These variables have the prefix “il”. The numeration of these questions also differs.

8.7 What are “vignettes”?

Extra samples (“vignettes sample”) were taken in eight countries (Belgium, France, Germany, Greece, Italy, The Netherlands, Spain and Sweden) in order to collect a special self-completion questionnaire with anchoring vignette questions. These are supposed to improve cross-national comparability. Two types were randomly assigned to the respondents: type A and type B. They differ with regard to question order and gender of the people described in the statements. The variable “type” contains information on the vignette type. The variable label shows which type B questions correspond to type A.

8.8 What kind of generated variables are available?

The SHARE dataset contains various generated variables that are documented in SHARE1rel2-0-1_GV_generated variables.pdf. The generated variables are related to education (ISCED-97 coding), health, housing & regional information (NUTS), occupations and industries (ISCO88 and NACE coding) and social support & household composition. Additionally, SHARE provides weights (for details see: SHARE_Release_Guide_2.3.1) and generated financial variables in the imputation files (see question 8.9).

8.9 What kind of imputed variables are available?

Imputation procedures were run five times in order to estimate values. These values are needed when information is missing, especially variables on financial issues (e.g. income). All five implicated datasets are included. Note that in consequence, each case appears five times in the imputations data file. They can be identified by the variable “implicat”.

9. Why are there different types of respondents?

The SHARE CAPI main questionnaire is designed in such a way that not every eligible household member has to answer every CAPI module (see table 1).

9.1 Who is the Financial Respondent?

The financial respondent is identified by the questions CM002 and CM003 at the start of the Demographics module (DN). The financial respondent answers the modules FT (financial transfers) and AS (assets). In a one-person household, the respondent is always the financial respondent. In multi-person households, the number of financial respondents may vary: respondents living without a partner in multi-person households are always financial respondents. Eligible couples, i.e. spouses and partners, may decide in CM002 to answer questions about their finances separately (this can be retrieved from “finsep”). It is also possible that only one spouse answers the questions. In this case he or she is identified as the financial respondent for the couple CM003. This is indicated by the dummy variable “dumfinr”in the coverscreen (cv_r).

9.2 Who is the Household Respondent?

The household respondent is picked in CV038 to answer on behalf of the whole household all questions about household features (HO, HH, CO). The household respondent is indicated by the dummy “dumhhr” in the coverscreen (cv_r).

9.3 Who is the family respondent?

Family respondents answer the questions of the CH module and the first part of the SP module (SP001 to SP017). They are indicated by the dummy variable “dumfamr”. They are selected by the chronological order of interviews per couple: The couple’s first person interviewed is the family respondent in the coverscreen (cv_r).

10. Are proxy-interviews allowed?

Proxy interviews are conducted, where physical and cognitive limitations make it too difficult for a selected respondent to complete the interview himself or herself, except for the non-proxy sections: Cognitive Function (CF), Mental Health (MH; partly), Grip Strength (GS), Walking Speed (WS), Activities (AC) and Expectations (EX). Each data module contains a variable which indicates who answered the questions in the respective module.

11. How can I merge the datasets?

To merge data always use sampid2 AND cvid as keys.

  • On the household level (CV_H): Its key identifier is “sampid2”
  • On the individual level (CV_R): Its key identifiers are “sampid2” AND “cvid”

Datasets based on the main part of the instrument have two case identifiers other than sampid2:

  • cvid: indicates the corresponding identifier from the coverscreen interview. The cvid comprises numbers for non-eligible or non-responding household members. They have no respid because they do not appear in the data set resulting the main part of the instrument.
  • respid (respondent identifier): indicates the case identifier for respondents in all data files on individual level. (only eligible responding persons).

12. How are the names of variables and dummies constructed (naming conventions)?

12.1 What is the general naming format?

In general, the variable names in the main instrument data use the following format:
MMXXX_YY

MM     module identifier, e.g. DN
XXX     question number, e.g. 001
_      separation character, for dummies (question 12.2) replaced by 'd'
YY     optional digit(s) for category or loop indication; in case two digits are used the first digit refers to the 'outer loop' and the second digit refers to the 'inner loop'. When the loop number goes beyond 9 and there is only one character available, the numbering continues with a (=10), b (=11) etc.
The format slightly deviates in case of variables with euro amounts (question 13) or variables for the “unfolding brackets” (question 14).

12.2 How are dummy variables named?

Answers to all questions that allow for multiple responses have dummy variables as final data. E.g. question BR005 ("What do or did you smoke") has three answer categories:

1.Cigarettes
2. Pipe
3. Cigars or cigarillos

The data set contains three dummies: BR005d1, BR005d2 and BR005d3 corresponding to the three categories. A value “1” means that the respondent chose the particular category as answer, and in case of “0” the respondent did not choose the particular category as answer.

In case the respondent answers with a “none of these”, “don‟t know” or “refusal”, the naming of the dummy variables has the following structure:

MMXXXdno “none of these” MMXXXdrf “refusal” MMXXXddk “don‟t know”

In case the question requires loop indication, the second to last digit indicates the number of the loop (#), the last letter stands for the answer categories:
MMXXXd#n “none of these” MMXXXd#r “refusal” MMXXXd#d “don‟t know”

13. How are the amounts of money collected and made comparable (euro conversion)?

All answers about amounts of money are converted into Euro values. For non-Euro countries a frozen exchange rate is chosen. For Euro countries the Euro value is either the given value or the converted pre-Euro value because respondents in Euro countries were given the option to report in either Euro or the pre-Euro currency.

The format of the variable name is mentioned above except for the separation identifier '_'; this is now replaced by 'e'. Possible digits that follow reflect loop numbers.

When the respondent gives a “don‟t know” (DK) or “refusal” (RF) as an answer to a question indicating a financial amount, the following values are included in the dataset:
 -9999998: Refusal
 -9999999: Don't know

Exchange rates are documented in the SHARE_Release_Guide_2.3.1.

14. What are “unfolding brackets”?

When a respondent does not know (DK) or refuses (RF) the answer to a question about amounts of money, usually an unfolding sequence of bracket questions starts. There are three entry points, and the starting point is chosen randomly. All details of the sequence are stored in the dataset. However, in the public release only a few (summary) variables are included. For all sequences we have the country-specific bracket values (in Euros) and the final category where the respondent ended. When a DK or RF is given during the unfolding bracket sequence, the value for the final category is set to either DK or RF.

The format of the summarizing unfolding bracket variable is as follows:
MMXXXubY
with MM     module identifier, e.g. HC
XXX         question number, e.g. 045
Y        optional digit for loop indication

The variable indicating where the respondent finally ends can take seven values:
1. Less than low entry point
2. About low entry point
3. Between low and mid entry point
4. About mid entry point
5. Between mid and high entry point
6. About high entry point
7. More than high entry point.

The country-specific bracket values are indicated as:
MMXXXv1, MMXXXv2, and MMXXXv3
In the case of a loop, the digit for the loop indication precedes the 'v1', 'v2', and 'v3'.

15. Which information does the variable “PHrandom” contain?

There are two types of answer categories for the question about self-perceived health, which each respondent is asked twice (at the beginning and at the end of the PH module). Whether the selection is PH002/PH052 or PH003/PH053 is randomized. The variable phrandom indicates which type is chosen: 1 for PH002/PH052, 2 for PH003/PH053.

16. How are the children in the child loop selected?

Questions CH009 to CH020 in module CH about children are only asked for at most four children. When there are more than four children, the CAPI program selects the four children as follows:

1. Sort children in ascending order by

  • minor (defined as 0 for all children aged 18 and over and 1 for all others),
  • geographical proximity (CH007) and
  • birth year.

2. Pick the first four children. When all sorting variables are equal, the CAPI program chooses a child randomly. The variables “chselch1” up to “chselch4” contain the numbers of the children who were selected by the program.

17. What kind of weights are available?

SHARE provides two different sets of weights:
weights computed on the basis of respondents only (data file: share1rel2-0-1_gv_weight_resp_only)
weights computed including non-responding partners (included in data file: share1rel2-0-1_imputations)

SHARE includes three different kinds of weights:
1. design weights,
2. calibrated household weights and
3. calibrated individual weights.

In countries with vignette samples (Sweden, Belgium, Spain, France, Germany, Greece, Italy, and the Netherlands) each weight exists in three variants:
1. for the main sample,
2. the vignette sample and
3. for the two samples combined.

The variable samptype indicates to which sample a household belongs. In Sweden there is also a sample supplementary to the main sample. It was treated as part of the main sample. (for details see: SHARE_Release_Guide_2.3.1)

18. How do I use the weights?

Which weights to use depends on the specific research question. (see: SHARE_Release_Guide_2.3.1)

18.1 Which weights should I use?

For most purposes the calibrated weights. In most countries we have calibrated against the total national population by age group and gender. (for details see: SHARE_Release_Guide_2.3.1)

18.2 Why do I need the design weights?

If you would like to do your own calibration you need the design weights. (for details see: SHARE_Release_Guide_2.3.1)

18.3 When do I use household, when individual weights?

Household weights are useful in an inference to a population of households, individual weights in an inference to a population of individuals. (for details see: SHARE Release Guide 2.3.1)

18.4 Why are some weights missing?

The observation could be a partner less than 50 years old. Otherwise data needed for calibration might have been missing. (for details see: SHARE_Release_Guide_2.3.1)