It is not easy to give a general strategy for this question. We refer the SHARE users to the recent paper by Solon, Haider and Wooldridge (2013). The authors distinguish between two types of empirical research: (i) research directed at estimating population descriptive statistics, and (ii) research directed at estimating causal effects (e.g. to achieve more precise estimates by correcting for heteroskedasticity, to achieve consistent estimates by correcting for endogenous sampling, and to identify average partial effects in the presence of unmodeled heterogeneity of effects). For the former, weighting is called for to make the analysis sample representative of the target population. The choice of using weighted sample statistics is intuitive and not controversial: population statistics can be consistently estimated by weighted sample statistics. For the latter, the question of whether and how to weight is more nuanced. Researchers have to be clear about the reason for using weighted estimation, think carefully about whether the reason really applies, and double-check with appropriate diagnostics. In situations where researchers might be inclined to weight, it often is useful to report both weighted and unweighted estimates and to discuss what the contrast implies for the interpretation of the results. It is also advisable to use robust standard error estimates.
SHARE provides calibrated cross-sectional and longitudinal weights. For cross-sectional analyses, the calibrated weight to be used depends on the basic sample unit of analysis. For example, in Wave 4, this is the variable cciw_w4 if the basic sample unit is the individual and cchw_w4 if the basic sample unit is the household.
For longitudinal analyses, the calibrated weight to be used depends on both the wave combination of interest (i.e. the waves used to form the panel) and the basic sample unit of analysis. For example, for the fully balanced panel (wave combination 1-2-3-4-5-6-7), this is variable cliw_a if the basic sample unit is the individual, and clhw_a if the basic sample unit is the household.
For longitudinal analyses based on different wave combinations, users are required to compute their own calibrated weights. To support users in this nontrivial methodological task, we provide a Stata ado-file called `sreweight.ado’ which implements the calibration procedure of Deville and Särndal (1992), and Stata do-files which illustrate step-by-step how to compute calibrated longitudinal weights at the individual and the household level. Further information is available here.
The ado file sreweight.ado as well as the other files provided for generating longitudinal calibrated weights can be downloaded from the regular SHARE download website (“Generate Calibrated Weights Using Stata”).
Sampling design weights may differ across waves because of changes in the national sampling designs. Calibrated cross-sectional and longitudinal weights are instead computed through the procedure of Deville and Särndal (1992) in all waves. The other main differences with respect to the previous waves are that: (i) we do not distinguish any more between alternative variants of the SHARE sample (i.e. main sample alone, vignette sample alone and the two samples combined); (ii) we do not provide any more calibrated cross sectional weights for non-responding partners because of substantive change in the imputation procedure used in wave 4; (iii) we do not provide any more calibrated longitudinal weights for all possible wave combinations of the panel.
Missing data in sampling design and calibrated weights may be due to (i) age-ineligibility (i.e. respondents younger than 50 years), (ii) missing sampling frame information, (iii) missing information on the set of calibration variables (age, gender, NUTS1 regional code), (iv) respondents not belonging to the selected balanced sample (only for calibrated longitudinal weights). Observations with missing weights due to (i) are not problematic if we want to make inference on the 50+ population. Since there are very few observations with missing weights due to (ii) and (iii), these observations can in general be dropped for substantive analysis of the SHARE data. Observations with missing longitudinal weights due to (iv) can be more problematic if the process generating missing observations is not missing-at-random (based on the chosen set of conditioning variables). Notice that, in order to compensate for attrition, users may exploit a larger set of conditioning variables by exploiting the information available from the starting wave. Alternative methods, such as weights based on the propensity score and sample selection models, could also be used to impose weaker assumption on the missing data mechanism associated with attrition.