Ovarian Hyperstimulation and BP in Offspring: Part 2
Ovarian Hyperstimulation and BP in Offspring: Part 2
Although the use of randomized experiments is the commonly most accepted method for inference on causality (Hill, 1965), in many research settings this approach is often not feasible. Testing hypothesized causal relations using the observed data can provide a method to test the plausibility and consistency of causal models. In the last decade the use of causal graphs has been more and more accepted and explored in the fields of both statistics and epidemiology (Greenland et al., 1999; Hernán et al., 2004; VanderWeele et al., 2008).
The results from the search algorithms indicate the presence of positive direct effects of COH-IVF on SBP percentiles and subscapular skinfold thickness. This is the first study using causal inference search algorithms combined with structural equation modeling in this context. This approach makes it possible to take into account and estimate the causal effects between all (background and outcome) variables in the model simultaneously. As such, this approach is able to detect possible confounding, to distinguish between the direct and indirect effects and to account for the influence of latent variables. Multivariable regression analysis cannot do that and its results are therefore prone to represent biased effect estimates and may possibly lead to incorrect conclusions regarding the observed associations (Greenland et al., 1999). Especially in this area, where the causal mechanism hiding behind COH-IVF treatment effects is not clear and outcome variables may be closely related in different ways, this approach may offer more insight and ways to disentangle the underlying causal relationships than the multivariable linear regression approach could provide.
Zooming in on the result from Fig. 1, our data suggest that COH-IVF, i.e. IVF that includes ovarian stimulation, is associated with higher SBP percentiles and thicker subscapular skinfolds in 4-year-old offspring. This is in line with other studies describing adverse effects of IVF on cardiometabolic outcome (Belva et al., 2007, 2012; Ceelen et al., 2007, 2008; Sakka et al., 2010; Scherrer et al., 2012). In addition, we found no adverse, and even beneficial, effects of MNC-IVF, i.e. IVF without ovarian stimulation, on SBP percentiles. This suggests, for the first time, that ovarian stimulation is involved in the poorer cardiometabolic outcome seen in IVF offspring. The suggested beneficial effect of the whole IVF procedure on SBP percentiles was relatively small in comparison with the effect of COH-IVF (IVF: −2.7 percentiles; COH-IVF: +12.7 percentiles, see Fig. 1). It is conceivable that the IVF effect is spurious and may be related to the selection criteria for MNC-IVF. Moreover, an indirect adverse effect of COH-IVF on the triceps skinfold thickness (peripheral fat) as well as a direct adverse effect on subscapular skinfold thickness (truncal fat) was shown, underlining the findings from Part I that COH-IVF may be associated with the cardiometabolic syndrome (Seggers et al., 2013).
The use of search algorithms for causal inference has not been without criticism (Humphreys and Freedman, 1996; Korb and Wallace, 1997), mainly referring to the required data assumptions and their strictness (see Supplementary data). Although a valid argument, the counter argument that similar data assumptions are being made for most tests and universally accepted methods in statistics could also be made. In allowing such limitations to prevent the application of statistical tests, very little statistics could be applied to real-world problems ever (Korb and Wallace, 1997; Spirtes et al., 1997).
As the method applied is an explorative one, results need to be interpreted with appropriate caution. Results should be interpreted as possible indications for new research hypothesis and do not necessarily render 'the truth'. However, this caution is not restricted to search algorithms alone: applying a series of multivariable regression analyses with correction for different sets of potential confounders is also explorative in nature.
In the application of the described methods used here, some issues need to be taken into account with regard to model assumptions and generalizability of the results. Within the field of machine learning, many simulation studies have been done regarding the application and performance of the methods used here (Spirtes et al., 2000; Ramsey et al., 2006; Ramsey, 2010). Less is known about their performance with regard to the application on real-life data, although progress in this area has been made in various research fields as more applications to real-life data have been performed (Chickering, 2002; Acid et al., 2004; Mwebaze et al., 2010; La Bastide-Van Gemert et al., 2013).
As with conventional regression analysis, all effects are assumed to be linear by the algorithms, an assumption which might not be strictly met by all variables. However, as Korb and Wallace (1997) pointed out, the fact that the true causal relationship is not linear does not necessarily mean that it cannot be detected using tests that assume linearity. Multivariate normality is another assumption which is not fully met by our variables, and extensions of the methods for arbitrary distributions are still developing (Hoyer et al., 2012; Zhang and Spirtes, 2008). However, it can be shown that the algorithms tend to work well for uni-modal roughly symmetrical distributions (Tetrad manual, 2011), a criterion met by most of our (continuous) variables. For the limited number of included dichotomous variables, this was less of a problem as most of them could be considered exogenous and hence treated as continuous variables by the algorithm. This is similar to regression where binary explanatory variables can be treated as numerical variables in the analysis.
The issue of unmeasured confounding (i.e. the possible existence of latent variables disturbing the effects of interest) is also a concern not uncommon to more conventional statistical methods. In our case, unmeasured parental factors such as socioeconomic status could play such a role. CPC and GES are search algorithms which are not developed to specifically identify such latent variables, but in general, indications of their existence can be read off of their results (see the Supplementary data). We have run the CFCI algorithm as well, which is equipped to detect latent variables (Spirtes et al., 1995). No firm distinctive conclusions regarding latent variables could be drawn from the CFCI result, however. A likely explanation could be that the parental background variables included in our analysis (education levels, ages at conception) act as a proxy for other unmeasured socioeconomic variables, partly capturing their effects.
Although not all assumptions were met, the selected graph (Fig. 1) has a good model fit. Due to our relatively small sample size and the somewhat large number of variables used, this could partly be an effect of over-fitting, a common worry when applying structural equation modeling techniques. However, our sample size was not too small to be able to detect smaller associations (Shipley, 2000). This holds especially for the found effects (undirected edges between variables) in general. For the orientation of the found effects, the error rate would be much larger for our sample size. However, due to our time constraints, the amount of possible mistakes made by the algorithms in orienting the effects remains small (Shipley, 2000).
The fact that already known and expected mechanisms were detected by our models, further underlines the validity of the results. As explained, by varying the alpha values and penalty discounts in the algorithms, alternative possible graphs underlying the data can be calculated. The different models with the best model fit indices (see the section 'Results') all showed similar (and hence stable) mechanisms concerning the effect of COH-IVF on SBP, indicating a certain consistency of the found relations between these variables. Moreover, the resulting graphs did seem to consistently discriminate and eliminate a large number of theoretically plausible effects, again giving rise to confidence that resulting causal hypothesis is worth further investigation.
Despite the explorative character of the causal inference approach and the caution needed when interpreting the consequences, we do feel that the analyses described here can be a valuable tool in the development of causal hypotheses in a field where little of the underlying mechanisms is yet known. Testing the validity of the proposed causal model used here to eliminate possible spurious associations due to sampling error should be done using new data from multiple larger, carefully designed studies. Ideally, children of patients randomly assigned to COH-IVF or MNC-IVF, who underwent single embryo transfer (like in the INeS study), should be followed (Bensdorp et al., 2009).
In our study population, increased time to pregnancy, which may be used as a proxy for the severity of subfertility, was associated with a (slightly) lower rather then higher blood pressure levels, suggesting no adverse effect of subfertility in our data. Previously, we found that a longer time to pregnancy was associated with less trait anxiety and better mental health of parents, 1 year after childbirth (Jongbloed-Pereboom et al., 2012). This might be an effect of self-selection: couples who are able to deal with a long period of subfertility and subsequent IVF treatments, presumably cope well with stress. Via 'nature' and 'nurture', these parental characteristics may be associated with lower BP levels in offspring.
As our leading research question focused specifically on unravelling the causal effect of ovarian stimulation on outcome variables, we did not explicitly include the effect of ICSI as a separate variable in the model. Including ICSI in the causal model would have further split up the found effects of MNC-IVF and COH-IVF here on outcome in a direct and an indirect (mediated by ICSI) effect, but it would not have essentially altered the interpretation and conclusion concerning the derived causal effects and would unnecessarily complicate the interpretation of the model.
In conclusion, the results of the present study suggest that COH-IVF is associated with higher SBP percentiles and increased truncal fat in 4-year-old offspring. Future research needs to confirm the here hypothesized causal role of ovarian stimulation in poorer cardiometabolic outcome and its generalizability, and should further investigate the underlying mechanisms, using our result as a research hypothesis to be tested with new data using causal inference and structural equation modeling. Our findings emphasize the importance of cardiometabolic monitoring of the growing number of children conceived through IVF worldwide.
Discussion
Although the use of randomized experiments is the commonly most accepted method for inference on causality (Hill, 1965), in many research settings this approach is often not feasible. Testing hypothesized causal relations using the observed data can provide a method to test the plausibility and consistency of causal models. In the last decade the use of causal graphs has been more and more accepted and explored in the fields of both statistics and epidemiology (Greenland et al., 1999; Hernán et al., 2004; VanderWeele et al., 2008).
The results from the search algorithms indicate the presence of positive direct effects of COH-IVF on SBP percentiles and subscapular skinfold thickness. This is the first study using causal inference search algorithms combined with structural equation modeling in this context. This approach makes it possible to take into account and estimate the causal effects between all (background and outcome) variables in the model simultaneously. As such, this approach is able to detect possible confounding, to distinguish between the direct and indirect effects and to account for the influence of latent variables. Multivariable regression analysis cannot do that and its results are therefore prone to represent biased effect estimates and may possibly lead to incorrect conclusions regarding the observed associations (Greenland et al., 1999). Especially in this area, where the causal mechanism hiding behind COH-IVF treatment effects is not clear and outcome variables may be closely related in different ways, this approach may offer more insight and ways to disentangle the underlying causal relationships than the multivariable linear regression approach could provide.
Zooming in on the result from Fig. 1, our data suggest that COH-IVF, i.e. IVF that includes ovarian stimulation, is associated with higher SBP percentiles and thicker subscapular skinfolds in 4-year-old offspring. This is in line with other studies describing adverse effects of IVF on cardiometabolic outcome (Belva et al., 2007, 2012; Ceelen et al., 2007, 2008; Sakka et al., 2010; Scherrer et al., 2012). In addition, we found no adverse, and even beneficial, effects of MNC-IVF, i.e. IVF without ovarian stimulation, on SBP percentiles. This suggests, for the first time, that ovarian stimulation is involved in the poorer cardiometabolic outcome seen in IVF offspring. The suggested beneficial effect of the whole IVF procedure on SBP percentiles was relatively small in comparison with the effect of COH-IVF (IVF: −2.7 percentiles; COH-IVF: +12.7 percentiles, see Fig. 1). It is conceivable that the IVF effect is spurious and may be related to the selection criteria for MNC-IVF. Moreover, an indirect adverse effect of COH-IVF on the triceps skinfold thickness (peripheral fat) as well as a direct adverse effect on subscapular skinfold thickness (truncal fat) was shown, underlining the findings from Part I that COH-IVF may be associated with the cardiometabolic syndrome (Seggers et al., 2013).
The use of search algorithms for causal inference has not been without criticism (Humphreys and Freedman, 1996; Korb and Wallace, 1997), mainly referring to the required data assumptions and their strictness (see Supplementary data). Although a valid argument, the counter argument that similar data assumptions are being made for most tests and universally accepted methods in statistics could also be made. In allowing such limitations to prevent the application of statistical tests, very little statistics could be applied to real-world problems ever (Korb and Wallace, 1997; Spirtes et al., 1997).
As the method applied is an explorative one, results need to be interpreted with appropriate caution. Results should be interpreted as possible indications for new research hypothesis and do not necessarily render 'the truth'. However, this caution is not restricted to search algorithms alone: applying a series of multivariable regression analyses with correction for different sets of potential confounders is also explorative in nature.
In the application of the described methods used here, some issues need to be taken into account with regard to model assumptions and generalizability of the results. Within the field of machine learning, many simulation studies have been done regarding the application and performance of the methods used here (Spirtes et al., 2000; Ramsey et al., 2006; Ramsey, 2010). Less is known about their performance with regard to the application on real-life data, although progress in this area has been made in various research fields as more applications to real-life data have been performed (Chickering, 2002; Acid et al., 2004; Mwebaze et al., 2010; La Bastide-Van Gemert et al., 2013).
As with conventional regression analysis, all effects are assumed to be linear by the algorithms, an assumption which might not be strictly met by all variables. However, as Korb and Wallace (1997) pointed out, the fact that the true causal relationship is not linear does not necessarily mean that it cannot be detected using tests that assume linearity. Multivariate normality is another assumption which is not fully met by our variables, and extensions of the methods for arbitrary distributions are still developing (Hoyer et al., 2012; Zhang and Spirtes, 2008). However, it can be shown that the algorithms tend to work well for uni-modal roughly symmetrical distributions (Tetrad manual, 2011), a criterion met by most of our (continuous) variables. For the limited number of included dichotomous variables, this was less of a problem as most of them could be considered exogenous and hence treated as continuous variables by the algorithm. This is similar to regression where binary explanatory variables can be treated as numerical variables in the analysis.
The issue of unmeasured confounding (i.e. the possible existence of latent variables disturbing the effects of interest) is also a concern not uncommon to more conventional statistical methods. In our case, unmeasured parental factors such as socioeconomic status could play such a role. CPC and GES are search algorithms which are not developed to specifically identify such latent variables, but in general, indications of their existence can be read off of their results (see the Supplementary data). We have run the CFCI algorithm as well, which is equipped to detect latent variables (Spirtes et al., 1995). No firm distinctive conclusions regarding latent variables could be drawn from the CFCI result, however. A likely explanation could be that the parental background variables included in our analysis (education levels, ages at conception) act as a proxy for other unmeasured socioeconomic variables, partly capturing their effects.
Although not all assumptions were met, the selected graph (Fig. 1) has a good model fit. Due to our relatively small sample size and the somewhat large number of variables used, this could partly be an effect of over-fitting, a common worry when applying structural equation modeling techniques. However, our sample size was not too small to be able to detect smaller associations (Shipley, 2000). This holds especially for the found effects (undirected edges between variables) in general. For the orientation of the found effects, the error rate would be much larger for our sample size. However, due to our time constraints, the amount of possible mistakes made by the algorithms in orienting the effects remains small (Shipley, 2000).
The fact that already known and expected mechanisms were detected by our models, further underlines the validity of the results. As explained, by varying the alpha values and penalty discounts in the algorithms, alternative possible graphs underlying the data can be calculated. The different models with the best model fit indices (see the section 'Results') all showed similar (and hence stable) mechanisms concerning the effect of COH-IVF on SBP, indicating a certain consistency of the found relations between these variables. Moreover, the resulting graphs did seem to consistently discriminate and eliminate a large number of theoretically plausible effects, again giving rise to confidence that resulting causal hypothesis is worth further investigation.
Despite the explorative character of the causal inference approach and the caution needed when interpreting the consequences, we do feel that the analyses described here can be a valuable tool in the development of causal hypotheses in a field where little of the underlying mechanisms is yet known. Testing the validity of the proposed causal model used here to eliminate possible spurious associations due to sampling error should be done using new data from multiple larger, carefully designed studies. Ideally, children of patients randomly assigned to COH-IVF or MNC-IVF, who underwent single embryo transfer (like in the INeS study), should be followed (Bensdorp et al., 2009).
In our study population, increased time to pregnancy, which may be used as a proxy for the severity of subfertility, was associated with a (slightly) lower rather then higher blood pressure levels, suggesting no adverse effect of subfertility in our data. Previously, we found that a longer time to pregnancy was associated with less trait anxiety and better mental health of parents, 1 year after childbirth (Jongbloed-Pereboom et al., 2012). This might be an effect of self-selection: couples who are able to deal with a long period of subfertility and subsequent IVF treatments, presumably cope well with stress. Via 'nature' and 'nurture', these parental characteristics may be associated with lower BP levels in offspring.
As our leading research question focused specifically on unravelling the causal effect of ovarian stimulation on outcome variables, we did not explicitly include the effect of ICSI as a separate variable in the model. Including ICSI in the causal model would have further split up the found effects of MNC-IVF and COH-IVF here on outcome in a direct and an indirect (mediated by ICSI) effect, but it would not have essentially altered the interpretation and conclusion concerning the derived causal effects and would unnecessarily complicate the interpretation of the model.
In conclusion, the results of the present study suggest that COH-IVF is associated with higher SBP percentiles and increased truncal fat in 4-year-old offspring. Future research needs to confirm the here hypothesized causal role of ovarian stimulation in poorer cardiometabolic outcome and its generalizability, and should further investigate the underlying mechanisms, using our result as a research hypothesis to be tested with new data using causal inference and structural equation modeling. Our findings emphasize the importance of cardiometabolic monitoring of the growing number of children conceived through IVF worldwide.