Data mining data dredging examples

12/5/2023

Reproducibility is attained by employing two levels of model validation: internal (relative to data collated around hypotheses) and external (independent to the hypotheses used to generate data or to the data used to generate hypotheses).

We here propose a HARK-solid, reproducible inference framework suitable for big data, based on models that represent formalization of hypotheses.

Some of the HARK precautions can conflict with the modern reality of researchers’ obligations to use big, ‘organic’ data sources-from high-throughput genomics to social media streams.

Despite potential drawbacks, HARK has deepened thinking about complex causal processes. Hypothesizing after the results are known (HARK) has been disparaged as data dredging, and safeguards including hypothesis preregistration and statistically rigorous oversight have been recommended.

0 Comments

Data mining data dredging examples

Leave a Reply.

Author

Archives

Categories