Reproducible research

We made a lot of work till now. But does it matter for other people if they did not understand what was done and how it was done?

“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.” - D. Donoho

Consider the situation when you made a great research regarding crimes in USA, published the article and then get tons of questions from followers regarding your deeds. On the other side, if the one provided data, code, experiment flow and other information that is relevant to one’s research, there will be much less questions for sure.

Top reasons for making your research reproducible:

  • Ethics – we have a responsibility to show our work to move science forward.

  • Funding requirements – many funding agencies require the storage of data and the steps performed in the analysis as part of the effort to increase rigor and reproducibility.

  • Catch your mistakes – when you generate a reproducible data document, you are more likely to catch mistakes. Also, your documented steps allow you to trace back to where you went wrong.

  • Others can catch mistakes – it is far better that others, like the reviewers for your manuscript or your mentors, find the flaws in your analysis than for them to be buried while countless other students try to repeat your work.

  • Others can learn how to perform the analysis – a reproducible research document can be a powerful teaching tool for future lab members or others around the globe.

  • Better study design – when you write down why you performed a test, you realize the rationale, because ‘that’s how we always do this’ doesn’t quite cut it.

  • Other people who want to do research in the field can really start from the current state of the art, instead of spending months trying to figure out what was exactly done in a certain paper. It is much easier to take up someone else’s work if documented code is also available.

  • It highly simplifies the task of comparing a new method to the existing methods. Results can be compared more easily, and one is also sure that the implementation is the correct one.

We encourage you to take a look on few articles that are publicly available on rpubs. All of them are using R as main data processing language, but for now we are interested only in the structure of research papers.

Listed links are only for educational purposes.

Here is common structural elements in most of shown articles:

  • Title

  • Initials of the author

  • Introduction/Synopsis

  • Data processing/Environment preparation

  • Results

  • Summary

Description of assignment

We hope that you remember the data used for previous assignment. Despite the fact that all the assignments in this course follow reproducibility rules, we are going to prepare a .pdf article, that is prepared from a changed notebook of previous lesson, to get better sense of theory in this section. The skills you gained in this module of course are mandatory for any machine learning practitioner. What is really important is to understand the data you are working with and be able to use if effectively for your purposes. Good luck.

Assignment 1

Pdf regarding research.