• Training
  • Training: Version control with Git and GitHub
Version Control with Git and GitHub

Version Control with Git and GitHub: Courses and Training

Requirements: basic programming knowledge in R, Python or another language, basic knowledge in Git

Duration: 4h

Git should be as much a part of every data scientist's skill set as data visualisation or simple descriptive techniques. It is the leading version control tool and has become the industry standard across all sectors. Version control allows multiple people to work together on code, track changes and save them. Git is organised in a decentralised way so that each user always has a complete copy of the code base. Local changes to a project can then be synchronised with a server, for example as SaaS through GitHub. GitHub, on the other hand, is not Git. GitHub is a company that was bought by Microsoft in 2018 and combines hosting of Git repositories with a social network as a central service. It offers a variety of teamwork options and allows you to submit support requests via a ticket system that lets you contact developers directly. In the meantime, there are only a few open source tools in the field of data science that cannot be found on GitHub.

We have been working with Git and GitHub since 2012. Our workflow essentially consists of the following steps:

  1. Whenever we start working on a new task in an existing project, we create a so-called feature branch, which opens a new branch in the code base.
  2. As soon as the work is completed, we open a pull request, i.e. a request for integration into the code base.
  3. In addition to automatic tests to check the quality of the code (test coverage, lints, etc.) and the correctness of the implementation (unit tests), GitHub enables a code review at this point, which can help to ensure or improve the quality of the code.
  4. A code review enables the distribution of knowledge in the team through precise feedback on the code. It is based on a four-eyes principle (the reviewer must understand what is happening in the code) and leads to less maintenance effort, as it has been proven that fewer errors occur.

In our Git workshop we teach you the workflow described above and pass on our experience. The technical basics are relatively easy to learn. We will focus the training mainly on the use of Git and GitHub in a data science team.

We would like to teach you our Git and GitHub workflow in this workshop. During the Git training we will mainly focus on the use of Git and GitHub in a data science team. This cours includes:

  • Git commands: pull, push, commit, branch, merge, checkout
  • Feature branch: workflow in a 1-person team
  • Feature branch: workflow in an x-person team
  • Code reviews