R Basics

R Basics

Capture the essential fundamentals of Data Science in our two-day training and lay the foundation for your ‚Hacking Skills‘. Our practice-oriented introduction to the statistical programming environment R focuses on data import and export, data quality assurance and data management. 

Data Science Venn Diagramm by Drew Conway from http://drewconway.com/the-lab/

Introduction to R: First of all we will provide you with detailed information about the programming environment R, from installation to support, so that you will be able to kick-start independently working with R, solve problems, or request help. Step by step we’ll introduce the basic components and operations of the programming language R. Through examples and exercises you are invited to run and write your own code right from the start.

Import and export data: Previous to the actual analysis the data has to be imported i.e. you need to establish a connection to where the data is located and import it considering its format. The case that the data is stored in the preferred format in a file on the local computer is rather the exception. Whether directly from a database or from the Internet, whether .csv, .xls or .sav file, we prepare you for all eventualities.

Tidy data: In the information age data quality plays a major role for the economic success of a company. Incorrect data or missing values can lead to faulty information and thus to inefficient or incorrect decisions eventually. The quality of a statistical analysis is directly related to the quality of the data used (garbage in – garbage out principle). This module will help you to learn how to collect, clean and handle your data correctly.

Data management: An alternative to data management with the R base package is dplyr, which is particularly helpful when dealing with large data sets. The concept behind this package is „Instead of moving the data to where the computation is, you want to send the computation to where the data is“ (Hadley Wickham). One of many advantages using the package dplyr is that calculations and data manipulations become much faster compared to the base approach. 

Knowledge: There is no requirement of prior knowledge.

Hard – and Software: You will need a Laptop with the current versions of R and RStudio. The statistical programming environment R can be downloaded from the website of the Comprehensive R Archive Network. The free desktop version of RStudio is available on the website of RStudio.

Amit Ghosh

Amit has 13 years’ experience as a data analytics consultant. After earning his Ph.D. with an emphasis on robust statistics, he built up and headed the statistical consulting unit at the Freie Universität Berlin. Amit is co-founder and managing director of INWT.

Sebastian Warnholz

Sebastian supports the predictive analytics team and works at the interface between software development and data science. He holds a Ph.D. in statistics and is a consultant and instructor at INWT with many years of teaching and business experience.


Curious? Catch a Glimpse of our Reader

R_Basics_preview.pdf (1.4 MiB)