Blog
  • INWT
  • Blog: Data Science

Mon 30 Nov 2020·by

Now that Halloween is over and Advent is just around the corner, it is time for some christmas decorations. And what better way to get into the holiday spirit than with a Python 🐍 project?

Tue 20 Oct 2020·by

How to protect your data from one of the most common (and potentially damaging) web security risks. 

Wed 16 Sep 2020·by

Traditionally, marketing decisions have been made by executives on the basis of instinct, experience, and what data are available. But what if this could be automated, with an artificial agent making use of huge amounts of data to automatically determine the optimal marketing strategy for every customer individually at a particular moment in time? This is precisely the promise of reinforcement learning.  

Mon 15 Jun 2020·by

Jenkins is currently the leading open source automation server and programmed in Java. It is distributed under the MIT license. Jenkins is absolutely free and very flexible because it allows the use of a wide range of version control systems and offers more than 1,500 plugins. In this blog article we would like to introduce the CI tool Jenkins and the essential aspects of its user interface.

Mon 15 Jun 2020·by

This article provides a theoretical introduction to Continuous Integration and an overview of the pros and cons of using CI Tools. A selection of different tools for getting started will be presented.

Tue 31 Mar 2020·by

Missing or incomplete data can have a huge negative impact on any data science project. In this blog we explore what kinds of missing data exist, and how we can go about overcoming the challenges they present. 

Thu 13 Feb 2020·by

In this post we’d like to introduce you to our new R package shinyMatrix. It provides you with an editable matrix input field for shiny apps.

Fri 27 Dec 2019·by

Business is changing as a result of the increasing quantity and variety of data available. Significant new opportunities can be realized by harnessing the knowledge contained in these data - if you know where to look. A data science team can help to bring raw data through the analysis process and derive insights that are critical in today’s technologically-competitive environment.

Mon 23 Dec 2019·by

Visualization tools in R and Python offer support for projects in different ways. If you are still unsure which language is right for you, this article could be of interest to you and offer support in decision-making. Common packages of both languages ​​are presented and sample graphics are created.

Tue 19 Nov 2019·by

When you write code, you’re sure to run into problems from time to time. Here are some advanced tips and tricks for handling these errors, explained accessibly.

Mon 21 Oct 2019·by

One of the biggest challenges that companies face is to use their advertising budgets efficiently, and to advertise purposefully such that advertising meets the customer when it has the most leverage - without being overwhelming, repetitive, or irrelevant. With Marketing Mix Modeling, we can help to overcome this challenge.

 

Thu 26 Sep 2019·by

Multi-Armed Bandit algorithms are a modern alternative to traditional A/B testing. Similar to Reinforcement Learning, these algorithms can optimize what is shown to the client to maximize rewards while simultaneously determining the most successful option for your business. 

Tue 17 Sep 2019·by

Having understandable, clean, and compliant data is a necessity for business success. Specific care is needed to ensure that analyses made on the basis of data are reliable and offer value to an organization. In this context, the role of a Data Steward is becoming ever-more valuable. This article discusses roles and tasks of Data Stewardship.

Mon 09 Sep 2019·by

This article describes best practice approaches for developing shiny dashboards. The creation of the dashboard in package form, as well as the use of unit tests should enable the development of robust solutions and guarantee high quality.

Thu 25 Jul 2019·by

An introduction to and comparison of the market leaders in statistics programs - R, Python, SAS, SPSS, and STATA - to help pick the best one for your needs.

Tue 16 Jul 2019·by

In this article we look at how to build a shiny app with clear code, reusable and automatically tested modules. For that, we first go into the package structure and testing a shiny app before we focus on the actual modules.

Wed 19 Jun 2019·by

In current online marketing practice, short-term TV-induced web page traffic is usually quantified by a simple baseline correction. In our blog article, we show which measurement errors go along with it, how they can be avoided and how the identified TV impact is correctly considered in the attribution.

Tue 21 May 2019·by

In this article we present our R package rsync, which serves as an interface between R and the popular Linux command line tool rsync. Rsync allows users of Unix systems to synchronize local and remote files between two locations.

Tue 07 May 2019·by

When a code base grows we may think of using several files first and then source them. Functions, of course, are rightfully advocated to new R users, and are the essential building block. Packages are then, already, the next level of abstraction we have to offer. With the modules package I want to provide something in between.

Mon 25 Mar 2019·by

ggCorpIdent is a package for customizing ggplot2 graphics in R easily and without touching the plot code itself. It lets you use custom colors in the plot, which are interpolated if you have not specified as much colors as needed. You can add custom fonts for the text elements within the plot and embed your corporate logo.

Wed 30 Jan 2019·by

In this post I'd like to introduce the R Markdown template for business reports by INWTlab. It's a nice and clean template for use in a corporate environment that is easy to customize in colors, cover and logo.

Wed 21 Nov 2018·by

In the first part of this blog series, we examined the theoretical foundations of cluster analysis. Now we put the theory into practice using R and find a cluster solution for the mtcars data set. Then the cluster solution is evaluated and interpreted.

Tue 06 Nov 2018·by

This article focuses on introducing the theoretical concepts of cluster analysis. You'll get a basic understanding of the underlying measures and the different methods that can be used for clustering. An evaluation method for group structures and cluster solutions is introduced towards the end of the article.

Thu 11 Oct 2018·by

This article describes how you can apply a programming technique, called Memoization, to speed up your R code and solve performance bottlenecks.

Tue 25 Sep 2018·by

The Kernelheaping package also supports boundary-corrected kernel density estimation, which allows us to exclude certain areas, where we know that the density must be zero. One example is estimating the population density where we like to exclude uninhabited areas such as lakes, forests, parks etc. The Kernelheaping package employs a boundary correction method, where each single kernel is restricted to the area of interest.

Mon 06 Aug 2018·by

The speed or run-time of models in R can be a critical factor, especially considering the size and complexity of modern datasets. The number of data points as well as the number of features can easily be in the millions. Even relatively trivial modeling procedures can consume a lot of time, which is critical both for optimization and update of models. An easy way to speed up computations is to use an optimized BLAS (Basic Linear Algebra Subprograms). Especially since R’s default BLAS is well regarded for its stability and portability, not necessarily its speed, this has potential.

Fri 13 Jul 2018·by

Interval censoring can be generalised to rectangles or alternatively even arbitrary shapes. That may include counties, zip codes, electoral districts or administrative districts. Standard area-level mapping methods such as choropleth maps suffer from very different area sizes or odd area shapes which can greatly distort the visual impression. The Kernelheaping package provides a way to convert these area-level data to a smooth point estimate.

Mon 28 May 2018·by

All over the world, at the newsstand, in public transport and above all in countless betting communities, football fans are currently asking themselves the question: Who will be the World Champion of the 2018 Football World Cup? Using statistical data science models, we simulated the 2018 FIFA World Cup 10,000 times to determine the probabilities for the next World Cup winner and thus the World Cup favourites. In the following days of the FIFA World Cup, you will find the answer to the question who are the top favourites for the FIFA World Cup here in our blog - daily updated and based on a lot of data and up-to-date statistical analyses.

Wed 04 Apr 2018·by

This article is a reflection on how I use different strategies to solve things in R. Design Pattern seems to be a big word, especially because of its use in object-oriented programming. But in the end I think it is simply the correct label for reoccurring strategies to design software.

Mon 05 Mar 2018·by

The motivation for this plot is the function:graphics::smoothScatter, basically a plot of a two dimensional density estimator. In the following I want to reproduce the features with ggplot2.

Tue 06 Feb 2018·by

In this blog article I'd like to introduce the univariate kernel density estimation for heaped (i.e. rounded or interval censored) data with the Kernelheaping package.

Thu 25 Jan 2018·by

Sticking to a styleguide helps writing cleaner code and makes working in a team more comfortable. In this article, we present the styleguide we use at INWT – and how you can check your code for deviations from certain style rules.

Tue 12 Dec 2017·by

This is a reproduction of the (simple) bar plot of chapter 6.1.1 in Datendesign mit R with ggplot2.

Wed 22 Nov 2017·by

Which layout of an advertisement leads to more clicks? Would a different color or position of the purchase button lead to a higher conversion rate? Does a special offer really attract more customers – and which of two phrasings would be better? For a long time, people have trusted their gut feeling to answer these questions. Today all these questions could be answered by conducting an A/B test.

Wed 01 Nov 2017·by

Beginning to (re)discover the usefulness of closures, I remember some (at first sight) very strange behaviour. Actually it is consistent within the scoping rules of R, but until I felt to be on the same level of consistency it took a while.

Wed 16 Aug 2017·by

This last part is about visualizing the crash location and the flight route with help of the R package leaflet

Wed 16 Aug 2017·by

In this part I'll request the geocoordinates for the crash location and the point of departure as well as for the intendet arrival location from from the Google Maps Geocoding API. 

Tue 08 Aug 2017·by

Have you ever tried to find your way around in the file structure of an already existing project? To separate relevant from obsolete files in a historically grown directory? To find out in which order existing scripts should be executed? To make all this easier, it helps to have a consistent file and folder structure across your projects. In this article we present our file structure for R projects to help you get started. 

Tue 01 Aug 2017·by

This first part is about how to scrape information on aviation accidents from planecrashinfo.com. On this site you can find multiple tables inside tables with lots of information on aviation accidents of the last century.

Mon 20 Mar 2017·by

This article presents the election forecast of INWT for the 2017 elections to the Bundestag. A statistical forecasting model based on the survey results of leading German survey institutes is presented. Unlike the survey institutes we can also use our election forecast to predict the probability of possible coalitions after the election.

Tue 07 Mar 2017·by

MariaDB is currently the fastest growing open source database solution. It is mainly developed by the MariaDB corporation and is a fork of MySQL. This article describes an own solution for monitoring and optimizing our internal database infrastructure implemented with R and Shiny: the MariaDB monitor. It is an open source alternative to existing fee-based or unflexible monitoring tools.  

Wed 15 Feb 2017·by

A statistical analysis of more than 150 Lego building kits shows that the price of individual lego components is determined not only by their size, but also by Lego theme, like Star Wars for example.