Data Science and Minitab

Harsha Teja N
5 min readOct 22, 2020

How one can use minitab to learn data science concepts! #5

Data science with Minitab

About this blog

This blog serves as both a tutorial and a synthesis of the various resources I utilized to learn data science. It provides an overview of how to use Minitab for statistical methods, supplemented by a basic introduction to statistics.

I realized this information might be beneficial to others who are new to statistics, as I initially struggled to find similar introductory materials. Sharing my knowledge of Minitab not only helps others but also enhances my own understanding of the tool and potentially connects me with experts who can expand my learning.

If you find this article useful, please consider letting me know or supporting the development of such tools by purchasing the Pro version of Minitab on their official website. It’s a worthwhile investment. For the purposes of this blog, I have utilized the trial period offered by Minitab.

Let’s start knowing about the inferential statistics and how we can implement methods in Minitab.

As discussed in the first blog, inferential statistics is about collecting the data and trying to estimate the population data.

Just to get a broader perspective of inferential statistics, let’s know the topics under this category. There are two sub-topics; one is Estimations, and the second one is the Test of Hypothesis. Again under TOH, there are three different topics; One-sample, Two-sample, and ANOVA.

Estimations

Under estimations, we try to estimate the value of Mean (mu), Standard Deviation (Sigma), and Proportions (P knot), which are close to the values of the population parameters.

The reason why we chose only these three properties and not the other n number of possible properties is a large topic by itself. Just to mention the reasons, I can say because of its unbiasedness, lowest variance, and linearity.

There are different sets of formulas and references we apply to calculate and validate the value we found.

One-Sample Test and Two-Sample Test

The prerequisite to an understanding of one sample and two sample test is Hypothesis testing.

Hypothesis testing is about having a belief backed by some data, which is either contradictory to the fact or along with the fact. To validate the belief we hold, we perform hypothesis testing on the belief. We do this by first writing down our alternate hypothesis and null hypothesis.

Then we perform steps on the data collected to either accept or reject the null hypothesis. In the process of performing the steps, we come across t-test, z-test, f-test, and respective data tables to find out the values.

Now, in one sample test, we compare the hypothesis with some standard measures. For instance, when I bought 10 bottles of 2 liters coca-cola beverage, I saw that almost 8 of those bottles were filled till only 1.8 liters. I went to the respective customer executive to validate this. In this example, I’m comparing my value with some standard processes of the coca-cola beverage.

Minitab: One Sample tests

Similarly, when I have two data sets collected from performing the same activity in different instances, I consider two-sample tests. For example, let’s assume I made my team go through training to improve their performance. Now I believe that my money spent on the training was useful, to validate this I’ll probably compare the values of the performance results collected before training and after exercise.

Minitab: Two sample tests

ANOVA

ANOVA is known as one of the most important concepts for data science and machine learning. As every theory is almost laid on this basic concept. ANOVA is an acronym for Analysis of Variance.

For a simpler understanding, we can consider the use of ANOVA when there are more than 2 factors and 1 level of activity happening on which we want to understand, maybe an efficient factor or profitable level or something on those lines.

The results that we derive from these methods help us to perform further steps like Tukey and Dunnett’s methods.

When there is a multivariate sample means we conduct MANOVA; Multivariate Analysis of Variance.

Balanced Anova is when there are no missing levels or factors in the sample collected.

In my understanding, ANOVA is a vast and important topic that I think every data scientist should understand clearly.

With this, I’m wrapping inferential statistics. You can explore all the options available under each tab. Minitab’s “Help” section is excellent, with a detailed explanation behind every aspect of the feature.

To learn and experiment on the theories that you know, you need some sample data, and Minitab gives you an option to generate random sample data under the “Calc” tab. Before getting results, you have to enter the parameters for the respective data distribution have chosen.

Minitab: Random Data tabs

In the next blog I’m write about how to perform Regression in Minitab. See you there!

Disclaimer

I am not affiliated with any of the services mentioned in this article. Additionally, I do not claim to be an expert. If you believe that I have overlooked important details or omitted crucial steps, please feel free to point them out in the comments section or contact me directly. I welcome constructive feedback and suggestions for improvement.

--

--