Data Science and Minitab

Harsha Teja N
4 min readOct 22, 2020

--

How one can use minitab to learn data science concepts! #6

Data science with Minitab

About this blog

This blog serves as both a tutorial and a synthesis of the various resources I utilized to learn data science. It provides an overview of how to use Minitab for statistical methods, supplemented by a basic introduction to statistics.

I realized this information might be beneficial to others who are new to statistics, as I initially struggled to find similar introductory materials. Sharing my knowledge of Minitab not only helps others but also enhances my own understanding of the tool and potentially connects me with experts who can expand my learning.

If you find this article useful, please consider letting me know or supporting the development of such tools by purchasing the Pro version of Minitab on their official website. It’s a worthwhile investment. For the purposes of this blog, I have utilized the trial period offered by Minitab.

My professor once told me that “Regressions is like an art,” which I didn’t understand at that point in time. Now that I have done many projects under this theory, I can surely relate to what he said back then.
I have seen that most data science teams these days directly jump into Machine learning and Deep Learning without exploring the opportunities to solve the problem using regression.

Regression can be simply put as the next level of MANOVA, where the number of parameters is insanely huge, and similarly, Machine learning is the next level of regression. The main difference that I observed between regression and machine learning is. The regression method can opt only when we can satisfy a few assumptions stated as the Gauss-Markov theorem.

There will be instances where we cannot make them satisfy the theorem even after performing some data processes. In such cases, we use Machine learning and others.

The regression equation is generally written as y = ߺ + ßx + E, where ߺ is the y-intercept, ß is the slope with the parameter x, and E is the error function.

Minitab: Regression tabs (1,2,3,4)

I have generated random data in columns labeled y (response) and x (variable/parameter) in the first image.

You can find the option to perform regression under the tab “Stat.” There are other regression modeling methods, which you can try with some basic knowledge on its purpose.

After inserting the respective values, the tool fetches the results. The result consists of different parameters to judge the accuracy of the model we got: y = 99 + 0.0208x.

Minitab also provides us with 4 different values to judge the level of accuracy through coefficients of determination and also the value of VIF, which talks about multicollinearity problems in our data.

The first table, i.e., Analysis of Variance, talks about the variable’s values and its effect on the overall model. In this case, we have only one variable, but generally, there will be more than thousands of variables in real case scenarios.

This is the last blog under this series. I hope I made you aware of Minitab’s basics and its practical use in learning and performing data science concepts. See you again!

Disclaimer

I am not affiliated with any of the services mentioned in this article. Additionally, I do not claim to be an expert. If you believe that I have overlooked important details or omitted crucial steps, please feel free to point them out in the comments section or contact me directly. I welcome constructive feedback and suggestions for improvement.

--

--