Data Science and Minitab

Harsha Teja N
6 min readOct 22, 2020

--

How one can use minitab to learn data science concepts! #4

Data science with Minitab

About this blog

This blog serves as both a tutorial and a synthesis of the various resources I utilized to learn data science. It provides an overview of how to use Minitab for statistical methods, supplemented by a basic introduction to statistics.

I realized this information might be beneficial to others who are new to statistics, as I initially struggled to find similar introductory materials. Sharing my knowledge of Minitab not only helps others but also enhances my own understanding of the tool and potentially connects me with experts who can expand my learning.

If you find this article useful, please consider letting me know or supporting the development of such tools by purchasing the Pro version of Minitab on their official website. It’s a worthwhile investment. For the purposes of this blog, I have utilised the trial period offered by Minitab.

In this blog, we will be discussing on random variables and categories of probability distributions under it

Random variables is a numerical description of the outcome of a statistical experiment. A random variable that may assume only a finite number or an infinite sequence of values is said to be discrete; one that may assume any value in some interval on the real number line is said to be continuous.”

In reality, we often conduct experiments on a small scale, trying to deduce the result or probability of our favourable outcome when performed on a large scale. Before applying major calculations on the empirical data, we first try to understand the data distribution. These distributions are categorised based on the nature of the data. The data’s nature can be divided into two types, the first is Discrete, and the second one is Continuous.

The theories that can be applied under the Discrete nature of data are; Binomial, Bernoulli, Poisson, Negative Binomial, and others. Under the data’s continuous nature, the theories are; Normal, Gamma, Exponential, and others.

The formulas we use under the Discrete theories are called Probability Mass Function and under continuous theories are called Probability Density Function.

There are almost hundreds of theories available under each nature of data that are adept at certain events. You can visit the help section in Minitab or google to know more about the number of theories and events.

I’ll brief on few theories before talking about the steps of performing it on Minitab. Let’s start from theories under the Discrete nature of data.

Bernoulli

“The Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability and the value 0 with probability.”

Bernoulli distribution is observed in events where the data is binary in nature. For instance, did someone pass the exam or not, I have money to buy something or not.

Parameters: A parameter of a distribution is a number or a vector of numbers describing some characteristic of that distribution. The parameters for Bernoulli Distribution is n; number of trials.

You can find these data distribution theories in the Calc tab under Probability Distributions.

As said earlier, there are many distribution theories one can try to apply to the data.

In this image, we cannot see Bernoulli as an option because of its similarity with Binomial distribution.

Going forward, we will look at how we can perform Bernoulli calculation under Binomial.

Binomial

“A binomial distribution can be thought of as simply the probability of a SUCCESS or FAILURE outcome in an experiment or survey that is repeated multiple times. The binomial is a type of distribution that has two possible outcomes (the prefix “bi” means two, or twice).”

In other words, Binomial is a larger scale of Bernoulli, where the Bernoulli trials are repeated many times and are considered to be independent of other trials.

The parameters of Binomial Distribution are n and p; the number of trails and event probability.

Minitab: Binomial Tabs

Before proceeding with the results, we have to enter the parameters under the input boxes of the Number of trails and Event probability.

Poisson

“The Poisson distribution is the discrete probability distribution of the number of events occurring in a given time period, given the average number of times the event occurs over that time period.”

Poisson distribution holds certain assumptions that are beyond the scope of this blog. Put simply, we use Poisson distribution when the number of trails is extremely large, and the event probability is extremely low.

The parameter for Poisson distribution is Lambda; (λ) is the total number of events (k) divided by the number of units (n) in the data (λ = k/n)

Minitab: Poisson Tabs

Before getting the results, we have to enter the parameter of Poisson .i.e. Lambda.

Now, let’s try to understand the theories under the continuous nature of data. Though there are many theories under this type, the most used theory is Normal Distribution.

Normal

The normal distribution is the most referred theory in statistics because of its properties, which are friendly towards many other theories. The normal distribution is a probability function that describes how the values of a variable are distributed. It is an asymmetric distribution where most of the observations cluster around the central peak (Bell Curve), and the probabilities for values further away from the mean taper off equally in both directions.

Minitab: Normal curve (Probability Distribution Curve)

We see the bell curve formed by the data when plotted in a histogram.

The tendency of the data is moving towards the center.

The parameters for normal distribution are Mean (mu — accuracy) and Standard Deviation (sigma — Consistency).

Minitab: Normal Tabs

Before trying to get results, we have to enter the respective values of mean and standard deviation.

Though there are many other theories under the continuous nature of data, normal is the one most people stick to. There’s also a theory called “Central Limit theorem,” which helps convert the given nature of the data into normal.

In the next blog, I’ll be writing on Inferential theory. See you there!

Disclaimer

I am not affiliated with any of the services mentioned in this article. Additionally, I do not claim to be an expert. If you believe that I have overlooked important details or omitted crucial steps, please feel free to point them out in the comments section or contact me directly. I welcome constructive feedback and suggestions for improvement.

--

--