Data Science and Minitab

Harsha Teja N
7 min readOct 22, 2020

How one can use minitab to learn data science concepts! #2

Data science with Minitab

About this blog

This blog serves as both a tutorial and a synthesis of the various resources I utilized to learn data science. It provides an overview of how to use Minitab for statistical methods, supplemented by a basic introduction to statistics.

I realized this information might be beneficial to others who are new to statistics, as I initially struggled to find similar introductory materials. Sharing my knowledge of Minitab not only helps others but also enhances my own understanding of the tool and potentially connects me with experts who can expand my learning.

If you find this article useful, please consider letting me know or supporting the development of such tools by purchasing the Pro version of Minitab on their official website. It’s a worthwhile investment. For the purposes of this blog, I have utilized the trial period offered by Minitab.

In this article, I will introduce some fundamental concepts in statistics that are essential for understanding the topics we will explore using Minitab.

Statistics for data science can broadly be divided into two categories: descriptive statistics and inferential statistics. Descriptive statistics help us summarize and describe data, while inferential statistics are used to make predictions or inferences about a larger population from a sample.

Data is typically viewed from two perspectives: sample data and population data. Analyses performed on sample data are referred to as statistics, whereas analyses done on population data are called parameters. For example, the calculation of an average (sum of elements divided by the number of elements) in sample data results in a statistic known as the “Average,” while the same calculation performed on population data is termed “Mean.”

Our focus in descriptive statistics will be on understanding and describing the data we have collected. In inferential statistics, we aim to derive insights from the data by applying various theories, which help us estimate population parameters.

Let’s begin with descriptive statistics and then progress to inferential statistics to build a comprehensive understanding.

Descriptive Statistics

The first topic we should know under descriptive statistics is Data Visualisation Charts. This activity of plotting the given data points/observations under different types helps us find properties of the data like Central tendency, Shape of the data, and Dispersion/spread of the data.

Broadly there 5 types of data visualisations based on which the given data points are plotted:

  1. Box Plot
  2. Line Chart / Run Chart
  3. Bar Graph
  4. Histogram
  5. Scatter Plot

Box Plot:

We plot our observations/data points under the box plot to find outliers in our data collection. (Outlier Data: Observation which is at an abnormal distance from other values). Boxplot is constructed based on the percentile concept.

Minitab — labeled Screens for Boxplot

From the image above, each screen (label alphabetically) signifies:

A ) We have pasted 1000 observations to perform boxplot analysis.

B ) We can find the option for “Boxplot” under the “Graphs” section in the tab bar.

C) After choosing for Boxplot, we have to specify which category of the plot we need for our analysis. In our case, because we have only one variable, the choice should be “simple.”

D) This box requires you to input the column to be considered for plotting, and also you can add more analysis that can be performed under Boxplot.

E) This the Boxplot that the tool has plotted based on the given data. The data points are distributed according to quartiles, and observations beyond the whiskers are considered outliers.

F) For the sake of example, I have added data points in the column, which are far distant from most data points. From the resultant Boxplot, we can see how the box got shrunken, and the observations are lying far beyond its quartile values.

Line Chart:

We use a line chart for understanding the data collected in a timely order. Time series analysis is a different mammoth all together where the line chart/run chart is used extensively. A line chart is used to understand the trend of the data.

Minitab — Labeled screens of Line Chart

From the image above, each screen (label alphabetically) signifies:

A) I took 15 random observations to a plotline chart and chose time series data under the “Graphs” tab.

B) Since my data is of a single observation, I chose a simple chart.

C) Selected the column where I stored my observations. There are also other options that I can opt for, like naming the legends, deciding the range, and others. I sincerely suggest you explore all these options and the “help” option takes you to the manual page of the option.

D) The resultant chart.

Bar Graph:

A bar graph is used to compare two or more variables in data, and also, in general guidelines, a bar chart is used when the data collected is in discrete format. The bar is also used to understand the trend of the data.

Minitab — labeled screens of Bar graph

From the image above, each screen (label alphabetically) signifies:

A) I took 15 random categorical between a b& c observations to plot bar graph and chose the bar chart option under the “Graphs” tab.

B) Since my data is of a single observation, I chose a simple chart. There are other 2 options under “Bars represent” we could use according to our objective.

C) Selected the column where I stored my observations. There are also other options that I can opt for, like naming the legends, deciding the scales, and others. I sincerely suggest you explore all these options and the “help” option directs you to the manual page of the option.

D) The resultant chart.

Histogram:

The histogram is one of the most important and highly used charts in data science. It is used to understand data distribution. In general guidelines, a histogram is used when the data collected is in a continuous format.

Minitab — labeled screens of Histogram

From the image above, each screen (label alphabetically) signifies:

A) I took 100 random observations to plot Histogram and chose the option of Histogram under the “Graphs” tab.

B) Since my data is of a single observation, I chose a simple chart. There are other options under like “ with fit,” which fits the line with normal curve.

C) Selected the column where I stored my observations. There are also other options that I can opt for, like naming the legends, deciding the scales, and others. I sincerely suggest you explore all these options and the “help” option directs you to the manual page of the option.

D) The resultant chart, which we also call as a distribution curve.

Scatter Plot:

A Scatter plot helps us in finding the relationship between the variable we plot. It used aggressively to check correlation or covariance between different variables in the given data.

Minitab — labeled screens of Scatter plot

From the image above, each screen (label alphabetically) signifies:

A) I took 100 random observations under X and Y variables to plot a Scatter plot, which would help me find the relationship between these variables. To perform this, I chose the option of Scatterplot under the “Graphs” tab.

B) Since my data is of two variables, I chose a simple chart.

C)Selected the respective columns where I stored my variable observations. There are also other options that I can opt for, like naming the legends, deciding the scales, and others. I sincerely suggest you explore all these options and the “help” option directs you to the manual page of the option.

D) With the resultant plot, I can see some kind of negative relationship between the variable X and Y.

This ends the Data Visualization charts topic. In my next article, I’ll be writing about the properties of data like central tendency, shape, and spread of data. Thanks for your support. Please let me know if I have misrepresented any information. See you in my next article!

Disclaimer

I am not affiliated with any of the services mentioned in this article. Additionally, I do not claim to be an expert. If you believe that I have overlooked important details or omitted crucial steps, please feel free to point them out in the comments section or contact me directly. I welcome constructive feedback and suggestions for improvement.

--

--