Summary of the steps for exploratory data analysis to understand hidden data

แวะมาทักทายกันได้

Chapter 1 : Summary of the steps for exploratory data analysis to understand hidden data

The essential steps before starting a programming project are five steps:
- Gathering requirements
- Designing the system
- Developing the system
- Testing the system
- Installing the system

These steps will generate data after the system is completed, according to the system that has been designed.

But what about data-related work? It is often the case that data is given to you immediately, along with a question about what you want to do. The first step you need to take is exploratory data analysis (EDA).
- To better understand the data: EDA is a preliminary data exploration process to get to know the existing data by considering the data from different perspectives, such as descriptive data, statistical data, and relational data. EDA helps you to see an overview of the data and understand the relationships between variables.
- To identify problems and errors in the data: EDA helps you to identify problems and errors in the data, such as missing data, incorrect data, or conflicting data. Identifying problems and errors in the data can help you to improve the quality of the data.
- To discover new insights: EDA helps you to discover new insights that are hidden in the data, which may lead to improvements in processes or decision-making. For example, EDA may help you to discover relationships between variables that you never expected before.

Let's take an example:

Imagine that a company wants to analyze customer data to find ways to improve its products and services. Let's say that this company hires you to analyze some data. What steps would you need to take?

Here are the processes that you need to go through to manage the data so that it is suitable for further use:
- Check what type of database the company uses, such as PostgresSQL, MySQL, MongoDB, or Excel or CSV files.
- Is there an ER diagram or other diagrams that you can use to understand the relationships?
- If you don't have these things, you will need to perform EDA on the data to see the characteristics of the data and the relationships between the data.

In the EDA process, there are usually repetitive steps to manage the data:
- Check the number of rows of data in the database, regardless of how many tables there are.
- Check for null or blank values.
- Check the data type or see what each column looks like, such as numeric, character, date and time, or Boolean. Sometimes, there may be unstructured data objects such as JSON or Base64.
- Check the relationships in the tables, such as which columns are primary keys and foreign keys.

These steps can be used to manage or explore data immediately using SQL commands, such as COUNT, INNER JOIN, IS NULL, Coalesce, GROUP BY, LIMIT, and ORDER BY.

In conclusion, whether you are working on a programming project or a data analysis project, you will always need to know EDA.

To be continued. If you don't want to miss the next post, please share and follow my profile.

Chapter 2 : Summary of the steps for exploratory data analysis to understand hidden data

The essential steps before starting a programming project are five steps:
- Gathering requirements
- Designing the system
- Developing the system
- Testing the system
- Installing the system

These steps will generate data after the system is completed, according to the system that has been designed.

But what about data-related work? It is often the case that data is given to you immediately, along with a question about what you want to do. The first step you need to take is exploratory data analysis (EDA).
- To better understand the data: EDA is a preliminary data exploration process to get to know the existing data by considering the data from different perspectives, such as descriptive data, statistical data, and relational data. EDA helps you to see an overview of the data and understand the relationships between variables.
- To identify problems and errors in the data: EDA helps you to identify problems and errors in the data, such as missing data, incorrect data, or conflicting data. Identifying problems and errors in the data can help you to improve the quality of the data.
- To discover new insights: EDA helps you to discover new insights that are hidden in the data, which may lead to improvements in processes or decision-making. For example, EDA may help you to discover relationships between variables that you never expected before.

Let's take an example:

Imagine that a company wants to analyze customer data to find ways to improve its products and services. Let's say that this company hires you to analyze some data. What steps would you need to take?

Here are the processes that you need to go through to manage the data so that it is suitable for further use:
- Check what type of database the company uses, such as PostgresSQL, MySQL, MongoDB, or Excel or CSV files.
- Is there an ER diagram or other diagrams that you can use to understand the relationships?
- If you don't have these things, you will need to perform EDA on the data to see the characteristics of the data and the relationships between the data.

In the EDA process, there are usually repetitive steps to manage the data:
- Check the number of rows of data in the database, regardless of how many tables there are.
- Check for null or blank values.
- Check the data type or see what each column looks like, such as numeric, character, date and time, or Boolean. Sometimes, there may be unstructured data objects such as JSON or Base64.
- Check the relationships in the tables, such as which columns are primary keys and foreign keys.

These steps can be used to manage or explore data immediately using SQL commands, such as COUNT, INNER JOIN, IS NULL, Coalesce, GROUP BY, LIMIT, and ORDER BY.

In conclusion, whether you are working on a programming project or a data analysis project, you will always need to know EDA.

To be continued. If you don't want to miss the next post, please share and follow my profile.

Chapter 3 : Summary of the steps for exploratory data analysis to understand hidden text data

The previous post summarized the concepts and steps for exploratory data analysis (EDA) in data analysis, search, and work with databases. It will help you see the way to start from scratch, including checking data, null values, relationships in databases, and necessary commands.

In this post, we will focus on text data. Text data is data that can be expressed as words or phrases. It can be used to describe a wide variety of things, such as products, services, or people.

In order to work with text data in SQL, we need to understand the different types of text data that can be stored. There are three main types of text data in SQL:

Character: This is a fixed-length type of text data. For example, a column of type CHAR(20) can store up to 20 characters. If less than 20 characters are entered, the system will pad the column with spaces to fill up the remaining space.

Character varying: This is a variable-length type of text data. For example, a column of type VARCHAR(50) can store up to 50 characters. If more than 50 characters are entered, the system will only store the first 50 characters.

Text: This is an unlimited-length type of text data. For example, a column of type TEXT can store any length of text.

In addition to the type of text data, we also need to consider the structure of the text data. There are two main types of text data structure:

Categorized: This type of text data is divided into categories. For example, a column of type VARCHAR(20) that stores the names of days of the week would be categorized data.

Unstructured: This type of text data is not divided into categories. For example, a column of type TEXT that stores the text of a book would be unstructured data.

Once we understand the different types and structures of text data, we can start to explore the data. Here are some common ways to explore text data using SQL:

Checking for null values: This is important to do for any type of data, but it is especially important for text data. Null values can indicate that a piece of data is missing or invalid.

Grouping data: This can be used to see how the data is distributed. For example, we could group text data by category to see how many examples of each category there are.

Counting data: This can be used to see how much data there is. For example, we could count the number of rows in a table to see how many pieces of data there are.

Counting unique data: This can be used to see how many different pieces of data there are. For example, we could use the DISTINCT keyword to count the number of unique days of the week in a column.

Examples of how to explore text data using SQL:

SQL

-- Check for null values
SELECT COUNT(*)
FROM table
WHERE column IS NULL;

-- Group data by category
SELECT category, COUNT(*)
FROM table
GROUP BY category;

-- Count data
SELECT COUNT(*)
FROM table;

-- Count unique data
SELECT COUNT(DISTINCT column)
FROM table;

Conclusion

Exploring text data can help us to understand the data and to identify patterns and trends. By using the techniques described in this post, we can gain valuable insights into the data that we are working with.

แวะมาทักทายกันได้

Summary of the steps for exploratory data analysis to understand hidden data

Categories

Tags

Summary of the steps for exploratory data analysis to understand hidden data

Categories

Tags

ติดต่อเรา