Pandas and SQL Similarity 3

 

Pandas and SQL similarity
Pandas and SQL similarity

This is the 3 post in the series of Pandas and SQL command line similarity, link to first and second post are

Pandas and SQL

Pandas and SQL similarity 2

 

Finding unique values from the datasets

Pandas

We can find unique value and its count for a column with the help of below command, this may be required for in-depth data analysis.

Pandas Unique value
Pandas Unique Value
SQL

 

SQL Unique
SQL Unique/Distinct Values

Adding column to existing datasets

Sometimes we have to add new column to complete the analysis, procedure to add new column in Pandas and SQL are

Pandas –

This command will add empty column to the existing data frame with data type Object.

Pandas Add column
Pandas Add column
SQL
SQL Add column
SQL Add column

Finding Mean,SUM,Min and Max values of column

The statistical values play important role in the data analysis to develop Machine learning model, the commands to find them are

Pandas
Pandas Mean,Max Value
Pandas Mean,Max Value
SQL
SQL Mean,Max
SQL Mean,Max

Drop the column/s

We have to drop some column/s during data analysis which may be irrelevant in developing Machine learning models. The command from Pandas and SQL are

Pandas
Pandas drop column
Pandas drop column
SQL
SQL Drop column
SQL Drop column

Hope you enjoyed this 3 article to find out Pandas and SQL command line similarity, Stay tuned for the final article. You can find same article on Medium.com here.

 

 

Pandas and SQL similarity 2

This article is the next in the series about the topic of Pandas and SQL similarity which we discussed earlier

Pandas and SQL similarity (2)

Check the dtypes

Pandas –

data type in panda
Data types in pandas

To check data types of the data frame, type ‘df.dtypes’ command

SQL –

type sp_help table_name in the query editor. You may be wondering what is ‘nvarchar’

The key difference between varchar and nvarchar is the way they are stored, varchar is stored as regular 8-bit data(1 byte per character) and nvarchar stores data at 2 bytes per character. Due to this reason, nvarchar can hold upto 4000 characters and it takes double the space as SQL varchar.

You can visit for more information on varchar and nvarchar.

data types in SQL
Data type in SQL
Check the data

Pandas –

Once you load the data in to the pandas you can view the data by running the ‘df.head()’ command. This command will give you output of first 5 rows and all columns by default. This output is not in the order.

pandas head function
pandas head function to get the first n rows.

You can specify the number of rows you want to see e.g. 10

pandas head function
pandas head function to get first 10 rows

To view last rows in the data frame, type ‘df.tail()’ command

pandas tail function
pandas tail function to get last n rows
pandas tail function
pandas tail function to get last 10 rows

We can sort the data frame by column name in the ascending order

pandas sort values
pandas sort function to sort value by column

by descending order

pandas sort values
pandas sort function to sort value by column

SQL –

In SQL we have to use ‘ORDER BY’ Column to view the data.

SQL Sort
SQL sorting by ‘ORDER BY’ Clause in Ascending order
SQL Sort1
SQL sorting by ‘ORDER BY’ Clause in Descending order
Check the NULL Value

Every data set will have some kind of NULL or missing values either in one column or multiple column. Command to find out NULL Value is

Pandas –

df.isnull().sum()

The ‘Product Base Margin’ column has 72 missing values.

pandas null value
pandas isnull function

SQL –

In SQL to get NULL Value we have to run below query for each column.

SELECT * from table where columnname IS NULL

SQL Null Value
SQL ISNULL function
Select the columns

What if we want to see the specific columns, how do we select them?

Pandas –

Note – Pandas will not display all rows in the output by default, to see all rows please run the command –> ‘pd.set_option(‘display.max_rows’, None)’

Pandas select column
Pandas select multiple column

SQL –

select [Row ID],[Order Priority] from table name.

SQL Select column
SQL select multiple column from the table
Filter the data

Every data scientist or data analyst have to filter the data to get some insight. How do we do that in both pandas and SQL?

Pandas –

Lets say, we need details of all the orders shipped through ‘Regular Air’ only

pandas filter
pandas filter function

and sent to California state.

pandas filter command
pandas advance filter function

SQL –

SELECT * from table where [Ship Mode] = ‘Regular Air’

sql filter
SQL filter command
SQL Filter command
SQL advance filter command

Hope you like the article about pandas and SQL command similarity part 2, Stay tuned for next update.