We can find unique value and its count for a column with the help of below command, this may be required for in-depth data analysis.
Pandas Unique Value
SQL
SQL Unique/Distinct Values
Adding column to existing datasets
Sometimes we have to add new column to complete the analysis, procedure to add new column in Pandas and SQL are
Pandas –
This command will add empty column to the existing data frame with data type Object.
Pandas Add column
SQL
SQL Add column
Finding Mean,SUM,Min and Max values of column
The statistical values play important role in the data analysis to develop Machine learning model, the commands to find them are
Pandas
Pandas Mean,Max Value
SQL
SQL Mean,Max
Drop the column/s
We have to drop some column/s during data analysis which may be irrelevant in developing Machine learning models. The command from Pandas and SQL are
Pandas
Pandas drop column
SQL
SQL Drop column
Hope you enjoyed this 3 article to find out Pandas and SQL command line similarity, Stay tuned for the final article. You can find same article on Medium.com here.
This article is the next in the series about the topic of Pandas and SQL similarity which we discussed earlier
Pandas and SQL similarity (2)
Check the dtypes
Pandas –
Data types in pandas
To check data types of the data frame, type ‘df.dtypes’ command
SQL –
type sp_help table_name in the query editor. You may be wondering what is ‘nvarchar’
The key difference between varchar and nvarchar is the way they are stored, varchar is stored as regular 8-bit data(1 byte per character) and nvarchar stores data at 2 bytes per character. Due to this reason, nvarchar can hold upto 4000 characters and it takes double the space as SQL varchar.
You can visit for more information on varchar and nvarchar.
Data type in SQL
Check the data
Pandas –
Once you load the data in to the pandas you can view the data by running the ‘df.head()’ command. This command will give you output of first 5 rows and all columns by default. This output is not in the order.
pandas head function to get the first n rows.
You can specify the number of rows you want to see e.g. 10
pandas head function to get first 10 rows
To view last rows in the data frame, type ‘df.tail()’ command
pandas tail function to get last n rowspandas tail function to get last 10 rows
We can sort the data frame by column name in the ascending order
pandas sort function to sort value by column
by descending order
pandas sort function to sort value by column
SQL –
In SQL we have to use ‘ORDER BY’ Column to view the data.
SQL sorting by ‘ORDER BY’ Clause in Ascending orderSQL sorting by ‘ORDER BY’ Clause in Descending order
Check the NULL Value
Every data set will have some kind of NULL or missing values either in one column or multiple column. Command to find out NULL Value is
Pandas –
df.isnull().sum()
The ‘Product Base Margin’ column has 72 missing values.
pandas isnull function
SQL –
In SQL to get NULL Value we have to run below query for each column.
SELECT * from table where columnname IS NULL
SQL ISNULL function
Select the columns
What if we want to see the specific columns, how do we select them?
Pandas –
Note – Pandas will not display all rows in the output by default, to see all rows please run the command –> ‘pd.set_option(‘display.max_rows’, None)’
Pandas select multiple column
SQL –
select [Row ID],[Order Priority] from table name.
SQL select multiple column from the table
Filter the data
Every data scientist or data analyst have to filter the data to get some insight. How do we do that in both pandas and SQL?
Pandas –
Lets say, we need details of all the orders shipped through ‘Regular Air’ only
pandas filter function
and sent to California state.
pandas advance filter function
SQL –
SELECT * from table where [Ship Mode] = ‘Regular Air’
SQL filter commandSQL advance filter command
Hope you like the article about pandas and SQL command similarity part 2, Stay tuned for next update.
Pandas and SQL are the integral part of Data Science. It is used for data manipulation and analysis while SQL or Structured Query Language is usually used to fetch data, update the contents of the table, or operate on the structure of the database or tables, using any type of database tools, which will have a user interface to apply the operations on the database.
I believe Pandas and SQL are both similar in terms of analyzing the datasets, freshers like me find it difficult to remember the command syntax of both tool. A small effort from my end to show the command similarity.
Let’s Start..
Pandas Installation –
To work on both the tool, first we need to install them. I am using Anaconda distribution on my laptop which comes with all the required library pre-installed. To download Anaconda, please go to their site here
Launch the Jupyter Notebook
Jupyter notebook
Load the pandas module,
Note – If you are not using Anaconda, then pandas installation procedure can be found here
SQL Installation –
I am using Microsoft SQL Express edition for the demo purpose, which is available here for the download and installation procedure here.
Note – The only expectation here is that you should know pandas syntax.
Importing the data –
Pandas –
pandas data frame
I am using ‘read_excel’ because my source file is Excel and added sheet_name parameter because i want to import only first sheet from the excel file.
SQL –
You can import entire excel in to the SQL database with the help of SQL Import and Export wizard.
SQL Import Export
You have to choose source data source as ‘Microsoft Excel’ as shown in below screenshot
SQL Import Export select source
Choose the Excel file which you want to import and the correct Excel version
SQL Import Export destination
Select ‘SQL Server Native Client 11.0’ in the ‘Choose a Destination’ screen
SQL Import Export select destination SQL Server
Pick correct server name, authentication type and the database
You can select first option to copy entire data from the excel sheet or you can write custom SQL query to import only required data. I will go ahead and select first option.
Here please select the sheet you to import under first column, new table will be created in the database with the sheet name (second column).
Select sheet to import in SQL database
It will import data and will create new table in to the database which we provided.
SQL Table created after import
Lets work on the data now –
How to view entire data
Pandas –
pandas data frame
‘df’ is nothing but a variable to store the data. Pandas can store or read different type of data which is called as DataFrame.
You can type ‘print(df)’ or ‘df’ to view the entire data.
SQL –
Connect to the SQL Express using the SQL Management Studio
Click on the ‘New Query’
type below commands in the query field
‘select * from the dbo.Orders
Hope you enjoyed the article on pandas and SQL, please visit for next set of commands.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.AcceptRead More
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.