top of page
Search
Writer's pictureYuchen Liu

SQL vs. Python, Which Is Your Favorite?


Last week, I had a dinner break with my colleagues at Le District, New York. I asked one of them, "do you like coding?" His answer was, "I don't like it, but I think sometimes it's useful, so I practice." I feel coding is like driving - you probably don't like it, but it's useful, and sometimes you will find a lot of fun solving a puzzle in different ways. In one of my recent daily coding challenges, there was one problem that made me think about how I would choose programming languages and methods of solving it, and I found some pros and cons.


The problem that I was trying to solve is to swap the seat id of every two consecutive students. If the number of students is odd, the id of the last student won't change. The schema looks like this:

  • Id is the primary key column for this table.

  • Each row contains the name and the id of a student.

  • Id is a continuous increment.The sample table looks like this:

id

name

1

Mary

2

Jones

3

Debby

4

Shrio

5

Lucy

The result should be like this: Mary and Jones sit in positions 1 and 2, so they swap the seats; Debby and Shiro sit in positions 3 and 4, so they exchange the seats; Lucy sits in position 5, which is odd, so she doesn't move.

id

name

1

Jones

2

Mary

3

Shrio

4

Debby

5

Lucy

The solution is simple (in your mind), but how would we code the logic? If we're using MySQL, a natural way would be to use 'case when' clause to check:

  1. If the id is even, then id-1

  2. If the id is odd and less than the total count of id, then id+1

  3. If the id is odd and the last, then it remains the same

select case when id%2=0 then id-1
            when id%2=1 and id<(select count(*)from seat) then id+1
            when id=(select count(*) from table) then id 
            end as id,
        student
        from table
 order by id

When I write SQL query, I prefer to use "horizontal thinking" by applying its robust "join" function to expand the existing table with itself or other tables by specific criteria. But with Python, I will think more vertically to prioritize the "if" clause or loop function. The beauty of the SQL "join" function is that it has simplified the "if...else..." logic by the join key. Let me show you what I mean:

select t1.id as id, 
       ifnull(t2.student,t1.student) as student
from table t1
left join table t2
on (t1.id=t2.id-1 and t1.id%2=1) or 
   (t1.id=t2.id+1 and t1.id%2=0)

Beautiful! The logic is the same as the first solution, but the query shows how we think outside the box with the "self join".


Now let's talk about Python. As a data analyst, I think the beauty of this language is that you can generalize the solution by writing functions and reuse them hundreds of times. Thanks to Pandas and NumPy giving us so many dynamic ways to process and analyze data, we can try this solution by using "where" function in NumPy:

#Method 1 -----
df = pd.read_csv("~/Desktop/Sample Data.csv")
df['temp']=df.id%2
def my_function(var1, var2):
    if df['id'].iloc[-1]%2==1:
        df[var1]=np.where(df[var2]==1,df[var1]+1,df[var1]-1)
        df['id'].iloc[-1] = df['id'].iloc[-1]-1
    else:
        df[var1]=np.where(df[var2]==1,df[var1]+1,df[var1]-1)
    return df
df=my_function('id','temp')
df=df.sort_values(by='id',ascending=True)
df=df.drop(columns='temp')
print(df)

To make it slower but clear, we can also use "if...else..." and lambda:

#Method 2 ----
df = pd.read_csv("~/Desktop/Sample Data.csv")
df['temp']=df.id%2

def my_fun_2(x, var1, var2):
    if  df['id'].iloc[-1]%2==1:
        if  x[var2]==1:
            x[var1]=x[var1]+1
        else:
            x[var1]=x[var1]-1
        
    else:
        if  x[var2]==1:
            x[var1]=x[var1]+1
        else:
            x[var1]=x[var1]-1
    return x
df_2 = df.apply(lambda x: my_fun_2(x, 'id', 'temp'), axis=1)
print (df_2)

if  df['id'].iloc[-1]%2==1:
    df_2['id'].iloc[-1] = df_2['id'].iloc[-1]-1
print(df_2)

Thinking of different ways to solve one problem and generalize the solutions can help us build a "solution bank" in our minds to search for the templates when we see similar problems quickly. In this blog, I showed one of the reasons that SQL has been 50 years old (developed in the 1970s) but is still very popular today - simple and powerful. Python, not to mention the dynamic, enables every data analyst or data scientist to rethink how they process data and make predictions.



41 views0 comments

Recent Posts

See All

JSON - The Best Pal of Python

It's been a while since the last time I visited my favorite restaurants in New York. Hmm...I really miss them. I'm always wondering how...

Stay Home and Detect Fake News With Python

Stay home is not a bad thing, watching the news every day reminds me of a project during my graduate study at NYU, which was an...

Comments


bottom of page