Python/Pandas
[pandas] merge로 데이터 프레임 합치기
코딩하고분석하는돌스
2021. 4. 7. 11:07
Pandas - merge를 이용한 데이터 프레임 통합¶
In [5]:
import os
import pandas as pd
os.chdir(r"D:/bigdata/jupyter/data_preprocessing/1. 데이터 핸들링\데이터/")
merge 기초
In [6]:
df1 = pd.read_csv("merge_실습_employee_group.csv", engine="python")
df2 = pd.read_csv("merge_실습_employee_hire_date.csv", engine="python")
In [7]:
df1
Out[7]:
employee | group | |
---|---|---|
0 | Bob | Accounting |
1 | Jake | Engineering |
2 | Lisa | Engineering |
3 | Sue | HR |
In [8]:
df2
Out[8]:
employee | hire_date | |
---|---|---|
0 | Lisa | 2004 |
1 | Bob | 2008 |
2 | Jake | 2012 |
3 | Sue | 2014 |
In [11]:
# 합칠 컬럼은 자동으로
merged_df = pd.merge(df1, df2)
merged_df.head()
Out[11]:
employee | group | hire_date | |
---|---|---|---|
0 | Bob | Accounting | 2008 |
1 | Jake | Engineering | 2012 |
2 | Lisa | Engineering | 2004 |
3 | Sue | HR | 2014 |
In [13]:
# on을 사용하여 merge = 특정 컬럼의 키 값을 기준으로 통합
merged_df2 = pd.merge(df1, df2, on=["employee"])
merged_df2.head()
Out[13]:
employee | group | hire_date | |
---|---|---|---|
0 | Bob | Accounting | 2008 |
1 | Jake | Engineering | 2012 |
2 | Lisa | Engineering | 2004 |
3 | Sue | HR | 2014 |
In [14]:
df1 = pd.read_csv("merge_실습_employee_group.csv", engine="python")
df2 = pd.read_csv("merge_실습_name_hire_date.csv", engine="python")
In [15]:
df1
Out[15]:
employee | group | |
---|---|---|
0 | Bob | Accounting |
1 | Jake | Engineering |
2 | Lisa | Engineering |
3 | Sue | HR |
In [16]:
df2
Out[16]:
hire_date | name | |
---|---|---|
0 | 2004 | Lisa |
1 | 2008 | Bob |
2 | 2012 | Jake |
3 | 2014 | Sue |
In [23]:
# df1을 기준으로 df2에 있는 데이터를 합침
merged_df3 = pd.merge(df1, df2, left_on="employee", right_on="name")
merged_df3.head()
Out[23]:
employee | group | hire_date | name | |
---|---|---|---|---|
0 | Bob | Accounting | 2008 | Bob |
1 | Jake | Engineering | 2012 | Jake |
2 | Lisa | Engineering | 2004 | Lisa |
3 | Sue | HR | 2014 | Sue |
In [24]:
merged_df3.drop('name', inplace=True, axis=1)
merged_df3
Out[24]:
employee | group | hire_date | |
---|---|---|---|
0 | Bob | Accounting | 2008 |
1 | Jake | Engineering | 2012 |
2 | Lisa | Engineering | 2004 |
3 | Sue | HR | 2014 |
index를 사용한 데이터 통합¶
In [25]:
df1 = pd.read_csv("merge_실습_employee_group.csv", engine="python")
df2 = pd.read_csv("merge_실습_name_hire_date.csv", engine="python", index_col="name")
In [26]:
df1
Out[26]:
employee | group | |
---|---|---|
0 | Bob | Accounting |
1 | Jake | Engineering |
2 | Lisa | Engineering |
3 | Sue | HR |
In [27]:
df2
Out[27]:
hire_date | |
---|---|
name | |
Lisa | 2004 |
Bob | 2008 |
Jake | 2012 |
Sue | 2014 |
In [28]:
merged_df4 = pd.merge(df1, df2, left_on ="employee", right_index=True)
merged_df4
Out[28]:
employee | group | hire_date | |
---|---|---|---|
0 | Bob | Accounting | 2008 |
1 | Jake | Engineering | 2012 |
2 | Lisa | Engineering | 2004 |
3 | Sue | HR | 2014 |