고유하지 않은 열의 날짜별로 Pandas DataFrame 항목을 그룹화하는 방법

developer tip

고유하지 않은 열의 날짜별로 Pandas DataFrame 항목을 그룹화하는 방법

optionbox 2020. 11. 10. 08:02

고유하지 않은 열의 날짜별로 Pandas DataFrame 항목을 그룹화하는 방법

Pandas DataFrame에는 "date"고유하지 않은 datetime값 을 포함하는 이름 이 지정된 열 이 있습니다 . 다음을 사용하여이 프레임의 라인을 그룹화 할 수 있습니다.

data.groupby(data['date'])

그러나 이렇게하면 데이터가 datetime값으로 분할 됩니다. "날짜"열에 저장된 연도별로 이러한 데이터를 그룹화하고 싶습니다. 이 페이지 는 타임 스탬프가 색인으로 사용되는 경우 연도별로 그룹화하는 방법을 보여줍니다. 제 경우에는 그렇지 않습니다.

이 그룹을 어떻게 달성합니까?

pandas 0.16.2를 사용하고 있습니다. 이것은 내 대규모 데이터 세트에서 더 나은 성능을 제공합니다.

data.groupby(data.date.dt.year)

dt옵션을 사용하고 weekofyear, dayofweek등을 가지고 노는 것이 훨씬 쉬워집니다.

ecatmur의 솔루션이 잘 작동합니다. 그러나 이는 대규모 데이터 세트에서 더 나은 성능을 제공합니다.

data.groupby(data['date'].map(lambda x: x.year))

이는 샘플 데이터 세트로 설명하기 더 쉬울 수 있습니다.

샘플 데이터 생성

Timestamps의 단일 열과 date집계를 수행하려는 다른 열인 a.

df = pd.DataFrame({'date':pd.DatetimeIndex(['2012-1-1', '2012-6-1', '2015-1-1', '2015-2-1', '2015-3-1']),
                   'a':[9,5,1,2,3]}, columns=['date', 'a'])

df

        date  a
0 2012-01-01  9
1 2012-06-01  5
2 2015-01-01  1
3 2015-02-01  2
4 2015-03-01  3

연도별로 그룹화하는 방법에는 여러 가지가 있습니다.

year속성 과 함께 dt 접근 자 사용
넣어 date인덱스에 액세스 년에 익명 함수를 사용
사용 resample방법
판다 시대로 전환

`.dt`에 접근 `year`부동산

Pandas Timestamps의 열 (색인이 아님)이있는 경우 접근자를 사용하여 더 많은 추가 속성 및 메서드에 액세스 할 수 있습니다 dt. 예를 들면 :

df['date'].dt.year

0    2012
1    2012
2    2015
3    2015
4    2015
Name: date, dtype: int64

이를 사용하여 그룹을 구성하고 특정 열에 대한 일부 집계를 계산할 수 있습니다.

df.groupby(df['date'].dt.year)['a'].agg(['sum', 'mean', 'max'])

      sum  mean  max
date                
2012   14     7    9
2015    6     2    3

색인에 날짜를 넣고 익명 기능을 사용하여 연도에 액세스

If you set the date column as the index, it becomes a DateTimeIndex with the same properties and methods as the dt accessor gives normal columns

df1 = df.set_index('date')
df1.index.year

Int64Index([2012, 2012, 2015, 2015, 2015], dtype='int64', name='date')

Interestingly, when using the groupby method, you can pass it a function. This function will be implicitly passed the DataFrame's index. So, we can get the same result from above with the following:

df1.groupby(lambda x: x.year)['a'].agg(['sum', 'mean', 'max'])

      sum  mean  max
2012   14     7    9
2015    6     2    3

Use the `resample` method

If your date column is not in the index, you must specify the column with the on parameter. You also need to specify the offset alias as a string.

df.resample('AS', on='date')['a'].agg(['sum', 'mean', 'max'])

             sum  mean  max
date                       
2012-01-01  14.0   7.0  9.0
2013-01-01   NaN   NaN  NaN
2014-01-01   NaN   NaN  NaN
2015-01-01   6.0   2.0  3.0

Convert to pandas Period

You can also convert the date column to a pandas Period object. We must pass in the offset alias as a string to determine the length of the Period.

df['date'].dt.to_period('A')

0   2012
1   2012
2   2015
3   2015
4   2015
Name: date, dtype: object

We can then use this as a group

df.groupby(df['date'].dt.to_period('Y'))['a'].agg(['sum', 'mean', 'max'])


      sum  mean  max
2012   14     7    9
2015    6     2    3

This should work:

data.groupby(lambda x: data['date'][x].year)

this will also work

data.groupby(data['date'].dt.year)

참고URL : https://stackoverflow.com/questions/11391969/how-to-group-pandas-dataframe-entries-by-date-in-a-non-unique-column

'developer tip' 카테고리의 다른 글

SQL Server Management Studio 2008에서 만든 트리거를 볼 수 없습니다. (0)	2020.11.10
열을 null로 만들기위한 Rails 마이그레이션 => true (0)	2020.11.10
NSNotificationCenter 대 위임 (프로토콜 사용)? (0)	2020.11.10
결과를 기다리지 않는 php exec 명령 (또는 유사) (0)	2020.11.10
MongoDB 셸에서 복제 세트에 어떻게 연결합니까? (0)	2020.11.10

현재글고유하지 않은 열의 날짜별로 Pandas DataFrame 항목을 그룹화하는 방법

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

optionbox

고유하지 않은 열의 날짜별로 Pandas DataFrame 항목을 그룹화하는 방법

고유하지 않은 열의 날짜별로 Pandas DataFrame 항목을 그룹화하는 방법

`.dt`에 접근 `year`부동산

색인에 날짜를 넣고 익명 기능을 사용하여 연도에 액세스

Use the `resample` method

Convert to pandas Period

'developer tip' 카테고리의 다른 글

'developer tip'의 다른글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

고유하지 않은 열의 날짜별로 Pandas DataFrame 항목을 그룹화하는 방법

고유하지 않은 열의 날짜별로 Pandas DataFrame 항목을 그룹화하는 방법

.dt에 접근 year부동산

색인에 날짜를 넣고 익명 기능을 사용하여 연도에 액세스

Use the resample method

Convert to pandas Period

'developer tip' 카테고리의 다른 글

'developer tip'의 다른글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

`.dt`에 접근 `year`부동산

Use the `resample` method