developer tip

Numpy argsort-무엇을하고 있습니까?

optionbox 2020. 8. 18. 07:38
반응형

Numpy argsort-무엇을하고 있습니까?


numpy가이 결과를 제공하는 이유 :

x = numpy.array([1.48,1.41,0.0,0.1])
print x.argsort()

>[2 3 1 0]

내가 이것을 기대할 때 :

[3 2 0 1]

분명히 기능에 대한 이해가 부족합니다.


문서 에 따르면

배열을 정렬 할 인덱스를 반환합니다.

  • 2의 색인입니다 0.0.
  • 3의 색인입니다 0.1.
  • 1의 색인입니다 1.41.
  • 0의 색인입니다 1.48.

[2, 3, 1, 0] 가장 작은 요소가 인덱스 2, 인덱스 3, 인덱스 1, 인덱스 0에 있음을 나타냅니다.

가 있습니다 여러 가지 방법으로 당신이 찾고있는 결과를 얻기는 :

import numpy as np
import scipy.stats as stats

def using_indexed_assignment(x):
    "https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
    result = np.empty(len(x), dtype=int)
    temp = x.argsort()
    result[temp] = np.arange(len(x))
    return result

def using_rankdata(x):
    return stats.rankdata(x)-1

def using_argsort_twice(x):
    "https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
    return np.argsort(np.argsort(x))

def using_digitize(x):
    unique_vals, index = np.unique(x, return_inverse=True)
    return np.digitize(x, bins=unique_vals) - 1

예를 들면

In [72]: x = np.array([1.48,1.41,0.0,0.1])

In [73]: using_indexed_assignment(x)
Out[73]: array([3, 2, 0, 1])

이는 모두 동일한 결과를 생성하는지 확인합니다.

x = np.random.random(10**5)
expected = using_indexed_assignment(x)
for func in (using_argsort_twice, using_digitize, using_rankdata):
    assert np.allclose(expected, func(x))

이러한 IPython %timeit벤치 마크는 대규모 어레이 using_indexed_assignment에 대해 가장 빠르다고 제안합니다 .

In [50]: x = np.random.random(10**5)
In [66]: %timeit using_indexed_assignment(x)
100 loops, best of 3: 9.32 ms per loop

In [70]: %timeit using_rankdata(x)
100 loops, best of 3: 10.6 ms per loop

In [56]: %timeit using_argsort_twice(x)
100 loops, best of 3: 16.2 ms per loop

In [59]: %timeit using_digitize(x)
10 loops, best of 3: 27 ms per loop

작은 어레이의 using_argsort_twice경우 더 빠를 수 있습니다.

In [78]: x = np.random.random(10**2)

In [81]: %timeit using_argsort_twice(x)
100000 loops, best of 3: 3.45 µs per loop

In [79]: %timeit using_indexed_assignment(x)
100000 loops, best of 3: 4.78 µs per loop

In [80]: %timeit using_rankdata(x)
100000 loops, best of 3: 19 µs per loop

In [82]: %timeit using_digitize(x)
10000 loops, best of 3: 26.2 µs per loop

또한 stats.rankdata동일한 값의 요소를 처리하는 방법을 더 잘 제어 할 수 있습니다.


As the documentation says, argsort:

Returns the indices that would sort an array.

That means the first element of the argsort is the index of the element that should be sorted first, the second element is the index of the element that should be second, etc.

What you seem to want is the rank order of the values, which is what is provided by scipy.stats.rankdata. Note that you need to think about what should happen if there are ties in the ranks.


numpy.argsort(a, axis=-1, kind='quicksort', order=None)

Returns the indices that would sort an array

Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as that index data along the given axis in sorted order.

Consider one example in python, having a list of values as

listExample  = [0 , 2, 2456,  2000, 5000, 0, 1]

Now we use argsort function:

import numpy as np
list(np.argsort(listExample))

The output will be

[0, 5, 6, 1, 3, 2, 4]

This is the list of indices of values in listExample if you map these indices to the respective values then we will get the result as follows:

[0, 0, 1, 2, 2000, 2456, 5000]

(I find this function very useful in many places e.g. If you want to sort the list/array but don't want to use list.sort() function (i.e. without changing the order of actual values in the list) you can use this function.)

For more details refer this link: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.argsort.html


First, it was ordered the array. Then generate an array with the initial index of the array.


input:
import numpy as np
x = np.array([1.48,1.41,0.0,0.1])
x.argsort().argsort()

output:
array([3, 2, 0, 1])


np.argsort returns the index of the sorted array given by the 'kind' (which specifies the type of sorting algorithm). However, when a list is used with np.argmax, it returns the index of the largest element in the list. While, np.sort, sorts the given array, list.


Just want to directly contrast the OP's original understanding against the actual implementation with code.

numpy.argsort is defined such that for 1D arrays:

x[x.argsort()] == numpy.sort(x) # this will be an array of True's

The OP originally thought that it was defined such that for 1D arrays:

x == numpy.sort(x)[x.argsort()] # this will not be True

Note: This code doesn't work in the general case (only works for 1D), this answer is purely for illustration purposes.

참고URL : https://stackoverflow.com/questions/17901218/numpy-argsort-what-is-it-doing

반응형