lxml에서 요소를 제거하는 방법

developer tip

lxml에서 요소를 제거하는 방법

optionbox 2020. 10. 16. 07:20

lxml에서 요소를 제거하는 방법

파이썬의 lxml을 사용하여 속성의 내용을 기반으로 요소를 완전히 제거해야합니다. 예:

import lxml.etree as et

xml="""
<groceries>
  <fruit state="rotten">apple</fruit>
  <fruit state="fresh">pear</fruit>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="rotten">mango</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>
"""

tree=et.fromstring(xml)

for bad in tree.xpath("//fruit[@state=\'rotten\']"):
  #remove this element from the tree

print et.tostring(tree, pretty_print=True)

다음을 인쇄하고 싶습니다.

<groceries>
  <fruit state="fresh">pear</fruit>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>

다음과 같이 임시 변수를 저장하고 수동으로 인쇄하지 않고이를 수행하는 방법이 있습니까?

newxml="<groceries>\n"
for elt in tree.xpath('//fruit[@state=\'fresh\']'):
  newxml+=et.tostring(elt)

newxml+="</groceries>"

removexmlElement 의 메소드를 사용하십시오 .

tree=et.fromstring(xml)

for bad in tree.xpath("//fruit[@state=\'rotten\']"):
  bad.getparent().remove(bad)     # here I grab the parent of the element to call the remove directly on it

print et.tostring(tree, pretty_print=True, xml_declaration=True)

@Acorn 버전과 비교해야한다면 제거 할 요소가 xml의 루트 노드 바로 아래에 있지 않아도 작동합니다.

remove기능을 찾고 있습니다. 트리의 remove 메서드를 호출하고 제거 할 하위 요소를 전달합니다.

import lxml.etree as et

xml="""
<groceries>
  <fruit state="rotten">apple</fruit>
  <fruit state="fresh">pear</fruit>
  <punnet>
    <fruit state="rotten">strawberry</fruit>
    <fruit state="fresh">blueberry</fruit>
  </punnet>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="rotten">mango</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>
"""

tree=et.fromstring(xml)

for bad in tree.xpath("//fruit[@state='rotten']"):
    bad.getparent().remove(bad)

print et.tostring(tree, pretty_print=True)

결과:

<groceries>
  <fruit state="fresh">pear</fruit>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>

한 가지 상황을 만났습니다.

<div>
    <script>
        some code
    </script>
    text here
</div>

div.remove(script)의도 text here하지 않은 부분을 제거합니다 .

following the answer here, I found that etree.strip_elements is a better solution for me, which you can control whether or not you will remove the text behind with with_tail=(bool) param.

But still I don't know if this can use xpath filter for tag. Just put this for informing.

Here is the doc:

strip_elements(tree_or_element, *tag_names, with_tail=True)

Delete all elements with the provided tag names from a tree or subtree. This will remove the elements and their entire subtree, including all their attributes, text content and descendants. It will also remove the tail text of the element unless you explicitly set the with_tail keyword argument option to False.

Tag names can contain wildcards as in _Element.iter.

Note that this will not delete the element (or ElementTree root element) that you passed even if it matches. It will only treat its descendants. If you want to include the root element, check its tag name directly before even calling this function.

Example usage::
   strip_elements(some_element,
       'simpletagname',             # non-namespaced tag
       '{http://some/ns}tagname',   # namespaced tag
       '{http://some/other/ns}*'    # any tag from a namespace
       lxml.etree.Comment           # comments
       )

As already mentioned, you can use the remove() method to delete (sub)elements from the tree:

for bad in tree.xpath("//fruit[@state=\'rotten\']"):
  bad.getparent().remove(bad)

But it removes the element including its tail, which is a problem if you are processing mixed-content documents like HTML:

<div><fruit state="rotten">avocado</fruit> Hello!</div>

Becomes

<div></div>

Which is I suppose what you not always want :) I have created helper function to remove just the element and keep its tail:

def remove_element(el):
    parent = el.getparent()
    if el.tail.strip():
        prev = el.getprevious()
        if prev:
            prev.tail = (prev.tail or '') + el.tail
        else:
            parent.text = (parent.text or '') + el.tail
    parent.remove(el)

for bad in tree.xpath("//fruit[@state=\'rotten\']"):
    remove_element(bad)

This way it will keep the tail text:

<div> Hello!</div>

참고URL : https://stackoverflow.com/questions/7981840/how-to-remove-an-element-in-lxml

'developer tip' 카테고리의 다른 글

unsigned char 에서 const char 로 C ++ 스타일 캐스트 (0)	2020.10.16
OPENQUERY에 매개 변수 포함 (0)	2020.10.16
UI-Router를 사용하여 상위 상태로 전환 할 때 사용자를 하위 상태로 안내 (0)	2020.10.16
보기 쪽이 아닌 목록 개체 템플릿 쪽을 어떻게 제한합니까? (0)	2020.10.16
MD5는 128 비트인데 왜 32 자입니까? (0)	2020.10.16

현재글lxml에서 요소를 제거하는 방법

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

optionbox

lxml에서 요소를 제거하는 방법

lxml에서 요소를 제거하는 방법

'developer tip' 카테고리의 다른 글

'developer tip'의 다른글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

lxml에서 요소를 제거하는 방법

lxml에서 요소를 제거하는 방법

'developer tip' 카테고리의 다른 글

'developer tip'의 다른글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역