GraphViz에 비해 너무 큰 무 방향 그래프 시각화?

developer tip

GraphViz에 비해 너무 큰 무 방향 그래프 시각화?

optionbox 2020. 11. 2. 07:53

GraphViz에 비해 너무 큰 무 방향 그래프 시각화?

178,000 개의 노드와 500,000 개의 에지가있는 무 방향 그래프를 렌더링하기위한 조언이 필요합니다. Neato, Tulip 및 Cytoscape를 사용해 보았습니다. Neato는 원격으로 가까이 오지 않고 Tulip과 Cytoscape는 처리 할 수 있지만 처리 할 수 없다고 주장합니다. (튤립은 아무것도하지 않고 Cytoscape는 작동한다고 주장한 다음 멈 춥니 다.)

원격으로 합리적인 노드 레이아웃을 가진 벡터 형식 파일 (ps 또는 pdf)을 원합니다.

Graphviz 자체는 큰 그래프를 렌더링하기위한 솔루션을 제공합니다.

즉, Graphviz에는 sfdp내 프로젝트에서 큰 그래프 (70k 노드, 500k 가장자리)를 그리는 데 유용한 큰 무 방향 그래프 레이아웃을위한 fdp의 다중 스케일 버전 (graphviz에서도 neato와 유사)이 포함되어 있습니다.

이 소프트웨어에 대한 설명서는 graphviz 웹 사이트 ( http://www.graphviz.org/) 에서 찾을 수 있습니다 .

자세한 정보, 기본 기술 및 예제를 설명하는 문서는 http://yifanhu.net/PUB/graph_draw_small.pdf 에서 찾을 수 있습니다.

먼저 노드를 클러스터로 축소 한 다음 클러스터를 시각화하는 것과 같이 데이터의 일부 전처리를 먼저 수행하는 것이 좋습니다. 축소하면 노드 수가 줄어들고 Kamada-Kawai 또는 Fruchterman-Reingold와 같은 알고리즘이 결과 그래프를 더 쉽게 렌더링 할 수 있습니다.

실제로 500.000 노드를 시각화해야하는 경우 간단한 원형 레이아웃 사용을 고려할 수 있습니다. 이것은 힘 기반 알고리즘이 가지고있는 문제없이 쉽게 렌더링 할 수 있습니다. Circos 살펴보기 : http://mkweb.bcgsc.ca/circos/

Circos는 게놈 및 기타 매우 크고 복잡한 데이터 세트를 시각화하도록 맞춤화 된 생물 정보학 사람들이 개발 한 그래프 시각화입니다.

PERL 기반 패키지이므로 문제가되지 않기를 바랍니다.

파이썬에서 그래프 도구 라이브러리를 사용하여 좋은 결과를 얻었습니다 . 아래 그래프에는 1,490 개의 노드와 19,090 개의 에지가 있습니다. 랩톱에서 렌더링하는 데 약 5 분이 걸렸습니다.

정치 블로그 네트워크

그래프 데이터는 Adamic과 Glance가 "정치적 블로고 스피어와 2004 년 미국 선거" pdf 링크 에서 설명한 정치 블로깅 네트워크에서 가져온 것 입니다. 확대하면 각 노드에 대한 블로그 URL을 볼 수 있습니다.

그림을 그리는 데 사용한 코드는 다음과 같습니다 (블로그 http://ryancompton.net/2014/10/22/stochastic-block-model-based-edge-bundles-in-graph-tool/ ) :

import graph_tool.all as gt
import math

g = gt.collection.data["polblogs"] #  http://www2.scedu.unibo.it/roversi/SocioNet/AdamicGlanceBlogWWW.pdf
print(g.num_vertices(), g.num_edges())

#reduce to only connected nodes
g = gt.GraphView(g,vfilt=lambda v: (v.out_degree() > 0) and (v.in_degree() > 0) )
g.purge_vertices()

print(g.num_vertices(), g.num_edges())

#use 1->Republican, 2->Democrat
red_blue_map = {1:(1,0,0,1),0:(0,0,1,1)}
plot_color = g.new_vertex_property('vector<double>')
g.vertex_properties['plot_color'] = plot_color
for v in g.vertices():
    plot_color[v] = red_blue_map[g.vertex_properties['value'][v]]

#edge colors
alpha=0.15
edge_color = g.new_edge_property('vector<double>')
g.edge_properties['edge_color']=edge_color
for e in g.edges():
    if plot_color[e.source()] != plot_color[e.target()]:
        if plot_color[e.source()] == (0,0,1,1):
            #orange on dem -> rep
            edge_color[e] = (255.0/255.0, 102/255.0, 0/255.0, alpha)
        else:
            edge_color[e] = (102.0/255.0, 51/255.0, 153/255.0, alpha)            
    #red on rep-rep edges
    elif plot_color[e.source()] == (1,0,0,1):
        edge_color[e] = (1,0,0, alpha)
    #blue on dem-dem edges
    else:
        edge_color[e] = (0,0,1, alpha)

state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)
bstack = state.get_bstack()
t = gt.get_hierarchy_tree(bstack)[0]
tpos = pos = gt.radial_tree_layout(t, t.vertex(t.num_vertices() - 1), weighted=True)
cts = gt.get_hierarchy_control_points(g, t, tpos)
pos = g.own_property(tpos)
b = bstack[0].vp["b"]

#labels
text_rot = g.new_vertex_property('double')
g.vertex_properties['text_rot'] = text_rot
for v in g.vertices():
    if pos[v][0] >0:
        text_rot[v] = math.atan(pos[v][1]/pos[v][0])
    else:
        text_rot[v] = math.pi + math.atan(pos[v][1]/pos[v][0])

gt.graph_draw(g, pos=pos, vertex_fill_color=g.vertex_properties['plot_color'], 
            vertex_color=g.vertex_properties['plot_color'],
            edge_control_points=cts,
            vertex_size=10,
            vertex_text=g.vertex_properties['label'],
            vertex_text_rotation=g.vertex_properties['text_rot'],
            vertex_text_position=1,
            vertex_font_size=9,
            edge_color=g.edge_properties['edge_color'],
            vertex_anchor=0,
            bg_color=[0,0,0,1],
            output_size=[4024,4024],
            output='polblogs_blockmodel.png')

Try Gephi, it has a new layout plugin called OpenOrd that scales to millions of nodes.

Mathematica could very likely handle it, but I have to admit my first reaction was along the lines of the comment that said "take a piece of paper and color it black." Is there no way to reduce the density of the graph?

A possible issue is that you seem to be looking for layout, not just rendering. I have no knowledge about the Big O characteristics of the layouts implemented by various tools, but intuitively I would guess that it might take a long time to lay out that much data.

Does it need to be truly accurate?

Depending on what you're trying to accomplish it might be good enough to just graph 10% or 1% of the data volume. (of course, it might also be completely useless, but it all depends on what the visualization is for)

I expect edge clustering (http://www.visualcomplexity.com/vc/project_details.cfm?id=679&index=679&domain=) would help. This technique bundles related edges together, reducing the visual complexity of the graph. You may have to implement the algorithm yourself though.

BioFabric (www.BioFabric.org) is another tool for visualizing large graphs. It should be able to handle the network described (178,000 nodes and 500,000 edges) OK, though the initial layout may take awhile. The network show here (from the Stanford Large Network Dataset Collection) is the Stanford Web Network, which has 281,903 nodes and 2,312,497 edges:

스탠포드 웹 네트워크 BioFabric's scalability is due to the fact that it represents nodes not as points, but as horizontal lines. The edges are then shown as vertical lines. For some intuition about how this works, there is the Super-Quick BioFabric Demo, which is a small network that is animated using D3.

The primary application is written in Java. At the moment, it can only export PNG images, not PDFs. There is a PDF export option from RBioFabric, though that is a very simple implementation that cannot handle really large networks yet.

Full disclosure: BioFabric is a tool that I wrote.

You might offer a sanitized version of the file to the developers of those tools as a debugging scenario, if all else fails.

You could try aiSee: http://www.aisee.com/manual/unix/56.htm

Check out the Java/Jython based GUESS: http://graphexploration.cond.org/

Large Graph Layout (LGL) project helped me a lot with a similar ptoblem. It handles layout and have a small java app to draw produced layouts in 2D. No vector output out of the box so you'll have to draw the graph yourself (given the node coordinates produced by LGL)

A windows tool that can visualize graphs is pajek, it generates a eps output, however I don't know if it can read your data.

There's a list of apps here: http://www.mkbergman.com/?p=414

Walrus and LGL are two tools supposedly suited for large graphs. However, both seem to require graphs to be input as text files in their own special format, which might be a pain.

I don't think you can come remotely close to visualising that in a flat layout.

I've been intrigued by Hyperbolic Graphs, described in this research paper for some time. Try the software from SourceForge.

Another idea is just graphing the nodes using a TreeMap as seen at Panopticode.

You can also try NAViGaTOR (disclosure: I'm one of the developers for that software). We've successfully visualized graphs with as many as 1.7 million edges with it. Although such large networks are hard to manipulate (the user interface will get laggy). However, it does use OpenGL for the visualization so some of the overhead is transferred to the graphics card.

Also note that you'll have to crank up the memory settings in the File->Preferences dialog box before you can successfully open a network that big.

Finally, as most of the other responses point out, you are better off re-organizing your data into something smaller and more meaningful.

First, I would like to second aliekens' suggestion to try sfdp. It is the large scale version of Neato.

As OJW suggests you could also just plot the nodes in R2. Your edges actually supply what he calls a "natural ordering." In particular you can plot the components of the second and third eigenvectors of the normalized graph Laplacian. This is the matrix L in this wikipedia page about spectral clustering. You should be able to write down this matrix without understanding the linear algebra behind it. Then, you have reduced your problem to approximately computing the first few eigenvectors of a large sparse matrix. This is traditionally done by iterative methods and is implemented in standard linear algebra packages. This method should scale up to very large graphs.

참고URL : https://stackoverflow.com/questions/238724/visualizing-undirected-graph-thats-too-large-for-graphviz

'developer tip' 카테고리의 다른 글

컬렉션의 구문을 설명하십시오. (0)	2020.11.02
react-router를 사용하여 다른 경로로 리디렉션하는 방법은 무엇입니까? (0)	2020.11.02
CSS-ID 내에서 클래스를 선택하는 구문 (0)	2020.11.02
Visual Studio를 VB.NET 대신 C # 프로젝트로 기본 설정하는 방법은 무엇입니까? (0)	2020.11.02
사용자 이름과 비밀번호를 Python에 안전하게 저장해야합니다. 옵션은 무엇입니까? (0)	2020.11.02

현재글GraphViz에 비해 너무 큰 무 방향 그래프 시각화?

optionbox

GraphViz에 비해 너무 큰 무 방향 그래프 시각화?

GraphViz에 비해 너무 큰 무 방향 그래프 시각화?

'developer tip' 카테고리의 다른 글

'developer tip'의 다른글

티스토리툴바

GraphViz에 비해 너무 큰 무 방향 그래프 시각화?

GraphViz에 비해 너무 큰 무 방향 그래프 시각화?

'developer tip' 카테고리의 다른 글

'developer tip'의 다른글

관련글

티스토리툴바