Message from @realz

Discord ID: 781076070567903283


2020-11-25 08:21:00 UTC  

OK I need a better way to visualize dot files

2020-11-25 08:21:45 UTC  

Can you do smaller chunks of the data and overlay the dots in layers?

2020-11-25 08:21:46 UTC  

@AntiFish03, you just advanced to level 2!

2020-11-25 08:22:56 UTC  

oh interesting idea

2020-11-25 08:23:06 UTC  

I can probably see how many connected graphs there are

2020-11-25 08:23:59 UTC  

It would let you still visualize the data but at smaller bite size pieces rather than choking on the whole thing at once

2020-11-25 08:25:01 UTC  

Since SVG is just a glorified XML file smaller ones can be merged with others into a bigger one

2020-11-25 08:25:43 UTC  

yea there is no need see the connected components together

2020-11-25 08:25:52 UTC  

I don't even need to merge it

2020-11-25 08:26:15 UTC  

lemme first see how many components there are

2020-11-25 08:26:27 UTC  

or if it is a giant hairball

2020-11-25 08:26:27 UTC  

Well there you go, that should optimize the process considerably.

2020-11-25 08:26:54 UTC  

Unless it's a rats nest

2020-11-25 08:28:33 UTC  

there are 2092 connected components with 5 nodes or more

2020-11-25 08:29:20 UTC  

Let me switch over to my computer so I can actually look at the github repo @DrSammyD posted. My phone chokes on itself to try and open it up

2020-11-25 08:34:53 UTC  

you are only going to make your computer suffer

2020-11-25 08:36:00 UTC  

```
G.number_of_nodes(): 90259
G.number_of_edges(): 313771
len(connected_component_sizes): 31528
len(connected_component_sizes): 2092
connected_component_sizes[:20] [1030, 1022, 996, 865, 762, 622, 610, 583, 493, 464, 421, 397, 384, 371, 347, 331, 330, 325, 315, 287]
Gp.number_of_nodes(): 10955
Gp.number_of_edges(): 164456

```

2020-11-25 08:36:23 UTC  

by picking the top 20 connected components, I got 1/2 the edges

2020-11-25 08:36:42 UTC  

though I do worry that cleaning the data this way might make it seem suspicious when it isn't or vice versa

2020-11-25 08:37:04 UTC  
2020-11-25 08:37:26 UTC  

now I have three fdp operations running to convert dot to svg at the same time

2020-11-25 08:39:49 UTC  

I already pulled my MBP out... so I am looking at the repo directly... just trying to first get a handle on what the code is doing before I try to wrap my head around the deluge of data

2020-11-25 08:40:16 UTC  

Indeed. I have a few things generated

2020-11-25 08:40:35 UTC  

That one is basically the Shiva data on county by county scale per state

2020-11-25 08:43:35 UTC  

I don't think I've ever run my CPU this hot lol

2020-11-25 08:43:43 UTC  

It's a giant cluster of data to try and analyze. I know this might sound weird have you tried doing statistical analysis for mean, median, mode, quartiles and the outliers? It might let you look for out of the ordinary things

2020-11-25 08:45:11 UTC  

OK I found a `neato` option that _drastically_ speeds up generation

2020-11-25 08:45:20 UTC  

I'm really looking at this as a clean set of eyes at the moment, not knowing what has or hasn't been done so not trying to derail anything just asking questions

2020-11-25 08:45:20 UTC  

@AntiFish03, you just advanced to level 3!

2020-11-25 08:46:23 UTC  

there is nothing to derail heh

2020-11-25 08:46:28 UTC  

ask your questoins

2020-11-25 08:47:51 UTC  

(gephi is still forzed lol)

2020-11-25 08:48:16 UTC  

I've downloaded Tulip to see what it can do with this

2020-11-25 08:48:32 UTC  

someone on SO said it can handle "up to 100k nodes"

2020-11-25 08:49:15 UTC  

I'm just trying to wrap my head around things that you've tried or haven't to look at this data... like comparing all the counties in a state looking not at trump v. biden but just at the mean, median and mode and breaking up for things like turnout rates that are way outside any other counties in the state

2020-11-25 08:50:24 UTC  

@realz you could filter by ratios that don't repeat more than X number of times

2020-11-25 08:50:45 UTC  

Just thinking that trying to highlight points of where to actually look for issues rather than just trying to drown in the data

2020-11-25 08:50:57 UTC  

@DrSammyD the connected components filter effectively does that

2020-11-25 08:51:09 UTC  

it gets rid of nodes that aren't connected to K other nodes

2020-11-25 08:51:17 UTC  

Ah...

2020-11-25 08:51:43 UTC  

You could divide the image up into bands of ratios