Message from @realz
Discord ID: 781078689382924308
Well there you go, that should optimize the process considerably.
Unless it's a rats nest
there are 2092 connected components with 5 nodes or more
Let me switch over to my computer so I can actually look at the github repo @DrSammyD posted. My phone chokes on itself to try and open it up
you are only going to make your computer suffer
```
G.number_of_nodes(): 90259
G.number_of_edges(): 313771
len(connected_component_sizes): 31528
len(connected_component_sizes): 2092
connected_component_sizes[:20] [1030, 1022, 996, 865, 762, 622, 610, 583, 493, 464, 421, 397, 384, 371, 347, 331, 330, 325, 315, 287]
Gp.number_of_nodes(): 10955
Gp.number_of_edges(): 164456
```
by picking the top 20 connected components, I got 1/2 the edges
though I do worry that cleaning the data this way might make it seem suspicious when it isn't or vice versa
@AntiFish03 This one should run on your phone https://bitcadia.github.io/DownBallot/compare.html
now I have three fdp operations running to convert dot to svg at the same time
I already pulled my MBP out... so I am looking at the repo directly... just trying to first get a handle on what the code is doing before I try to wrap my head around the deluge of data
Indeed. I have a few things generated
That one is basically the Shiva data on county by county scale per state
I don't think I've ever run my CPU this hot lol
It's a giant cluster of data to try and analyze. I know this might sound weird have you tried doing statistical analysis for mean, median, mode, quartiles and the outliers? It might let you look for out of the ordinary things
OK I found a `neato` option that _drastically_ speeds up generation
I'm really looking at this as a clean set of eyes at the moment, not knowing what has or hasn't been done so not trying to derail anything just asking questions
@AntiFish03, you just advanced to level 3!
there is nothing to derail heh
ask your questoins
I've downloaded Tulip to see what it can do with this
someone on SO said it can handle "up to 100k nodes"
I'm just trying to wrap my head around things that you've tried or haven't to look at this data... like comparing all the counties in a state looking not at trump v. biden but just at the mean, median and mode and breaking up for things like turnout rates that are way outside any other counties in the state
@realz you could filter by ratios that don't repeat more than X number of times
Just thinking that trying to highlight points of where to actually look for issues rather than just trying to drown in the data
it gets rid of nodes that aren't connected to K other nodes
Ah...
You could divide the image up into bands of ratios
yea
I mean
`[1030, 1022, 996, 865, 762, 622, 610, 583, 493, 464, 421, 397, 384, 371, 347, 331, 330, 325, 315, 287]`
this is the top 20 hairballs
I can't split any one of these hairballs up
a ratio of 1030 appears in this data?
no
1030 reports are connected to eachother as having the same ratio within the threshold delta time
Ah I see
those 1030 reports must show up on the same graph in order to have meaning
I am hoping I can fit 20 of them