Message from @realz

```
G.number_of_nodes(): 90259
G.number_of_edges(): 313771
len(connected_component_sizes): 31528
len(connected_component_sizes): 2092
connected_component_sizes[:20] [1030, 1022, 996, 865, 762, 622, 610, 583, 493, 464, 421, 397, 384, 371, 347, 331, 330, 325, 315, 287]
Gp.number_of_nodes(): 10955
Gp.number_of_edges(): 164456

```

realz

2020-11-25 08:36:23 UTC

by picking the top 20 connected components, I got 1/2 the edges

realz

2020-11-25 08:36:42 UTC

though I do worry that cleaning the data this way might make it seem suspicious when it isn't or vice versa

DrSammyD

2020-11-25 08:37:04 UTC

@AntiFish03 This one should run on your phone https://bitcadia.github.io/DownBallot/compare.html

realz

2020-11-25 08:37:26 UTC

now I have three fdp operations running to convert dot to svg at the same time

AntiFish03

2020-11-25 08:39:49 UTC

I already pulled my MBP out... so I am looking at the repo directly... just trying to first get a handle on what the code is doing before I try to wrap my head around the deluge of data

DrSammyD

2020-11-25 08:40:16 UTC

Indeed. I have a few things generated

DrSammyD

2020-11-25 08:40:35 UTC

That one is basically the Shiva data on county by county scale per state

realz

2020-11-25 08:43:35 UTC

I don't think I've ever run my CPU this hot lol

AntiFish03

2020-11-25 08:43:43 UTC

It's a giant cluster of data to try and analyze. I know this might sound weird have you tried doing statistical analysis for mean, median, mode, quartiles and the outliers? It might let you look for out of the ordinary things

realz

2020-11-25 08:45:11 UTC

OK I found a `neato` option that _drastically_ speeds up generation

AntiFish03

2020-11-25 08:45:20 UTC

I'm really looking at this as a clean set of eyes at the moment, not knowing what has or hasn't been done so not trying to derail anything just asking questions

Watching the Watchers

2020-11-25 08:45:20 UTC

@AntiFish03, you just advanced to level 3!

realz

2020-11-25 08:46:23 UTC

there is nothing to derail heh

realz

2020-11-25 08:46:28 UTC

ask your questoins

realz

2020-11-25 08:47:51 UTC

(gephi is still forzed lol)

realz

2020-11-25 08:48:16 UTC

I've downloaded Tulip to see what it can do with this

realz

2020-11-25 08:48:32 UTC

someone on SO said it can handle "up to 100k nodes"

AntiFish03

2020-11-25 08:49:15 UTC

I'm just trying to wrap my head around things that you've tried or haven't to look at this data... like comparing all the counties in a state looking not at trump v. biden but just at the mean, median and mode and breaking up for things like turnout rates that are way outside any other counties in the state

DrSammyD

2020-11-25 08:50:24 UTC

@realz you could filter by ratios that don't repeat more than X number of times

AntiFish03

2020-11-25 08:50:45 UTC

Just thinking that trying to highlight points of where to actually look for issues rather than just trying to drown in the data

realz

2020-11-25 08:50:57 UTC

@DrSammyD the connected components filter effectively does that

realz

2020-11-25 08:51:09 UTC

it gets rid of nodes that aren't connected to K other nodes