Message from @AntiFish03

Discord ID: 781079418654949386


2020-11-25 08:36:00 UTC  

```
G.number_of_nodes(): 90259
G.number_of_edges(): 313771
len(connected_component_sizes): 31528
len(connected_component_sizes): 2092
connected_component_sizes[:20] [1030, 1022, 996, 865, 762, 622, 610, 583, 493, 464, 421, 397, 384, 371, 347, 331, 330, 325, 315, 287]
Gp.number_of_nodes(): 10955
Gp.number_of_edges(): 164456

```

2020-11-25 08:36:23 UTC  

by picking the top 20 connected components, I got 1/2 the edges

2020-11-25 08:36:42 UTC  

though I do worry that cleaning the data this way might make it seem suspicious when it isn't or vice versa

2020-11-25 08:37:04 UTC  
2020-11-25 08:37:26 UTC  

now I have three fdp operations running to convert dot to svg at the same time

2020-11-25 08:39:49 UTC  

I already pulled my MBP out... so I am looking at the repo directly... just trying to first get a handle on what the code is doing before I try to wrap my head around the deluge of data

2020-11-25 08:40:16 UTC  

Indeed. I have a few things generated

2020-11-25 08:40:35 UTC  

That one is basically the Shiva data on county by county scale per state

2020-11-25 08:43:35 UTC  

I don't think I've ever run my CPU this hot lol

2020-11-25 08:43:43 UTC  

It's a giant cluster of data to try and analyze. I know this might sound weird have you tried doing statistical analysis for mean, median, mode, quartiles and the outliers? It might let you look for out of the ordinary things

2020-11-25 08:45:11 UTC  

OK I found a `neato` option that _drastically_ speeds up generation

2020-11-25 08:45:20 UTC  

I'm really looking at this as a clean set of eyes at the moment, not knowing what has or hasn't been done so not trying to derail anything just asking questions

2020-11-25 08:45:20 UTC  

@AntiFish03, you just advanced to level 3!

2020-11-25 08:46:23 UTC  

there is nothing to derail heh

2020-11-25 08:46:28 UTC  

ask your questoins

2020-11-25 08:47:51 UTC  

(gephi is still forzed lol)

2020-11-25 08:48:16 UTC  

I've downloaded Tulip to see what it can do with this

2020-11-25 08:48:32 UTC  

someone on SO said it can handle "up to 100k nodes"

2020-11-25 08:49:15 UTC  

I'm just trying to wrap my head around things that you've tried or haven't to look at this data... like comparing all the counties in a state looking not at trump v. biden but just at the mean, median and mode and breaking up for things like turnout rates that are way outside any other counties in the state

2020-11-25 08:50:24 UTC  

@realz you could filter by ratios that don't repeat more than X number of times

2020-11-25 08:50:45 UTC  

Just thinking that trying to highlight points of where to actually look for issues rather than just trying to drown in the data

2020-11-25 08:50:57 UTC  

@DrSammyD the connected components filter effectively does that

2020-11-25 08:51:09 UTC  

it gets rid of nodes that aren't connected to K other nodes

2020-11-25 08:51:17 UTC  

Ah...

2020-11-25 08:51:43 UTC  

You could divide the image up into bands of ratios

2020-11-25 08:51:52 UTC  

yea

2020-11-25 08:52:05 UTC  

I mean

2020-11-25 08:52:13 UTC  

`[1030, 1022, 996, 865, 762, 622, 610, 583, 493, 464, 421, 397, 384, 371, 347, 331, 330, 325, 315, 287]`

2020-11-25 08:52:17 UTC  

this is the top 20 hairballs

2020-11-25 08:52:38 UTC  

I can't split any one of these hairballs up

2020-11-25 08:52:40 UTC  

a ratio of 1030 appears in this data?

2020-11-25 08:52:44 UTC  

no

2020-11-25 08:53:01 UTC  

1030 reports are connected to eachother as having the same ratio within the threshold delta time

2020-11-25 08:53:10 UTC  

Ah I see

2020-11-25 08:53:27 UTC  

those 1030 reports must show up on the same graph in order to have meaning

2020-11-25 08:53:41 UTC  

I am hoping I can fit 20 of them

2020-11-25 08:53:54 UTC  

the thing is, without the full graph, we won't see how rare this is

2020-11-25 08:53:58 UTC  

we are selecting for it

2020-11-25 08:56:13 UTC  

Could you imagine if it appeared here as we're doing this, but never in other cities...

2020-11-25 08:56:41 UTC  

"So much fraud, we couldn't even render it"

2020-11-25 08:56:46 UTC  

lol