Message from @porco

Discord ID: 463368293240274948


2018-07-02 15:36:55 UTC  

and 24 cores

2018-07-02 15:37:12 UTC  

but really, splitting the database would only make it slower

2018-07-02 15:37:46 UTC  

my formula to decide if a video is relevant right now is `w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)`

2018-07-02 15:37:54 UTC  

```
dtf is the number of times the term appears in the document
sumdtf is the sum of (log(dtf)+1)'s for all terms in the same document
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```

2018-07-02 15:38:06 UTC  

>math
🤢

2018-07-02 15:38:08 UTC  

```
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```

2018-07-02 15:38:12 UTC  

these won't work if I split the db

2018-07-02 15:38:18 UTC  

and you will get shit results

2018-07-02 15:38:34 UTC  

my way right now is OK the problem is the data quality

2018-07-02 15:38:50 UTC  

for this, I ordered myself a nice GTX 1080 TI to put into the server that I'm waiting for

2018-07-02 15:39:11 UTC  

about 30% of my database is tagged and categorized correctly, the rest is trash

2018-07-02 15:39:22 UTC  

I am working on a ML model to categorize the rest

2018-07-02 15:39:31 UTC  

1080ti? damn

2018-07-02 15:39:39 UTC  

I'm dumb so how come a GPU will help with things? <:GWchinaSakuraThinking:398950680217255977>

2018-07-02 15:39:44 UTC  

ml accel

2018-07-02 15:39:51 UTC  

Or will it boost the machinelearning shit

2018-07-02 15:40:00 UTC  

look at their examples

2018-07-02 15:40:05 UTC  

like the titanic survival rate

2018-07-02 15:40:08 UTC  

Never got too deep into it other than a forced "something" tutorial on MatLab

2018-07-02 15:40:10 UTC  

they are self-explanatory

2018-07-02 15:40:37 UTC  

Though, ideally, you should be grabbing something like Volta if you move to a different ML framework, like Tensorflow

2018-07-02 15:40:39 UTC  

ohshit

2018-07-02 15:40:44 UTC  

I didn't knew C# had ML

2018-07-02 15:40:51 UTC  

I always thought that for ML you'd use python and only python

2018-07-02 15:40:59 UTC  

my priorities right now are

- get better hardware
- improve site performance
- improve database quality to deliver better search results
- add more features

2018-07-02 15:41:10 UTC  

add more features would be your filters

2018-07-02 15:41:19 UTC  

`- add more features`
pornspider pass <:GWnanaPepoHype:392308469488680961>

2018-07-02 15:41:32 UTC  

but since I do this on a freetime basis and you can see I have more important tasks before that, it might take a while

2018-07-02 15:42:04 UTC  

because filters are useless, if half my datasets don't have a duration in the first place

2018-07-02 15:42:14 UTC  

first, I need to increase the data quality

2018-07-02 15:42:38 UTC  

Also .NET ML is much nicer to use than python

2018-07-02 15:42:40 UTC  

give it a try

2018-07-02 15:42:50 UTC  

Microsoft is doing everything right except for windows, these days

2018-07-02 15:42:51 UTC  

I will, once I finish the MVC course

2018-07-02 15:43:16 UTC  

MS is doing Windows correctly

2018-07-02 15:43:20 UTC  

>

2018-07-02 15:43:20 UTC  

just not consumer Windows

2018-07-02 15:43:24 UTC  

>

2018-07-02 15:43:31 UTC  

Windows Server sucks ass

2018-07-02 15:43:40 UTC  

there's a reason I'm running .NET pornspider on Linux with mono