Message from @porco

Discord ID: 463367476013432833


2018-07-02 15:33:23 UTC  

Even basic shit like `longer than 20 minutes`

2018-07-02 15:33:38 UTC  

Since it easily cuts down shit like trailer clips and the like

2018-07-02 15:33:40 UTC  

yes as I said, my server sucks. Pornspider has 20k views / day and runs on 2 GB Ram

2018-07-02 15:33:47 UTC  

I really can't add any search complexity

2018-07-02 15:33:58 UTC  

>open pornspider
>click teen
>teenage guys masturbating in their own mouth

2018-07-02 15:34:00 UTC  

<:woah:333623269674713098>

2018-07-02 15:34:29 UTC  

Not even basic ```csharp
if ( Video.timeDuration > userFilter ) { return vid }```

2018-07-02 15:34:49 UTC  

Though I have no idea about hardware usage of pornspider

2018-07-02 15:35:06 UTC  

if I do it exactly like that, it will slow down your search by about 15-20 min / request because it would have to iterate through 15 million videos

2018-07-02 15:35:22 UTC  

I use a lot of btree indexes

2018-07-02 15:35:27 UTC  

What about splitting the database?

2018-07-02 15:35:40 UTC  

0-5 > db1
10-15 > db2
20+ > db3?

2018-07-02 15:35:43 UTC  

then I can't use the vector-space formula anymore if I chunk the database

2018-07-02 15:35:48 UTC  

I just need more ram

2018-07-02 15:35:53 UTC  

download mowe wam

2018-07-02 15:35:59 UTC  

I ordered hardware on ebay

2018-07-02 15:36:06 UTC  

3200 $ in server hardware

2018-07-02 15:36:18 UTC  

and I rented a colocation space for 160 $ / month nearby my home

2018-07-02 15:36:29 UTC  

then, I will be able to add whatever filters are possible

2018-07-02 15:36:38 UTC  

because I will have 196 GB ram

2018-07-02 15:36:55 UTC  

and 24 cores

2018-07-02 15:37:12 UTC  

but really, splitting the database would only make it slower

2018-07-02 15:37:46 UTC  

my formula to decide if a video is relevant right now is `w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)`

2018-07-02 15:37:54 UTC  

```
dtf is the number of times the term appears in the document
sumdtf is the sum of (log(dtf)+1)'s for all terms in the same document
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```

2018-07-02 15:38:06 UTC  

>math
🤢

2018-07-02 15:38:08 UTC  

```
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```

2018-07-02 15:38:12 UTC  

these won't work if I split the db

2018-07-02 15:38:18 UTC  

and you will get shit results

2018-07-02 15:38:34 UTC  

my way right now is OK the problem is the data quality

2018-07-02 15:38:50 UTC  

for this, I ordered myself a nice GTX 1080 TI to put into the server that I'm waiting for

2018-07-02 15:39:11 UTC  

about 30% of my database is tagged and categorized correctly, the rest is trash

2018-07-02 15:39:22 UTC  

I am working on a ML model to categorize the rest

2018-07-02 15:39:31 UTC  

1080ti? damn

2018-07-02 15:39:39 UTC  

I'm dumb so how come a GPU will help with things? <:GWchinaSakuraThinking:398950680217255977>

2018-07-02 15:39:44 UTC  

ml accel

2018-07-02 15:39:51 UTC  

Or will it boost the machinelearning shit

2018-07-02 15:40:00 UTC  

look at their examples

2018-07-02 15:40:05 UTC  

like the titanic survival rate

2018-07-02 15:40:08 UTC  

Never got too deep into it other than a forced "something" tutorial on MatLab

2018-07-02 15:40:10 UTC  

they are self-explanatory