Message from @porco

Discord ID: 463367020340183067


2018-07-02 15:32:09 UTC  

Domain is .to iirc

2018-07-02 15:32:12 UTC  

it's 5 weeks old now

2018-07-02 15:32:15 UTC  
2018-07-02 15:32:20 UTC  

<#189811441174446080>

2018-07-02 15:32:22 UTC  

jk

2018-07-02 15:32:49 UTC  

if you get that one extra user from czech republic now

2018-07-02 15:32:51 UTC  

stop stalking me

2018-07-02 15:32:53 UTC  

thanks

2018-07-02 15:33:00 UTC  

and I actually had someone join the team, a ~~meme~~machinelearning specialist

2018-07-02 15:33:01 UTC  

But yeah I'd use more and attribute towards stats but right now I don't think I can use pornspider easily due to filters

2018-07-02 15:33:07 UTC  

As much as I meme about them, they are useful

2018-07-02 15:33:13 UTC  

filters?

2018-07-02 15:33:23 UTC  

Even basic shit like `longer than 20 minutes`

2018-07-02 15:33:38 UTC  

Since it easily cuts down shit like trailer clips and the like

2018-07-02 15:33:40 UTC  

yes as I said, my server sucks. Pornspider has 20k views / day and runs on 2 GB Ram

2018-07-02 15:33:47 UTC  

I really can't add any search complexity

2018-07-02 15:33:58 UTC  

>open pornspider
>click teen
>teenage guys masturbating in their own mouth

2018-07-02 15:34:00 UTC  

<:woah:333623269674713098>

2018-07-02 15:34:29 UTC  

Not even basic ```csharp
if ( Video.timeDuration > userFilter ) { return vid }```

2018-07-02 15:34:49 UTC  

Though I have no idea about hardware usage of pornspider

2018-07-02 15:35:06 UTC  

if I do it exactly like that, it will slow down your search by about 15-20 min / request because it would have to iterate through 15 million videos

2018-07-02 15:35:22 UTC  

I use a lot of btree indexes

2018-07-02 15:35:27 UTC  

What about splitting the database?

2018-07-02 15:35:40 UTC  

0-5 > db1
10-15 > db2
20+ > db3?

2018-07-02 15:35:43 UTC  

then I can't use the vector-space formula anymore if I chunk the database

2018-07-02 15:35:48 UTC  

I just need more ram

2018-07-02 15:35:53 UTC  

download mowe wam

2018-07-02 15:35:59 UTC  

I ordered hardware on ebay

2018-07-02 15:36:06 UTC  

3200 $ in server hardware

2018-07-02 15:36:18 UTC  

and I rented a colocation space for 160 $ / month nearby my home

2018-07-02 15:36:29 UTC  

then, I will be able to add whatever filters are possible

2018-07-02 15:36:38 UTC  

because I will have 196 GB ram

2018-07-02 15:36:55 UTC  

and 24 cores

2018-07-02 15:37:12 UTC  

but really, splitting the database would only make it slower

2018-07-02 15:37:46 UTC  

my formula to decide if a video is relevant right now is `w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)`

2018-07-02 15:37:54 UTC  

```
dtf is the number of times the term appears in the document
sumdtf is the sum of (log(dtf)+1)'s for all terms in the same document
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```

2018-07-02 15:38:06 UTC  

>math
🤢

2018-07-02 15:38:08 UTC  

```
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```

2018-07-02 15:38:12 UTC  

these won't work if I split the db

2018-07-02 15:38:18 UTC  

and you will get shit results

2018-07-02 15:38:34 UTC  

my way right now is OK the problem is the data quality