Message from @porco

sum

2018-07-02 15:33:23 UTC

Even basic shit like `longer than 20 minutes`

sum

2018-07-02 15:33:38 UTC

Since it easily cuts down shit like trailer clips and the like

porco

2018-07-02 15:33:40 UTC

yes as I said, my server sucks. Pornspider has 20k views / day and runs on 2 GB Ram

porco

2018-07-02 15:33:47 UTC

I really can't add any search complexity

Cole

2018-07-02 15:33:58 UTC

>open pornspider
>click teen
>teenage guys masturbating in their own mouth

Cole

2018-07-02 15:34:00 UTC

<:woah:333623269674713098>

sum

2018-07-02 15:34:29 UTC

Not even basic ```csharp
if ( Video.timeDuration > userFilter ) { return vid }```

sum

2018-07-02 15:34:49 UTC

Though I have no idea about hardware usage of pornspider

porco

2018-07-02 15:35:06 UTC

if I do it exactly like that, it will slow down your search by about 15-20 min / request because it would have to iterate through 15 million videos

porco

2018-07-02 15:35:22 UTC

I use a lot of btree indexes

sum

2018-07-02 15:35:27 UTC

What about splitting the database?

sum

2018-07-02 15:35:40 UTC

0-5 > db1
10-15 > db2
20+ > db3?

porco

2018-07-02 15:35:43 UTC

then I can't use the vector-space formula anymore if I chunk the database

porco

2018-07-02 15:35:48 UTC

I just need more ram

sum

2018-07-02 15:35:53 UTC

download mowe wam

porco

2018-07-02 15:35:59 UTC

I ordered hardware on ebay

porco

2018-07-02 15:36:06 UTC

3200 $ in server hardware

porco

2018-07-02 15:36:18 UTC

and I rented a colocation space for 160 $ / month nearby my home

porco

2018-07-02 15:36:29 UTC

then, I will be able to add whatever filters are possible

porco

2018-07-02 15:36:38 UTC

because I will have 196 GB ram

porco

2018-07-02 15:36:55 UTC

and 24 cores

porco

2018-07-02 15:37:12 UTC

but really, splitting the database would only make it slower

porco

2018-07-02 15:37:46 UTC

my formula to decide if a video is relevant right now is `w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)`

porco

2018-07-02 15:37:54 UTC

```
dtf is the number of times the term appears in the document
sumdtf is the sum of (log(dtf)+1)'s for all terms in the same document
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```

sum

2018-07-02 15:38:06 UTC

>math
🤢

porco

2018-07-02 15:38:08 UTC

```
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```

porco

2018-07-02 15:38:12 UTC

these won't work if I split the db

porco

2018-07-02 15:38:18 UTC

and you will get shit results