Message from @porco

sum

2018-07-02 15:32:09 UTC

Domain is .to iirc

porco

2018-07-02 15:32:12 UTC

it's 5 weeks old now

porco

2018-07-02 15:32:15 UTC

`pornspider.to`

sum

2018-07-02 15:32:20 UTC

<#189811441174446080>

sum

2018-07-02 15:32:22 UTC

jk

Cole

2018-07-02 15:32:49 UTC

if you get that one extra user from czech republic now

Cole

2018-07-02 15:32:51 UTC

stop stalking me

Cole

2018-07-02 15:32:53 UTC

thanks

porco

2018-07-02 15:33:00 UTC

and I actually had someone join the team, a ~~meme~~machinelearning specialist

sum

2018-07-02 15:33:01 UTC

But yeah I'd use more and attribute towards stats but right now I don't think I can use pornspider easily due to filters

sum

2018-07-02 15:33:07 UTC

As much as I meme about them, they are useful

bentech

2018-07-02 15:33:13 UTC

filters?

sum

2018-07-02 15:33:23 UTC

Even basic shit like `longer than 20 minutes`

sum

2018-07-02 15:33:38 UTC

Since it easily cuts down shit like trailer clips and the like

porco

2018-07-02 15:33:40 UTC

yes as I said, my server sucks. Pornspider has 20k views / day and runs on 2 GB Ram

porco

2018-07-02 15:33:47 UTC

I really can't add any search complexity

Cole

2018-07-02 15:33:58 UTC

>open pornspider
>click teen
>teenage guys masturbating in their own mouth

Cole

2018-07-02 15:34:00 UTC

<:woah:333623269674713098>

sum

2018-07-02 15:34:29 UTC

Not even basic ```csharp
if ( Video.timeDuration > userFilter ) { return vid }```

sum

2018-07-02 15:34:49 UTC

Though I have no idea about hardware usage of pornspider

porco

2018-07-02 15:35:06 UTC

if I do it exactly like that, it will slow down your search by about 15-20 min / request because it would have to iterate through 15 million videos

porco

2018-07-02 15:35:22 UTC

I use a lot of btree indexes

sum

2018-07-02 15:35:27 UTC

What about splitting the database?

sum

2018-07-02 15:35:40 UTC

0-5 > db1
10-15 > db2
20+ > db3?

porco

2018-07-02 15:35:43 UTC

then I can't use the vector-space formula anymore if I chunk the database

porco

2018-07-02 15:35:48 UTC

I just need more ram

sum

2018-07-02 15:35:53 UTC

download mowe wam

porco

2018-07-02 15:35:59 UTC

I ordered hardware on ebay

porco

2018-07-02 15:36:06 UTC

3200 $ in server hardware

porco

2018-07-02 15:36:18 UTC

and I rented a colocation space for 160 $ / month nearby my home

porco

2018-07-02 15:36:29 UTC

then, I will be able to add whatever filters are possible

porco

2018-07-02 15:36:38 UTC

because I will have 196 GB ram

porco

2018-07-02 15:36:55 UTC

and 24 cores

porco

2018-07-02 15:37:12 UTC

but really, splitting the database would only make it slower

porco

2018-07-02 15:37:46 UTC

my formula to decide if a video is relevant right now is `w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)`

porco

2018-07-02 15:37:54 UTC

```
dtf is the number of times the term appears in the document
sumdtf is the sum of (log(dtf)+1)'s for all terms in the same document
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```

sum

2018-07-02 15:38:06 UTC

>math
🤢

porco

2018-07-02 15:38:08 UTC

```
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```

porco

2018-07-02 15:38:12 UTC

these won't work if I split the db

porco

2018-07-02 15:38:18 UTC

and you will get shit results

porco

2018-07-02 15:38:34 UTC

my way right now is OK the problem is the data quality

Message from @porco

Discord ID: 463367020340183067