Message from @porco
Discord ID: 463367020340183067
Domain is .to iirc
it's 5 weeks old now
<#189811441174446080>
jk
if you get that one extra user from czech republic now
stop stalking me
thanks
and I actually had someone join the team, a ~~meme~~machinelearning specialist
But yeah I'd use more and attribute towards stats but right now I don't think I can use pornspider easily due to filters
As much as I meme about them, they are useful
filters?
Even basic shit like `longer than 20 minutes`
Since it easily cuts down shit like trailer clips and the like
yes as I said, my server sucks. Pornspider has 20k views / day and runs on 2 GB Ram
I really can't add any search complexity
>open pornspider
>click teen
>teenage guys masturbating in their own mouth
<:woah:333623269674713098>
Not even basic ```csharp
if ( Video.timeDuration > userFilter ) { return vid }```
Though I have no idea about hardware usage of pornspider
if I do it exactly like that, it will slow down your search by about 15-20 min / request because it would have to iterate through 15 million videos
I use a lot of btree indexes
What about splitting the database?
0-5 > db1
10-15 > db2
20+ > db3?
then I can't use the vector-space formula anymore if I chunk the database
I just need more ram
download mowe wam
I ordered hardware on ebay
3200 $ in server hardware
and I rented a colocation space for 160 $ / month nearby my home
then, I will be able to add whatever filters are possible
because I will have 196 GB ram
and 24 cores
but really, splitting the database would only make it slower
my formula to decide if a video is relevant right now is `w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)`
```
dtf is the number of times the term appears in the document
sumdtf is the sum of (log(dtf)+1)'s for all terms in the same document
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```
>math
🤢
```
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```
these won't work if I split the db
and you will get shit results
my way right now is OK the problem is the data quality