Message from @porco
Discord ID: 463367323600814082
But yeah I'd use more and attribute towards stats but right now I don't think I can use pornspider easily due to filters
As much as I meme about them, they are useful
filters?
Even basic shit like `longer than 20 minutes`
Since it easily cuts down shit like trailer clips and the like
yes as I said, my server sucks. Pornspider has 20k views / day and runs on 2 GB Ram
I really can't add any search complexity
>open pornspider
>click teen
>teenage guys masturbating in their own mouth
<:woah:333623269674713098>
Not even basic ```csharp
if ( Video.timeDuration > userFilter ) { return vid }```
Though I have no idea about hardware usage of pornspider
if I do it exactly like that, it will slow down your search by about 15-20 min / request because it would have to iterate through 15 million videos
I use a lot of btree indexes
What about splitting the database?
0-5 > db1
10-15 > db2
20+ > db3?
then I can't use the vector-space formula anymore if I chunk the database
I just need more ram
download mowe wam
I ordered hardware on ebay
3200 $ in server hardware
then, I will be able to add whatever filters are possible
because I will have 196 GB ram
and 24 cores
but really, splitting the database would only make it slower
my formula to decide if a video is relevant right now is `w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)`
```
dtf is the number of times the term appears in the document
sumdtf is the sum of (log(dtf)+1)'s for all terms in the same document
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```
>math
🤢
```
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```
these won't work if I split the db
and you will get shit results
my way right now is OK the problem is the data quality
for this, I ordered myself a nice GTX 1080 TI to put into the server that I'm waiting for
about 30% of my database is tagged and categorized correctly, the rest is trash
I am working on a ML model to categorize the rest
1080ti? damn
I'm dumb so how come a GPU will help with things? <:GWchinaSakuraThinking:398950680217255977>
ml accel
Or will it boost the machinelearning shit
look at their examples