Message from @porco
Discord ID: 463367958010265621
I use a lot of btree indexes
What about splitting the database?
0-5 > db1
10-15 > db2
20+ > db3?
then I can't use the vector-space formula anymore if I chunk the database
I just need more ram
download mowe wam
I ordered hardware on ebay
3200 $ in server hardware
and I rented a colocation space for 160 $ / month nearby my home
then, I will be able to add whatever filters are possible
because I will have 196 GB ram
and 24 cores
but really, splitting the database would only make it slower
my formula to decide if a video is relevant right now is `w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)`
```
dtf is the number of times the term appears in the document
sumdtf is the sum of (log(dtf)+1)'s for all terms in the same document
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```
>math
🤢
```
U is the number of Unique terms in the document
N is the total number of documents
nf is the number of documents that contain the term
```
these won't work if I split the db
and you will get shit results
my way right now is OK the problem is the data quality
about 30% of my database is tagged and categorized correctly, the rest is trash
I am working on a ML model to categorize the rest
1080ti? damn
I'm dumb so how come a GPU will help with things? <:GWchinaSakuraThinking:398950680217255977>
ml accel
Or will it boost the machinelearning shit
look at their examples
like the titanic survival rate
Never got too deep into it other than a forced "something" tutorial on MatLab
they are self-explanatory
Though, ideally, you should be grabbing something like Volta if you move to a different ML framework, like Tensorflow
ohshit
I didn't knew C# had ML
I always thought that for ML you'd use python and only python
my priorities right now are
- get better hardware
- improve site performance
- improve database quality to deliver better search results
- add more features
add more features would be your filters
`- add more features`
pornspider pass <:GWnanaPepoHype:392308469488680961>
but since I do this on a freetime basis and you can see I have more important tasks before that, it might take a while
because filters are useless, if half my datasets don't have a duration in the first place