Well I did some work on it today. It’s now in extension form (the old version, prior to Ben Goodger’s changes). Also using a “database” (array) of 18 keywords right now, with a fair amount of success.
Now the big topic will be creating a RDF schema and a method for scanning efficiently, and “fuzzy”. Allow me to expand:
We can’t just ban the page because of the word “ass”, but the word “ass”, and several other words could be potential page worth blocking. So what needs to be done is attach point values to all words (scientifically). Then based if the point value gets higher than 5.0, we block it. This is basically how SpamAssassin operates. So what I need is for someone to do some experimentation, and find out exactly what keywords to use, and what point values to attach to them. A nice thing would be a little C++ app that could be used to generate scores based on data. I’m rather open to suggestions on how to do this. So… give suggestions, code solutions. Submit them to me, be a hero.
The RDF schema also needs to contain a method field. Since regEx is extremely slow, and bloated, we obviously don’t want to do that more than we need to. So we have the option to use window.find(). By using that method, there’s a speed increase (with obvious limitations).
So code is coming, hopefully an initial checkin soon, I’m just not ready yet, and busy. I’ve had about 3hrs today of free time to play, and that was my break from the academic books. More to come, but lets get the creative juices flowing.