Securita 0.1 in the works

Well I did some work on it today. It’s now in extension form (the old version, prior to Ben Goodger’s changes). Also using a “database” (array) of 18 keywords right now, with a fair amount of success.

Now the big topic will be creating a RDF schema and a method for scanning efficiently, and “fuzzy”. Allow me to expand:

We can’t just ban the page because of the word “ass”, but the word “ass”, and several other words could be potential page worth blocking. So what needs to be done is attach point values to all words (scientifically). Then based if the point value gets higher than 5.0, we block it. This is basically how SpamAssassin operates. So what I need is for someone to do some experimentation, and find out exactly what keywords to use, and what point values to attach to them. A nice thing would be a little C++ app that could be used to generate scores based on data. I’m rather open to suggestions on how to do this. So… give suggestions, code solutions. Submit them to me, be a hero.

The RDF schema also needs to contain a method field. Since regEx is extremely slow, and bloated, we obviously don’t want to do that more than we need to. So we have the option to use window.find(). By using that method, there’s a speed increase (with obvious limitations).

Perhaps in the future, changing the core engine to compiled binary would be better, but for now, we make do with javaScript. So far performance on a 1.8GHz system is actually not much slower at all, I really don’t notice it. But we will need some more keywords. I figure about 50-100, provided we use a scoring system like mentioned above.

So code is coming, hopefully an initial checkin soon, I’m just not ready yet, and busy. I’ve had about 3hrs today of free time to play, and that was my break from the academic books. More to come, but lets get the creative juices flowing.

3 thoughts on “Securita 0.1 in the works

  1. > So what needs to be done is attach point values to all words (scientifically).

    The obvious thing to do here is to use the Bayesian analysis code on each page. It does almost exactly what you want. You could either ship new databases every so often, or allow an admin to “add” new pages to the database so that it learns what the admin doesn’t want the user to see.

    Gerv

  2. Well the problem we run into there is that Bayes in JavaScript would be rather slow.

    So it can’t be for th end user. Though we could use it to generate the database for the user…. though it’s beyond my capabilities. Hopefully someone will come forward with that one.

  3. Why is it written in JS? Surely this is a Mozilla add-on, and can therefore leverage Mozilla services and use C++ XPCOM components? Or have I missed something?

    Gerv

Leave a Reply

Your email address will not be published. Required fields are marked *

Connect with Facebook

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>