|sporkmonger||yeah, i'm about 90% sure that won't work|
Maybe have it quit once the hash is empty.
|sporkmonger||that looks promissing|
|Olathe||I thought of another way. Let me make it.|
if it'd make it any simpler
i don't need to know -where- in the string the substrings occur
just how many times they occur
heh, i'm still trying to decipher this code, but it looks pretty damn close to what i need
and it's about 1000x times faster
Here's the beginnings of a faster one.
|cygnus128||sporkmonger: so in the string 'ab ab' you would find 'ab ab', 'ab a'(1), ' ab'(1), 'a '(1), ' a'(1), and 'ab'(2) right?|
sorry, 'ab ab' would be found once
You can probably do something with this: http://pastie.caboo.se/50469
Let me make an evaporating thingy.
The near-finished version: http://pastie.caboo.se/50470
|sporkmonger||sorry, was afk, back now|
cygnus128: i don't care about substrings that occur only once unless they're only one character long
the ultimate output is going to be a huffman coding compression tree
so everything in the string has to be able to be translated to a huffman coding
but strings that occur only once wouldn't compress very well
so i just ignore them
since they're weighted exactly the same as single characters
keeping track of singly occurring substrings would only serve to increase the size of the tree, but wouldn't improve the compression ratio
|Olathe||And the finale: http://pastie.caboo.se/50472|
"lobster" is a seven character repeat at positions 0 and 50 to the right of that. Note that  is in position zero of the seventh stage.
So, now lobs or ster don't appear; only the complete substring.
yeah, i still haven't figured out if that's desirable behavior or not yet
i had a delete method in my algorithm that resulted in the same effect
but using it didn't speed the algorithm up
and i -think- that it reduces the effectiveness of the primary algorithm
but i'm not sure
what would it take to make "lobs" and "ster" to show up as repeated substrings
Just use offsets