ec2-18-232-179-37.compute-1.amazonaws.com | ToothyWiki | MoonShadow | RecentChanges | Login | Webcomic

If you're looking for instructions on writing generators for this wiki, they're at GeneratorsForDummies.

If you'd like a list of generators and related pages, it's [here].

Stuff below is really only of historical interest.

Could someone clarify the use of nth order Markov chains in the generation of text to me?  From what I recall/have found out, an nth order Markov chain is one that considers the past n items in the generation of the probabilities of the next item. (This is a random addition to make a point in a following paragraph).

I believe I understand the simple case of first-order...  Just make a note of every word used and record which words follow it to get a list of probabilities.  i.e. in the paragraph above, "in" would be followed by "the" with 2/3 probability and "a" with 1/3 probability.

In the fourth-order case, we must take account of the previous 4 words when generating the next.  Is this simply a case of recording the probabilities against a four word "phrase" instead of a single word?, i.e. "could someone clarify the" is followed by "use" with probability 1?  Obviously this will mean constructing a far larger 'dictionary' but I cannot see any great complication beyond this.  Am I missing something? --K

That's the heart of it. Complications arise when you want to optimise it to represent the dictionary in less space while still keeping it quick to search. - MoonShadow

Source for a simple BNF-templated stuff generator is [here]. It uses the ToothyWikiInternals/PageParser to read pages from a UseModWiki. There is a GitHub? repository [here].

bnf is the grammar name. Options can be set using option ::= option = value

The other half (still to do):

----------- cut here -----------
#! /usr/bin/perl -w
use English;
use strict;

# --------- settings
my $order = 1;       # (how many words next word depends on) - 1
my $spewcount = 50; # how many words to spew

# --------- 

my $dictionary = { };

sub store
  my($word, $context) = @_;

  my $container = $dictionary;

  # traverse down tree of hashes until context exhausted
  foreach my $item(@$context)
      $container->{$item} = { };
    $container = $container->{$item};

  # add one to word count and total
    $container->{$word} = 1;
  if(!defined($container->{' '}))
    $container->{' '} = 0;
    $container->{' '}++;

sub generate
  my($context) = @_;

  my $container = $dictionary;
  # traverse down tree of hashes to find leaf corresponding to current context
  foreach my $item(@$context)
     $container = $container->{$item};
      return 0; # todo: pick a random item that *is* defined.

  my $index = int(rand($container->{' '}));
  foreach my $key(keys %$container)
    $key eq ' ' and next;
    $index -= $container->{$key};
    if($index <= 0)
      return $key;

  return 0;

sub main
   my $fifo = [ ];

     s/\s+/ /g;
     my @words = split(/ /);
     foreach my $word(@words)
       if(length $word)
        if($#$fifo >= $order)
          &store($word, $fifo);
          shift @$fifo;
        push @$fifo, $word;

   # select random starting context
   $fifo = [ ];
    my $container = $dictionary;
    while($#$fifo < $order)
      my @keys = keys %$container;
      my $key = $keys[int(rand($#keys + 1))];
      print "$key ";
      push @$fifo, $key;
      $container = $container->{$key};
   while($spewcount-- > 0)
     my $word = &generate($fifo);
       print "$word ";
       shift @$fifo;
       push @$fifo, $word;
       print "\n";
       exit(0); # done
   print "\n";


----------- cut here -----------

 sl236@debian$ cat Door/door-* | perl markov.pl 
attention; they talked amongst themselves, remembering some sort after all? Yes? We were standing in a whisper: - We need a True Dawn... Just don't ask me what I need to do a little, deciding that the mystery of the Flying himself.. - Why not? When I saw a shadow unfold in

Not optimised; too memory-intensive for webserver at present. Order 1 is quite disjointed, as you can see above. Order 2 gives long passages that occasionally change subject. One really wants to be able to vary the context length randomly between those two, I think :) - MoonShadow

PeterTaylor wonders precisely what the debug is meant to do. Looking at the Perl above, he thinks it's meant to dump the BNF and to show a trace of how the result is produced. Correct?

If so, it seems not to work. PeterTaylor/MagicCardGenerator? currently has debug enabled, and is consistently printing only "destroy_effect ::= ". What I'd like is to see the chain of substitutions which leads to an output text.
Indeed, there was a bug, which I have just fixed. Should work now. - MoonShadow
(PeterTaylor) Not quite. The rules in the derivation are being shown backwards. It's useable now, though. Thanks.
Huh? The steps of the derivation should be shown in the order in which the tags get resolved. The lines in the state dump at the start are in random order (state is stored in a hash, which makes no guarantees about order of keys) - they're just there to make it possible to spot lines the parser isn't picking up which you think it should be and vice versa. - MoonShadow
The steps are shown in order, but each line of step is reversed. For example, I've just enabled debug for BNF: Sandbox/Generator, and it output the dump, the hr, and then
bnf ::= qualifier move optionalemphasis colourword
colourword ::= colour
colour ::= blue
optionalemphasis ::= nothing
nothing ::= 
move ::= pinch
qualifier ::= nothing
nothing ::= 
blue pinch

Request for syntax: Some text that can be used in a BNF generator page to be displayed as a link to another BNF generator page.

e.g. Generator on Kazuhiko/Generator includes text such as [1] which (if displayed) displays as a link to that BNF...

Not important, as I might not get round to doing anything with it but it could be interesting. --K

Ah, gomen MoonShadow, I think I didn't describe well enough (though the Wiki BNF syntax is also useful!).  Basically, I wanted the following to work:

 bnf ::= Would you like to go to test_location ?
test_location ::= [Kazuhiko's Bizarre Objects of Doom] | [Alex's Random Magic Cards]

Also, if the bold tag could be enabled, it would be appreciated... Thanks :) --K
Should now work - only in the form [wikipage label] or [bnf:wikipage label] though.  b, i, ul, h1, h2, and h3 have all been enabled. - MoonShadow
Could ToothyWikiInternals/InterWiki links be allowed too? Since those are trusted sites... --AlexChurchill
...[are they]? Before you suggest I remove the redirect script: since external sites are not under my control, any external link can potentially go [anywhere]. I suppose it's not a problem if people are expecting it, though. But it would still require me to integrate into the BNF script code for parsing the interwiki file, and for sanitising any parameters appended to the interwiki URL so people can't abuse the functionality to, for instance, launch nasties on mouseover.  Which isn't hard, but does involve a fair chunk of work whether I do it by despaghettifying the appropriate UseMod code or by writing it myself from scratch. And once I do all that, I may as well allow arbitrary URLs the way the wiki does. - MoonShadow
Hmm, I see... Count: wasn't listed on the ToothyWikiInternals/InterWiki list ;) Okay, I'll narrow down my request. Could MTG: links to wizards.com be allowed? (Of course, general URLs would work too, and that might be okay, but my requirements are much humbler than that). I suspect my next generator would work much better if people could look up the cards mentioned where necessary; but we can wait and see if you want. --AlexChurchill
Like I say, Count aside, it's surprising just [what] sites have publically accessible redirect scripts. Alphanumeric chars and the + symbol are sufficient for all the card names, right? - MoonShadow
Heh, ok. Yes, those would do: at first I thought we'd need apostrophe too, but it seems MTG: Teferis Curse works as well as MTG: Teferi's Curse. --AC
Enjoy ;) - MoonShadow
No longer true, although I don't know if anyone cares. --CH

Any chance of <pre>, and an option for spaces not to be stripped? I want to generate some AsciiArt :) --Bobacus
Sorry, ignore me. I hadn't escaped the < and >, nor separated it by spaces from the token... --Bobacus
Too late - made up an aa tag that works :)

(PeterTaylor) Although it's getting away from pure BNF, some syntax for weighting probabilities of branches would be useful.

Now implemented:
Whever a##x is about to be resolved, x is resolved one level first and iff the result consists of characters that are legal in a tag name (a-zA-Z0-9-), the result is appended to a, separated with an underscore. a##x##y, a##x##y##z, ... all behave appropriately.
a{x} - after a is resolved one level, the result goes into x as well as going on the pending stack as usual. a{x}{y}{z}... all work appropriately. a##x{y} would place result of  resolving a_x one level into y. Note that a{x}{y##x} would resolve a one level, place that into x, and use the new value of x when working out what to concatenate to y.
What happens about the same pattern being invoked further down the evaluation chain? Presumably you'd just lose the previously evaluated values. --AlexChurchill''
You lose the previously evaluated values. Usually that's a good thing. You can use ## to make yourself a stack-like thing and therefore local variables and stuff. - MoonShadow

<a::=x> - resolves x one level, assigns result to a there and then, generates no output and places nothing on the pending stack.

Please tell me if I've broken anything. Since all of these operations are deterministic, none of them count towards the iteration limit even though the debug log shows what's going on in gory detail. If anyone wants to have a stab at translating the above for GeneratorsForDummies, feel free; although I suspect only the diehard compscis will be interested...

tag{x} doesn't assign the end result of the lookup to x, just the first level:

x ::= f
f ::= g
x{v1} would set v1 to be f, not g

Then you can do
sentence ::= noun{x} which tobe##x blue
noun ::= singular | plural
singular ::= cat | dog
plural ::= cats | dogs
tobe_singular ::= is
tobe_plural ::= are

e.g. you could define if(x==y) then equal else different:
equality ::= <equality##x##y::=different> <equality##x##x::=equal> equality##x##y

something ::= a | <x::=src> <y::=creature> <equal::=b> <different::=something> equality
..which will, albeit clumsily, only ever return b if src was equal to creature when something was invoked; though it's getting away from the whole nonintrusiveness thing rather. Hm.

MoonShadow/Calculator sez this is enough to make the whole thing TuringComplete.

Is there a reason why quotes aren't supported inside <myVar::="value"> assignments? And is it legal to do <myVar::=value1 | value2>? --AlexChurchill

Ah - edit clash :) just answered that on your generator page. No, *nothing* is legal on the RHS of an assignment except a tag name with or without concatenation. For the former case, define something to be "value" and assign that. For the latter, pull the or outside the assignment, like this: assignMyVar?::=<myVar::=value1> | <myVar::=value2> ... and then invoke assignMyVar? - MoonShadow
Okay, fair enough, I guess. Next question: would you expect myLookup##flag resolve to myLookup_ when flag has been assigned to "" or nothing? If not, would it be problematic to allow it? --AC
No and yes. If flag is "", I would expect resolution to bail out and myLookup##flag to be the result (since "" cannot be part of a valid tag name, and removing the double quotes requires a second resolution). If flag is nothing, I would expect you to end up with myLookup_nothing (since nothing is the result of resolving flag one level - this is precisely as documented above). It would be problematic to behave otherwise since that would require me to magically divine how many levels you want to resolve in each case. Why don't you test for myLookup_nothing instead of myLookup_ ? - MoonShadow

Oops. Thank you, I'd been dereferencing too far in my head. I also didn't realise the double quotes weren't stripped out until display time. One more question before I go to the dentist: it seems that with spaces off, a non-recognised tag name still gets bracketed with spaces if it ends up displayed. Combined with no quotes allowed in assignments, this is resulting in me doing silly things like s ::= "s" . Is this as desired - are you expecting *every* literal to be quoted if spaces are turned off? (No offence, btw. I'm impressed with the new syntax. Just trying to establish what's permitted and what's intended.) --AlexChurchill
I also. If you want to see it in gory detail to try to reverse engineer how it actually works, I've been through and fixed a fair bit of problems with ChrisHowlett/ExaltedStatsGeneratorDebug. --CH
Hm. That's not supposed to happen. Do you think you could reproduce it in SandBox/Generator for me? I can't seem to. Thanks.. - MoonShadow
*embarrassed look* No action required. I just need to invoke "option ::= spaces = 0" rather than "options ::= spaces = 0"... --AlexChurchill

Next irritating feature request: both myself and ChrisHowlett (as creators of the two generators stretching this new funky syntax) have found occasions when it's rather frustrating to have ## only resolve one level. Although we agree it's a sensible default, is there any way an option could be added to the syntax to resolve more than one level? Ideally I suspect we'd like something equivalent to "tag###3#var", meaning "resolve var three levels down and then concatenate the result, if legal, onto tag". However, just providing some way to resolve two levels rather than one would do. Is there any chance of this, or would it be hairy? --AlexChurchill
It'd make the code really quite nasty. I'd have to start charging you for such resolutions - for each level! - since var could resolve to itself for instance. It seems like it'd be quite hairy to debug generators using it, too - you'd have to keep track of what's happening n levels down. You can currently work around the restriction by resolving and assigning before you hit the code that uses ##. Hm. - MoonShadow
Yeah, I feared it might be nasty. But what's this workaround you mention? I couldn't think of *any* way to resolve something multiple times, because e.g. <v1::=complicatedExpression> <v2::=v1> <v3::=v2> just sets them all to the same string... If there's a workaround, we'd happily use that! --AlexChurchill

instead of things like

a ::= b
b ::= c
bnf ::= thing##a
thing_c ::= Success!
(doesn't work)

you can do things like

a ::= b
b ::= <result::=c>
bnf ::= a thing##result
thing_c ::= Success!

because a resolves all the way before thing##result is examined.

Indeed. ChrisHowlett is already using this technique to allow him to increment the value of var##varAbilToAdd? in several places.

But what about if I want to create a particular instance of one large complicated definition, and then use that precise copy three times? As in,
 <myTokenType?::=tokendefinition> "Put a " myTokenType?: " into play." newline "Sacrifice a " myTokenType? ": " doStuff
tokendefinition ::= powertoughness colour creaturetype artifactness " creature token"
In this case each invocation of myTokenType? will create a different type of token. What I currently do is along the lines of
  tokendefinition ::= <myPT::=powertoughness> <myColour::=colour> <myCreatureType?::=creaturetype> myPT myColour myCreatureType? " creature token" 
... but the problem there is, if any of the nonterminals referenced are multilevel themselves, they only get derefed once. Eg powertoughness chooses whether to be a constant or X/X: then all that tokendefinition knows is that it will be a constant, not what that constant is. I suppose I could go through replacing all the way down the simple BNF choices with assignment-then-invocation... hmm. --AlexChurchill

Update: that doesn't work either. See SandBox/Generator: performing an assignment doesn't process embedded assignments. I think this means that *nothing* gets derefed more than once unless it's actually being displayed right now.
Yes, correct.
So even using assignment-then-invocation wouldn't work to avoid the issue mentioned above.
No, I think you misunderstood what I meant. Do the assignment at the point when the decision is made, not at top level. Separate the display and decision making. That way you can invoke something which makes all the decisions and does all the assignments but doesn't display anything, and then invoke something else that displays stuff. I'll go tweak the sandbox version to show you what I mean now.
It's not bothering me overmuch and there may not be much reason to fix it right now (Generator activity has died down recently), but just so you know. --AlexChurchill

And it's risen again. I'm now at the stage where I really want some way to evaluate something all the way down without displaying it. I had thought that enabling CSS "display:none" syntax would do that, but it won't, because "displayoff deepevaluation{variable} displayon" will only store the one-level-deep (I realise this is by design). But the need is rising for some way to display "A: B" with A being dynamically generated based on B, and not the other way around. I'd be happy to do it by something like "displayoff deepstuff{{B}} displayon calculatedA B" if that would be easier. What I can't do is rework everything from "a b | c d e" to "<myresult::=evaluateab> | storecde" format: even if it wasn't horrendous, I believe there are a number of cases where it just wouldn't work. So, is there any chance that deep-evaluate-and-store-without-display could be implemented? --AlexChurchill
Being able to arbitrarily invoke deep, silent evaluation would basically require me to rewrite the server-side code from scratch, designing it from the ground up to do that; either that, or have people nagging me for several months about corner cases that don't quite work. So if such a thing happened, it wouldn't happen until the holidays. I would also want some nice syntax that wouldn't break any existing generators, and can't think of any right now. Suggestions are invited. Such evaluation will be charged against the iteration limit, to stop you setting up infinite recursion. I would be very interested in the cases you suggest exist where separating the display code from the decision-making code could never work, since that would have interesting implications for turing-completeness of functional languages and in particular would imply that MoonShadow/Calculator cannot work as written. - MoonShadow
:P:P Okay, I guess I don't mean "is physically impossible". But in the same way that the Generator /could/ code a MandelbrotSet generator, but it wouldn't be pleasant, there are some aspects of the task I want to do that I can't see a sensible way of doing in the generator. Things like assembling an arbitrary length string from rather small parts, in one section (well, many sections) to be used in another, potentially involving recursion.
What about when it's not silent? That is, would it be any easier to make stuff{a} (which displays a before storing it) go deep without extra cost, seeing as it's all being generated at that point anyway? With a syntax of something like: "stuff{a} means evaluate stuff one level, store it in a, then evaluate it the rest of the way and display it; stuff{{a}} means evaluate stuff all the way, then display it and store it in a."  Just an idea; maybe we can discuss it at WA. --AC
Well.. it's *not* actually being generated at that point. The basic generator algorithm is the following:

..which is simple and fast for what it currently does; however, it flattens out all the recursion. The changes you propose basically mean that several hundred tags *after* the tag that needs to be deeply evaluated is first encountered, something needs to say "actually, all the output we've had between there and here should have something unusual done to it". Which is pretty horrible. If I rewrite the generator to do

..deep evaluation, and suppressing its results, suddenly becomes *much* nicer, at the cost of much higher runtime memory usage by the perl script due to the deep recursion, and possibly lower performance due to the recursive function call for every tag.

Time to revisit this just a little. Any chance table tags could be allowed? I don't care about border widths and all that kind of thing, but it would be quite useful to be able to have just <table>, <tr> and <td> supported at a very basic level. --CH
Hm. I don't *think* it'll break any existing generators. Yup, I'll do that. - MoonShadow
Thanks, that's useful. I had actually meant "I don't care about being able to specify border widths" - I'd ideally like borders still to be present, but the default settings are fine. Also, and I know this wasn't in the original spec, is there any chance we could get colspan implemented in the <td> tag? See ChrisHowlett/BetaExaltedStatsGenerator - I'd like "Physical" to be a heading to the 2-column-wide rows beneath it.
Done. Note that HTML escaping / unescaping / validation happens *before* BNF parsing and is not aware of BNF syntax; therefore, your HTML tags must be paired up in the BNF source in order to validate and get unescaped. - MoonShadow
I.e., "stuff ::= <table> things </table>" is legal, but "stuff ::= <table> things" and "things ::= foo </table>" isn't? That's fine. Many thanks! The issue with spaces hadn't occurred to me when I asked, I must admit. --CH

Bug reports

Source for a standalone version is [here].

CategoryGenerator; also seems to have become FunctionalProgrammingLanguage
See also http://www.seventhsanctum.com/

ec2-18-232-179-37.compute-1.amazonaws.com | ToothyWiki | MoonShadow | RecentChanges | Login | Webcomic
Edit this page | View other revisions | Recently used referrers
Last edited October 23, 2012 5:39 pm (viewing revision 99, which is the newest) (diff)