Story Scribbler was founded 2 years ago with a simple problem statement: can children with demand avoidance be engaged by AI powered content?
To understand the answer to this, I want to talk about Story Scribbler; who it’s target audience is, how it works, and discuss why the product has been (by some metrics) life-changing for students, but lacks wide enough market fit to be profitable.
The idea for Story Scribbler came about from discussing AI with a local school - the Dorset Center of Excellence / Coombe House School (CHS). CHS is specifically designed for young people from 5 to 19 with special educational needs. This means the challenges it faces are unique compared to traditional primary or secondary schools - for example, some older students will have lower level reading ability (for instance a year 12 student with a year 2 reading level), but still wish to engage with content appropriate for people their age.
What this can often result in is engagement with certain forms of media, like video games or movies, and a lack of interest in reading or writing.
So how do you engage students? You meet them at their interests. But how many books exist, for example, for teenagers but follow the phonics scheme? The short answer - little to none. This is where Story Scribbler steps into the picture.
To re-iterate, the problem here is two-fold:
Try this exercise: come up with a short story (or even a few sentences) following these restrictions for phase 3 phonics:
Phase 3 phonics in the UK (part of the government’s Letters and Sounds programme) introduces children (usually in Reception, age 4–5) to the remaining letters of the alphabet and common digraphs/trigraphs such as ch, sh, th, ng, ai, ee, igh, oa, oo, ar, or, ur, ow, oi, ear, air, ure, er, so they can represent almost every sound in simple words. To write stories for this stage, use short sentences made from regular CVC/CVCC words and these Phase 3 digraphs (e.g. The goat in the rain had a red coat), avoiding more complex spellings and advanced grammar.
It’s suprisingly hard to pull off, consider this common fairy tale opener:
once upon a time, in a land far, far away...
Each of those highlighted terms are inaccessible to a person reading at a phase 3 phonics level.
A quick primer on how AI models like the one powering ChatGPT work - they don’t understand words based on letters or phonics, they process tokens. A token is a unit of text - either single letters, whole words, or fragments of words. This matters since it makes it difficult for a model to definitively determine if a word matches the target phonics scheme.
For instance, our example above might be split up into this series of tokens:
phonics are absolutely harder than it first appears
If we talk about how large language models “percieve” text, this illustration shows it best. Instead of the building blocks of the latin alphabet, a LLM has a “vocabulary” of potentially hundreds of thousands of snippets of text. Some of these snippets are whole words, as we see above, whilst some might follow a common structure (e.g.: percieving ordering as order plus ing, since ing is a common extension for english words), others might be unintuitive to a person (as the structure of the vocabulary arises through a machine learning process, instead of human labelling).
Furthermore, if you click the toggle to show token IDs, you’ll observe that the LLM doesn’t even ingest words directly, but instead uses IDs correlating to parts of its vocabulary (this is because )
We can use this to observe how whitespace is treated - based on the highlighting, we can see that it is attached (often prepended) to words. As it happens, the LLM will assign different IDs to represent words with a space before them versus the word by itself. For example, hello has the ID 24912, whereas hello has the ID 40617 - the model might treat the two inputs almost identically, but it does see them as separate “words” entirely.
This is a fundamental limitation to LLMs. Although you might not observe any noticeable difference between saying hello versus hello to a model, there is a unique difference, and this may confuse the model or create perturbations in its output, much like a butterfly effect, in certain scenarios.
Returning to the problem of phonics, it becomes clear that the model cannot understand the phonics of a word - so how could an LLM, with generally acceptable accuracy, identify the phonics scheme of a particular word, or write a short story that adheres to a phonics scheme?
The answer lies in the training data used to create the model. Models are essentially trained by wrote, meaning they’re exposed to a vast amount of information via a process that through repetition distills emergent behaviours, such as the ability to seemingly reason and understand text, into the LLM.
In short, the model can accurately identify the phonics stage a word belongs to by either of the following mechanisms:
Between these two mechanisms, a model can simulate “good enough” approximation, and only mis-identify words on rare occasions.
So we’ve shown AI can understand phonics, but can it write good stories? Well that depends on your interpretation of what a good story is…
Does this count as a good story? The answer is likely subjective based on your personal tastes, and even popular formula like “save the cat”, “the seven basic plots” or the “Hero’s journey” don’t guarantee appeal (1).
For Story Scribbler, the greatest challenge faced was the passivity of AI. Modern AI is designed to be palatable, which could be argued as making it excellent for writing stories appealing to the widest audience, but in reality results in stories that are simply uncontroversial or “bland”. Forcing a model to exhibit behaviours outside of this out-of-the-box style proved to be an interesting challenge, and represented the most fragile part of the entire project.
Prompt engineering, when models like GPT-3 were released, required concerted effort to get decent performance, but the state of the art has advanced rapidly in the last few years to the point where the expectation for modern models is that prompt engineering is now about squeezing small performance gains. What we found developing the AI agent behind Story Scribbler, however, is that creative writing is still a frontier problem for models. Model upgrades or new features (e.g.: our phonics feature), could easily collapse the performance of the model and revert it back to passive voice, predictable, stories. We didn’t discover a secret phrase for elucidating improved behaviour, but did find that a model could be coaxed into being more creative when creative (high tempurature) sub-agents were guided by lower temperature orchestration-agents, balancing creativity with necessary adherence to e.g.: JSON output or tool calling structures.
Having developed this web-app, we set about sharing it with our beta-tester school. Coombe House School was an ideal first customer, and we found that students readily engaged with the app, creating stories whilst retaining a sense of ownership (as evidenced by anecdotes of sharing with teachers, other students, and carers). Furthermore, the work to create controversial stories paid off, as we found students (particularly the demographic of older students with lower level reading abilities) would often tie their stories back to their own interests, ranging from football to grim-dark fantasy (such as Trench Crusade).
Emboldened by this success, we sought market fit among less-specialist schools, working with industry contacts to meet with schools with traditionally higher than average concentrations of students with learning or behavioural needs. Whilst we found interest amongst leadership and English leads, it was difficult to canvas schools at a sufficient volume to see conversion from our free trial to paid school memberships.
Overall, we’d consider this project to be a success - whilst finding profitability would have allowed Story Scribbler to continue offering services, the project vision wasn’t predoominantly focussed on this (if it had been, compromises to the user experience or more gamified tactics could have been deployed, but these were avoided due to concerns about how it would affect the educational value of the tool).
Although Story Scribbler may not be a unicorn venture, it does have a proved value to schools that adopt it - Coombe House School made the app an official part of it’s curriculum for the 2025/26 academic year, a capstone achievement for the project. Due to it’s tech-stack, the project is well-poised to transition into a native iOS or Android application, leveraging local large language models via device APIs, allowing for low-cost distribution. Additional options for the roadmap also include web-llm integration, however current state of the art produces a poor user experience - but once Chrome integration of Gemini Nano advances, this will be revisited (1, 2).
Get in touch: