The History of Elephant

I think I could build a better language learning system

— A friend of mine in college

Back then, I dismissed my friend. The apps out there were designed by teaching experts after all! I hadn’t seriously studied a language for a long time at that point, just a bit of Duolingo. But I did start soon after.

I studied Korean - not very successfully, though. I studied on and off, using Duolingo, taking a class at the local Volkshochschule, flashcards using Memrise or Anki, studying with books, buying - and taking! - courses online, and many other attempts.

At first, I was searching for a shortcut. I thought that if I just found the best tool, I wouldn’t have to put in so much effort. A very sweet lie I told myself, especially considering I was already using the scientifically best way to remember vocabulary.

In the end, there is no shortcut to putting in the effort in studying. The brain will only remember what you make your brain do - listen to the language, recall vocabulary, speak the language, and write the language. I was barely doing any of that, and to nobody’s surprise, I was not progressing.

When I tried a new tool, I was excited, but I had to start from scratch. Read Chapter One, Lesson One, the introduction to Hangul. If I were lucky, there was a placement test, a self-evaluation, or a way to mark something as known. However, the lessons might contain new vocabulary, so I had to review everything anyway.

The worst was Memrise. Many courses had community-maintained Merise decks, which is awesome! But they did not have a shared pool of vocabulary, so if you follow two, or likely many more, courses, you have a lot of duplication. Software should be better.

Beyond spaced repetition cards, which are effective but hardly interesting, the tasks were repetitive. The feedback was restricted, often only a binary “correct” or “incorrect” with a given solution. There was little flexibility in translation tasks.

Even back then, I thought computers could do more. Computers should do more. Linguistic tools like morpheme analysers and all the rich results from natural language processing research (NLP) should be able to help us learn.

I was no expert in computer linguistics, though. I was a PhD student in distributed systems and cryptography - at least a field that qualified me to feel superior to everyone and overestimate my own abilities. After I finished my PhD, in the midst of the pandemic, I decided to do some research and work on the project.

At the time, I was waiting for a visa for a job abroad. It took time for the visa processing to go through, as the government changed its mind between processing requests normally and requiring emergency requests. Restarting half the process every time, changing forms, and never providing English translations. In short, I had some time.

I learned a lot about morpheme analysers, grammatical error correction, word2vec and more.

A picture formed in my mind: I started working on a prototype. I named it “Hannibal” - based on the first syllable “Han” (한) for Korea. Hannibal would also be the name of a cute elephant, the mascot of the app. I can’t help it, I love cute animal figures like Daram, that are popular in Korea.

Alas, my visa fell through. I had to find a job, find a new apartment (I had cancelled my lease for the approved flight that never happened), and, overall, move on with my life quickly. At least I could go back to Korea, visiting my girlfriend, as they opened up the border again. My life became too busy for Hannibal.

I switched jobs - again - and moved - again -, working at a VC-funded startup, when AI hell broke loose - ChatGPT released. I was loosely following AI for my interest in NLP, but it brought AI back to the minds of everyone, including my friend. Suddenly, I got a call from him: “Do you remember the bit about building a better language learning tool?” I did. (I am paraphrasing; in reality, he asked, “Do you remember saying, ‘I never want to work in the same department as my friends?’ How serious were you about that?”, but he meant to ask the other question.)

We, he, his wife, and I got together and made plans to create what we had imagined. We started working in our off time. We went back to researching language learning techniques, implementing FSRS (really cool work! The world lacks open spaced repetition research), and what the project actually should look like, and the target market and languages. Lastly, how to fund everything. We could not afford not to earn any money.

If you paid attention, you might have guessed it. We chose English for Chinese speakers, especially for Gaokao preparation for Chinese students. A large market of users, a strong change in regulation (lots of private tutors had a hard time getting their visa extended), and good language support with LLMs for English. And connections for a potential business lead and investors. All built around the capabilities of the new LLMs.

I took a TEFL course to learn more about teaching English. I started implementing prototypes for interactions with the LLM, basic web endpoints, and an awful basic design, which I will not show to you.

Unfortunately, beyond the first prototype and investment talks, we dropped the project. We couldn’t handle the project on the side - we didn’t ‘get’ the investment game. We rushed towards burnout while working our day jobs and were not willing or able to compromise.

However, Hannibal just wouldn’t stay out of my head. I wanted to realize the idea I had years ago. At that point, that idea was so different from what we worked on. But I wanted it to be fun. Another friend got me interested in learning Rust, and it seemed fun - I also thought C++ was fun, so take that with a grain of salt.

I scrapped everything and began from scratch with a Rust prototype. Authentication, Users, Vocabulary Datastructures! Hitting a roadblock on data — but I stumbled on gold: The Korean learners dictionary is not only a great resource, but they also provide their data in a machine-readable format. (It was a bit of a pain to work with at first, though.)

With this, I thought I could build at least a dictionary and some flashcards. Flashcards, however, are already well served by Anki, and not what I wanted as a core feature. I had to start somewhere, though. SO I implemented the datastrcutures for the dictionary data - again - they were much different from what I imagined originally. When in doubt, take what the dictionary experts tell you, not what a computer scientist thought it should be.

Then, using lindera, a morphene analyser and tokeniser in Rust, I built a first vocabulary detection pipeline. After having some issues with Naver’s change in policy for using the Papago API, I started searching for another translation service for Korean. I stumbled upon murf. Murf is a service providing high-quality translations and voice generation. I couldn’t stop myself from imagining all the things I could implement - and it was fun. I had no deadline. No pressure. When I wanted to program, I did. Finally arriving at something that I thought was worth sharing.

To share it, there had to be a domain and a name. Hannibal - I registered “hannibal.one” at some point, but was talked out of it: negative connotations, nobody would understand, not a good URL. Infinite reasons not to use it. By now, I at least agree with the terrible URL. After lots of searching, I picked “Kokoa” - I couldn’t let go of the elephant and thought I’d start on the first syllable of 코키리 ko - so funny, right?

Maybe you can guess why I did not pick kokoa.

But I could not let go of the elephant. So the elephant became everything. In German, we have the word “Mammutprojekt”, a project of large scope, long duration, and probably expensive.

So this is my modern mammoth, my elephant project. Even without the elephant, because I couldn’t hire someone to draw it for me.

— David