Show HN: Personalized Duolingo (kind of) for vocabulary building

github.com

151 points by arbayi 2 days ago

Hi! Wanted to share the project I really wanted to have. TLDR; this app lets you create your own list of words and you get a Duolingo-like experience (kind of, still needs a lot of features) practicing those words in their context.

My English is not the best but not the worst either. But I realized I can't boost it up after a certain level! In my belief, in order to truly learn a language, you need to be exposed to that language often. Vocabulary is the key factor here if you really want to improve in any language.

My experience is that when I read a book to improve my English vocabulary, I encounter words that I don't know so often and my reading gets disturbed. I go look for the meaning, come back, put it in context, re-read it, etc. It didn't work for me. So I tried listening to audiobooks - I listen to the book and read along, and whenever I encounter a word, I write it down. I get these 50 words in 2-3 pages and I ask ChatGPT to give me their meanings. I read them, take the book, and now read it myself. That helps for sure, but still after a while I lose those words because I never encounter them again. Well then, in order to not forget those words, I need some kind of exercise, right? A flashcard app maybe? Well, I still need to go out there, ask ChatGPT to create questions, put them in a flashcard app, etc. It's still time-consuming and this is supposed to be fun!

I need to be exposed to English in my daily life. I just need to save the words somewhere and whenever I want, I need to be able to practice them in a fun way, in Duolingo style maybe? So then I realized would it be better to store words in their own context? I mean, say I read Harry Potter and have a list of words I encountered in it, say I watch Breaking Bad and have a list of words I encountered watching it. I believe seeing those words together and practicing together makes it easier to remember them.

But I shouldn't be the one adding the meaning of the word and the one to generate exercises, right? It all should be automated. The exercise part will be handled by LLM for sure, but for the meaning of the word, I can fetch from a dictionary? But I really don't like the dictionary definitions and one word can have multiple meanings in their own context. So then I need to use LLM for this task too and have the word's meaning in its own context.

You create a list for your context, you add words, meanings get added automatically, and I see the word added in a different color (coloring is also a method used to remember words). It all takes seconds. And whenever I want to practice these lists, I can use learn mode to learn and test my knowledge in quiz mode. So I basically built this app ((thanks to Claude 3.5 Sonnet)). I want it to be like Duolingo, but of course I still have a way ahead to go, but wanted to share it in hopes of getting contributors.

You can read more in the repository. I would love to get your thoughts on this.

cat_multiverse a day ago

Hey there, quick suggestion as a PhD Linguistics candidate and avid language learner!

The best way I've found to identify vocabulary most important to my life is through journaling in the language I'm trying to learn. Describing exactly what I did that day, my thoughts, etc, as best I can.

I had thought of doing the journal entries digitally and gathering dictionary headwords from such journal entries, whether they're written in my mother tongue (English) or not, and use the built dictionary lists to drill vocab.

Traditionally you'd use a lemmatizer with a morphosyntactic tagger for the language to identify the dictionary words, but AI is serviceable these days to easily identify dictionary words from long-form text in many languages, though honestly would be surprised if AI outperforms the traditional methods already.

Good luck and have fun :)

  • learning-tr a day ago

    Thoughts on FSI methodology? That's what I used for mine (my app).

    • cat_multiverse a day ago

      Honestly had never even heard of it! But adult language acquisition isn't really a domain of study I've ever been interested in. I can only speak to what I have found most helpful in my own adult language acquisition journeys. The journaling method was taught to me by a polyglot friend of mine and it sort of solved the "what actually is my everyday vocabulary anyway" side of language learning for me.

      • learning-tr a day ago

        tl;dr "The Foreign Service Institute (FSI) is the primary training institution to prepare American diplomats to advance U.S. foreign affairs interests, teaching, among other things, the languages of the countries where Foreign Service Officers will serve. "

        Apologies, I should have linked beforehand.

tkgally a day ago

This looks really good. I wish I had had something like this many years ago when I was studying languages.

Somebody has already suggested adding spaced repetition and audio, which I agree with completely.

One more suggestion: In addition to having the LLM give you the meaning and example for the context in which you originally saw the word, also ask it to provide the word’s other main meanings and examples of it being used in those senses. You might encounter a word first in a slang or technical sense; while it’s useful to learn that meaning, it’s also important to learn other, more common meanings.

Below are some examples of words you might encounter first in technical contexts but would also be worth knowing in their more general meanings. (Examples suggested and defined by ChatGPT o1.)

canonical

Religious/General: Relating to a canon (e.g., church law) or a recognized body of works.

Math/Computing: Conforming to a standard or simplest form (e.g., “canonical form” of an equation).

resolution

General: A firm decision or determination (often heard in “New Year’s resolution”).

Tech/Imaging: The detail an image holds, typically measured in pixels, dots per inch (DPI), etc.

protocol

Diplomatic/General: The official procedure or set of rules governing state or ceremonial events.

Computing: A set of conventions and rules for transmitting data between electronic devices.

flux

General: Continuous movement or change, often implying instability.

Physics/Engineering: The amount of some quantity (e.g., heat, magnetism) passing through a given area over time.

  • arbayi 9 hours ago

    Thank you! I think that would be a good feature. I use the app, and it works fine when I add words as I encounter them, but when I later revisit the list, I sometimes wish I could see the sentences where the word were used (originally) to better understand it. Based on your suggestion, having a 'show examples' button/link below the meaning would be a cool feature to add.

  • PeterSmit 18 hours ago

    I’ve had this same idea, and it doesn’t work. Or at least: it works quote well, but the problem is that you get hallucinations. And it can be incredibly discouraging to find out the flashcards you’ve been cramming are completlh wrong.

    • 3D30497420 16 hours ago

      I've had this same problem using ChatGPT and German. Even for basic German hallucinations can be unexpected and problematic. (I don't recall the model, but it was a recent one.)

      In one instance, I was having it correct akkusativ/dativ/nominativ sentences and it would say the sentence is in one case when I knew it was in another case. I'd ask ChatGPT if it was sure, and then it would change its answer. If pressed further, it would again change its answer.

      I was originally quite excited about using an LLM for my language practice, but now I'm pretty cautious with it.

      It is also why I'm very skeptical of AI-based language learning apps, especially if the creator is not a native speaker.

      • arbayi 9 hours ago

        Would agentic workflows come in handy in these cases? I mean having a controller agent after the sentence is created, where this agent would be able to search the web or have access to a database? or personal notes and ensure everything is correct.

    • tkgally 18 hours ago

      What models have you been using for that? While I haven’t tried automating the production of vocabulary lists through an API, within the last few weeks I have had the chat versions of ChatGPT 4o, Claude Sonnet 3.5, and one of the latest Gemini models produce annotated vocabulary lists based on literary texts in English, Russian, and Latin. I didn’t spot any hallucinations.

      I was asking only for the meanings of the words and phrases, though. I didn’t ask for things like pronunciations, grammatical categories, etc. In the past, when I’ve tried to get that kind of granular information from LLMs, there were indeed errors, presumably because of tokenization issues.

      A few days ago, I ran some similar tests with Japanese, asking for readings of kanji and jukugo in an extended text. All of the models I had tried before for such tasks had screwed up. This time, however, ChatGPT o1 scored 100%. It also was able to analyze sentence grammar accurately, unlike the other models I tried. I was impressed.

      At current API prices, though, o1 might be a bit too expensive for such a task.

      • arbayi 9 hours ago

        I wonder if there are any benchmarks specifically designed to evaluate LLMs' performance in language learning tasks

        • tkgally 4 hours ago

          I haven’t heard of any. It would great if there were....

arbayi 8 hours ago

Here's a list of all the apps (built by them!) mentioned in comments:

- LangTurbo (by @sebnun) - langturbo.com : Learn through podcasts with transcriptions and contextual word definitions

- Nuenki (by @Alex-Programs) - nuenki.app : Browser extension that translates appropriate-difficulty sentences across websites, with hover-for-definitions feature

- Manabi Reader (by @wahnfrieden) - reader.manabi.io : Japanese-focused integrated reader with SRS and Anki integration

- (by @muth02446) - Spanish: appicenter.net/Apps/VocabES/ - English: appicenter.net/Apps/VocabEN/ : Uses spaced repetition and audio for basic vocabulary learning

- Vocabuo (by @kebsup) - vocabuo.com : Combines SRS flashcards with ebook/YouTube/website reader, using AI for content generation

- LingoStories (by @laurentlb) - github.com/laurentlb/lingostories/ : Open-source language learning tool

- Turkish Learning Tool (by @learning-tr) : Browser extension for colloquial translations with audio and pronunciation features

- Language Reactor (by @davidzweig) : Planning to open-source soon, looking for contributors

Note: above list is summarized by Claude 3.5 Sonnet.

sebnun a day ago

Great work, I had a similar need, and built a similar app (using podcasts) [1]

I originally planned to add some kind of SRS to it, but I found that I learned much better just reading things in context instead of explicitly using SRS to memorize them. Steve Kaufmann (creator of LingQ) explains this better here [2]

[1] https://www.langturbo.com

[2] https://www.youtube.com/watch?v=t26IPxExmzs

  • arbayi a day ago

    Thank you so much both for your comment and for sharing your app! (there are definitely great tools out there that we're not aware of) I am very happy to find your app because I actually needed something like this! I enjoy listening while working and being able to see the transcription alongside it, with word definitions in context - this kind of learning really works for me! It's fantastic how it supports all those languages - you can listen, read, and look up definitions all in one place. Looking at this, the one I shared above looked very basic. You handle transcription, media playing, testing pronouncation, LLM interaction I guess for contextual meaning and examples... ! The only question I have (sorry if this already exists - but i couln't find it) but is there a chance I can see a list of words I've encountered and marked as known?

    And for the second part, I'm planning to include SRS features @markvdb pointed out in comments, combining both contextual learning with SRS would be interested I guess.

  • claylimo a day ago

    Similar to LingQ there is Migaku which can do this for YouTube and other sites. It definitely has significantly aided my learning and made it a zero friction and even fun experience to learn another language.

    • arbayi a day ago

      Thank you for sharing! Looking at their blog, I saw this post about learning Japanese vocabulary (https://migaku.com/blog/japanese/how-to-learn-japanese-vocab...). They share a Japanese Netflix Frequency List - (https://docs.google.com/spreadsheets/d/15b3j9--RJ1K5hI9vz_2L...)

      "To recognize 99% of all the words in Netflix's subtitles, you'd need to know 37,247 words"

      Interesting approach! I really don't know how they managed to gather this list, but it's an interesting and clever method.

    • Alex-Programs a day ago

      There's also https://nuenki.app (disclaimer: I made it), which applies the same approach to every single website*. It translates appropriate-difficulty sentences into your target language, and you can hover for definitions, pronunciations, etc.

      *other than those blocked for privacy reasons

      • dpig_ 9 hours ago

        Any plans to add Hindi, being the third-most spoken language in the world?

      • arbayi a day ago

        I actually want to learn German, but I want to learn it by reading German texts and starting from zero, even though that makes it challenging. I need to look up definitions and such, but translating the entire page defeats the core purpose. This app in my case is just perfect match! Thank you for sharing!

        • Alex-Programs a day ago

          Awesome! Let me know if you have any feedback!

    • davidzweig a day ago

      I'll drop this here: If anyone wants to work on Language Reactor (well compensated), my email is in my profile. I'm planning to start open-sourcing much of it soon.

  • wahnfrieden 16 hours ago

    I built a popular integrated reader and SRS (with Anki integration as an alternative option) similar to LingQ but focused on Japanese currently

    https://reader.manabi.io

  • jkoff 15 hours ago

    [dead]

arbayi 13 hours ago

Thank you so much to everyone contributing to this thread! I learned a lot here just by sharing this - the power of open source, I guess! From all those conversations and recommendations, I've gathered a list of features that will hopefully be built in next coming days:

- A more user-friendly approach to running the app

- LiteLLM integration so we can use any LLM (it's done! thanks to @enessusan00!)

- Running the database locally

- Customizable language preferences (e.g., learning German through Turkish)

- A live version where anyone can easily try the app

- A protection mechanism for LLM responses to ensure getting valid JSON

- Fixing small bugs

- Customizable exercise types (ability to enable/disable specific question formats)

We'll be focusing on improving the app as much as we can, but help would be greatly appreciated! We'll be structuring the repository to make it easier for everyone to contribute together.

I'm truly amazed by all the insights and suggestions shared here. There are so many great ideas. Thank you all again for making this discussion so enriching and the support. I'll keep sharing updates here! All amazing suggestions shared here will be added to the roadmap in the README!

kebsup a day ago

Great app! I've been building something similar, but for less advanced language learners, who wouldn't understand definitions in their target language.

My app [1] is basically a combination of SRS flashcards with an ebook/YouTube/Website reader. Unlike Anki though, AI creates example sentences, definitions, images and audio.

I find it interesting that you want to get inspired by Duolingo. My approach is to have the most efficient grind possible - no gamification. I've found Duolingo was wasting so much of my time with exercises that did not really teach me anything and took a long time to complete + the XP points/levels etc. were quite distracting.

[1] https://vocabuo.com

  • kalido 18 hours ago

    With all the (somewhat competing, though aimed and monetized differently) products in this thread, are there any promotions in place for extensive testing and comparison?

    (E.g., your vocabuo website prominently points to possible promo codes.)

flemhans 20 hours ago

Is there a "Duolingo" that takes a web site as input and makes it into a course? So I could learn by reading e.g. a geeky news site in the language to be learned.

  • arbayi 9 hours ago

    Would be cool to have such a tool, maybe an extension using chrome's built-in AI APIs?

laurentlb a day ago

Interesting approach! Thanks for making it open-source, I think we need more open-language language learning tools. As I'm also building one (https://github.com/laurentlb/lingostories/), I'm going to take a look at what you did and the technical decisions.

You seem to focus on the English use-case. In my experience, getting exposure to other languages can be much more difficult, especially when you're not fluent yet. It would be interesting to see how to approach it: ideally, questions and answers should be in the target language, but the questions have to be very simple.

As someone else mentioned, having audio would be very useful. At some point, you could consider a hand-free mode: it reads the question out loud, pauses a few seconds, then tells the response.

nikkwong 2 days ago

This looks neat. If you’re going to add Duolingo style features, please don’t add fill-in-the-blank or word matching to the question types; or at least make them optional. They are an incredibly frustrating waste of time on Duolingo—they take up a ton of time to solve and don’t actually improve comprehension. My biggest gripe with Duolingo is that half of the questions asked in a lesson are questions like these which have the pretense of helping you learn but don’t actually deliver. I think if you instead came up with some very difficult question types that really challenged someone’s comprehension, it would be stickier than Duolingo (especially for the HN crowd who is actually trying to learn) and not just here to “play a game” like a large portion of the Duolingo audience.

  • jghn a day ago

    Out of curiosity, do have any citations on how those exercises don't enable learning?

    On the latter part, there used to be a hard mode at least in browser mode where you could have it force you hand type every word. I always really liked that, but then they got rid of it. Of course with the heart system these days, I wouldn't last 5 minutes if I tried to do it that way so such is life I suppose

  • arbayi a day ago

    Thank you! I am very interested in this project and want to keep working on it, hopefully getting help from open source contributors.

    I actually had this idea of using Duolingo's style exercises, but now with your comment, I realize some might not be appropriate for individual learners with different goals.

    The cool thing would be to have customizable exercise types, where users can choose which ones they want and which ones they don't want!

    I will add this to the roadmap in the README, pointing out this comment! Thanks again!

trizoza 17 hours ago

This is absolutely amazing. Can I check how do you check the memorized retention? Are you increasing the period of testing a word once it's successfully memorized by doubles, in 1day, in 2days, in 4days, 8days, etc.?

  • arbayi 9 hours ago

    Thank you! The app doesn't have such a feature yet, but I think it will be needed eventually, and it has been recommended before. I will add this to the roadmap.

markvdb 2 days ago

Thank you for your contribution to the FOSS learning space.

Here's a few random suggestions: - spaced repetition. Again, anki style. - audio. Can you make it easy to record a phrase, anki style? Or maybe even make AI pronounce them correctly?

I would something like that.

  • arbayi a day ago

    Thank you so much! I will definitely add those ideas to the roadmap in the README (pointing out this comment).

    I believe the spaced repetition feature must be prioritized because that's the most important thing in this app. I mean, what's the purpose of seeing the words over and over again if I already have confidence with them?

    For the pronunciation feature, I had similar work before and there are great open source tools and libraries we can build upon that analyze your pronunciation and spot where you made mistakes. We can use open source TTS libraries to pronounce the correct version.

    I also would definitely want to see audio questions in exercises similar to Duolingo, and it would be great to work on those features.

  • learning-tr a day ago

    I am learning Turkish so I built something like that for me. You can highlight any word online and it will translate colloquially so you can actually use it irl.

    It also has audio and pronunciation. It is around the halfway mark in the demo.

    demo: https://imgur.com/a/full-demo-so-far-O2fzBJn

    • arbayi 9 hours ago

      Harika görünüyor! Are you planning to share it open source?

      Not: Bu benim en sevdiğim şarkılardan biridir!! (it's one my fav songs)

groggo a day ago

As someone who's learning a language (french) with Duolingo, and also supplementing that with other methods (podcasts, social media, online chatting, talking to chatgpt) I've also really wanted a way to get duolingo type experience with my own set of vocabulary that I encounter. So i'll definitely check this out. Also your english is impressive!

MH4GF a day ago

This is nice! (100th star I pressed!) I'd like to try it, but the setup was a pain, it would be nice to have a button to deploy immediately to Vercel and Supabase! ref: https://vercel.com/docs/deployments/deploy-button

  • arbayi 9 hours ago

    Hi!!! Thank you for being the 100th* star and for suggestion! I've added this to priority list.

    *thanks again!!

  • codenote a day ago

    I thought it was a great idea! I also often use the Vercel Deploy button because it allows me to try things out quickly.

getwiththeprog a day ago

From my research, the best language learning program is Anki. It is open source and one can make custom 'decks' fairly simply. Perhaps a dictionary add-on for Anki would be good idea?

  • krowek 21 hours ago

    I think pretty much the same, to the point where I've committed to save every sentence seen in Duolingo to make then my own Anki deck. I wouldn't bother going back again to Duolingo if I already finished the tree and there are no new units for me.

frizlab 2 days ago

Apologies if this answered in the readme but does this support other languages than English?

  • arbayi a day ago

    Hi! I actually forgot to mention this in the README, thank you for pointing it out.

    The app would work for any language, but the definitions and exercises will be written in English. I created a list just now for German words and added the German word "Zeitreise". It generated this definition:

    <<"Zeitreise" in a German mystery series means time travel. It refers to the act of a character or characters moving through time, either to the past or the future, often as a central element of the mystery's plot.>>

    Exercises were asked in English.

    "What does "Zeitreise" mean?":

    - Time travel - Train journey - Long wait - Difficult puzzle

    Maybe a feature where you can choose the language would be cool. I mean, someone might prefer to learn German using German, or say Spanish using Turkish.

    Again, thank you for pointing it out. I will update the README and hopefully add inference language preference feature.

    • learning-tr a day ago

      Abi, slick design!

      A feature where it supports TR -> EN and vice versa would be amazing!

inetknght 16 hours ago

Some thoughts

I'm on a Duolingo family plan, studying Ukrainian. It keeps throwing more and more difficult words without really building my knowledge and experience with the previous words.

I'm not sure if I can't hear the words correctly (it's possible, I'm partially deaf and it sounds like the voices it provides are low quality). I'm not sure if I'm not pronouncing them correctly (it often doesn't accept my pronunciations). Its feedback for improvement is extremely limited. For example, no matter how hard or slow or fast I pronounce it, it pretty much never accepts один ("oden" = "one") when I speak it.

When I was in 3rd through 6th grade of school, I learned English pronunciations using Spalding phonetics [0]. There are about 70 or so English phonemes if I recall. It would be handy to have that for other languages. It specifically taught how to put letters together to form sounds, and which combinations of letters are synonymous for sounds (but not for spelling, which was a separate class based much more on memorization of rules and exceptions). I excelled in both of these classes.

I've also sometimes asked ChatGPT for translations of words. It seems semi-OK. But it's much better to ask Ukrainian friends and colleagues. Friends and colleagues don't have a lot of time or patience to teach though. And they'd often throw additional meaning or context that was difficult to understand (for example, English has much less assignment of gender to words).

Not too much later in life (8th grade or so), I started writing software. I was homeschooled then, and had a lot of time on my hands. So I'd write software for most of the day every day for months at a time. There came a point where I stopped thinking in English and started thinking in objects and code relationships. I didn't realize it until my mother asked me what I was doing and I had to think to translate to English.

I've heard similar anecdotes: you start to become a native speaker when you can think in that language. I want that from Duolingo but haven't yet achieved it after 2.5 years. I imagine what's missing is just as @cat_multiverse said [1]: I don't really use the Ukrainian language in my daily life and should just start doing so even if it's just a journal. But without any feedback about correct pronunciation or grammar I worry that I would end up with my own mini language instead of truly a Ukrainian one.

[0]: https://spalding.org/

[1]: https://news.ycombinator.com/item?id=42774032

  • agos 16 hours ago

    wait, are you dictating into Duolingo? is it available only to some languages?

    • inetknght 9 hours ago

      > wait, are you dictating into Duolingo?

      Am I dictating insomuch as writing speaking free-form words and it writes them down? No.

      Duolingo will present a word and ask me to verbalize them into the microphone.

      Duolingo will present a phrase or sentence and ask me to verbalize them into the microphone.

      I know that Duolingo will do this also for learning German.

    • bigfishrunning 10 hours ago

      It's available in Spanish, on Android, (which is the only duolingo permutation I have experience with) and seems to use google's speech recognition backend.