What makes comprehensible input comprehensible?

(cij-analysis.streamlit.app)

29 points | by surprisetalk 4 days ago

5 comments

  • ragazzina 3 hours ago
    > Word length - At least in English and French (the languages I know best), longer words are generally considered harder.

    I think in a language with a lot of similar sounds or even homophones, longer words are easier. For a beginner Chinese speaker that knows both words, hearing "chē" will probably be ambiguous, but "chūzūchē" will be parsed immediately.

    • joshdavham 2 hours ago
      That’s a good point.

      I don’t think the ‘longer equals harder’ pattern holds for every language. I actually reached out to the head teacher at CIJ when I first made this analysis and she said the same.

  • joshdavham 4 hours ago
    Oh wow! I’m surprised to see someone post my analysis haha

    Happy to answer any questions here. I kept my analysis really high level for a general audience but since this is HN, we can get a bit nerdy :D

  • EdiX 2 hours ago
    I don't think this captures the whole situation. Much of what makes comprehensible input comprehensible, at lower levels, is presence of visual hints.
    • joshdavham 2 hours ago
      That's exactly right.

      Much of the beginner videos make use of visual hints like you say (images, props, etc), and none of these were taken into account in my analysis.

      I do think it could be cool to do a 'visual' analysis of CI in the future where you attempt to measure how much context is present (or not) in each video and see what insights you could draw from that.

  • flippyhead 3 hours ago
    I love this. I made a totally free, just for fun, tool based around learning Japanese via Youtube using the CI approach. https://seikai.tv The trick is finding content that is at the right level but that you also find interesting. Great article, thank you!
  • joshdavham 2 hours ago
    Here's the source code for this analysis to those interested: https://github.com/joshdavham/cij-analysis

    I will note that the transcripts (and parsing scripts) are not included in the repo. The transcripts are not my intellectual property so I can't share it (and the parsing scripts are a bit of a dumpster fire).