Skip to content

GitHub Org's stars Twitter Follow Hugging Face

ARBML is a group of researchers working on democratizing Arabic NLP research and deveopment:

  • 🙋‍♀️ All about Arabic NLP and ML, open source for the win!
  • 🏵️ Contribution guidelines - open an issue and given the go-ahead submit a PR.
  • 👩‍💻 Some repos have specific contribution guidlines.
  • 📝 Remember to cite if you use one of our resources.

Pinned Loading

  1. ARBML ARBML Public

    Implementation of many Arabic NLP and CV projects. Providing real time experience using many interfaces like web, command line and notebooks.

    JavaScript 422 48

  2. klaam klaam Public

    Arabic speech recognition, classification and text-to-speech.

    Jupyter Notebook 421 85

  3. masader masader Public

    The largest public catalogue for Arabic NLP and speech datasets. There are +500 datasets annotated with more than 25 attributes.

    JavaScript 193 35

  4. Calliar Calliar Public

    A dataset for online Arabic calligraphy. A collection of 2500 annotated calligraphic styles.

    Jupyter Notebook 153 20

  5. tkseem tkseem Public

    Arabic Tokenization Library. It provides many tokenization algorithms.

    Jupyter Notebook 110 21

  6. CIDAR CIDAR Public

    Instruction dataset for Arabic with 10,000 instruction and output pairs. CIDAR can be used to fine-tune LLMs to follow instructions.

    Jupyter Notebook 43 8

Repositories

Showing 10 of 30 repositories
  • masader Public

    The largest public catalogue for Arabic NLP and speech datasets. There are +500 datasets annotated with more than 25 attributes.

    ARBML/masader’s past year of commit activity
    JavaScript 193 GPL-3.0 35 1 1 Updated Jan 30, 2026
  • ARBML/masader-webservice’s past year of commit activity
    Python 5 MIT 7 2 1 Updated Oct 13, 2025
  • CIDAR Public

    Instruction dataset for Arabic with 10,000 instruction and output pairs. CIDAR can be used to fine-tune LLMs to follow instructions.

    ARBML/CIDAR’s past year of commit activity
    Jupyter Notebook 43 Apache-2.0 8 0 0 Updated Apr 3, 2025
  • Calliar Public

    A dataset for online Arabic calligraphy. A collection of 2500 annotated calligraphic styles.

    ARBML/Calliar’s past year of commit activity
    Jupyter Notebook 153 MIT 20 2 0 Updated Jun 24, 2024
  • dar Public

    A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.

    ARBML/dar’s past year of commit activity
    Python 11 Apache-2.0 2 1 0 Updated Jun 23, 2024
  • ARBML/arbml.github.io’s past year of commit activity
    HTML 0 2 1 0 Updated May 10, 2024
  • .github Public
    ARBML/.github’s past year of commit activity
    1 1 0 0 Updated Apr 13, 2024
  • CIDAR-v2 Public
    ARBML/CIDAR-v2’s past year of commit activity
    Jupyter Notebook 6 2 3 0 Updated Mar 30, 2024
  • ARBML/cidar_human_eval’s past year of commit activity
    Python 1 1 1 0 Updated Mar 3, 2024
  • ARBML Public

    Implementation of many Arabic NLP and CV projects. Providing real time experience using many interfaces like web, command line and notebooks.

    ARBML/ARBML’s past year of commit activity
    JavaScript 422 MIT 48 10 (4 issues need help) 0 Updated Mar 1, 2024