تنزيل Jlama - تنزيل رمز المصدر Jlama

Jlama

شفرة المصدر الأخرى

v0.8.3

تنزيل

؟ Jlama: محرك استنتاج LLM الحديث لجافا

لطيف jlama

سمات

دعم النموذج:

طرازات Gemma & Gemma 2
طرازات Llama & Llama2 & Llama3
نماذج Mistral و Mixtral
نماذج QWEN2
نماذج الجرانيت IBM
نماذج GPT-2
نماذج بيرت
BPE المميزات
المميزات وورد

الأدوات:

انتباه راسخ
مزيج من الخبراء
أداة الاتصال
توليد التضمينات
دعم المصنف
نموذج Safetensors Huggingface وتنسيق Tokenizer
دعم أنواع F32 ، F16 ، BF16
دعم كمية طراز Q8 ، Q4
عمليات GEMM السريعة
الاستدلال الموزع!

تتطلب Jlama Java 20 أو لاحقًا وتستخدم واجهة برمجة تطبيقات Vector الجديدة لاستنتاج أسرع.

؟ ما الذي يستخدم ل؟

أضف استنتاج LLM مباشرة إلى تطبيق Java الخاص بك.

؟ بداية سريعة

‍♀ كيفية استخدام كعميل محلي (مع JBang!)

يتضمن Jlama أداة سطر الأوامر تجعلها سهلة الاستخدام.

يمكن تشغيل CLI مع jbang.

 # Install jbang (or https://www.jbang.dev/download/)
curl -Ls https://sh.jbang.dev | bash -s - app setup

# Install Jlama CLI (will ask if you trust the source)
jbang app install --force jlama@tjake

الآن بعد أن قمت بتثبيت Jlama ، يمكنك تنزيل نموذج من Huggingface والدردشة معه. لاحظ أن لدي نماذج مسبقة متوفرة على https://hf.co/tjake

 # Run the openai chat api and UI on a model
jlama restapi tjake/Llama-3.2-1B-Instruct-JQ4 --auto-download

فتح المتصفح إلى http: // localhost: 8080/

الدردشة التجريبية

Usage:

jlama [COMMAND]

Description:

Jlama is a modern LLM inference engine for Java !
Quantized models are maintained at https://hf.co/tjake

Choose from the available commands:

Inference:
  chat                 Interact with the specified model
  restapi              Starts a openai compatible rest api for interacting with this model
  complete             Completes a prompt using the specified model

Distributed Inference:
  cluster-coordinator  Starts a distributed rest api for a model using cluster workers
  cluster-worker       Connects to a cluster coordinator to perform distributed inference

Other:
  download             Downloads a HuggingFace model - use owner/name format
  list                 Lists local models
  quantize             Quantize the specified model

؟ ‍ كيف تستخدم في مشروع Java الخاص بك

الغرض الرئيسي من Jlama هو توفير طريقة بسيطة لاستخدام نماذج لغة كبيرة في Java.

أبسط طريقة لتضمين Jlama في تطبيقك هي مع تكامل Langchain4J.

إذا كنت ترغب في تضمين Jlama بدون Langchain4J ، فأضف تبعيات Maven التالية إلى مشروعك:

< dependency >
  < groupId >com.github.tjake</ groupId >
  < artifactId >jlama-core</ artifactId >
  < version >${jlama.version}</ version >
</ dependency >

< dependency >
  < groupId >com.github.tjake</ groupId >
  < artifactId >jlama-native</ artifactId >
  <!-- supports linux-x86_64, macos-x86_64/aarch_64, windows-x86_64 
       Use https://github.com/trustin/os-maven-plugin to detect os and arch -->
  < classifier >${os.detected.name}-${os.detected.arch}</ classifier >
  < version >${jlama.version}</ version >
</ dependency >

يستخدم Jlama ميزات معاينة Java 21. يمكنك تمكين الميزات على مستوى العالم مع:

 export JDK_JAVA_OPTIONS= " --add-modules jdk.incubator.vector --enable-preview "

أو تمكين ميزات المعاينة عن طريق تكوين برنامج التحويل البرمجي Maven و Failsafe المكونات الإضافية.

ثم يمكنك استخدام فئات النماذج لتشغيل النماذج:

 public void sample () throws IOException {
    String model = "tjake/Llama-3.2-1B-Instruct-JQ4" ;
    String workingDirectory = "./models" ;

    String prompt = "What is the best season to plant avocados?" ;

    // Downloads the model or just returns the local path if it's already downloaded
    File localModelPath = new Downloader ( workingDirectory , model ). huggingFaceModel ();
    
    // Loads the quantized model and specified use of quantized memory
    AbstractModel m = ModelSupport . loadModel ( localModelPath , DType . F32 , DType . I8 );

    PromptContext ctx ;
    // Checks if the model supports chat prompting and adds prompt in the expected format for this model
    if ( m . promptSupport (). isPresent ()) {
        ctx = m . promptSupport ()
                . get ()
                . builder ()
                . addSystemMessage ( "You are a helpful chatbot who writes short responses." )
                . addUserMessage ( prompt )
                . build ();
    } else {
        ctx = PromptContext . of ( prompt );
    }

    System . out . println ( "Prompt: " + ctx . getPrompt () + " n " );
    // Generates a response to the prompt and prints it
    // The api allows for streaming or non-streaming responses
    // The response is generated with a temperature of 0.7 and a max token length of 256
    Generator . Response r = m . generate ( UUID . randomUUID (), ctx , 0.0f , 256 , ( s , f ) -> {});
    System . out . println ( r . responseText );
 }

أعطنا نجمًا!

إذا كنت ترغب في ذلك أو كنت تستخدم هذا المشروع لبناء خاص بك ، فالرجاء إعطائنا نجمة. إنها طريقة مجانية لإظهار دعمك.

خريطة الطريق

دعم المزيد والمزيد من النماذج
~~أضف مميزات جافا النقية~~
~~كمية الدعم (مثل k-quantization)~~
إضافة دعم لورا
دعم Graalvm
~~إضافة الاستدلال الموزع~~

؟ ️ الترخيص والاستشهاد

الرمز متاح بموجب ترخيص Apache.

إذا وجدت هذا المشروع مفيدًا في بحثك ، فيرجى الاستشهاد بهذا العمل

 @misc{jlama2024,
    title = {Jlama: A modern Java inference engine for large language models},
    url = {https://github.com/tjake/jlama},
    author = {T Jake Luciani},
    month = {January},
    year = {2024}
}

يوسع

معلومات إضافية

الإصدار v0.8.3
النوع شفرة المصدر الأخرى
وقت التحديث 2025-02-25
الحجم 3.19MB
من Github

تطبيقات ذات صلة

Google Dorks

2025-03-10
shepherd

2025-06-04
hidusbf

2025-02-14
mongo express

2025-06-04
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

نوصي لك

chat.petals.dev

شفرة المصدر الأخرى

1.0.0
GPT Prompt Templates

شفرة المصدر الأخرى

1.0.0
GPTyped

شفرة المصدر الأخرى

GPTyped 1.0.5
Google Dorks

شفرة المصدر الأخرى

1.0
shepherd

شفرة المصدر الأخرى

v6.1.6-react-shepherd: Prepare Release (#3063)
hidusbf

شفرة المصدر الأخرى

1.0.0
Google Dorks

شفرة المصدر الأخرى

1.0
shepherd

شفرة المصدر الأخرى

v6.1.6-react-shepherd: Prepare Release (#3063)
hidusbf

شفرة المصدر الأخرى

1.0.0

أخبار ذات صلة الكل