Descarga de Jlama - Descargar el código fuente Jlama

Jlama

Otro código fuente

v0.8.3

Descargar

? JLama: un motor de inferencia LLM moderno para Java

Lindo jlama

Características

Soporte del modelo:

Modelos de Gemma y Gemma 2
Llama & Llama2 & Llama3 Models
Modelos Mistral y Mixtral
Modelos QWEN2
Modelos de granito de IBM
Modelos GPT-2
Modelos Bert
Tokenizadores de BPE
Tokenizers de la obra de palabras

Implementos:

Atención paginada
Mezcla de expertos
Llamadas de herramientas
Generar incrustaciones
Soporte del clasificador
Huggingface Safetensors modelo y formato de tokenizador
Soporte para tipos F32, F16, BF16
Soporte para cuantificación del modelo Q8, Q4
Operaciones rápidas de gemm
Inferencia distribuida!

JLama requiere Java 20 o posterior y utiliza la nueva API de vector para una inferencia más rápida.

? ¿Para qué se usa?

Agregue la inferencia LLM directamente a su aplicación Java.

? Comienzo rápido

‍♀️ cómo usar como cliente local (¡con JBang!)

JLama incluye una herramienta de línea de comandos que facilita el uso.

El CLI se puede ejecutar con JBang.

 # Install jbang (or https://www.jbang.dev/download/)
curl -Ls https://sh.jbang.dev | bash -s - app setup

# Install Jlama CLI (will ask if you trust the source)
jbang app install --force jlama@tjake

Ahora que tiene instalado JLama, puede descargar un modelo de Huggingface y chatear con él. Tenga en cuenta que tengo modelos pre-cantizados disponibles en https://hf.co/tjake

 # Run the openai chat api and UI on a model
jlama restapi tjake/Llama-3.2-1B-Instruct-JQ4 --auto-download

Abra el navegador a http: // localhost: 8080/

Chat de demostración

Usage:

jlama [COMMAND]

Description:

Jlama is a modern LLM inference engine for Java !
Quantized models are maintained at https://hf.co/tjake

Choose from the available commands:

Inference:
  chat                 Interact with the specified model
  restapi              Starts a openai compatible rest api for interacting with this model
  complete             Completes a prompt using the specified model

Distributed Inference:
  cluster-coordinator  Starts a distributed rest api for a model using cluster workers
  cluster-worker       Connects to a cluster coordinator to perform distributed inference

Other:
  download             Downloads a HuggingFace model - use owner/name format
  list                 Lists local models
  quantize             Quantize the specified model

? ‍ Cómo usar en su proyecto Java

El objetivo principal de JLama es proporcionar una forma simple de usar modelos de idiomas grandes en Java.

La forma más sencilla de incrustar a Jlama en su aplicación es con la integración Langchain4j.

Si desea incrustar a JLama sin LangChain4J, agregue las siguientes dependencias de Maven a su proyecto:

< dependency >
  < groupId >com.github.tjake</ groupId >
  < artifactId >jlama-core</ artifactId >
  < version >${jlama.version}</ version >
</ dependency >

< dependency >
  < groupId >com.github.tjake</ groupId >
  < artifactId >jlama-native</ artifactId >
  <!-- supports linux-x86_64, macos-x86_64/aarch_64, windows-x86_64 
       Use https://github.com/trustin/os-maven-plugin to detect os and arch -->
  < classifier >${os.detected.name}-${os.detected.arch}</ classifier >
  < version >${jlama.version}</ version >
</ dependency >

JLama utiliza funciones de vista previa de Java 21. Puede habilitar las características a nivel mundial con:

 export JDK_JAVA_OPTIONS= " --add-modules jdk.incubator.vector --enable-preview "

o habilite las funciones de vista previa configurando los complementos del compilador Maven y FailsSafe.

Luego puede usar las clases de modelos para ejecutar modelos:

 public void sample () throws IOException {
    String model = "tjake/Llama-3.2-1B-Instruct-JQ4" ;
    String workingDirectory = "./models" ;

    String prompt = "What is the best season to plant avocados?" ;

    // Downloads the model or just returns the local path if it's already downloaded
    File localModelPath = new Downloader ( workingDirectory , model ). huggingFaceModel ();
    
    // Loads the quantized model and specified use of quantized memory
    AbstractModel m = ModelSupport . loadModel ( localModelPath , DType . F32 , DType . I8 );

    PromptContext ctx ;
    // Checks if the model supports chat prompting and adds prompt in the expected format for this model
    if ( m . promptSupport (). isPresent ()) {
        ctx = m . promptSupport ()
                . get ()
                . builder ()
                . addSystemMessage ( "You are a helpful chatbot who writes short responses." )
                . addUserMessage ( prompt )
                . build ();
    } else {
        ctx = PromptContext . of ( prompt );
    }

    System . out . println ( "Prompt: " + ctx . getPrompt () + " n " );
    // Generates a response to the prompt and prints it
    // The api allows for streaming or non-streaming responses
    // The response is generated with a temperature of 0.7 and a max token length of 256
    Generator . Response r = m . generate ( UUID . randomUUID (), ctx , 0.0f , 256 , ( s , f ) -> {});
    System . out . println ( r . responseText );
 }

¡Danos una estrella!

Si desea o está utilizando este proyecto para construir el suyo, danos una estrella. Es una forma gratuita de mostrar su apoyo.

? ️ Hoja de ruta

Apoyar más y más modelos
~~Agregar tokenizadores de java puro~~
~~Apoye la cuantización (por ejemplo, la quantización K)~~
Agregar soporte de Lora
Soporte GRAALVM
~~Agregar inferencia distribuida~~

? ️ Licencia y cita

El código está disponible bajo la licencia Apache.

Si encuentra útil este proyecto en su investigación, cite este trabajo en

 @misc{jlama2024,
    title = {Jlama: A modern Java inference engine for large language models},
    url = {https://github.com/tjake/jlama},
    author = {T Jake Luciani},
    month = {January},
    year = {2024}
}

Expandir

Información adicional

Versión v0.8.3
Tipo Otro código fuente
Fecha de actualización 2025-02-25
tamaño 3.19MB
Proviene de Github

Aplicaciones relacionadas

Google Dorks

2025-03-10
shepherd

2025-06-04
hidusbf

2025-02-14
mongo express

2025-06-04
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

Recomendado para ti

chat.petals.dev

Otro código fuente

1.0.0
GPT Prompt Templates

Otro código fuente

1.0.0
GPTyped

Otro código fuente

GPTyped 1.0.5
Google Dorks

Otro código fuente

1.0
shepherd

Otro código fuente

v6.1.6-react-shepherd: Prepare Release (#3063)
hidusbf

Otro código fuente

1.0.0
Google Dorks

Otro código fuente

1.0
shepherd

Otro código fuente

v6.1.6-react-shepherd: Prepare Release (#3063)
hidusbf

Otro código fuente

1.0.0

Información relacionada Todo