Phi 3 Vision MLX 다운로드 -PHI Phi 3 Vision MLX 소스 코드 다운로드

PHI-3-MLX : Apple Silicon의 언어 및 비전 모델

PHI-3-MLX는 MLX 프레임 워크를 사용하여 Apple Silicon에 최적화 된 PHI-3-Vision Multimodal 모델과 PHI-3-MINI-128K 언어 모델을 활용하는 다목적 AI 프레임 워크입니다. 이 프로젝트는 고급 텍스트 생성에서 시각적 질문 응답 및 코드 실행에 이르기까지 광범위한 AI 작업을위한 사용하기 쉬운 인터페이스를 제공합니다.

특징

PHI-3.5 vision 모델과 통합
PHI-3.5-MINI 모델 지원
MLX를 사용하여 Apple Silicon에서 최적화 된 성능
여러 프롬프트 처리를위한 배치 생성
다양한 AI 작업을위한 유연한 에이전트 시스템
전문 워크 플로우를위한 맞춤형 도구 체인
효율성을 향상시키기위한 모델 양자화
로라 미세 조정 기능
확장 된 기능을위한 API 통합 (예 : 이미지 생성, 텍스트 음성 연설)

최소 요구 사항

PHI-3-MLX는 Apple Silicon Mac에서 실행되도록 설계되었습니다. 최소 요구 사항은 다음과 같습니다.

Apple Silicon Mac (M1, M2 또는 이후)
8GB RAM ( quantize_model=True 옵션을 사용한 양자화)

최적의 성능, 특히 대형 모델 또는 데이터 세트로 작업 할 때 16GB RAM 이상의 MAC를 사용하는 것이 좋습니다.

빠른 시작

명령 줄에서 PHI-3-MLX를 설치하고 시작합니다.

 # Quick install (note: PyPI version may not always be up to date)
pip install phi-3-vision-mlx
phi3v

# For the latest version, you can install directly from the repository:
# git clone https://github.com/JosefAlbers/Phi-3-Vision-MLX.git
# cd Phi-3-Vision-MLX
# pip install -e .

파이썬 스크립트에서 라이브러리를 사용하려면 :

 from phi_3_vision_mlx import generate

1. 핵심 기능

시각적 질문 답변

 generate ( 'What is shown in this image?' , 'https://collectionapi.metmuseum.org/api/collection/v1/iiif/344291/725918/main-image' )

모델 및 캐시 양자화

 # Model quantization
generate ( "Describe the water cycle." , quantize_model = True )

# Cache quantization
generate ( "Explain quantum computing." , quantize_cache = True )

배치 텍스트 생성

 # A list of prompts for batch generation
prompts = [
    "Write a haiku about spring." ,
    "Explain the theory of relativity." ,
    "Describe a futuristic city."
]
# Generate responses using Phi-3-Vision (multimodal model)
generate ( prompts , max_tokens = 100 )

# Generate responses using Phi-3-Mini-128K (language-only model)
generate ( prompts , max_tokens = 100 , blind_model = True )

제한된 빔 디코딩

 from phi_3_vision_mlx import constrain

# Use constrain for structured generation (e.g., code, function calls, multiple-choice)
prompts = [
    "A 20-year-old woman presents with menorrhagia for the past several years. She says that her menses “have always been heavy”, and she has experienced easy bruising for as long as she can remember. Family history is significant for her mother, who had similar problems with bruising easily. The patient's vital signs include: heart rate 98/min, respiratory rate 14/min, temperature 36.1°C (96.9°F), and blood pressure 110/87 mm Hg. Physical examination is unremarkable. Laboratory tests show the following: platelet count 200,000/mm3, PT 12 seconds, and PTT 43 seconds. Which of the following is the most likely cause of this patient’s symptoms? A: Factor V Leiden B: Hemophilia A C: Lupus anticoagulant D: Protein C deficiency E: Von Willebrand disease" ,
    "A 25-year-old primigravida presents to her physician for a routine prenatal visit. She is at 34 weeks gestation, as confirmed by an ultrasound examination. She has no complaints, but notes that the new shoes she bought 2 weeks ago do not fit anymore. The course of her pregnancy has been uneventful and she has been compliant with the recommended prenatal care. Her medical history is unremarkable. She has a 15-pound weight gain since the last visit 3 weeks ago. Her vital signs are as follows: blood pressure, 148/90 mm Hg; heart rate, 88/min; respiratory rate, 16/min; and temperature, 36.6℃ (97.9℉). The blood pressure on repeat assessment 4 hours later is 151/90 mm Hg. The fetal heart rate is 151/min. The physical examination is significant for 2+ pitting edema of the lower extremity. Which of the following tests o should confirm the probable condition of this patient? A: Bilirubin assessment B: Coagulation studies C: Hematocrit assessment D: Leukocyte count with differential E: 24-hour urine protein"
]

# Define constraints for the generated text
constraints = [( 0 , ' n The' ), ( 100 , ' The correct answer is' ), ( 1 , 'X.' )]

# Apply constrained beam decoding
results = constrain ( prompts , constraints , blind_model = True , quantize_model = True , use_beam = True )

옵션에서 선택

 from phi_3_vision_mlx import choose

# Select best option from choices for given prompts
prompts = [
    "What is the largest planet in our solar system? A: Earth B: Mars C: Jupiter D: Saturn" ,
    "Which element has the chemical symbol 'O'? A: Osmium B: Oxygen C: Gold D: Silver"
]

# For multiple-choice or decision-making tasks
choose ( prompts , choices = 'ABCDE' )

로라 미세 조정

 from phi_3_vision_mlx import train_lora , test_lora

# Train a LoRA adapter
train_lora (
    lora_layers = 5 ,  # Number of layers to apply LoRA
    lora_rank = 16 ,   # Rank of the LoRA adaptation
    epochs = 10 ,      # Number of training epochs
    lr = 1e-4 ,        # Learning rate
    warmup = 0.5 ,     # Fraction of steps for learning rate warmup
    dataset_path = "JosefAlbers/akemiH_MedQA_Reason"
)

# Generate text using the trained LoRA adapter
generate ( "Describe the potential applications of CRISPR gene editing in medicine." ,
    blind_model = True ,
    quantize_model = True ,
    use_adapter = True )

# Test the performance of the trained LoRA adapter
test_lora ()

2. 에이전트 상호 작용

다중 회전 대화

 from phi_3_vision_mlx import Agent

# Create an instance of the Agent
agent = Agent ()

# First interaction: Analyze an image
agent ( 'Analyze this image and describe the architectural style:' , 'https://images.metmuseum.org/CRDImages/rl/original/DP-19531-075.jpg' )

# Second interaction: Follow-up question
agent ( 'What historical period does this architecture likely belong to?' )

# End conversation, clear memory for new interaction
agent . end ()

생성 피드백 루프

 # Ask the agent to generate and execute code to create a plot
agent ( 'Plot a Lissajous Curve.' )

# Ask the agent to modify the generated code and create a new plot
agent ( 'Modify the code to plot 3:4 frequency' )
agent . end ()

외부 API 도구 사용

 # Request the agent to generate an image
agent ( 'Draw "A perfectly red apple, 32k HDR, studio lighting"' )
agent . end ()

# Request the agent to convert text to speech
agent ( 'Speak "People say nothing is impossible, but I do nothing every day."' )
agent . end ()

3. 맞춤형 도구 체인

텍스트 내 학습 에이전트

 from phi_3_vision_mlx import add_text

# Define the toolchain as a string
toolchain = """
    prompt = add_text(prompt)
    responses = generate(prompt, images)
    """

# Create an Agent instance with the custom toolchain
agent = Agent ( toolchain , early_stop = 100 )

# Run the agent
agent ( 'How to inspect API endpoints? @https://raw.githubusercontent.com/gradio-app/gradio/main/guides/08_gradio-clients-and-lite/01_getting-started-with-the-python-client.md' )

검색 증강 코딩 에이전트

 from phi_3_vision_mlx import VDB
import datasets

# Simulate user input
user_input = 'Comparison of Sortino Ratio for Bitcoin and Ethereum.'

# Create a custom RAG tool
def rag ( prompt , repo_id = "JosefAlbers/sharegpt_python_mlx" , n_topk = 1 ):
    ds = datasets . load_dataset ( repo_id , split = 'train' )
    vdb = VDB ( ds )
    context = vdb ( prompt , n_topk )[ 0 ][ 0 ]
    return f' { context } n <|end|> n <|user|> n Plot: { prompt } '

# Define the toolchain
toolchain_plot = """
    prompt = rag(prompt)
    responses = generate(prompt, images)
    files = execute(responses, step)
    """

# Create an Agent instance with the RAG toolchain
agent = Agent ( toolchain_plot , False )

# Run the agent with the user input
_ , images = agent ( user_input )

다중 에이전트 상호 작용

 # Continued from Example 2 above
agent_writer = Agent ( early_stop = 100 )
agent_writer ( f'Write a stock analysis report on: { user_input } ' , images )

외부 LLM 통합

 # Create Agent with Mistral-7B-Instruct-v0.3 instead
agent = Agent ( toolchain = "responses, history = mistral_api(prompt, history)" )

# Generate a neurology ICU admission note
agent ( 'Write a neurology ICU admission note.' )

# Follow-up questions (multi-turn conversation)
agent ( 'Give me the inpatient BP goal for this patient.' )
agent ( 'DVT ppx for this patient?' )
agent ( "Patient's prognosis?" )

# End
agent . end ()

벤치 마크

 from phi_3_vision_mlx import benchmark

benchmark ()

일	바닐라 모델	양자화 된 모델	양자화 된 캐시	로라 어댑터
텍스트 생성	25.02 TPS	61.01 TPS	18.68 TPS	24.72 TPS
이미지 캡션	21.29 TPS	44.26 TPS	5.56 tps	20.48 TPS
배치 된 세대	236.60 TPS	149.23 TPS	121.92 TPS	232.78 TPS

(M1 MAX 64GB)

선적 서류 비치

API 참조 및 추가 정보는 다음과 같습니다.

https://josefalbers.github.io/phi-3-vision-mlx/

다음에서 사용할 수있는 튜토리얼 시리즈도 확인하십시오.

https://medium.com/@albersj66

특허

이 프로젝트는 MIT 라이센스에 따라 라이센스가 부여됩니다.

소환

확장하다

Phi 3 Vision MLX

PHI-3-MLX : Apple Silicon의 언어 및 비전 모델

특징

최소 요구 사항

빠른 시작

1. 핵심 기능

시각적 질문 답변

모델 및 캐시 양자화

배치 텍스트 생성

제한된 빔 디코딩

옵션에서 선택

로라 미세 조정

2. 에이전트 상호 작용

다중 회전 대화

생성 피드백 루프

외부 API 도구 사용

3. 맞춤형 도구 체인

텍스트 내 학습 에이전트

검색 증강 코딩 에이전트

다중 에이전트 상호 작용

외부 LLM 통합

벤치 마크

선적 서류 비치

특허

소환

언리얼 토너먼트 3

잃어버린 행성 3

3 에이펙스 무기

9번째 재능 3

SpellForce 3 강화

길드 3

chat.petals.dev

GPT Prompt Templates

GPTyped

ML stack

awesome free chatgpt

pywin_contextmenu

Google Dorks

shepherd

mongo express