interpretable embeddings下载 - interpretable embeddings源代码下载

interpretable embeddings

Ai源码

1.0.0

下载

❓提问的嵌入empedings❓

通过询问LLMS问题，QA-EMB论文的代码来制作可解释的嵌入。

Qa-Embs通过向预培训的自回归LLM提出一系列不是问题来构建可解释的嵌入。

Quickstart

如果您只想在自己的应用程序中使用QA-EMB，则最简单的方法是通过Imodelsx软件包。要安装，只需运行pip install imodelsx即可。

然后，您可以通过提出有关域的问题来生成自己的可解释的嵌入：

 from imodelsx import QAEmb
import pandas as pd

questions = [
    'Is the input related to food preparation?' ,
    'Does the input mention laughter?' ,
    'Is there an expression of surprise?' ,
    'Is there a depiction of a routine or habit?' ,
    'Does the sentence contain stuttering?' ,
    'Does the input contain a first-person pronoun?' ,
]
examples = [
    'i sliced some cucumbers and then moved on to what was next' ,
    'the kids were giggling about the silly things they did' ,
    'and i was like whoa that was unexpected' ,
    'walked down the path like i always did' ,
    'um no um then it was all clear' ,
    'i was walking to school and then i saw a cat' ,
]

checkpoint = 'meta-llama/Meta-Llama-3-8B-Instruct'

embedder = QAEmb (
    questions = questions , checkpoint = checkpoint , use_cache = False )
embeddings = embedder ( examples )

df = pd . DataFrame ( embeddings . astype ( int ), columns = [
    q . split ()[ - 1 ] for q in questions ])
df . index = examples
df . columns . name = 'Question (abbreviated)'
display ( df . style . background_gradient ( axis = None ))
- - - - - - - - DISPLAYS ANSWER FOR EACH QUESTION IN EMBEDDING - - - - - - - -

数据集设置

在论文中重现fMRI实验所需的数据集的说明。

使用python experiments/00_load_dataset.py
- 在您运行的任何地方创建一个data元
将neuro1.config.root_dir设置为要存储数据的位置
要{root_dir}/ds003020/derivative/pycortex-db/ flatmap
要运行ENG1000，需要从此处获取em_data目录，然后将其内容移至{root_dir}/em_data
加载响应
- neuro1.data.response_utils功能load_response
- 加载来自{root_dir}/ds003020/derivative/preprocessed_data/{subject} wheretheressmoke.h5响应，它们存储在每个故事的H5文件中
加载刺激
- neuro1.features.stim_utils功能load_story_wordseqs
- 从{root_dir}/ds003020/derivative/TextGrids", where each story has a TextGrid file, eg wheretheressmoke.textgrid`
- 使用{root_dir}/ds003020/derivative/respdict.json获取每个故事的长度

代码安装

在此处安装代码作为完整开发的包装的说明。

从repo目录开始，从pip install -e .在本地安装neuro1软件包
python 01_fit_encoding.py --subject UTS03 --feature eng1000
- 编码的其他可选参数。py采用会话，ndelays，single_alpha，允许用户更改所使用的线性回归的数据量和正则化方面。
- 然后，此功能将将模型性能指标和模型权重作为Numpy阵列保存。

引用

 @ misc { benara2024crafting ,
      title = { Crafting Interpretable Embeddings by Asking LLMs Questions }, 
      author = { Vinamra Benara and Chandan Singh and John X. Morris and Richard Antonello and Ion Stoica and Alexander G. Huth and Jianfeng Gao },
      year = { 2024 },
      eprint = { 2405.16714 },
      archivePrefix = { arXiv },
      primaryClass = { cs.CL }
}