documind下载 - documind源代码下载

documind

其他源码

v1.0.9

下载

Documind

Documind是一种高级文档处理工具，利用AI从PDF中提取结构化数据。它旨在处理PDF转换，提取相关信息以及由可自定义模式指定的格式结果。

该仓库建在Zerox -https：//github.com/getomni-ai/zerox上。 Zerox的MIT许可证包含在Core文件夹中，并且在根许可证文件中也提到了。

特征

将PDF转换为图像以进行详细的AI处理。
使用OpenAI的API提取和构建信息。
允许用户为各种文档格式指定提取模式。
专为在本地或云环境上的灵活部署而设计。

尝试托管版本

Documind托管版本的演示很快将供您尝试！托管版本提供了完全管理的API的无缝体验，因此您可以跳过设置并立即提取数据。

要完全访问托管服务，请请求访问权限，我们将为您设置。

要求

在使用documind之前，请确保安装以下软件依赖关系：

系统依赖性

Ghostscript ： documind依靠Ghostscript来处理某些PDF操作。
GraphicsMagick ：文档转换中图像处理所必需的。

在继续之前，请在系统上安装这两个：

 # On macOS
brew install ghostscript graphicsmagick

# On Debian/Ubuntu
sudo apt-get update
sudo apt-get install -y ghostscript graphicsmagick

node.js＆npm

确保系统上安装了Node.js（V18+）和NPM。

安装

您可以通过NPM安装documind ：

npm install documind

环境设置

documind需要一个.env文件来存储像OpenAI API密钥一样的敏感信息。

在您的项目目录中创建.env文件，并添加以下内容：

OPENAI_API_KEY=your_openai_api_key

用法

基本示例

首先，导入documind并定义您的架构。模式概述了每个文档中应寻找的信息documind 。这是一个快速开始的设置。

1。定义模式

模式是每个对象定义的对象数组：

名称：要提取的字段名称。
类型：数据类型（例如， "string" ， "number" ， "array" ， "object" ）。
描述：字段描述。
儿童（可选）：对于数组和对象，定义嵌套字段。

银行对帐单的示例模式：

 const schema = [
  {
    name : "accountNumber" ,
    type : "string" ,
    description : "The account number of the bank statement."
  } ,
  {
    name : "openingBalance" ,
    type : "number" ,
    description : "The opening balance of the account."
  } ,
  {
    name : "transactions" ,
    type : "array" ,
    description : "List of transactions in the account." ,
    children : [
      {
        name : "date" ,
        type : "string" ,
        description : "Transaction date."
      } ,
      {
        name : "creditAmount" ,
        type : "number" ,
        description : "Credit Amount of the transaction."
      } ,
      {
        name : "debitAmount" ,
        type : "number" ,
        description : "Debit Amount of the transaction."
      } ,
      {
        name : "description" ,
        type : "string" ,
        description : "Transaction description."
      }
    ]
  } ,
  {
    name : "closingBalance" ,
    type : "number" ,
    description : "The closing balance of the account."
  }
] ;

2。运行`documind`

使用documind通过传递文件URL和架构来处理PDF。

 import { extract } from 'documind' ;

const runExtraction = async ( ) => {
  const result = await extract ( {
    file : 'https://bank_statement.pdf' ,
    schema
  } ) ;

  console . log ( "Extracted Data:" , result ) ;
} ;

runExtraction ( ) ;

示例输出

这是提取结果的外观的示例：

 {
  "success" : true ,
  "pages" : 1 ,
  "data" : {
    "accountNumber" : " 100002345 " ,
    "openingBalance" : 3200 ,
    "transactions" : [
        {
        "date" : " 2021-05-12 " ,
        "creditAmount" : null ,
        "debitAmount" : 100 ,
        "description" : " transfer to Tom " 
      },
      {
        "date" : " 2021-05-12 " ,
        "creditAmount" : 50 ,
        "debitAmount" : null ,
        "description" : " For lunch the other day "
      },
      {
        "date" : " 2021-05-13 " ,
        "creditAmount" : 20 ,
        "debitAmount" : null ,
        "description" : " Refund for voucher "
      },
      {
        "date" : " 2021-05-13 " ,
        "creditAmount" : null ,
        "debitAmount" : 750 ,
        "description" : " May's rent "
      }
    ],
    "closingBalance" : 2420
  },
  "fileName" : " bank_statement.pdf "
}

模板

Documind带有内置模板，用于从发票，银行语句等流行文档类型中提取数据。这些模板使得在不定义自己的模式的情况下开始更容易入门。

列表可用模板

您可以使用templates.list函数列出所有可用模板。

 import { templates } from 'documind' ;

const templates = templates . list ( ) ;
console . log ( templates ) ; // Logs all available template names

使用模板

要使用模板，只需将其名称与要从中提取数据提取的文件一起传递给extract功能。这是一个例子：

 import { extract } from 'documind' ;

const runExtraction = async ( ) => {
  const result = await extract ( {
    file : 'https://bank_statement.pdf' ,
    template : 'bank_statement'
  } ) ;

  console . log ( "Extracted Data:" , result ) ;
} ;

runExtraction ( ) ;

阅读模板文档，以获取有关模板以及如何贡献您的模板的更多详细信息。

贡献

欢迎捐款！请提交带有任何改进或功能的拉请请求。

执照

该项目已根据AGPL v3.0许可获得许可。

展开

附加信息

版本 v1.0.9
类型其他源码
更新时间 2025-02-26
大小 882.86KB
来自于 Github

documind

Documind

特征

尝试托管版本

要求

系统依赖性

node.js＆npm

安装

环境设置

用法

基本示例

1。定义模式

2。运行`documind`

示例输出

模板

贡献

执照

Google Dorks

shepherd

hidusbf

mongo express

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

hidusbf

Google Dorks

shepherd

hidusbf

documind

Documind

特征

尝试托管版本

要求

系统依赖性

node.js＆npm

安装

环境设置

用法

基本示例

1。定义模式

2。运行documind

示例输出

模板

贡献

执照

2。运行`documind`