[NB:此存储库是评估CHATGPT的相同推理任务的后续。]
大型语言模型(LLM),例如GPT-4,Chatgpt和Claude表现出了令人印象深刻的编程能力,并且能够解决各种语言及其分类法的问题。尽管取得了这些成功,但在这些模型表现出对这些语言基础的句法和操作规则的任何基本欣赏程度上,某些怀疑态度仍然存在。
在此及时工程存储库中,使用任意域特异性语言(DSL)探索GPT-4的编程能力。 DSL代表了研究LLMS的推理能力的有吸引力的底物,因为它们是新颖的,并且在训练中遇到广泛遇到和记忆的可能性较小。因此,它们可以更直接地测试LLM可以以几种方式推断出新型编程语言规则的程度。
在这里,选择了特定于域的语言SIL(对称集成语言),原因有两个。首先,GPT-4在培训过程中暴露于任何大量的SIL代码,因为这是由技术繁重的对冲基金(称为Symmetry Investments)开发的内部DSL。其次,作为一种编程语言,该模型具有一些有趣的功能(例如,这是一种强调表达性的功能性语言,但缺乏像Haskell或Ocaml中的let ”。

在一些示例代码提示之后,GPT-4尝试在一种称为“ SIL”的小说,功能性DSL中编写代码。
以下是一些提示的集合,其中包括SIL代码的简短示例,这些示例突出了其功能。在提示GPT-4执行任务并提供SIL代码样本之后(请参见下文;在此回购中也可以找到提示和生成的代码样本的完整库),我要求它在SIL中实现许多主流编程任务。
在下面的一节中,我显示了一些示例SIL代码脚本,并在此提示了该模型(可以在此处找到完整的示例)及其在SIL中实现各种问题的尝试。
第一个提示是处理一些邮件服务器功能的脚本。因此,它与我随后会提示chatgpt的问题类型有很大不同,但它确实说明了SIL(例如|> )的某些语法,数据结构和功能方面。
// example of using IMAP IDLE to run rules on new mail
import imap
moveMessages(session,ids,target) => if (ids.length > 0 ) then imap.moveUIDs(session,ids,target) else false
login = imap.ImapLogin(environment( " IMAP_USER " ),environment( " IMAP_PASS " ))
server = imap.ImapServer( " imap.fastmail.com " , " 993 " )
session = imap.Session(server,login, true ,imap.Options(debugMode: true )) | > imap.openConnection | > imap.login
rules=[ [
[ " INBOX/0receipts " ,
[
" FROM [email protected] " ,
" FROM interactivebrokers.com " ,
]],
[ " Junk " ,
[
" FROM Tapatalk " ,
]],
[ " INBOX/newsletters " ,
[
" FROM [email protected] " ,
" HEADER X-mailer mailgun " ,
" HEADER X-mailer WPMailSMTP/Mailer/mailgun 2.4.0 " ,
" HEADER X-mailer nlserver " ,
" FROM hbr.org " ,
" FROM elliottwave.com " ,
" OR FROM cio.com FROM cio.co.uk " ,
" FROM substack.com " ,
" FROM eaglealpha.com " ,
" FROM haaretz.com " ,
" FROM gavekal.com " ,
" FROM go.weka.io " ,
" FROM marketing.weka.io " ,
` HEADER list-unsubscribe "" ` ,
` HEADER list-Id "" ` ,
` HEADER list-Post "" ` ,
` HEADER list-owner"" ` ,
` HEADER Precedence bulk ` ,
` HEADER Precedence list ` ,
` HEADER list-bounces "" ` ,
` HEADER list-help "" ` ,
` HEADER List-Unsubscribe "" ` ,
" FROM no-reply " ,
]],
[ " INBOX/notifications " ,
[
` KEYWORD "$IsNotification" ` ,
" FROM [email protected] " ,
" FROM [email protected] " ,
" FROM skillcast.com " ,
" FROM reedmac.co.uk " ,
" FROM [email protected] " ,
" FROM [email protected] " ,
" FROM [email protected] " ,
]],
]
runRules(Session,Rules) => Rules
| > map(target => [target[ 0 ],(target[ 1 ] | > map(term => imap.search(Session,term).ids))])
| > mapa(set => moveMessages(Session,set[ 1 ] | > join,set[ 0 ]))
runRulesBox(Session,Rules,Mailbox) => {
imap.select(Session,Mailbox)
in runRules (Session,Rules)
}
inboxes=[ " INBOX " ]
result = inboxes | > mapa(inbox => runRulesBox(session,rules,imap.Mailbox(session,inbox)))
print(result)
import parallel;
threadFunction(x) => {
imap.idle(session)
in inboxes | > mapa(inbox => runRulesBox(session,rules,imap.Mailbox(session,inbox)))
}
parallel.runEvents((x) => false ,[threadFunction])第二个示例代码提示类似地旨在突出该DSL的某些功能,并引入了一些新的标准库功能,例如iota和fold 。
import imap
import imap_config
import string
// Get the configuration from the environment and command line.
config = imap_config.getConfig(commandLineArguments)
// -------------------------------------------------------------------------------------------------
// Some helper functions.
//
// Firstly, a function to join an array of strings.
joinFields(flds, sep) => {
len(flds) > 0 | > enforce( " Cannot join an empty array. " )
in fold (flds[ 1 :$], (str, fld) => str ~ sep ~ fld, flds[ 0 ])
}
// Secondly, a field formatter which strips the field prefix and pads to a fixed width.
// E.g., ("From: [email protected]" |> fmtField(20)) == "[email protected] "
fmtField(field, width) => {
pad(str) => iota(width - len(str)) | > fold((a, i) => a ~ " " , str)
in field
| > string .split( " : " )[ 1 :$]
| > joinFields( " : " )
| > pad
}
// And thirdly, a function which concatenates the headers into a formatted string.
fmtHeaders(outStr, headers) => {
outStr ~ " " ~ joinFields(headers, " | " ) ~ " n "
}
// -------------------------------------------------------------------------------------------------
// Connect to the inbox.
creds = imap.ImapLogin(config.user, config.pass)
server = imap.ImapServer(config.host, config.port)
session =
imap.Session(server, creds)
| > imap.openConnection()
| > imap.login()
inbox = imap.Mailbox(session, " INBOX " )
// Get the number of messages in the inbox.
msgCount = imap.status(session, inbox).messages
// Select the default inbox.
inbox | > imap.examine(session, _)
// Get the headers (date, from and subject) for each message, from oldest to newest, format and
// print them.
headers =
iota(msgCount)
| > map(id => " # " ~ toString(id + 1 ))
| > map(id =>
imap.fetchFields(session, id, " date from subject " ).lines
| > map(hdr => fmtField(hdr, 40 )))
| > fold(fmtHeaders, " INBOX: n " )
print(headers)第三个代码示例进一步说明了该DSL的一些异常功能,其目的是接下来将在其自己的实现中使用这些功能。
// This script will create a report based on a specific example set of automated 'support' emails.
// E.g.,
//
// "support.mbox": 17 messages 17 new
// N 1 [email protected] Mon May 11 22:26 28/1369 "Alert: new issue 123"
// N 2 [email protected] Tue May 12 12:20 22/933 "Notification: service change"
// N 3 [email protected] Tue May 12 12:36 26/1341 "Alert: new issue 124"
// N 4 [email protected] Wed May 13 02:13 21/921 "Resolution: issue 124"
// N 5 [email protected] Wed May 13 18:53 26/1332 "Email not from robot."
// N 6 [email protected] Thu May 14 03:13 27/1339 "Alert: new issue 125"
// N 7 [email protected] Thu May 14 08:46 26/1270 "Resolution: issue 123"
// N 8 [email protected] Thu May 14 17:06 25/1249 "Alert: new issue 126"
// N 9 [email protected] Fri May 15 09:46 24/1185 "Resolution: issue 126"
// N 10 [email protected] Fri May 15 12:33 23/1052 "Alert: new issue 127"
// N 11 [email protected] Fri May 15 15:20 27/1331 "Notification: service change"
// N 12 [email protected] Fri May 15 18:06 23/953 "Resolution: issue 127"
// N 13 [email protected] Mon May 18 12:46 27/1218 "Alert: new issue 128"
// N 14 [email protected] Mon May 18 15:33 32/1628 "Alert: new issue 129"
// N 15 [email protected] Tue May 19 05:26 25/1176 "Resolution: issue 128"
// N 16 [email protected] Tue May 19 08:13 26/1312 "Notification: service change"
// N 17 [email protected] Tue May 19 11:00 28/1275 "Alert: new issue 130"
//
//
// Each of these automated emails are from `robot` _except_ for message 5. Messages 2, 8 and 16 are
// from `robot` but are unrelated to issues.
//
// This script will search for emails and match new issue numbers with resolutions to report the
// number of outstanding alerts.
import imap
import * from imap.query
import imap_config
import dates
import string
// Get the configuration from the environment and command line.
config = imap_config.getConfig(commandLineArguments)
// Connect to the inbox.
creds = imap.ImapLogin(config.user, config.pass)
server = imap.ImapServer(config.host, config.port)
session =
imap.Session(server, creds)
| > imap.openConnection()
| > imap.login()
inbox = imap.Mailbox(session, " support " )
// Select the default inbox.
inbox | > imap.examine(session, _)
// These criteria are common for both our searches.
commonCrit = imap.Query()
| > and(from( ` [email protected] ` ))
| > and(sentSince(dates. Date ( 2020 , 5 , 13 )))
// Get each of the alerts and resolutions from the past week (13-19 May 2020).
alertMsgIds =
imap.search(session, imap.Query(subject( " Alert: new issue " )) | > and(commonCrit)).ids
resolutionMsgIds =
imap.search(session, imap.Query(subject( " Resolution: issue " )) | > and(commonCrit)).ids
// A function to get the alert ID from a message subject.
getAlertId(msgId) => {
imap.fetchFields(session, toString (msgId), " subject " ).lines[ 0 ]
| > string .split()[$ - 1 ]
}
// A function to remove an entry from a table whether it's there or not.
removeIfExists(tbl, key) => {
if find( keys (tbl), key) == [] then
tbl
else
removeEntry(tbl, key)
}
// Now find those alerts which have no resolution. Firstly the subject for each alert, get the
// issue number end and store it in a table.
allAlertTable = alertMsgIds | > fold((tbl, msgId) => addEntry(tbl, getAlertId(msgId), msgId), {})
// Go through the resolutions and remove their corresponding alerts from the table.
unresolvedAlertTable =
resolutionMsgIds | > fold((tbl, msgId) => removeIfExists(tbl, getAlertId(msgId)), allAlertTable)
// Create a report with the date of the unresolved alerts.
report =
keys (unresolvedAlertTable)
| > map(alertId => {
msgId = unresolvedAlertTable[alertId] | > toString
in [ alertId
, imap.fetchFields(session, msgId, " date " ).lines[ 0 ]
, imap.fetchText(session, msgId).lines[ 0 ]
]
})
| > fold((outStr, tuple) => {
outStr ~ " Issue: " ~ tuple[ 0 ] ~ " n " ~ tuple[ 1 ] ~ " n Summary: " ~ tuple[ 2 ] ~ " n "
}, " n UNRESOLVED ISSUES FROM THE PAST 10 DAYS: nn " )
// Close the session.
imap.closeConnection(session)
// Print the report last.
print(report)作为第一次练习,我要求模型在SIL中实现一个非常简单的关系数据库,具有基本功能(请注意, //表示SIL中的评论)。
import imap
import imap_config
import string
// Get the configuration from the environment and command line.
config = imap_config.getConfig(commandLineArguments)
// Connect to the server.
creds = imap.ImapLogin(config.user, config.pass)
server = imap.ImapServer(config.host, config.port)
session =
imap.Session(server, creds)
| > imap.openConnection()
| > imap.login()
// -------------------------------------------------------------------------------------------------
// Here's an attempt to create something resembling a simple relational database.
// Define two tables, 'Employees' and 'Departments'.
Employees = { " 1 " : [ " Alice " , " Engineering " ], " 2 " : [ " Bob " , " Marketing " ], " 3 " : [ " Charlie " , " Engineering " ] }
Departments = { " Engineering " : [ " Alice " , " Charlie " ], " Marketing " : [ " Bob " ] }
// Function to emulate SQL SELECT operation.
select = (table, id) => {
table[id] | > enforce( " Id does not exist in table. " )
}
// Function to emulate SQL INSERT operation.
insert = (table, id, values ) => {
addEntry(table, id, values )
}
// Function to emulate SQL DELETE operation.
delete = (table, id) => {
removeEntry(table, id)
}
// Function to emulate SQL JOIN operation.
join = (table1, table2, commonField) => {
// This is a very basic JOIN operation and does not account for multiple matches, etc.
joinedTable = {}
for key in keys (table1) {
value = table1[key][commonField]
if find( keys (table2), value) != [] {
joinedTable = addEntry(joinedTable, key, [table1[key], table2[value]])
}
}
joinedTable
}
// Close the session.
imap.closeConnection(session)
// Perform operations.
newEmployee = [ " David " , " Marketing " ]
Employees = insert(Employees, " 4 " , newEmployee)
Departments = insert(Departments, newEmployee[ 1 ], [ " David " ])
print(select(Employees, " 1 " ))
print(join(Employees, Departments, 1 ))实施排序算法是一个原型软件工程面试问题。在这里,我要求GPT-4在SIL中实现Bubblesort(从概念上讲是最基本的排序算法)。以下是回应:
// Helper function to bubble up the largest value in one pass
bubble(list) => {
if length(list) <= 1 then
list
else
if list[ 0 ] > list[ 1 ] then
[list[ 1 ]] ~ bubble([list[ 0 ]] ~ list[ 2 to $])
else
[list[ 0 ]] ~ bubble(list[ 1 to $])
}
// Main BubbleSort function
bubbleSort(list) => {
bubbled = bubble(list)
if bubbled == list then
list
else
bubbleSort(bubbled)
}接下来,我提示该模型编写一个用于计算列表中值的脚本(下面生成的响应)。实际上,辅助函数sortList确实与功能编程范式一致,但是在这里GPT-4假定分配运算符的存在:= 。
// Helper function to sort an array of integers.
sortList(list) => list | > len | > iota | > map(i => list | > min | > snd) | > each((i, minVal) => list := removeIndex(list, minVal))
// Function to compute the median of a set of integers.
computeMedian(list) => {
// Sort the list in ascending order.
sortedList = sortList(list)
// Get the length of the list.
length = len(sortedList)
// Check if the length is even.
isEven = length % 2 == 0
// If the length is even, the median is the average of the two middle numbers.
// Otherwise, the median is just the middle number.
median =
if isEven then
(sortedList[length / 2 - 1 ] + sortedList[length / 2 ]) / 2
else
sortedList[length / 2 ]
// Return the median.
median
}接下来提示GPT -4在SIL中实现K -Neart最邻居算法。除了幻觉/假设存在一个称为math的模块(可以说是本练习范围的合理假设)之外,这似乎是一种可接受的解决方案,该解决方案适当地利用了功能运算符,例如|> , map和fold 。
import math
distance(p1, p2) => {
math.sqrt(
math.pow(p1[ 0 ] - p2[ 0 ], 2 )
+ math.pow(p1[ 1 ] - p2[ 1 ], 2 )
)
}
majorityVote(labels) => {
labelCounts = labels | > fold((counts, label) => addEntry(counts, label, get (counts, label, 0 ) + 1 ), {})
labelCounts | > maxBy((_, count) => count)
}
knn(data, newPoint, k) => {
distances = data | > map((point, label) => (distance(point, newPoint), label))
sorted = distances | > sort((a, b) => a[ 0 ] < b[ 0 ])
nearest = sorted | > take(k)
labels = nearest | > map((_, label) => label)
majorityVote(labels)
}
功能语言的有趣特征之一是它们可以具有很高的表现力。在这里,我提示GPT-4实现基本功能运算map , filter和reduce (或在功能语言的背景下fold )。
// map
// The map-like function would take a function and a list, and apply the function to each element in the list.
mapList(func, list) => {
len(list) > 0 | > enforce( " Cannot map over an empty list. " )
in iota (len(list)) | > fold((result, i) => addEntry(result, func(list[i])), [])
}
// filter
// The filter-like function would take a predicate and a list, and return a list of elements for which the predicate returns true.
filterList(pred, list) => {
len(list) > 0 | > enforce( " Cannot filter an empty list. " )
in iota (len(list)) | > fold((result, i) => if pred(list[i]) then addEntry(result, list[i]) else result, [])
}
// reduce
// The reduce-like function would take a binary function, a list, and an initial accumulator, and apply the function to each element in the list with the accumulator.
reduceList(func, list, acc) => {
len(list) > 0 | > enforce( " Cannot reduce an empty list. " )
in iota (len(list)) | > fold((result, i) => func(result, list[i]), acc)
}
Euler项目是数学和编程问题的集合(我认为这是LLM的有趣提示)。在这里,我提示GPT-4使用SIL解决前两个项目Euler问题。
// Project Euler problem #1:
// Here, my prompt was to solve the (easy) first problem of the Project Euler series (https://projecteuler.net/problem=1)
// Define a range from 1 to 999.
range = iota( 999 ) | > map(x => x + 1 )
// Define a function to check if a number is a multiple of 3 or 5.
isMultipleOf3Or5 = (num) => num % 3 == 0 || num % 5 == 0
// Filter the range to get only multiples of 3 or 5.
multiples = range | > filter(isMultipleOf3Or5)
// Sum the multiples.
sum = multiples | > fold((a, b) => a + b, 0 )
// Print the sum.
print(sum)
// Project Euler problem #2:
// This prompt is to solve the second Project Euler problem (https://projecteuler.net/problem=2)
// Define a function to generate the next term in the Fibonacci sequence.
nextFibonacciTerm = (tuple) => [tuple[ 1 ], tuple[ 0 ] + tuple[ 1 ]]
// Start with the first two terms.
initialTerms = [ 1 , 2 ]
// Define a function to generate Fibonacci sequence up to a limit.
generateFibonacci = (limit) => {
fibonacciSequence = unfold(nextFibonacciTerm, initialTerms, (tuple) => tuple[ 1 ] <= limit)
fibonacciSequence
}
// Define a function to check if a number is even.
isEven = (num) => num % 2 == 0
// Generate the Fibonacci sequence up to four million.
fibonacciSequence = generateFibonacci( 4000000 )
// Extract the second term from each tuple in the sequence, which is the actual Fibonacci number.
fibonacciNumbers = fibonacciSequence | > map(tuple => tuple[ 1 ])
// Filter the sequence to get only even numbers.
evenFibonacciNumbers = fibonacciNumbers | > filter(isEven)
// Sum the even numbers.
sum = evenFibonacciNumbers | > fold((a, b) => a + b, 0 )
// Print the sum.
print(sum)
探索LLMs以几种方式推断和理解新型编程语言特征的能力仍然是一个开放而有趣的问题。在这里,这些能力是在促使GPT-4解决促使特定领域特定语言(DSL)的问题的背景下进行了探索的。 DSL是探索LLMS中推理与记忆的潜在有用的测试用例,因为它们通常具有独特的特征,并且在训练期间(如果有的话)中遇到了广泛的特征。
这项练习的最有趣的发现也许是,在相同的提示和任务上评估时,GPT-4的少量推理功能明显好于Chatgpt的推理功能(请参阅此处的Chatgpt类似练习)。
有证据表明,LLM记忆是通过训练示例演示的频率以及用于提示模型的相关令牌的数量来促进的。 ↩