query planner guide ดาวน์โหลด - query planner guide ซอร์สโค้ดดาวน์โหลด

[ตัวยึดตำแหน่ง]

การแนะนำ

การวางแผนการสืบค้นเป็นส่วนประกอบของระบบการจัดการฐานข้อมูล (DBMS) ที่รับผิดชอบในการสร้างแผนสำหรับการดำเนินการสืบค้นฐานข้อมูล แผนคิวรีระบุขั้นตอนที่ DBMS จะใช้เพื่อดึงข้อมูลที่ร้องขอโดยแบบสอบถาม เป้าหมายของการวางแผนการสืบค้นคือการสร้างแผนการที่มีประสิทธิภาพมากที่สุดเท่าที่จะเป็นไปได้ซึ่งหมายความว่าจะส่งคืนข้อมูลให้กับผู้ใช้โดยเร็วที่สุด

นักวางแผนการสืบค้นเป็นซอฟต์แวร์ที่ซับซ้อนและอาจเข้าใจได้ยาก คู่มือนี้เกี่ยวกับการดำเนินการวางแผนแบบสอบถามตามต้นทุนจะช่วยให้คุณมีภาพรวมทีละขั้นตอนของกระบวนการวิธีการใช้การวางแผนการสืบค้นตามต้นทุนของคุณเองในขณะที่ยังคงครอบคลุมแนวคิดพื้นฐานของการวางแผนการสืบค้น

เขียนโดย AI แก้ไขโดยมนุษย์

ผู้ชมเป้าหมาย

คู่มือนี้เขียนขึ้นสำหรับ:

ใครเคยทำงานกับเครื่องยนต์แบบสอบถาม
ใครอยากรู้อยากเห็นอยากทำสิ่งของของตัวเอง
ใครอยากเรียนรู้สิ่งของ แต่เกลียดคณิตศาสตร์

เป้าหมาย:

สามารถเข้าใจพื้นฐานของการวางแผนการสืบค้น
สามารถเขียนนักวางแผนการสืบค้นของคุณเองได้

สถาปัตยกรรมพื้นฐานของเครื่องยนต์แบบสอบถาม

 กราฟ TD
    ผู้ใช้ ((ผู้ใช้))
    Parser [Query Parser]
    Planner [Query Planner]
    Executor [Query Processor]
    ผู้ใช้ -ข้อความค้นหาข้อความ -> ตัวแยกวิเคราะห์
    Parser -Ast -> Planner
    Planner -แผนการทางกายภาพ -> ผู้บริหาร

สถาปัตยกรรมพื้นฐานของเอ็นจิ้นแบบสอบถามประกอบด้วยส่วนประกอบเหล่านั้น:

ตัวแยกวิเคราะห์แบบสอบถาม: ใช้ในการแยกวิเคราะห์อินพุตแบบสอบถามผู้ใช้โดยปกติจะอยู่ในรูปแบบข้อความที่มนุษย์อ่านได้ (เช่น SQL)
Query Planner: ใช้เพื่อสร้างแผน/กลยุทธ์เพื่อดำเนินการค้นหา โดยปกติผู้วางแผนการสืบค้นจะเลือกแผนการที่ดีที่สุดในหลายแผนที่สร้างขึ้นจากการสืบค้นเดียว
ตัวประมวลผลแบบสอบถาม: ใช้เพื่อดำเนินการตามแผนคิวรีซึ่งเป็นผลลัพธ์โดยผู้วางแผนการสืบค้น

ประเภทของนักวางแผนการสืบค้น

โดยปกติแล้วนักวางแผนการสืบค้นจะแบ่งออกเป็น 2 ประเภท:

นักวางแผนฮิวริสติก
นักวางแผนตามต้นทุน

นักวางแผนฮิวริสติกเป็นนักวางแผนการสืบค้นซึ่งใช้กฎที่กำหนดไว้ล่วงหน้าเพื่อสร้างแผนคิวรี

ผู้วางแผนที่ใช้ต้นทุนเป็นผู้วางแผนการสืบค้นซึ่งขึ้นอยู่กับค่าใช้จ่ายในการสร้างแบบสอบถามมันพยายามค้นหาแผนการที่ดีที่สุดตามต้นทุนของการสืบค้นอินพุต

ในขณะที่นักวางแผนฮิวริสติกมักจะพบแผนการที่ดีที่สุดโดยใช้กฎการแปลงถ้ารู้ว่าแผนการเปลี่ยนแปลงนั้นดีกว่านักวางแผนที่ใช้ต้นทุนจะค้นหาแผนที่ดีที่สุดโดยระบุแผนการที่เทียบเท่าและพยายามค้นหาแผนที่ดีที่สุดในหมู่พวกเขา

นักวางแผนการสืบค้นตามต้นทุน

ในการวางแผนแบบสอบถามตามต้นทุนมักจะประกอบด้วยเฟส:

วางแผนการแจกแจง
การเพิ่มประสิทธิภาพแบบสอบถาม

ในขั้นตอนการแจกแจงแผนผู้วางแผนจะระบุแผนการที่เทียบเท่าที่เป็นไปได้

หลังจากนั้นในขั้นตอนการเพิ่มประสิทธิภาพแบบสอบถามผู้วางแผนจะค้นหาแผนการที่ดีที่สุดจากรายการแผนการแจกแจง แผนการที่ดีที่สุดคือแผนมีต้นทุนต่ำสุดซึ่งมีการกำหนดรูปแบบต้นทุน (หรือฟังก์ชั่นต้นทุน)

เนื่องจากแผนตามธรรมชาติของแผนการมีโครงสร้างคล้ายต้นไม้ดังนั้นคุณสามารถคิดว่าการเพิ่มประสิทธิภาพ/การค้นหาเป็นปัญหาการค้นหาต้นไม้ และมีอัลกอริทึมการค้นหาต้นไม้มากมายที่นี่:

การค้นหาอย่างละเอียดเช่นการเขียนโปรแกรมแบบไดนามิกที่กำหนดขึ้น อัลกอริทึมจะดำเนินการค้นหาแผนการที่ดีที่สุดจนกว่าจะมีเงื่อนไขการยกเลิกการค้นหา
การค้นหาแบบสุ่มเช่นการค้นหาต้นไม้แบบสุ่ม อัลกอริทึมจะดำเนินการค้นหาแผนการที่ดีที่สุดจนกว่าจะมีเงื่อนไขการยกเลิกการค้นหา

หมายเหตุ: ในทางทฤษฎีเป็นไปได้ที่จะใช้อัลกอริทึมการค้นหาต้นไม้ทุกชนิด อย่างไรก็ตามในทางปฏิบัติมันเป็นไปไม่ได้เนื่องจากเวลาการค้นหาเพิ่มขึ้นเมื่ออัลกอริทึมการค้นหาของเราซับซ้อน

หมายเหตุ: เงื่อนไขการยุติการค้นหามักจะเป็น:

ค้นหาความอ่อนเพลีย (เมื่อไม่มีแผนการเยี่ยมชมอีกต่อไป)
เกณฑ์ต้นทุน (เมื่อพบแผนการที่ต้นทุนต่ำกว่าเกณฑ์ต้นทุนที่กำหนด)
เวลา (เมื่อขั้นตอนการค้นหาทำงานนานเกินไป)

นักวางแผนการสืบค้นภูเขาไฟ

นักวางแผนการสืบค้นภูเขาไฟ (หรือเครื่องกำเนิดเครื่องมือเพิ่มประสิทธิภาพของภูเขาไฟ) เป็นนักวางแผนการสืบค้นที่ใช้ต้นทุน

Volcano Planner ใช้วิธีการเขียนโปรแกรมแบบไดนามิกเพื่อค้นหาแผนคิวรีที่ดีที่สุดจากรายการแผนการแจกแจง

รายละเอียด: https://ieeexplore.ieee.org/document/344061 (ฉันขี้เกียจเกินไปที่จะอธิบายกระดาษที่นี่)

นี่คือคำอธิบายที่ยอดเยี่ยม: https://15721.courses.cs.cmu.edu/spring2017/slides/15-optimizer2.pdf#page=9

ร่างนักวางแผนการสืบค้นตามต้นทุนของเรา

นักวางแผนการสืบค้นของเราเป็นนักวางแผนการสืบค้นตามค่าใช้จ่ายตามแนวคิดพื้นฐานของการวางแผนการสืบค้นภูเขาไฟนักวางแผนของเราจะประกอบด้วย 2 ขั้นตอนหลัก:

ขั้นตอนการสำรวจ/การค้นหา
ขั้นตอนการใช้งาน/การเพิ่มประสิทธิภาพ

 กราฟ LR
    AST ((AST))
    logical_plan [แผน]
    สำรวจ _plans ["`
        แผน #1
        -
        แผน #n
    -
    edportation_plan ["แผน #x (แผนการที่ดีที่สุด)"]
    AST -แปลงเป็นแผนตรรกะ -> logical_plan
    logical_plan -ขั้นตอนการสำรวจ -> explored_plans
    Explored_plans -เฟสการเพิ่มประสิทธิภาพ -> edportation_plan
    LinkStyle 1,2 Color: Orange, Stroke: Orange, Stroke-Width: 5px

คำศัพท์

แผนตรรกะ

แผนตรรกะเป็นโครงสร้างที่เป็นนามธรรมของขั้นตอนการแปลงที่จำเป็นในการดำเนินการค้นหา

นี่คือตัวอย่างของแผนตรรกะ:

 กราฟ TD
    1 ["Project TBL1.ID, TBL1.Field1, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"];
    2 ["เข้าร่วม"];
    3 ["สแกน TBL1"];
    4 ["เข้าร่วม"];
    5 ["สแกน TBL2"];
    6 ["สแกน TBL3"];
    1 -> 2;
    2 -> 3;
    2 -> 4;
    4 -> 5;
    4 -> 6;

แผนการทางกายภาพ

ในขณะที่แผนเชิงตรรกะถือเป็นนามธรรมเพียงแผนทางกายภาพเป็นโครงสร้างข้อมูลที่ถือรายละเอียดการใช้งาน แต่ละแผนตรรกะจะมีแผนการทางกายภาพหลายอย่าง ตัวอย่างเช่นการเข้าร่วมเชิงตรรกะอาจมีแผนการทางกายภาพมากมายเช่น Hash Join, ผสานการเข้าร่วม, การเข้าร่วมออกอากาศ ฯลฯ

กลุ่มที่เทียบเท่า

กลุ่มที่เทียบเท่าเป็นกลุ่มของนิพจน์ที่เทียบเท่า (ซึ่งสำหรับแต่ละนิพจน์แผนตรรกะของพวกเขานั้นเทียบเท่ากับตรรกะ)

เช่น

 กราฟ TD
    กลุ่มย่อย#8
        Expr#8 ["Scan TBL2 (Field1, Field2, ID)"]
    จบ
    กลุ่มย่อย#2
        expr#2 ["สแกน TBL2"]
    จบ
    กลุ่มย่อย#11
        expr#11 ["เข้าร่วม"]
    จบ
    expr#11 -> กลุ่ม#7
    expr#11 -> กลุ่ม#10
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    กลุ่มย่อย#4
        expr#4 ["เข้าร่วม"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#7
        Expr#7 ["Scan TBL1 (ID, Field1)"]
    จบ
    กลุ่มย่อย#1
        expr#1 ["สแกน TBL1"]
    จบ
    กลุ่มย่อย#10
        expr#10 ["เข้าร่วม"]
    จบ
    expr#10 -> กลุ่ม#8
    expr#10 -> กลุ่ม#9
    กลุ่มย่อย#9
        Expr#9 ["Scan TBL3 (ID, Field2)"]
    จบ
    กลุ่มย่อย#3
        expr#3 ["สแกน TBL3"]
    จบ
    กลุ่มย่อย#6
        expr#12 ["Project TBL1.ID, TBL1.Field1, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
        expr#6 ["Project TBL1.ID, TBL1.Field1, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
    จบ
    expr#12 -> กลุ่ม#11
    expr#6 -> กลุ่ม#5

ที่นี่เราสามารถเห็น Group#6 มี 2 นิพจน์ที่เทียบเท่าซึ่งทั้งคู่เป็นตัวแทนของการสืบค้นเดียวกัน

กฎการเปลี่ยนแปลง

กฎการแปลงเป็นกฎในการแปลงจากแผนตรรกะหนึ่งไปเป็นแผนตรรกะที่เทียบเท่าเชิงตรรกะอื่น

ตัวอย่างเช่นแผน:

 กราฟ TD
    1 ["Project TBL1.ID, TBL1.Field1, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"];
    2 ["เข้าร่วม"];
    3 ["สแกน TBL1"];
    4 ["เข้าร่วม"];
    5 ["สแกน TBL2"];
    6 ["สแกน TBL3"];
    1 -> 2;
    2 -> 3;
    2 -> 4;
    4 -> 5;
    4 -> 6;

เมื่อใช้การแปลงการเปลี่ยนแปลงการฉายภาพจะถูกเปลี่ยนเป็น:

 กราฟ TD
    1 ["โครงการ *. *"];
    2 ["เข้าร่วม"];
    3 ["สแกน TBL1 (ID, Field1)"];
    4 ["เข้าร่วม"];
    5 ["สแกน TBL2 (Field1, Field2)"];
    6 ["สแกน TBL3 (ID, Field2, Field2)"];
    1 -> 2;
    2 -> 3;
    2 -> 4;
    4 -> 5;
    4 -> 6;

กฎการแปลงสามารถส่งผลกระทบโดยลักษณะเชิงตรรกะ/คุณสมบัติเช่นสคีมาตารางสถิติข้อมูล ฯลฯ

กฎการดำเนินการ

กฎการดำเนินการเป็นกฎที่จะส่งคืนแผนการทางกายภาพที่ได้รับแผนตรรกะ

กฎการใช้งานสามารถส่งผลกระทบโดยลักษณะทางกายภาพ/คุณสมบัติเช่นเค้าโครงข้อมูล (เรียงลำดับหรือไม่) ฯลฯ

ขั้นตอนการสำรวจ

ในขั้นตอนการสำรวจผู้วางแผนจะใช้กฎการเปลี่ยนแปลงสร้างแผนตรรกะที่เทียบเท่ากัน

ตัวอย่างเช่นแผน:

 กราฟ TD
    1326583549 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"
    -425111028 ["เข้าร่วม"];
    -349388609 ["สแกน TBL1"];
    1343755644 ["เข้าร่วม"];
    -1043437086 ["สแกน TBL2"];
    -1402686787 ["สแกน TBL3"];
    1326583549 -> -425111028;
    -425111028 -> -349388609;
    -425111028 -> 1343755644;
    1343755644 -> -1043437086;
    1343755644 -> -1402686787;

หลังจากใช้กฎการแปลงแล้วส่งผลให้กราฟดังต่อไปนี้:

 กราฟ TD
    กลุ่มย่อย#8
        Expr#8 ["Scan TBL2 (ID, Field1, Field2)"]
    จบ
    กลุ่มย่อย#11
        expr#11 ["เข้าร่วม"]
        expr#14 ["เข้าร่วม"]
    จบ
    expr#11 -> กลุ่ม#7
    expr#11 -> กลุ่ม#10
    expr#14 -> กลุ่ม#8
    expr#14 -> กลุ่ม#12
    กลุ่มย่อย#2
        expr#2 ["สแกน TBL2"]
    จบ
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
        expr#16 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    expr#16 -> กลุ่ม#2
    expr#16 -> กลุ่ม#13
    กลุ่มย่อย#4
        expr#4 ["เข้าร่วม"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#13
        expr#15 ["เข้าร่วม"]
    จบ
    expr#15 -> กลุ่ม#1
    expr#15 -> กลุ่ม#3
    กลุ่มย่อย#7
        Expr#7 ["Scan TBL1 (ID, Field1)"]
    จบ
    กลุ่มย่อย#1
        expr#1 ["สแกน TBL1"]
    จบ
    กลุ่มย่อย#10
        expr#10 ["เข้าร่วม"]
    จบ
    expr#10 -> กลุ่ม#8
    expr#10 -> กลุ่ม#9
    กลุ่มย่อย#9
        Expr#9 ["Scan TBL3 (ID, Field2)"]
    จบ
    กลุ่มย่อย#3
        expr#3 ["สแกน TBL3"]
    จบ
    กลุ่มย่อย#12
        expr#13 ["เข้าร่วม"]
    จบ
    expr#13 -> กลุ่ม#7
    expr#13 -> กลุ่ม#9
    กลุ่มย่อย#6
        expr#12 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
        expr#6 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
    จบ
    expr#12 -> กลุ่ม#11
    expr#6 -> กลุ่ม#5

ที่นี่เราจะเห็นได้ว่ากฎการจัดทำโปรเจ็กต์และการเข้าร่วมกฎใหม่จะถูกนำไปใช้

ขั้นตอนการเพิ่มประสิทธิภาพ

ขั้นตอนการเพิ่มประสิทธิภาพคือการสำรวจต้นไม้ที่ขยายตัวในขั้นตอนการสำรวจเพื่อค้นหาแผนการที่ดีที่สุดสำหรับการสืบค้นของเรา

"จริง" นี่คือการเพิ่มประสิทธิภาพการค้นหาแบบต้นไม้ดังนั้นคุณสามารถใช้อัลกอริทึมการค้นหาต้นไม้ใด ๆ ที่คุณสามารถจินตนาการได้ (แต่คุณต้องตรวจสอบให้แน่ใจว่าถูกต้อง)

นี่คือตัวอย่างของแผนการทางกายภาพที่สร้างขึ้นหลังจากขั้นตอนการเพิ่มประสิทธิภาพ:

 กราฟ TD
    กลุ่ม#6 ["
    กลุ่ม #6
เลือก: Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2
ผู้ประกอบการ: ProjectOperator
ราคา: ราคา (CPU = 641400.00, MEM = 1020400012.00, TIME = 1000000.00)
-
กลุ่ม#6 -> กลุ่ม#11
กลุ่ม#11 ["
กลุ่ม #11
เลือก: เข้าร่วม
ผู้ประกอบการ: HashJoinoperator
ราคา: ราคา (CPU = 641400.00, MEM = 1020400012.00, TIME = 1000000.00)
-
กลุ่ม#11 -> กลุ่ม#7
กลุ่ม#11 -> กลุ่ม#10
กลุ่ม#7 ["
กลุ่ม #7
เลือก: สแกน TBL1 (ID, Field1)
ผู้ประกอบการ: normalscanoperator
ราคา: ราคา (CPU = 400.00, MEM = 400000.00, TIME = 1000.00)
-
กลุ่ม#10 ["
กลุ่ม #10
เลือก: เข้าร่วม
ผู้ประกอบการ: MergeJoinoperator
ลักษณะ: จัดเรียง
ราคา: ราคา (CPU = 640000.00, MEM = 20000012.00, TIME = 1100000.00)
-
กลุ่ม#10 -> กลุ่ม#8
กลุ่ม#10 -> กลุ่ม#9
กลุ่ม#8 ["
กลุ่ม #8
เลือก: สแกน TBL2 (ID, Field1, Field2)
ผู้ประกอบการ: normalscanoperator
ลักษณะ: จัดเรียง
ราคา: ราคา (CPU = 600000.00, MEM = 12.00, TIME = 1000000.00)
-
กลุ่ม#9 ["
กลุ่ม #9
เลือก: สแกน TBL3 (ID, Field2)
ผู้ประกอบการ: normalscanoperator
ลักษณะ: จัดเรียง
ราคา: ราคา (CPU = 40000.00, MEM = 20000000.00, TIME = 100000.00)
-

แผนการสร้างได้แสดงแผนตรรกะที่เลือกต้นทุนโดยประมาณและผู้ให้บริการทางกายภาพ

เพิ่มประสิทธิภาพ/การยกเลิกการค้นหา

ผู้วางแผนของเราจะทำการค้นหาความอ่อนเพลียเพื่อค้นหาแผนการที่ดีที่สุด

ดำน้ำในรหัส

เนื่องจากรหัสของผู้วางแผนมีขนาดใหญ่ดังนั้นฉันจะไม่เขียนคำแนะนำทีละขั้นตอน แต่ฉันจะอธิบายรหัสทุกชิ้นแทน

ภาษาคิวรี

ที่นี่เราจะกำหนดภาษาคิวรีซึ่งใช้บทช่วยสอนนี้อย่างละเอียด

 SELECT emp . id ,
       emp . code ,
       dept . dept_name ,
       emp_info . name ,
       emp_info . origin
FROM emp
         JOIN dept ON emp . id = dept . emp_id
         JOIN emp_info ON dept . emp_id = emp_info . id

ภาษาคิวรีที่เราจะใช้เป็นภาษาที่มีลักษณะคล้าย SQL อย่างไรก็ตามเพื่อความเรียบง่ายเราจะ จำกัด การทำงานและไวยากรณ์

ภาษาปรากฏในรูปแบบของ

 SELECT tbl . field , [...]
FROM tbl JOIN [...]

มันจะสนับสนุนเฉพาะ SELECT และ JOIN นอกจากนี้ฟิลด์ในคำสั่งที่เลือกจะต้องผ่านการรับรองอย่างสมบูรณ์ (ในรูปแบบของ table.field ) ฟังก์ชันอื่น ๆ ทั้งหมดจะไม่ได้รับการสนับสนุน

AST

ก่อนอื่นเราต้องกำหนด AST สำหรับภาษาของเรา AST (หรือทรีไวยากรณ์นามธรรม) เป็นต้นไม้ที่ใช้เพื่อแสดงโครงสร้างวากยสัมพันธ์ของข้อความ

เนื่องจากภาษาของเราง่ายมากเราจึงสามารถกำหนดโครงสร้าง AST ในรหัสหลายบรรทัด:

 sealed trait Identifier

case class TableID ( id : String ) extends Identifier

case class FieldID ( table : TableID , id : String ) extends Identifier

sealed trait Statement

case class Table ( table : TableID ) extends Statement

case class Join ( left : Statement , right : Statement , on : Seq [( FieldID , FieldID )]) extends Statement

case class Select ( fields : Seq [ FieldID ], from : Statement ) extends Statement

ตัวอย่างเช่นการสืบค้น

 SELECT tbl1 . id ,
       tbl1 . field1 ,
       tbl2 . id ,
       tbl2 . field1 ,
       tbl2 . field2 ,
       tbl3 . id ,
       tbl3 . field2 ,
       tbl3 . field2
FROM tbl1
         JOIN tbl2 ON tbl1 . id = tbl2 . id
         JOIN tbl3 ON tbl2 . id = tbl3 . id

สามารถแสดงเป็น

 Select (
  Seq (
    FieldID ( TableID ( " tbl1 " ), " id " ),
    FieldID ( TableID ( " tbl1 " ), " field1 " ),
    FieldID ( TableID ( " tbl2 " ), " id " ),
    FieldID ( TableID ( " tbl2 " ), " field1 " ),
    FieldID ( TableID ( " tbl2 " ), " field2 " ),
    FieldID ( TableID ( " tbl3 " ), " id " ),
    FieldID ( TableID ( " tbl3 " ), " field2 " ),
    FieldID ( TableID ( " tbl3 " ), " field2 " )
  ),
  Join (
    Table ( TableID ( " tbl1 " )),
    Join (
      Table ( TableID ( " tbl2 " )),
      Table ( TableID ( " tbl3 " )),
      Seq (
        FieldID ( TableID ( " tbl2 " ), " id " ) -> FieldID ( TableID ( " tbl3 " ), " id " )
      )
    ),
    Seq (
      FieldID ( TableID ( " tbl1 " ), " id " ) -> FieldID ( TableID ( " tbl2 " ), " id " )
    )
  )
)

ตัวแยกวิเคราะห์แบบสอบถามง่ายๆ

หลังจากกำหนดโครงสร้าง AST เราจะต้องเขียนตัวแยกวิเคราะห์แบบสอบถามซึ่งใช้ในการแปลงข้อความค้นหาเป็นรูปแบบ AST

เนื่องจากคู่มือนี้ใช้ Scala สำหรับการใช้งานเราจะเลือก Scala-Parser-combinators เพื่อสร้างตัวแยกวิเคราะห์แบบสอบถามของเรา

Query Parser Class:

 object QueryParser extends ParserWithCtx [ QueryExecutionContext , Statement ] with RegexParsers {

  override def parse ( in : String )( implicit ctx : QueryExecutionContext ) : Either [ Throwable , Statement ] = {
    Try (parseAll(statement, in) match {
      case Success (result, _) => Right (result)
      case NoSuccess (msg, _) => Left ( new Exception (msg))
    }) match {
      case util. Failure (ex) => Left (ex)
      case util. Success (value) => value
    }
  }

  private def select : Parser [ Select ] = ??? // we will implement it in later section

  private def statement : Parser [ Statement ] = select
}

จากนั้นกำหนดกฎการแยกวิเคราะห์บางอย่าง:

 // common
private def str : Parser [ String ] = """ [a-zA-Z0-9_]+ """ .r
private def fqdnStr : Parser [ String ] = """ [a-zA-Z0-9_]+.[a-zA-Z0-9_]+ """ .r

// identifier
private def tableId : Parser [ TableID ] = str ^^ (s => TableID (s))

private def fieldId : Parser [ FieldID ] = fqdnStr ^^ { s =>
  val identifiers = s.split( '.' )
  if (identifiers.length != 2 ) {
    throw new Exception ( " should never happen " )
  } else {
    val table = identifiers.head
    val field = identifiers( 1 )
    FieldID ( TableID (table), field)
  }
}

ต่อไปนี้เป็นกฎสองข้อซึ่งใช้ในการแยกวิเคราะห์ตัวระบุ: TableID และ FieldID

ID ตาราง (หรือชื่อตาราง) มักจะมีเฉพาะอักขระตัวเลขและขีดล่าง ( _ ) ดังนั้นเราจะใช้ regex ง่าย ๆ [a-zA-Z0-9_]+ เพื่อระบุชื่อตาราง

ในทางกลับกัน ID ฟิลด์ (สำหรับตัวระบุฟิลด์) ในภาษาของเราคือชื่อฟิลด์ที่ผ่านการรับรองอย่างสมบูรณ์ โดยปกติแล้วมันอยู่ในรูปแบบของ table.field และชื่อฟิลด์มักจะมีเฉพาะอักขระตัวเลขและขีดล่างดังนั้นเราจะใช้ regex [a-zA-Z0-9_]+.[a-zA-Z0-9_]+ เพื่อแยกวิเคราะห์ชื่อฟิลด์

หลังจากกำหนดกฎสำหรับการแยกวิเคราะห์ตัวระบุตอนนี้เราสามารถกำหนดกฎเพื่อแยกวิเคราะห์คำสั่งแบบสอบถาม:

 // statement
private def table : Parser [ Table ] = tableId ^^ (t => Table (t))
private def subQuery : Parser [ Statement ] = " ( " ~> select <~ " ) "

กฎของ table เป็นกฎง่ายๆเพียงแค่สร้างโหนด Table โดยใช้การแยก TableID จากกฎ tableId

subQuery คือกฎที่จะแยกวิเคราะห์คำถามย่อย ใน SQL เราสามารถเขียนแบบสอบถามที่มีลักษณะเช่นนี้:

 SELECT a
FROM ( SELECT b FROM c) d

SELECT b FROM c เป็นคำถามย่อยในคำสั่งข้างต้น ที่นี่ในภาษาการสืบค้นง่าย ๆ ของเราเราจะระบุคำสั่งเป็นคำถามย่อยถ้ามันถูกล้อมรอบด้วยวงเล็บคู่ ( () ) เนื่องจากภาษาของเรามีคำสั่งเลือกเท่านั้นเราจึงสามารถเขียนกฎการแยกวิเคราะห์ได้ดังต่อไปนี้:

 def subQuery : Parser [ Statement ] = " ( " ~> select <~ " ) "

ตอนนี้เราจะกำหนดกฎการแยกวิเคราะห์สำหรับคำสั่งเลือก:

 private def fromSource : Parser [ Statement ] = table ||| subQuery

private def select : Parser [ Select ] =
  " SELECT " ~ rep1sep(fieldId, " , " ) ~ " FROM " ~ fromSource ~ rep(
    " JOIN " ~ fromSource ~ " ON " ~ rep1(fieldId ~ " = " ~ fieldId)
  ) ^^ {
    case _ ~ fields ~ _ ~ src ~ joins =>
      val p = if (joins.nonEmpty) {
        def chain ( left : Statement , right : Seq [( Statement , Seq [( FieldID , FieldID )])]) : Join = {
          if (right.isEmpty) {
            throw new Exception ( " should never happen " )
          } else if (right.length == 1 ) {
            val next = right.head
            Join (left, next._1, next._2)
          } else {
            val next = right.head
            Join (left, chain(next._1, right.tail), next._2)
          }
        }

        val temp = joins.map { join =>
          val statement = join._1._1._2
          val joinOn = join._2.map(on => on._1._1 -> on._2)
          statement -> joinOn
        }
        chain(src, temp)
      } else {
        src
      }
      Select (fields, p)
  }

ใน SQL เราสามารถใช้คำถามย่อยเป็นแหล่งเข้าร่วม ตัวอย่างเช่น:

 SELECT * . *
FROM tbl1
    JOIN ( SELECT * . * FROM tbl2)
    JOIN tbl3

ดังนั้นตัวแยกวิเคราะห์ของเราจะใช้กฎเพื่อแยกวิเคราะห์คำถามย่อยในส่วนการเข้าร่วมของคำสั่งนั่นคือเหตุผลที่เรามีกฎการแยกวิเคราะห์:

 " SELECT " ~ rep1sep(fieldId, " , " ) ~ " FROM " ~ fromSource ~ rep( " JOIN " ~ fromSource ~ " ON " ~ rep1(fieldId ~ " = " ~ fieldId)

ดู queryparser.scala สำหรับการใช้งานเต็มรูปแบบ

ทดสอบตัวแยกวิเคราะห์แบบสอบถามของเรา

ดู queryparserspec.scala

แผนตรรกะ

หลังจากสร้าง AST จากแบบสอบถามข้อความเราสามารถแปลงเป็นแผนตรรกะได้โดยตรง

ก่อนอื่นให้กำหนดอินเทอร์เฟซสำหรับแผนตรรกะของเรา:

 sealed trait LogicalPlan {
  def children () : Seq [ LogicalPlan ]
}

children เป็นรายชื่อแผนตรรกะของเด็ก ตัวอย่างเช่น:

 กราฟ TD
    1326583549 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"
    -425111028 ["เข้าร่วม"];
    -349388609 ["สแกน TBL1"];
    1343755644 ["เข้าร่วม"];
    -1043437086 ["สแกน TBL2"];
    -1402686787 ["สแกน TBL3"];
    1326583549 -> -425111028;
    -425111028 -> -349388609;
    -425111028 -> 1343755644;
    1343755644 -> -1043437086;
    1343755644 -> -1402686787;

โหนดลูกของโหนด PROJECT เป็นโหนด JOIN ครั้งแรก โหนด JOIN ครั้งแรกมีลูก 2 คนซึ่งเป็นโหนดเข้า JOIN และ SCAN tbl1 ที่สอง เร็วๆ นี้, ...

เนื่องจากภาษาคิวรีของเราง่ายเราจึงต้องการโหนดตรรกะเพียง 3 ประเภทเท่านั้น:

โครงการ: แสดงตัวดำเนินการฉายภาพในพีชคณิตที่สัมพันธ์กัน
เข้าร่วม: แสดงการเข้าร่วมแบบลอจิคัล
สแกน: แสดงการสแกนตาราง

 case class Scan ( table : ql. TableID , projection : Seq [ String ]) extends LogicalPlan {
  override def children () : Seq [ LogicalPlan ] = Seq .empty
}

case class Project ( fields : Seq [ql. FieldID ], child : LogicalPlan ) extends LogicalPlan {
  override def children () : Seq [ LogicalPlan ] = Seq (child)
}

case class Join ( left : LogicalPlan , right : LogicalPlan , on : Seq [(ql. FieldID , ql. FieldID )]) extends LogicalPlan {
  override def children () : Seq [ LogicalPlan ] = Seq (left, right)
}

จากนั้นเราสามารถเขียนฟังก์ชั่นเพื่อแปลง AST เป็นแผนตรรกะ:

 def toPlan ( node : ql. Statement ) : LogicalPlan = {
  node match {
    case ql. Table (table) => Scan (table, Seq .empty)
    case ql. Join (left, right, on) => Join (toPlan(left), toPlan(right), on)
    case ql. Select (fields, from) => Project (fields, toPlan(from))
  }
}

ดู LogicalPlan.scala สำหรับการใช้งานเต็มรูปแบบ

กลุ่มที่เทียบเท่า

กลุ่ม

เราสามารถกำหนดคลาสสำหรับกลุ่มดังต่อไปนี้:

 case class Group (
                  id : Long ,
                  equivalents : mutable. HashSet [ GroupExpression ]
                ) {
  val explorationMark : ExplorationMark = new ExplorationMark
  var implementation : Option [ GroupImplementation ] = None
}

case class GroupExpression (
                            id : Long ,
                            plan : LogicalPlan ,
                            children : mutable. MutableList [ Group ]
                          ) {
  val explorationMark : ExplorationMark = new ExplorationMark
  val appliedTransformations : mutable. HashSet [ TransformationRule ] = mutable. HashSet ()
}

Group คือชุดของแผนการที่เทียบเท่าเชิงตรรกะ

แต่ละ GroupExpression แสดงถึงโหนดแผนตรรกะ เนื่องจากเราได้กำหนดโหนดแผนตรรกะจะมีรายการของโหนดเด็ก (ในส่วนก่อนหน้า) และการ GroupExpression แสดงถึงโหนดแผนตรรกะและ Group แสดงชุดของแผนเทียบเท่าดังนั้นลูกของ GroupExpression จึงเป็นรายการของ Group

เช่น

 กราฟ TD
    กลุ่มย่อย#8
        Expr#8
    จบ
    กลุ่มย่อย#2
        Expr#2
    จบ
    กลุ่มย่อย#11
        Expr#11
    จบ
    expr#11 -> กลุ่ม#7
    expr#11 -> กลุ่ม#10
    กลุ่มย่อย#5
        Expr#5
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    กลุ่มย่อย#4
        Expr#4
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#7
        Expr#7
    จบ
    กลุ่มย่อย#1
        Expr#1
    จบ
    กลุ่มย่อย#10
        Expr#10
    จบ
    expr#10 -> กลุ่ม#8
    expr#10 -> กลุ่ม#9
    กลุ่มย่อย#9
        Expr#9
    จบ
    กลุ่มย่อย#3
        Expr#3
    จบ
    กลุ่มย่อย#6
        Expr#12
        Expr#6
    จบ
    expr#12 -> กลุ่ม#11
    expr#6 -> กลุ่ม#5

อย่างที่เราเห็นที่นี่ Group#6 มี 2 นิพจน์เทียบเท่า: Expr#12 และ Expr#6 และ children of Expr#12 คือ Group#11

หมายเหตุ: เราจะใช้การแปลงรอบหลายรอบในขั้นตอนการสำรวจดังนั้นสำหรับแต่ละ Group และ GroupExpression เรามี ExplorationMark สถานะการสำรวจ

 class ExplorationMark {
  private var bits : Long = 0

  def get : Long = bits

  def isExplored ( round : Int ) : Boolean = BitUtils .getBit(bits, round)

  def markExplored ( round : Int ) : Unit = bits = BitUtils .setBit(bits, round, on = true )

  def markUnexplored ( round : Int ) : Unit = bits = BitUtils .setBit(bits, round, on = false )
}

ExplorationMark เป็นเพียงคลาส wrapper bitset มันจะทำเครื่องหมาย I-th บิตเป็น 1 ถ้ามีการสำรวจรอบ i-th, ทำเครื่องหมายเป็น 0 มิฉะนั้น

ExplorationMark ยังสามารถใช้เพื่อแสดงภาพการเปลี่ยนแปลงที่แน่นอนดูการสร้างภาพข้อมูลเพิ่มเติมสำหรับรายละเอียดเพิ่มเติม

บันทึก

บันทึกเป็นกลุ่มผู้ช่วยที่จะช่วยสร้างกลุ่มที่เทียบเท่า บันทึกประกอบด้วย HashMap หลายตัวเพื่อแคชกลุ่มและการแสดงออกของกลุ่มยังมีวิธีการในการลงทะเบียนกลุ่มใหม่หรือนิพจน์กลุ่ม

 class Memo (
            groupIdGenerator : Generator [ Long ] = new LongGenerator ,
            groupExpressionIdGenerator : Generator [ Long ] = new LongGenerator
          ) {
  val groups : mutable. HashMap [ Long , Group ] = mutable. HashMap [ Long , Group ]()
  val parents : mutable. HashMap [ Long , Group ] = mutable. HashMap [ Long , Group ]() // lookup group from group expression ID
  val groupExpressions : mutable. HashMap [ LogicalPlan , GroupExpression ] = mutable. HashMap [ LogicalPlan , GroupExpression ]()

  def getOrCreateGroupExpression ( plan : LogicalPlan ) : GroupExpression = {
    val children = plan.children()
    val childGroups = children.map(child => getOrCreateGroup(child))
    groupExpressions.get(plan) match {
      case Some (found) => found
      case None =>
        val id = groupExpressionIdGenerator.generate()
        val children = mutable. MutableList () ++ childGroups
        val expression = GroupExpression (
          id = id,
          plan = plan,
          children = children
        )
        groupExpressions += plan -> expression
        expression
    }
  }

  def getOrCreateGroup ( plan : LogicalPlan ) : Group = {
    val exprGroup = getOrCreateGroupExpression(plan)
    val group = parents.get(exprGroup.id) match {
      case Some (group) =>
        group.equivalents += exprGroup
        group
      case None =>
        val id = groupIdGenerator.generate()
        val equivalents = mutable. HashSet () + exprGroup
        val group = Group (
          id = id,
          equivalents = equivalents
        )
        groups.put(id, group)
        group
    }
    parents += exprGroup.id -> group
    group
  }
}

ดู memo.scala สำหรับการใช้งานเต็มรูปแบบ

การเริ่มต้น

ขั้นตอนแรกภายในผู้วางแผนคือการเริ่มต้น

 กราฟ LR
    แบบสอบถาม ((คำถาม))
    AST ((AST))
    root_plan ((rootplan))
    root_group ((rootgroup))
    Query -"queryparser.parse (คำถาม)" -> AST
    AST -"LogicalPlan.toplan (AST)" -> root_plan
    root_plan -"memo.getorcreategroup (rootplan)" -> root_group

ก่อนอื่นการสืบค้นจะถูกแยกวิเคราะห์เป็น AST จากนั้นแปลงเป็นแผนตรรกะเรียกว่า root plan แล้วเริ่มต้นกลุ่มจาก root plan ที่เรียกว่า root group

 def initialize ( query : Statement )( implicit ctx : VolcanoPlannerContext ) : Unit = {
  ctx.query = query
  ctx.rootPlan = LogicalPlan .toPlan(ctx.query)
  ctx.rootGroup = ctx.memo.getOrCreateGroup(ctx.rootPlan)
  // assuming this is first the exploration round,
  // by marking the initialRound(0) as explored,
  // it will be easier to visualize the different between rounds (added nodes, add connections)
  ctx.memo.groups.values.foreach(_.explorationMark.markExplored(initialRound))
  ctx.memo.groupExpressions.values.foreach(_.explorationMark.markExplored(initialRound))
}

ดู Volcanoplanner.scala สำหรับรายละเอียดเพิ่มเติม

ตัวอย่างเช่นการสืบค้น:

 SELECT tbl1 . id ,
       tbl1 . field1 ,
       tbl2 . id ,
       tbl2 . field1 ,
       tbl2 . field2 ,
       tbl3 . id ,
       tbl3 . field2 ,
       tbl3 . field2
FROM tbl1
         JOIN tbl2 ON tbl1 . id = tbl2 . id
         JOIN tbl3 ON tbl2 . id = tbl3 . id

หลังจากเริ่มต้นกลุ่มจะมีลักษณะเช่นนี้:

 กราฟ TD
    กลุ่มย่อย#2
        expr#2 ["สแกน TBL2"]
    จบ
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    กลุ่มย่อย#4
        expr#4 ["เข้าร่วม"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#1
        expr#1 ["สแกน TBL1"]
    จบ
    กลุ่มย่อย#3
        expr#3 ["สแกน TBL3"]
    จบ
    กลุ่มย่อย#6
        expr#6 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
    จบ
    expr#6 -> กลุ่ม#5

ที่นี่คุณจะเห็นว่าทุกกลุ่มมีนิพจน์ที่เทียบเท่ากันอย่างแน่นอน

ขั้นตอนการสำรวจ

หลังจากเริ่มต้นตอนนี้คือขั้นตอนการสำรวจซึ่งจะสำรวจแผนการที่เป็นไปได้ทั้งหมด

วิธีการสำรวจค่อนข้างง่าย:

สำหรับแต่ละกลุ่มให้ใช้กฎการแปลงเพื่อค้นหานิพจน์กลุ่มที่เทียบเท่าทั้งหมดและเพิ่มในชุดเทียบเท่าจนกว่าเราจะไม่พบแผนใหม่ใด ๆ ที่เทียบเท่า
สำหรับการแสดงออกของแต่ละกลุ่มสำรวจกลุ่มเด็กทั้งหมด

กฎการเปลี่ยนแปลง

ก่อนที่จะดำน้ำในรหัสการสำรวจให้พูดคุยเกี่ยวกับกฎการเปลี่ยนแปลงก่อน

กฎการแปลงเป็นกฎที่ใช้ในการแปลงแผนตรรกะเป็นแผนตรรกะที่เทียบเท่ากันอื่นหากตรงกับเงื่อนไขกฎ

นี่คืออินเทอร์เฟซของกฎการแปลง:

 trait TransformationRule {
  def `match` ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : Boolean

  def transform ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : GroupExpression
}

เนื่องจากแผนตรรกะเป็นโครงสร้างที่เหมือนต้นไม้ดังนั้นการใช้งานการ match ของกฎการแปลงจึงเป็นการจับคู่รูปแบบบนต้นไม้

ตัวอย่างเช่นนี่คือการ match ที่ใช้เพื่อจับคู่โหนดโครงการในขณะเดียวกันก็ตรวจสอบว่าลูกหลานที่มีการเข้าร่วมและสแกนเท่านั้น:

 override def `match` ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : Boolean = {
  val plan = expression.plan
  plan match {
    case Project (_, child) => check(child)
    case _ => false
  }
}

// check if the tree only contains SCAN and JOIN nodes
private def check ( node : LogicalPlan ) : Boolean = {
  node match {
    case Scan (_, _) => true
    case Join (left, right, _) => check(left) && check(right)
    case _ => false
  }
}

แผนนี้คือ "จับคู่":

 กราฟ TD
    กลุ่มย่อย#2
        expr#2 ["สแกน"]
    จบ
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    กลุ่มย่อย#4
        expr#4 ["เข้าร่วม"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#1
        expr#1 ["สแกน"]
    จบ
    กลุ่มย่อย#3
        expr#3 ["สแกน"]
    จบ
    กลุ่มย่อย#6
        Expr#6 ["Project"]
    จบ
    expr#6 -> กลุ่ม#5

ในขณะที่แผนนี้ไม่ได้:

 กราฟ TD
    กลุ่มย่อย#2
        expr#2 ["สแกน"]
    จบ
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#3
    expr#5 -> กลุ่ม#4
    กลุ่มย่อย#4
        expr#4 ["สแกน"]
    จบ
    กลุ่มย่อย#7
        Expr#7 ["Project"]
    จบ
    expr#7 -> กลุ่ม#6
    กลุ่มย่อย#1
        expr#1 ["สแกน"]
    จบ
    กลุ่มย่อย#3
        Expr#3 ["Project"]
    จบ
    expr#3 -> กลุ่ม#2
    กลุ่มย่อย#6
        expr#6 ["เข้าร่วม"]
    จบ
    expr#6 -> กลุ่ม#1
    expr#6 -> กลุ่ม#5

วางแผนการแจกแจง

อย่างที่เราเคยพูดไปก่อนหน้านี้วิธีการสำรวจคือ:

สำหรับแต่ละกลุ่มให้ใช้กฎการแปลงเพื่อค้นหานิพจน์กลุ่มที่เทียบเท่าทั้งหมดและเพิ่มในชุดเทียบเท่าจนกว่าเราจะไม่พบแผนใหม่ใด ๆ ที่เทียบเท่า
สำหรับการแสดงออกของแต่ละกลุ่มสำรวจกลุ่มเด็กทั้งหมด

และนี่คือรหัสการสำรวจ (ค่อนข้างง่าย, huh):

 private def exploreGroup (
                          group : Group ,
                          rules : Seq [ TransformationRule ],
                          round : Int
                        )( implicit ctx : VolcanoPlannerContext ) : Unit = {
  while ( ! group.explorationMark.isExplored(round)) {
    group.explorationMark.markExplored(round)
    // explore all child groups
    group.equivalents.foreach { equivalent =>
      if ( ! equivalent.explorationMark.isExplored(round)) {
        equivalent.explorationMark.markExplored(round)
        equivalent.children.foreach { child =>
          exploreGroup(child, rules, round)
          if (equivalent.explorationMark.isExplored(round) && child.explorationMark.isExplored(round)) {
            equivalent.explorationMark.markExplored(round)
          } else {
            equivalent.explorationMark.markUnexplored(round)
          }
        }
      }
      // fire transformation rules to explore all the possible transformations
      rules.foreach { rule =>
        if ( ! equivalent.appliedTransformations.contains(rule) && rule.`match`(equivalent)) {
          val transformed = rule.transform(equivalent)
          if ( ! group.equivalents.contains(transformed)) {
            group.equivalents += transformed
            transformed.explorationMark.markUnexplored(round)
            group.explorationMark.markUnexplored(round)
          }
        }
      }
      if (group.explorationMark.isExplored(round) && equivalent.explorationMark.isExplored(round)) {
        group.explorationMark.markExplored(round)
      } else {
        group.explorationMark.markUnexplored(round)
      }
    }
  }
}

ดู Volcanoplanner.scala สำหรับรายละเอียดเพิ่มเติม

ใช้กฎการแปลงบางอย่าง

ตอนนี้ถึงเวลาที่จะใช้กฎการเปลี่ยนแปลงบางอย่าง

การฉายภาพ

การฉายภาพแบบพุชดาวน์เป็นกฎการแปลงอย่างง่ายที่ใช้ในการผลักดันการฉายลงไปที่ชั้นเก็บข้อมูล

ตัวอย่างเช่นการสืบค้น

 SELECT field1, field2
from tbl

มีแผน

 กราฟ LR
    โครงการ [Project Field1, Field2]
    สแกน [สแกน TBL]
    โครงการ -> สแกน

ด้วยแผนนี้เมื่อดำเนินการแถวจากชั้นเก็บข้อมูล (ภายใต้การสแกน) จะถูกดึงอย่างเต็มที่และฟิลด์ที่ไม่จำเป็นจะถูกลดลง (โครงการ) ข้อมูลที่ไม่จำเป็นยังคงต้องย้ายจากโหนดสแกนไปยังโหนดโครงการดังนั้นจึงมีความพยายามที่สูญเปล่าบางอย่างที่นี่

เราสามารถทำให้ดีขึ้นได้เพียงแค่บอกเลเยอร์ที่เก็บข้อมูลเท่านั้นดึงฟิลด์ที่จำเป็น ตอนนี้แผนจะถูกเปลี่ยนเป็น:

 กราฟ LR
    โครงการ [Project Field1, Field2]
    สแกน ["สแกน TBL (Field1, Field2)"]
    โครงการ -> สแกน

ไปที่รหัสกันเถอะ:

 override def `match` ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : Boolean = {
  val plan = expression.plan
  plan match {
    case Project (_, child) => check(child)
    case _ => false
  }
}

// check if the tree only contains SCAN and JOIN nodes
private def check ( node : LogicalPlan ) : Boolean = {
  node match {
    case Scan (_, _) => true
    case Join (left, right, _) => check(left) && check(right)
    case _ => false
  }
}

กฎการส่งข้อมูลของเราที่นี่จะตรงกับแผนเมื่อเป็นโหนดโครงการและลูกหลานทั้งหมดของมันจะสแกนและเข้าร่วมโหนดเท่านั้น

หมายเหตุ: ที่จริงแล้วการจับคู่การคาดการณ์การคาดการณ์ที่แท้จริงนั้นซับซ้อนกว่า แต่เพื่อความเรียบง่ายกฎการจับคู่ที่นี่เป็นเพียงโหนดโครงการที่มีการสแกนและเข้าร่วมลูกหลาน

และนี่คือรหัสแปลง:

 override def transform ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : GroupExpression = {
  val plan = expression.plan. asInstanceOf [ Project ]
  val pushDownProjection = mutable. ListBuffer [ FieldID ]()
  extractProjections(plan, pushDownProjection)
  val newPlan = Project (plan.fields, pushDown(pushDownProjection.distinct, plan.child))
  ctx.memo.getOrCreateGroupExpression(newPlan)
}

private def extractProjections ( node : LogicalPlan , buffer : mutable. ListBuffer [ FieldID ]) : Unit = {
  node match {
    case Scan (_, _) => () : Unit
    case Project (fields, parent) =>
      buffer ++= fields
      extractProjections(parent, buffer)
    case Join (left, right, on) =>
      buffer ++= on.map(_._1) ++ on.map(_._2)
      extractProjections(left, buffer)
      extractProjections(right, buffer)
  }
}

private def pushDown ( pushDownProjection : Seq [ FieldID ], node : LogicalPlan ) : LogicalPlan = {
  node match {
    case Scan (table, tableProjection) =>
      val filteredPushDownProjection = pushDownProjection.filter(_.table == table).map(_.id)
      val updatedProjection =
        if (filteredPushDownProjection.contains( " * " ) || filteredPushDownProjection.contains( " *.* " )) {
          Seq .empty
        } else {
          (tableProjection ++ filteredPushDownProjection).distinct
        }
      Scan (table, updatedProjection)
    case Join (left, right, on) => Join (pushDown(pushDownProjection, left), pushDown(pushDownProjection, right), on)
    case _ => throw new Exception ( " should never happen " )
  }
}

รหัสการแปลงจะพบการคาดการณ์ทั้งหมดจากโหนดรูทโปรเจ็กต์แล้วกดลงไปที่โหนดสแกนทั้งหมดภายใต้

การมองเห็นกฎของเราเช่นแผน

 กราฟ TD
    กลุ่มย่อย#2
        expr#2 ["สแกน TBL2"]
    จบ
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    กลุ่มย่อย#4
        expr#4 ["เข้าร่วม"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#1
        expr#1 ["สแกน TBL1"]
    จบ
    กลุ่มย่อย#3
        expr#3 ["สแกน TBL3"]
    จบ
    กลุ่มย่อย#6
        expr#6 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
    จบ
    expr#6 -> กลุ่ม#5

หลังจากใช้การแปลงการเปลี่ยนแปลงการฉายภาพจะส่งผลให้แผนเทียบเท่าใหม่กับการคาดการณ์จะถูกผลักลงไปที่การดำเนินการสแกน (แผนใหม่คือต้นไม้ที่มีโหนดชายแดนสีส้ม)

 กราฟ TD
    กลุ่มย่อย#8
        Expr#8 ["Scan TBL2 (ID, Field1, Field2)"]
    จบ
    กลุ่มย่อย#2
        expr#2 ["สแกน TBL2"]
    จบ
    กลุ่มย่อย#11
        expr#11 ["เข้าร่วม"]
    จบ
    expr#11 -> กลุ่ม#7
    expr#11 -> กลุ่ม#10
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    กลุ่มย่อย#4
        expr#4 ["เข้าร่วม"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#7
        Expr#7 ["Scan TBL1 (ID, Field1)"]
    จบ
    กลุ่มย่อย#10
        expr#10 ["เข้าร่วม"]
    จบ
    expr#10 -> กลุ่ม#8
    expr#10 -> กลุ่ม#9
    กลุ่มย่อย#1
        expr#1 ["สแกน TBL1"]
    จบ
    กลุ่มย่อย#9
        Expr#9 ["Scan TBL3 (ID, Field2)"]
    จบ
    กลุ่มย่อย#3
        expr#3 ["สแกน TBL3"]
    จบ
    กลุ่มย่อย#6
        expr#12 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
        expr#6 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
    จบ
    expr#12 -> กลุ่ม#11
    expr#6 -> กลุ่ม#5
    Style Expr#12 Stroke-Width: 4px, Stroke: Orange
    Style Expr#8 Stroke-Width: 4px, Stroke: Orange
    Style Expr#10 Stroke-Width: 4px, Stroke: Orange
    Style Expr#9 Stroke-Width: 4px, Stroke: Orange
    Style Expr#11 Stroke-Width: 4px, Stroke: Orange
    Style Expr#7 Stroke-Width: 4px, Stroke: Orange
    linkstyle 0 จังหวะความกว้าง: 4px, Stroke: Orange
    linkstyle 1 จังหวะความกว้าง: 4px, Stroke: Orange
    LinkStyle 6 จังหวะ: 4px, Stroke: Orange
    LinkStyle 7 จังหวะ: 4px, Stroke: Orange
    linkstyle 8 จังหวะความกว้าง: 4px, stroke: สีส้ม

ดู ProjectionPushDown.scala สำหรับการใช้งานเต็มรูปแบบ

เข้าร่วมใหม่

เข้าร่วมใหม่อีกครั้งยังเป็นหนึ่งในการเปลี่ยนแปลงที่ได้รับการยอมรับมากที่สุดในโลกของการวางแผนการสืบค้น ผู้วางแผนของเราจะใช้กฎการแปลงใหม่

เนื่องจากเข้าร่วมใหม่ในโลกแห่งความเป็นจริงไม่ใช่เรื่องง่ายที่จะนำไปใช้ ดังนั้นเราจะใช้งานที่เรียบง่ายและฉ้อโกงของ REORRORDER RULE ที่นี่

ขั้นแรก match คู่กฎ:

 // check if the tree only contains SCAN and JOIN nodes, and also extract all SCAN nodes and JOIN conditions
private def checkAndExtract (
                             node : LogicalPlan ,
                             buffer : mutable. ListBuffer [ Scan ],
                             joinCondBuffer : mutable. ListBuffer [(ql. FieldID , ql. FieldID )]
                           ) : Boolean = {
  node match {
    case node @ Scan (_, _) =>
      buffer += node
      true
    case Join (left, right, on) =>
      joinCondBuffer ++= on
      checkAndExtract(left, buffer, joinCondBuffer) && checkAndExtract(right, buffer, joinCondBuffer)
    case _ => false
  }
}

private def buildInterchangeableJoinCond ( conditions : Seq [(ql. FieldID , ql. FieldID )]) : Seq [ Seq [ql. FieldID ]] = {
  val buffer = mutable. ListBuffer [mutable. Set [ql. FieldID ]]()
  conditions.foreach { cond =>
    val set = buffer.find { set =>
      set.contains(cond._1) || set.contains(cond._2)
    } match {
      case Some (set) => set
      case None =>
        val set = mutable. Set [ql. FieldID ]()
        buffer += set
        set
    }
    set += cond._1
    set += cond._2
  }
  buffer.map(_.toSeq)
}

override def `match` ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : Boolean = {
  val plan = expression.plan
  plan match {
    case node @ Join (_, _, _) =>
      val buffer = mutable. ListBuffer [ Scan ]()
      val joinCondBuffer = mutable. ListBuffer [(ql. FieldID , ql. FieldID )]()
      if (checkAndExtract(node, buffer, joinCondBuffer)) {
        // only match if the join is 3 tables join
        if (buffer.size == 3 ) {
          var check = true
          val interChangeableCond = buildInterchangeableJoinCond(joinCondBuffer)
          interChangeableCond.foreach { c =>
            check &= c.size == 3
          }
          check
        } else {
          false
        }
      } else {
        false
      }
    case _ => false
  }
}

กฎของเราจะถูกจับคู่เฉพาะถ้าเราจับคู่การเข้าร่วม 3 ทาง (จำนวนตารางที่เกี่ยวข้องต้องเป็น 3 และเงื่อนไขการเข้าร่วมจะต้องเป็น 3 ทางเช่น tbl1.field1 = tbl2.field2 = tbl3.field3 )

ตัวอย่างเช่น,

tbl1
    JOIN tbl2 ON tbl1 . field1 = tbl2 . field2
    JOIN tbl3 ON tbl1 . field1 = tbl3 . field3

คำสั่งการเข้าร่วมที่นี่จะ "จับคู่" เนื่องจากเป็นการเข้าร่วม 3 ทาง (เป็นการเข้าร่วมระหว่าง tbl1 , tbl2 , tbl3 และเงื่อนไขคือ tbl1.field1 = tbl2.field2 = tbl3.field3 )

ถัดไปคือรหัสแปลง:

 override def transform ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : GroupExpression = {
  val plan = expression.plan. asInstanceOf [ Join ]
  val buffer = mutable. ListBuffer [ Scan ]()
  val joinCondBuffer = mutable. ListBuffer [(ql. FieldID , ql. FieldID )]()
  checkAndExtract(plan, buffer, joinCondBuffer)
  val interChangeableCond = buildInterchangeableJoinCond(joinCondBuffer)
  //
  val scans = buffer.toList
  implicit val ord : Ordering [ Scan ] = new Ordering [ Scan ] {
    override def compare ( x : Scan , y : Scan ) : Int = {
      val xStats = ctx.statsProvider.tableStats(x.table.id)
      val yStats = ctx.statsProvider.tableStats(y.table.id)
      xStats.estimatedTableSize.compareTo(yStats.estimatedTableSize)
    }
  }

  def getJoinCond ( left : Scan , right : Scan ) : Seq [(ql. FieldID , ql. FieldID )] = {
    val leftFields = interChangeableCond.flatMap { c =>
      c.filter(p => p.table == left.table)
    }
    val rightFields = interChangeableCond.flatMap { c =>
      c.filter(p => p.table == right.table)
    }
    if (leftFields.length != rightFields.length) {
      throw new Exception ( s " leftFields.length( ${leftFields.length} ) != rightFields.length( ${rightFields.length} ) " )
    } else {
      leftFields zip rightFields
    }
  }

  val sorted = scans.sorted
  val newPlan = Join (
    sorted( 0 ),
    Join (
      sorted( 1 ),
      sorted( 2 ),
      getJoinCond(sorted( 1 ), sorted( 2 ))
    ),
    getJoinCond(sorted( 0 ), sorted( 1 ))
  )
  ctx.memo.getOrCreateGroupExpression(newPlan)
}

รหัสการแปลงที่นี่จะสั่งซื้อตารางใหม่ด้วยขนาดโดยประมาณ

ตัวอย่างเช่นหากเรามี 3 ตาราง A, B, C ที่มีขนาดประมาณ 300B, 100B, 200B และคำสั่งเข้าร่วม A JOIN B JOIN C แล้วมันจะถูกเปลี่ยนเป็น B JOIN C JOIN A

หมายเหตุ: คุณอาจสังเกตเห็นในรหัสนี้เราได้ใช้สถิติตารางเพื่อให้คำแนะนำในการแปลงแผน ในทางปฏิบัติผู้วางแผนสามารถใช้สถิติทุกประเภทเพื่อช่วยในการเปลี่ยนแปลงเช่นขนาดตารางขนาดแถวจำนวนโมฆะฮิสโตแกรม ฯลฯ

การมองเห็นกฎของเราเช่นแผน

 กราฟ TD
    กลุ่มย่อย#2
        expr#2 ["สแกน TBL2"]
    จบ
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    กลุ่มย่อย#4
        expr#4 ["เข้าร่วม"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#1
        expr#1 ["สแกน TBL1"]
    จบ
    กลุ่มย่อย#3
        expr#3 ["สแกน TBL3"]
    จบ
    กลุ่มย่อย#6
        expr#6 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
    จบ
    expr#6 -> กลุ่ม#5

หลังจากเข้าร่วมการแปลงใหม่

 กราฟ TD
    กลุ่มย่อย#2
        expr#2 ["สแกน TBL2"]
    จบ
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
        expr#8 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    expr#8 -> กลุ่ม#2
    expr#8 -> กลุ่ม#7
    กลุ่มย่อย#4
        expr#4 ["เข้าร่วม"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#7
        expr#7 ["เข้าร่วม"]
    จบ
    expr#7 -> กลุ่ม#1
    expr#7 -> กลุ่ม#3
    กลุ่มย่อย#1
        expr#1 ["สแกน TBL1"]
    จบ
    กลุ่มย่อย#3
        expr#3 ["สแกน TBL3"]
    จบ
    กลุ่มย่อย#6
        expr#6 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
    จบ
    expr#6 -> กลุ่ม#5
    Style Expr#8 Stroke-Width: 4px, Stroke: Orange
    Style Expr#7 Stroke-Width: 4px, Stroke: Orange
    linkstyle 2 จังหวะความกว้าง: 4px, Stroke: Orange
    LinkStyle 6 จังหวะ: 4px, Stroke: Orange
    linkstyle 3 จังหวะความกว้าง: 4px, Stroke: Orange
    LinkStyle 7 จังหวะ: 4px, Stroke: Orange

เราจะเห็นได้ว่า tbl2 JOIN tbl1 JOIN tbl3 ถูกสร้างขึ้นจาก tbl1 JOIN tbl2 JOIN tbl3 ถูกสร้างขึ้นโดยการแปลง (โหนดและขอบที่เพิ่มขึ้นใหม่จะถูกระบุโดยเส้นสีส้ม)

ดู x3tablejoinreorderbysize.scala สำหรับการใช้งานเต็มรูปแบบ

รวมการเปลี่ยนแปลงทั้งหมดเข้าด้วยกัน

ตอนนี้เราสามารถวางกฎการเปลี่ยนแปลงของเราไว้ในที่เดียว

 private val transformationRules : Seq [ Seq [ TransformationRule ]] = Seq (
  Seq ( new ProjectionPushDown ),
  Seq ( new X3TableJoinReorderBySize )
)

และเรียกใช้พวกเขาเพื่อสำรวจกลุ่มที่เทียบเท่า

 for (r <- transformationRules.indices) {
  exploreGroup(ctx.rootGroup, transformationRules(r), r + 1 )
}

ตัวอย่างเช่นแผน

 กราฟ TD
    กลุ่มย่อย#2
        expr#2 ["สแกน TBL2"]
    จบ
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    กลุ่มย่อย#4
        expr#4 ["เข้าร่วม"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#1
        expr#1 ["สแกน TBL1"]
    จบ
    กลุ่มย่อย#3
        expr#3 ["สแกน TBL3"]
    จบ
    กลุ่มย่อย#6
        expr#6 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
    จบ
    expr#6 -> กลุ่ม#5

หลังจากถูกสำรวจแล้วจะส่งผลให้กราฟนี้

 กราฟ TD
    กลุ่มย่อย#8
        Expr#8 ["Scan TBL2 (ID, Field1, Field2)"]
    จบ
    กลุ่มย่อย#11
        expr#11 ["เข้าร่วม"]
        expr#14 ["เข้าร่วม"]
    จบ
    expr#11 -> กลุ่ม#7
    expr#11 -> กลุ่ม#10
    expr#14 -> กลุ่ม#8
    expr#14 -> กลุ่ม#12
    กลุ่มย่อย#2
        expr#2 ["สแกน TBL2"]
    จบ
    กลุ่มย่อย#5
        expr#5 ["เข้าร่วม"]
        expr#16 ["เข้าร่วม"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    expr#16 -> กลุ่ม#2
    expr#16 -> กลุ่ม#13
    กลุ่มย่อย#4
        expr#4 ["เข้าร่วม"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#13
        expr#15 ["เข้าร่วม"]
    จบ
    expr#15 -> กลุ่ม#1
    expr#15 -> กลุ่ม#3
    กลุ่มย่อย#7
        Expr#7 ["Scan TBL1 (ID, Field1)"]
    จบ
    กลุ่มย่อย#1
        expr#1 ["สแกน TBL1"]
    จบ
    กลุ่มย่อย#10
        expr#10 ["เข้าร่วม"]
    จบ
    expr#10 -> กลุ่ม#8
    expr#10 -> กลุ่ม#9
    กลุ่มย่อย#9
        Expr#9 ["Scan TBL3 (ID, Field2)"]
    จบ
    กลุ่มย่อย#3
        expr#3 ["สแกน TBL3"]
    จบ
    กลุ่มย่อย#12
        expr#13 ["เข้าร่วม"]
    จบ
    expr#13 -> กลุ่ม#7
    expr#13 -> กลุ่ม#9
    กลุ่มย่อย#6
        expr#12 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
        expr#6 ["Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2"]
    จบ
    expr#12 -> กลุ่ม#11
    expr#6 -> กลุ่ม#5
    Style Expr#12 Stroke-Width: 4px, Stroke: Orange
    Style Expr#8 Stroke-Width: 4px, Stroke: Orange
    Style Expr#10 Stroke-Width: 4px, Stroke: Orange
    Style Expr#13 จังหวะ: 4px, Stroke: Orange
    Style Expr#14 Stroke-Width: 4px, Stroke: Orange
    Style Expr#11 Stroke-Width: 4px, Stroke: Orange
    Style Expr#9 Stroke-Width: 4px, Stroke: Orange
    Style Expr#15 Stroke-Width: 4px, Stroke: Orange
    Style Expr#7 Stroke-Width: 4px, Stroke: Orange
    Style Expr#16 Stroke-Width: 4px, Stroke: Orange
    linkstyle 0 จังหวะความกว้าง: 4px, Stroke: Orange
    LinkStyle 15 จังหวะ: 4px, Stroke: Orange
    linkstyle 12 จังหวะ: 4px, Stroke: Orange
    linkstyle 1 จังหวะความกว้าง: 4px, Stroke: Orange
    linkstyle 16 จังหวะความกว้าง: 4px, Stroke: Orange
    LinkStyle 13 จังหวะ: 4px, Stroke: Orange
    linkstyle 2 จังหวะความกว้าง: 4px, Stroke: Orange
    LinkStyle 6 จังหวะ: 4px, Stroke: Orange
    linkstyle 3 จังหวะความกว้าง: 4px, Stroke: Orange
    linkstyle 10 จังหวะความกว้าง: 4px, Stroke: Orange
    LinkStyle 7 จังหวะ: 4px, Stroke: Orange
    LinkStyle 14 จังหวะ: 4px, Stroke: Orange
    LinkStyle 11 จังหวะ: 4px, Stroke: Orange

ดู Volcanoplanner.scala สำหรับรายละเอียดเพิ่มเติม

ขั้นตอนการเพิ่มประสิทธิภาพ

หลังจากขั้นตอนการสำรวจตอนนี้เรามีต้นไม้ขยายอย่างเต็มที่ซึ่งมีแผนการที่เป็นไปได้ทั้งหมดตอนนี้คือขั้นตอนการเพิ่มประสิทธิภาพ

ในขั้นตอนนี้เราจะพบแผนการที่ดีที่สุดสำหรับกลุ่มรากของเรา กระบวนการปรับให้เหมาะสมมีการอธิบายดังต่อไปนี้:

สำหรับแต่ละกลุ่มเราจะพบการใช้งานที่ดีที่สุดโดยเลือกกลุ่มที่แสดงด้วยต้นทุนต่ำสุด
สำหรับการแสดงออกของแต่ละกลุ่มก่อนอื่นเราจะระบุการใช้งานทางกายภาพจากแผนตรรกะ จากนั้นสำหรับการใช้งานทางกายภาพแต่ละครั้งเราจะคำนวณค่าใช้จ่ายโดยใช้ค่าใช้จ่ายกลุ่มเด็ก

นี่คือตัวอย่าง

 กราฟ TD
    Subgraph Group#2 ["Group#2 (cost = 1)"]
        expr#2 ["expr#2 (cost = 1)"]
    จบ
    กลุ่มย่อย#5 ["กลุ่ม#5 (ค่าใช้จ่าย = 3)"]
        expr#5 ["expr#5 (ค่าใช้จ่าย = สูงสุด (3,2) = 3"]
    จบ
    expr#5 -> กลุ่ม#1
    expr#5 -> กลุ่ม#4
    กลุ่มย่อย#4 ["กลุ่ม#4 (ค่าใช้จ่าย = 2)"]
        expr#4 ["expr#4 (ค่าใช้จ่าย = สูงสุด (1,2) = 2)"]
        expr#7 ["expr#7 (cost = 1+2 = 3)"]
    จบ
    expr#4 -> กลุ่ม#2
    expr#4 -> กลุ่ม#3
    กลุ่มย่อย#1 ["กลุ่ม#1 (ค่าใช้จ่าย = 3)"]
        expr#1 ["expr#1 (cost = 3)"]
    จบ
    กลุ่มย่อย#3 ["กลุ่ม#3 (ค่าใช้จ่าย = 2)"]
        expr#3 ["expr#3 (cost = 2)"]
    จบ
    Subgraph Group#6 ["กลุ่ม#6 (ค่าใช้จ่าย = 4.5)"]
        expr#6 ["expr#6 (cost = 3*1.5 = 4.5)"]
    จบ
    expr#6 -> กลุ่ม#5
    กลุ่มย่อย#8 ["กลุ่ม#8 (ค่าใช้จ่าย = 1)"]
        expr#8 ["expr#8 (cost = 1)"]
    จบ
    Subgraph Group#9 ["กลุ่ม#9 (ค่าใช้จ่าย = 2)"]
        expr#9 ["expr#9 (cost = 2)"]
    จบ
    expr#7 -> กลุ่ม#8
    expr#7 -> กลุ่ม#9

ตัวอย่างเช่นค่าใช้จ่าย Expr#4 คำนวณโดยค่าใช้จ่ายกลุ่มเด็ก ( Group#2 และ Group#3 ) โดยใช้ฟังก์ชั่น max อีกตัวอย่างหนึ่งคือ Group#4 ค่าใช้จ่ายของมันจะคำนวณโดยการคำนวณค่าขั้นต่ำระหว่างต้นทุนของการแสดงออกที่เทียบเท่า

แผนการทางกายภาพ

เนื่องจากเป้าหมายของขั้นตอนการเพิ่มประสิทธิภาพคือการสร้างแผนทางกายภาพที่ดีที่สุดเนื่องจากการแสดงออกของกลุ่มที่สำรวจ เราสามารถกำหนดแผนทางกายภาพดังต่อไปนี้:

 sealed trait PhysicalPlan {
  def operator () : Operator

  def children () : Seq [ PhysicalPlan ]

  def cost () : Cost

  def estimations () : Estimations

  def traits () : Set [ String ]
}

operator เป็นผู้ให้บริการทางกายภาพซึ่งใช้ในการดำเนินการตามแผนเราจะครอบคลุมในส่วนต่อมา จากนั้น children ก็เป็นรายการของโหนดแผนเด็กพวกเขาใช้เพื่อเข้าร่วมในกระบวนการคำนวณต้นทุน แอตทริบิวต์ที่สามคือ cost cost เป็นข้อมูลค่าใช้จ่ายในการถือวัตถุ (เช่นต้นทุน CPU ค่าใช้จ่ายหน่วยความจำต้นทุน IO ฯลฯ ) estimations เป็นสถิติการถือครองทรัพย์สินโดยประมาณเกี่ยวกับแผน (เช่นจำนวนแถวขนาดแถว ฯลฯ ) นอกจากนี้ยังมีส่วนร่วมในการคำนวณต้นทุน ในที่สุด traits เป็นชุดของลักษณะทางกายภาพซึ่งส่งผลกระทบต่อกฎการดำเนินการเพื่อส่งผลกระทบต่อกระบวนการสร้างแผนทางกายภาพ

ต่อไปเราสามารถใช้คลาสโหนดทางกายภาพ:

 case class Scan (
                 operator : Operator ,
                 cost : Cost ,
                 estimations : Estimations ,
                 traits : Set [ String ] = Set .empty
               ) extends PhysicalPlan {
  override def children () : Seq [ PhysicalPlan ] = Seq .empty // scan do not receive any child
}

case class Project (
                    operator : Operator ,
                    child : PhysicalPlan ,
                    cost : Cost ,
                    estimations : Estimations ,
                    traits : Set [ String ] = Set .empty
                  ) extends PhysicalPlan {
  override def children () : Seq [ PhysicalPlan ] = Seq (child)
}

case class Join (
                 operator : Operator ,
                 leftChild : PhysicalPlan ,
                 rightChild : PhysicalPlan ,
                 cost : Cost ,
                 estimations : Estimations ,
                 traits : Set [ String ] = Set .empty
               ) extends PhysicalPlan {
  override def children () : Seq [ PhysicalPlan ] = Seq (leftChild, rightChild)
}

ดู physicalplan.scala สำหรับการใช้งานเต็มรูปแบบ

กฎการดำเนินการ

สิ่งแรกในขั้นตอนการเพิ่มประสิทธิภาพนั่นคือเราต้องใช้กฎการใช้งาน กฎการใช้งานเป็นกฎในการแปลงจากแผนตรรกะเป็นแผนทางกายภาพโดยไม่ดำเนินการ

เนื่องจากเราไม่ได้ดำเนินการตามแผนทางกายภาพโดยตรงในการวางแผนดังนั้นเราจะส่งคืนตัวสร้างแผนการทางกายภาพแทนและง่ายต่อการปรับแต่งฟังก์ชั่นต้นทุนสำหรับแต่ละโหนด

นี่คืออินเทอร์เฟซของกฎการใช้งาน:

 trait PhysicalPlanBuilder {
  def build ( children : Seq [ PhysicalPlan ]) : Option [ PhysicalPlan ]
}

trait ImplementationRule {
  def physicalPlanBuilders ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : Seq [ PhysicalPlanBuilder ]
}

ที่นี่ PhysicalPlanBuilder เป็นอินเทอร์เฟซที่ใช้ในการสร้างแผนทางกายภาพเนื่องจากแผนการทางกายภาพของเด็ก

ตัวอย่างเช่นการเข้าร่วมแบบลอจิคัลมีการใช้งานทางกายภาพ 2 ครั้งคือการเข้าร่วมและผสานเข้าร่วม

 กราฟ TD
    เด็ก#1 ["ลูก#1"]
    เด็ก#2 ["ลูก#2"]
    เด็ก#3 ["ลูก#3"]
    เด็ก#4 ["เด็ก#4"]
    hash_join ["` `hash ino 
    ค่าใช้จ่าย = F (ค่าใช้จ่าย (เด็ก#1), ค่าใช้จ่าย (เด็ก#2))
    -
    merge_join ["` รวมเข้าร่วม
    ค่าใช้จ่าย = g (ค่าใช้จ่าย (เด็ก#3), ค่าใช้จ่าย (เด็ก#4))
    -
    hash_join --> child#1
    hash_join --> child#2
    merge_join --> child#3
    merge_join --> child#4

the HASH JOIN cost is using function f() to calculate cost, and MERGE JOIN is using function g() to calculate cost, both are using its children as function parameters. So it's easier to code if we're returning just the phyiscal plan builder from the implementation rule instead of the physical plan.

Finding the best plan

As we've said before, the optimization process is described as following:

For each group, we will find the best implementation by choosing the group expressing with the lowest cost
For each group expression, first we will enumerate the physical implementations from the logical plan. Then for each physical implementation, we will calculate its cost using its child group costs.

And here is the code:

 private def implementGroup ( group : Group , combinedRule : ImplementationRule )(
  implicit ctx : VolcanoPlannerContext
) : GroupImplementation = {
  group.implementation match {
    case Some (implementation) => implementation
    case None =>
      var bestImplementation = Option .empty[ GroupImplementation ]
      group.equivalents.foreach { equivalent =>
        val physicalPlanBuilders = combinedRule.physicalPlanBuilders(equivalent)
        val childPhysicalPlans = equivalent.children.map { child =>
          val childImplementation = implementGroup(child, combinedRule)
          child.implementation = Option (childImplementation)
          childImplementation.physicalPlan
        }
        // calculate the implementation, and update the best cost for group
        physicalPlanBuilders.flatMap(_.build(childPhysicalPlans)).foreach { physicalPlan =>
          val cost = physicalPlan.cost()
          bestImplementation match {
            case Some (currentBest) =>
              if (ctx.costModel.isBetter(currentBest.cost, cost)) {
                bestImplementation = Option (
                  GroupImplementation (
                    physicalPlan = physicalPlan,
                    cost = cost,
                    selectedEquivalentExpression = equivalent
                  )
                )
              }
            case None =>
              bestImplementation = Option (
                GroupImplementation (
                  physicalPlan = physicalPlan,
                  cost = cost,
                  selectedEquivalentExpression = equivalent
                )
              )
          }
        }
      }
      bestImplementation.get
  }
}

This code is an exhaustive search code, which is using recursive function to traverse all nodes. At each node (group), the function is called once to get its best plan while also calculate the optimal cost of that group.

Finally, the best plan for our query is the best plan of the root group

 val implementationRules = new ImplementationRule {

  override def physicalPlanBuilders (
                                     expression : GroupExpression
                                   )( implicit ctx : VolcanoPlannerContext ) : Seq [ PhysicalPlanBuilder ] = {
    expression.plan match {
      case node @ Scan (_, _) => implement. Scan (node)
      case node @ Project (_, _) => implement. Project (node)
      case node @ Join (_, _, _) => implement. Join (node)
    }
  }
}

ctx.rootGroup.implementation = Option (implementGroup(ctx.rootGroup, implementationRules))

See VolcanoPlanner.scala for full implementation

Here is an example of the plan after optimization, it's shown the selected logical node, the selected physical operator, and the estimated cost

 graph TD
    Group#6["
    Group #6
Selected: PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2
Operator: ProjectOperator
Cost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)
-
Group#6 --> Group#11
Group#11["
Group #11
Selected: JOIN
Operator: HashJoinOperator
Cost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)
-
Group#11 --> Group#7
Group#11 --> Group#10
Group#7["
Group #7
Selected: SCAN tbl1 (id, field1)
Operator: NormalScanOperator
Cost: Cost(cpu=400.00, mem=400000.00, time=1000.00)
-
Group#10["
Group #10
Selected: JOIN
Operator: MergeJoinOperator
Traits: SORTED
Cost: Cost(cpu=640000.00, mem=20000012.00, time=1100000.00)
-
Group#10 --> Group#8
Group#10 --> Group#9
Group#8["
Group #8
Selected: SCAN tbl2 (id, field1, field2)
Operator: NormalScanOperator
Traits: SORTED
Cost: Cost(cpu=600000.00, mem=12.00, time=1000000.00)
-
Group#9["
Group #9
Selected: SCAN tbl3 (id, field2)
Operator: NormalScanOperator
Traits: SORTED
Cost: Cost(cpu=40000.00, mem=20000000.00, time=100000.00)
-

Implement the implementation rules

Next, we will implement some implementation rules.

โครงการ

The first, easiest one is the implementation rule of logical PROJECT

 object Project {

  def apply ( node : logicalplan. Project )( implicit ctx : VolcanoPlannerContext ) : Seq [ PhysicalPlanBuilder ] = {
    Seq (
      new ProjectionImpl (node.fields)
    )
  }
}

class ProjectionImpl ( projection : Seq [ql. FieldID ]) extends PhysicalPlanBuilder {

  override def build ( children : Seq [ PhysicalPlan ]) : Option [ PhysicalPlan ] = {
    val child = children.head
    val selfCost = Cost (
      estimatedCpuCost = 0 ,
      estimatedMemoryCost = 0 ,
      estimatedTimeCost = 0
    ) // assuming the cost of projection is 0
    val cost = Cost (
      estimatedCpuCost = selfCost.estimatedCpuCost + child.cost().estimatedCpuCost,
      estimatedMemoryCost = selfCost.estimatedMemoryCost + child.cost().estimatedMemoryCost,
      estimatedTimeCost = selfCost.estimatedTimeCost + child.cost().estimatedTimeCost
    )
    val estimations = Estimations (
      estimatedLoopIterations = child.estimations().estimatedLoopIterations,
      estimatedRowSize = child.estimations().estimatedRowSize // just guessing the value
    )
    Some (
      Project (
        operator = ProjectOperator (projection, child.operator()),
        child = child,
        cost = cost,
        estimations = estimations,
        traits = child.traits()
      )
    )
  }
}

The implementation rule for logical PROJECT, is returning one physical plan builder ProjectionImpl . ProjectionImpl cost calculation is simple, it just inherits the cost from the child node (because the projection is actually not doing any intensive operation). Beside that, it also updates the estimation (in this code, estimation is also inherit from the child node)

See Project.scala for full implementation

เข้าร่วม

Writing implementation rule for logical JOIN is way harder than PROJECTION.

One first reason is, a logical JOIN has many physical implementation, such as HASH JOIN, MERGE JOIN, BROADCAST JOIN, etc.

The second reason is, estimating cost for physical JOIN is also hard, because it depends on lots of factors such as row count, row size, data histogram, indexes, data layout, etc.

So, to keep everything simple in this guide, I will only implement 2 physical JOIN: HASH JOIN and MERGE JOIN. The cost estimation functions are fictional (just to show how it works, I'm not trying to correct it). And in the MERGE JOIN, all data is assuming to be sorted by join key.

Here is the code:

 object Join {

  def apply ( node : logicalplan. Join )( implicit ctx : VolcanoPlannerContext ) : Seq [ PhysicalPlanBuilder ] = {
    val leftFields = node.on.map(_._1).map(f => s " ${f.table.id} . ${f.id} " )
    val rightFields = node.on.map(_._2).map(f => s " ${f.table.id} . ${f.id} " )
    Seq (
      new HashJoinImpl (leftFields, rightFields),
      new MergeJoinImpl (leftFields, rightFields)
    )
  }
}

The HASH JOIN:

 class HashJoinImpl ( leftFields : Seq [ String ], rightFields : Seq [ String ]) extends PhysicalPlanBuilder {

  private def viewSize ( plan : PhysicalPlan ) : Long = {
    plan.estimations().estimatedLoopIterations * plan.estimations().estimatedRowSize
  }

  // noinspection ZeroIndexToHead,DuplicatedCode
  override def build ( children : Seq [ PhysicalPlan ]) : Option [ PhysicalPlan ] = {
    // reorder the child nodes, the left child is the child with smaller view size (smaller than the right child if we're store all of them in memory)
    val (leftChild, rightChild) = if (viewSize(children( 0 )) < viewSize(children( 1 ))) {
      (children( 0 ), children( 1 ))
    } else {
      (children( 1 ), children( 0 ))
    }
    val estimatedLoopIterations = Math .max(
      leftChild.estimations().estimatedLoopIterations,
      rightChild.estimations().estimatedLoopIterations
    ) // just guessing the value
    val estimatedOutRowSize = leftChild.estimations().estimatedRowSize + rightChild.estimations().estimatedRowSize
    val selfCost = Cost (
      estimatedCpuCost = leftChild.estimations().estimatedLoopIterations, // cost to hash all record from the smaller view
      estimatedMemoryCost = viewSize(leftChild), // hash the smaller view, we need to hold the hash table in memory
      estimatedTimeCost = rightChild.estimations().estimatedLoopIterations
    )
    val childCosts = Cost (
      estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,
      estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,
      estimatedTimeCost = 0
    )
    val estimations = Estimations (
      estimatedLoopIterations = estimatedLoopIterations,
      estimatedRowSize = estimatedOutRowSize
    )
    val cost = Cost (
      estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,
      estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,
      estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost
    )
    Some (
      Join (
        operator = HashJoinOperator (
          leftChild.operator(),
          rightChild.operator(),
          leftFields,
          rightFields
        ),
        leftChild = leftChild,
        rightChild = rightChild,
        cost = cost,
        estimations = estimations,
        traits = Set .empty // don't inherit trait from children since we're hash join
      )
    )
  }
}

We can see that the cost function of HASH JOIN is composed of its children costs and estimations

 val selfCost = Cost (
  estimatedCpuCost = leftChild.estimations().estimatedLoopIterations, // cost to hash all record from the smaller view
  estimatedMemoryCost = viewSize(leftChild), // hash the smaller view, we need to hold the hash table in memory
  estimatedTimeCost = rightChild.estimations().estimatedLoopIterations
)

val childCosts = Cost (
  estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,
  estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,
  estimatedTimeCost = 0
)

val estimations = Estimations (
  estimatedLoopIterations = estimatedLoopIterations,
  estimatedRowSize = estimatedOutRowSize
)

val cost = Cost (
  estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,
  estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,
  estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost
)

Next, the MERGE JOIN:

 class MergeJoinImpl ( leftFields : Seq [ String ], rightFields : Seq [ String ]) extends PhysicalPlanBuilder {

  // noinspection ZeroIndexToHead,DuplicatedCode
  override def build ( children : Seq [ PhysicalPlan ]) : Option [ PhysicalPlan ] = {
    val (leftChild, rightChild) = (children( 0 ), children( 1 ))
    if (leftChild.traits().contains( " SORTED " ) && rightChild.traits().contains( " SORTED " )) {
      val estimatedTotalRowCount =
        leftChild.estimations().estimatedLoopIterations +
          rightChild.estimations().estimatedLoopIterations
      val estimatedLoopIterations = Math .max(
        leftChild.estimations().estimatedLoopIterations,
        rightChild.estimations().estimatedLoopIterations
      ) // just guessing the value
      val estimatedOutRowSize = leftChild.estimations().estimatedRowSize + rightChild.estimations().estimatedRowSize
      val selfCost = Cost (
        estimatedCpuCost = 0 , // no additional cpu cost, just scan from child iterator
        estimatedMemoryCost = 0 , // no additional memory cost
        estimatedTimeCost = estimatedTotalRowCount
      )
      val childCosts = Cost (
        estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,
        estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,
        estimatedTimeCost = 0
      )
      val estimations = Estimations (
        estimatedLoopIterations = estimatedLoopIterations,
        estimatedRowSize = estimatedOutRowSize
      )
      val cost = Cost (
        estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,
        estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,
        estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost
      )
      Some (
        Join (
          operator = MergeJoinOperator (
            leftChild.operator(),
            rightChild.operator(),
            leftFields,
            rightFields
          ),
          leftChild = leftChild,
          rightChild = rightChild,
          cost = cost,
          estimations = estimations,
          traits = leftChild.traits() ++ rightChild.traits()
        )
      )
    } else {
      None
    }
  }
}

Same with HASH JOIN, MERGE JOIN also uses its children costs and estimations to calculate its cost, but with different formulla:

 val selfCost = Cost (
  estimatedCpuCost = 0 , // no additional cpu cost, just scan from child iterator
  estimatedMemoryCost = 0 , // no additional memory cost
  estimatedTimeCost = estimatedTotalRowCount
)

val childCosts = Cost (
  estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,
  estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,
  estimatedTimeCost = 0
)

val estimations = Estimations (
  estimatedLoopIterations = estimatedLoopIterations,
  estimatedRowSize = estimatedOutRowSize
)

val cost = Cost (
  estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,
  estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,
  estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost
)

See HashJoinImpl.scala and MergeJoinImpl.scala for full implementation

คนอื่น

You can see other rules and physical plan builders here:

ดำเนินการ
builder

Putting all pieces together

Now, after done implementing the implementation rules, now we can find our best plan. Let's start over from the user query

 SELECT tbl1 . id ,
       tbl1 . field1 ,
       tbl2 . id ,
       tbl2 . field1 ,
       tbl2 . field2 ,
       tbl3 . id ,
       tbl3 . field2 ,
       tbl3 . field2
FROM tbl1
         JOIN tbl2 ON tbl1 . id = tbl2 . id
         JOIN tbl3 ON tbl2 . id = tbl3 . id

will be converted to the logical plan

 graph TD
    1326583549["PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"];
    -425111028["JOIN"];
    -349388609["SCAN tbl1"];
    1343755644["JOIN"];
    -1043437086["SCAN tbl2"];
    -1402686787["SCAN tbl3"];
    1326583549 --> -425111028;
    -425111028 --> -349388609;
    -425111028 --> 1343755644;
    1343755644 --> -1043437086;
    1343755644 --> -1402686787;

After exploration phase, it will generate lots of equivalent plans

 graph TD
    subgraph Group#8
        Expr#8["SCAN tbl2 (id, field1, field2)"]
    จบ
    subgraph Group#11
        Expr#11["JOIN"]
        Expr#14["JOIN"]
    จบ
    Expr#11 --> Group#7
    Expr#11 --> Group#10
    Expr#14 --> Group#8
    Expr#14 --> Group#12
    subgraph Group#2
        Expr#2["SCAN tbl2"]
    จบ
    subgraph Group#5
        Expr#5["JOIN"]
        Expr#16["JOIN"]
    จบ
    Expr#5 --> Group#1
    Expr#5 --> Group#4
    Expr#16 --> Group#2
    Expr#16 --> Group#13
    subgraph Group#4
        Expr#4["JOIN"]
    จบ
    Expr#4 --> Group#2
    Expr#4 --> Group#3
    subgraph Group#13
        Expr#15["JOIN"]
    จบ
    Expr#15 --> Group#1
    Expr#15 --> Group#3
    subgraph Group#7
        Expr#7["SCAN tbl1 (id, field1)"]
    จบ
    subgraph Group#1
        Expr#1["SCAN tbl1"]
    จบ
    subgraph Group#10
        Expr#10["JOIN"]
    จบ
    Expr#10 --> Group#8
    Expr#10 --> Group#9
    subgraph Group#9
        Expr#9["SCAN tbl3 (id, field2)"]
    จบ
    subgraph Group#3
        Expr#3["SCAN tbl3"]
    จบ
    subgraph Group#12
        Expr#13["JOIN"]
    จบ
    Expr#13 --> Group#7
    Expr#13 --> Group#9
    subgraph Group#6
        Expr#12["PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
        Expr#6["PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    จบ
    Expr#12 --> Group#11
    Expr#6 --> Group#5
    style Expr#12 stroke-width: 4px, stroke: orange
    style Expr#8 stroke-width: 4px, stroke: orange
    style Expr#10 stroke-width: 4px, stroke: orange
    style Expr#13 stroke-width: 4px, stroke: orange
    style Expr#14 stroke-width: 4px, stroke: orange
    style Expr#11 stroke-width: 4px, stroke: orange
    style Expr#9 stroke-width: 4px, stroke: orange
    style Expr#15 stroke-width: 4px, stroke: orange
    style Expr#7 stroke-width: 4px, stroke: orange
    style Expr#16 stroke-width: 4px, stroke: orange
    linkStyle 0 stroke-width: 4px, stroke: orange
    linkStyle 15 stroke-width: 4px, stroke: orange
    linkStyle 12 stroke-width: 4px, stroke: orange
    linkStyle 1 stroke-width: 4px, stroke: orange
    linkStyle 16 stroke-width: 4px, stroke: orange
    linkStyle 13 stroke-width: 4px, stroke: orange
    linkStyle 2 stroke-width: 4px, stroke: orange
    linkStyle 6 stroke-width: 4px, stroke: orange
    linkStyle 3 stroke-width: 4px, stroke: orange
    linkStyle 10 stroke-width: 4px, stroke: orange
    linkStyle 7 stroke-width: 4px, stroke: orange
    linkStyle 14 stroke-width: 4px, stroke: orange
    linkStyle 11 stroke-width: 4px, stroke: orange

And the at optimization phase, a final best plan is chose

 graph TD
    Group#6["
    Group #6
Selected: PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2
Operator: ProjectOperator
Cost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)
-
Group#6 --> Group#11
Group#11["
Group #11
Selected: JOIN
Operator: HashJoinOperator
Cost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)
-
Group#11 --> Group#7
Group#11 --> Group#10
Group#7["
Group #7
Selected: SCAN tbl1 (id, field1)
Operator: NormalScanOperator
Cost: Cost(cpu=400.00, mem=400000.00, time=1000.00)
-
Group#10["
Group #10
Selected: JOIN
Operator: MergeJoinOperator
Traits: SORTED
Cost: Cost(cpu=640000.00, mem=20000012.00, time=1100000.00)
-
Group#10 --> Group#8
Group#10 --> Group#9
Group#8["
Group #8
Selected: SCAN tbl2 (id, field1, field2)
Operator: NormalScanOperator
Traits: SORTED
Cost: Cost(cpu=600000.00, mem=12.00, time=1000000.00)
-
Group#9["
Group #9
Selected: SCAN tbl3 (id, field2)
Operator: NormalScanOperator
Traits: SORTED
Cost: Cost(cpu=40000.00, mem=20000000.00, time=100000.00)
-

Bonus: query execution

Now we've done building a functional query planner which can optimize the query from user, but our query plan could not run by itself. So it's the reason why now we will implement the query processor to test out our query plan.

Basically the query process receive input from the query planner, and execute them

 graph LR
    plan(("Physical Plan"))
    storage[("Storage Layer")]
    processor["Query Processor"]
    plan -- execute --> processor
    storage -- fetch --> processor

Volcano/Iterator model

Volcano/iterator model is the query processing model that is widely used in many DBMS. It is a pipeline architecture, which means that the data is processed in stages, with each stage passing the output of the previous stage to the next stage.

Each stage in the pipeline is represented by an operator. Operators are functions that perform a specific operation on the data, such as selecting rows, filtering rows, or aggregating rows.

Usually, operator can be formed directly from the query plan. For example, the query

 SELECT field_1
FROM tbl
WHERE field = 1

will have the plan

 graph TD
  project["PROJECT: field_1"]
  scan["SCAN: tbl"]
  filter["FILTER: field = 1"]
  project --> scan
  filter --> project

will create a chain of operators like this:

 scan = {
    next() // fetch next row from table "tbl"
}

project = {
    next() = {
        next_row = scan.next() // fetch next row from scan operator
        projected = next_row["field_1"]
        return projected
    }
}

filter = {
    next() = {
        next_row = {}
        do {
            next_row = project.next() // fetch next row from project operator
        } while (next_row["field"] != 1)
        return next_row
    }
}

results = []
while (row = filter.next()) {
    results.append(row)
}

notes : this pseudo code did not handle for end of result stream

The operators

The basic interface of an operator is described as following:

 trait Operator {
  def next () : Option [ Seq [ Any ]]
}

See Operator.scala for full implementation of all operators

Testing a simple query

Let's define a query

 SELECT emp . id ,
       emp . code ,
       dept . dept_name ,
       emp_info . name ,
       emp_info . origin
FROM emp
         JOIN dept ON emp . id = dept . emp_id
         JOIN emp_info ON dept . emp_id = emp_info . id

with some data and stats

 val table1 : Datasource = Datasource (
  table = " emp " ,
  catalog = TableCatalog (
    Seq (
      " id " -> classOf [ String ],
      " code " -> classOf [ String ]
    ),
    metadata = Map ( " sorted " -> " true " ) // assumes rows are already sorted by id
  ),
  rows = Seq (
    Seq ( " 1 " , " Emp A " ),
    Seq ( " 2 " , " Emp B " ),
    Seq ( " 3 " , " Emp C " )
  ),
  stats = TableStats (
    estimatedRowCount = 3 ,
    avgColumnSize = Map ( " id " -> 10 , " code " -> 32 )
  )
)

val table2 : Datasource = Datasource (
  table = " dept " ,
  catalog = TableCatalog (
    Seq (
      " emp_id " -> classOf [ String ],
      " dept_name " -> classOf [ String ]
    ),
    metadata = Map ( " sorted " -> " true " ) // assumes rows are already sorted by emp_id (this is just a fake trait to demonstrate how trait works)
  ),
  rows = Seq (
    Seq ( " 1 " , " Dept 1 " ),
    Seq ( " 1 " , " Dept 2 " ),
    Seq ( " 2 " , " Dept 3 " ),
    Seq ( " 3 " , " Dept 3 " )
  ),
  stats = TableStats (
    estimatedRowCount = 4 ,
    avgColumnSize = Map ( " emp_id " -> 10 , " dept_name " -> 255 )
  )
)

val table3 : Datasource = Datasource (
  table = " emp_info " ,
  catalog = TableCatalog (
    Seq (
      " id " -> classOf [ String ],
      " name " -> classOf [ String ],
      " origin " -> classOf [ String ]
    ),
    metadata = Map ( " sorted " -> " true " ) // assumes rows are already sorted by id (this is just a fake trait to demonstrate how trait works)
  ),
  rows = Seq (
    Seq ( " 1 " , " AAAAA " , " Country A " ),
    Seq ( " 2 " , " BBBBB " , " Country A " ),
    Seq ( " 3 " , " CCCCC " , " Country B " )
  ),
  stats = TableStats (
    estimatedRowCount = 3 ,
    avgColumnSize = Map ( " id " -> 10 , " name " -> 255 , " origin " -> 255 )
  )
)

The cost model is optimized for CPU

 val costModel : CostModel = ( currentCost : Cost , newCost : Cost ) => {
  currentCost.estimatedCpuCost > newCost.estimatedCpuCost
}

Now, executing the query by running this code:

 val planner = new VolcanoPlanner
QueryParser .parse(query) match {
  case Left (err) => throw err
  case Right (parsed) =>
    val operator = planner.getPlan(parsed)
    val result = Utils .execute(operator)
    // print result
    println(result._1.mkString( " , " ))
    result._2.foreach(row => println(row.mkString( " , " )))
}

it will print:

 emp.id,emp.code,dept.dept_name,emp_info.name,emp_info.origin
1,Emp A,Dept 1,AAAAA,Country A
1,Emp A,Dept 2,AAAAA,Country A
2,Emp B,Dept 3,BBBBB,Country A
3,Emp C,Dept 3,CCCCC,Country B

Voila, We've done building a fully functional query planner and query engine :). You can start writing one for your own, good luck

See Demo.scala for full demo code

ขอบคุณ

Thanks for reading this, this guide is quite long, and not fully correct, but I've tried my best to write it as understandably as possible ?

ขยาย