Unduh query planner guide - Unduh Kode Sumber query planner guide

[Placeholder]

Perkenalan

Perencana kueri adalah komponen dari sistem manajemen basis data (DBMS) yang bertanggung jawab untuk menghasilkan rencana untuk melaksanakan kueri basis data. Rencana kueri menentukan langkah -langkah yang akan diambil DBMS untuk mengambil data yang diminta oleh kueri. Tujuan dari perencana kueri adalah untuk menghasilkan rencana yang seefisien mungkin, yang berarti bahwa ia akan mengembalikan data kepada pengguna secepat mungkin.

Perencana kueri adalah perangkat lunak yang rumit, dan mereka bisa sulit dipahami. Panduan untuk mengimplementasikan perencana kueri berbasis biaya ini akan memberi Anda gambaran langkah demi langkah proses, cara mengimplementasikan perencana kueri berbasis biaya Anda sendiri, sambil tetap mencakup konsep dasar perencana kueri.

Ditulis oleh AI, diedit oleh manusia

Audiens yang ditargetkan

Panduan ini ditulis untuk:

yang dulu bekerja dengan mesin kueri
yang penasaran, ingin membuat barang sendiri
Siapa yang ingin belajar barang -barang DB tetapi membenci matematika

Sasaran:

Dapat memahami dasar perencanaan kueri
Dapat menulis perencana kueri Anda sendiri

Arsitektur dasar mesin kueri

 grafik td
    pengguna ((pengguna))
    parser [kueri parser]
    Perencana [Perencana Permintaan]
    pelaksana [prosesor kueri]
    Pengguna -kueri teks -> parser
    Parser -ast -> perencana
    Perencana -Rencana Fisik -> Pelaksana

Arsitektur dasar mesin kueri terdiri dari komponen -komponen tersebut:

Parser kueri: Digunakan untuk menguraikan input kueri pengguna, biasanya dalam format teks yang dapat dibaca manusia (seperti SQL)
Perencana kueri: Digunakan untuk menghasilkan rencana/strategi untuk menjalankan kueri. Biasanya perencana kueri akan memilih rencana terbaik di antara beberapa rencana yang dihasilkan dari satu kueri
Prosesor kueri: Digunakan untuk menjalankan rencana kueri, yang merupakan output oleh perencana kueri

Jenis Perencana Kueri

Biasanya, perencana kueri dibagi menjadi 2 jenis:

perencana heuristik
perencana berbasis biaya

Perencana Heuristik adalah perencana kueri yang menggunakan aturan yang telah ditentukan sebelumnya untuk menghasilkan rencana kueri.

Perencana berbasis biaya adalah perencana kueri yang berdasarkan biaya untuk menghasilkan kueri, ia mencoba untuk menemukan rencana optimal berdasarkan biaya kueri input.

Sementara perencana heuristik biasanya menemukan rencana terbaik dengan menerapkan aturan transformasi jika ia tahu bahwa rencana yang diubah lebih baik, perencana berbasis biaya menemukan rencana terbaik dengan menyebutkan rencana yang setara dan mencoba menemukan rencana terbaik di antara mereka.

Perencana Kueri Berbasis Biaya

Dalam perencana kueri berbasis biaya, biasanya terdiri dari fase:

Rencanakan enumerasi
Optimalisasi permintaan

Dalam fase Enumerasi Rencana, perencana akan menyebutkan kemungkinan rencana yang setara.

Setelah itu, dalam fase optimasi kueri, perencana akan mencari rencana terbaik dari daftar rencana yang disebutkan. Rencana terbaik adalah rencana memiliki biaya terendah, yang didefinisikan oleh model biaya (atau fungsi biaya).

Karena Rencana Logis alami, memiliki struktur seperti pohon, sehingga Anda dapat berpikir optimasi/pencarian sebenarnya adalah masalah pencarian pohon. Dan ada banyak algoritma pencarian pohon di sini:

Pencarian lengkap, seperti pemrograman dinamis deterministik. Algoritma akan melakukan pencarian rencana terbaik sampai kondisi penghentian pencarian
Pencarian acak, seperti pencarian pohon acak. Algoritma akan melakukan pencarian rencana terbaik sampai kondisi penghentian pencarian

Catatan: Secara teori adalah mungkin untuk menggunakan segala jenis algoritma pencarian pohon. Namun, secara praktis tidak layak karena waktu pencarian meningkat ketika algoritma pencarian kami rumit

Catatan: Kondisi pemutusan pencarian biasanya adalah:

Cari Kelelahan (ketika tidak ada lagi rencana untuk dikunjungi)
Ambang batas biaya (ketika ditemukan rencana bahwa biaya lebih rendah dari ambang biaya yang ditentukan)
waktu (saat fase pencarian berjalan terlalu lama)

Perencana Permintaan Gunung Berapi

Perencana Permintaan Gunung Berapi (atau Generator Pengoptimal Gunung Berapi) adalah perencana kueri berbasis biaya

Volcano Planner menggunakan pendekatan pemrograman dinamis untuk menemukan rencana kueri terbaik dari daftar rencana yang disebutkan.

Detail: https://ieExplore.ieee.org/document/344061 (Saya terlalu malas untuk menjelaskan kertas di sini)

Berikut adalah penjelasan yang bagus: https://15721.courses.cs.cmu.edu/spring2017/slides/15-optimizer2.pdf#page=9

Menyusun perencana permintaan berbasis biaya kami

Perencana kueri kami, adalah perencana kueri berbasis biaya, mengikuti ide dasar perencana permintaan gunung berapi perencana kami akan terdiri dari 2 fase utama:

Fase Eksplorasi/Pencarian
Fase Implementasi/Optimalisasi

 grafik lr
    ast ((ast))
    Logical_plan [Plan]
    Explored_plans ["`
        Rencana #1
        ...
        Rencanakan #N
    `"]
    implementasi_plan ["Plan #x (Paket Terbaik)"]
    AST -Konversi ke Rencana Logis -> Logical_Plan
    Logical_plan -fase eksplorasi -> explored_plans
    Explored_plans -Fase Optimasi -> Implementasi_Plan
    Linkstyle 1,2 Warna: Oranye, Stroke: Oranye, Stroke-lebar: 5px

Glosarium

Rencana logis

Rencana logis adalah struktur data yang memegang abstraksi langkah transformasi yang diperlukan untuk menjalankan kueri.

Berikut adalah contoh rencana logis:

 grafik td
    1 ["Project tbl1.id, tbl1.field1, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"];
    2 ["bergabung"];
    3 ["pindai tbl1"];
    4 ["bergabung"];
    5 ["pindai tbl2"];
    6 ["pindai tbl3"];
    1 -> 2;
    2 -> 3;
    2 -> 4;
    4 -> 5;
    4 -> 6;

Rencana fisik

Sementara rencana logis hanya memegang abstraksi, rencana fisik adalah struktur data yang memegang detail implementasi. Setiap rencana logis akan memiliki banyak rencana fisik. Misalnya, gabungan logis mungkin memiliki banyak rencana fisik seperti hash gabungan, gabungan bergabung, menyiarkan gabung, dll.

Grup yang setara

Grup yang setara adalah kelompok ekspresi yang setara (yang untuk setiap ekspresi, rencana logisnya secara logis setara)

misalnya

 grafik td
    Grup Subgraph#8
        Expr#8 ["Pindai TBL2 (Field1, Field2, ID)"]
    akhir
    Grup Subgraph#2
        Expr#2 ["pindai tbl2"]
    akhir
    Grup Subgraph#11
        Expr#11 ["gabung"]
    akhir
    Expr#11 -> Group#7
    Expr#11 -> Group#10
    Grup Subgraph#5
        Expr#5 ["gabung"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Grup Subgraph#4
        Expr#4 ["gabung"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#7
        Expr#7 ["pindai tbl1 (id, field1)"]
    akhir
    Grup Subgraph#1
        Expr#1 ["Pindai TBL1"]
    akhir
    Grup Subgraph#10
        Expr#10 ["gabung"]
    akhir
    Expr#10 -> Group#8
    Expr#10 -> Group#9
    Grup Subgraph#9
        Expr#9 ["pindai tbl3 (id, field2)"]
    akhir
    Grup Subgraph#3
        Expr#3 ["Pindai TBL3"]
    akhir
    Grup Subgraph#6
        Expr#12 ["Project tbl1.id, tbl1.field1, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
        Expr#6 ["Project tbl1.id, tbl1.field1, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    akhir
    Expr#12 -> Group#11
    Expr#6 -> Group#5

Di sini kita dapat melihat Group#6 memiliki 2 ekspresi yang setara, yang keduanya mewakili kueri yang sama (satu melakukan pemindaian dari meja kemudian proyek, satu mendorong proyeksi ke node pemindaian).

Aturan transformasi

Aturan transformasi adalah aturan untuk mengubah dari satu rencana logis ke rencana logis setara logis lainnya

Misalnya, rencananya:

 grafik td
    1 ["Project tbl1.id, tbl1.field1, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"];
    2 ["bergabung"];
    3 ["pindai tbl1"];
    4 ["bergabung"];
    5 ["pindai tbl2"];
    6 ["pindai tbl3"];
    1 -> 2;
    2 -> 3;
    2 -> 4;
    4 -> 5;
    4 -> 6;

Saat menerapkan transformasi pushdown proyeksi, diubah menjadi:

 grafik td
    1 ["Proyek *. *"];
    2 ["bergabung"];
    3 ["pindai tbl1 (id, field1)"];
    4 ["bergabung"];
    5 ["pindai tbl2 (field1, field2)"];
    6 ["pindai Tbl3 (id, field2, field2)"];
    1 -> 2;
    2 -> 3;
    2 -> 4;
    4 -> 5;
    4 -> 6;

Aturan transformasi dapat dipengaruhi oleh sifat/sifat logis seperti skema tabel, statistik data, dll.

Aturan Implementasi

Aturan implementasi adalah aturan untuk mengembalikan rencana fisik yang diberikan rencana logis.

Aturan implementasi dapat dipengaruhi oleh sifat/sifat fisik seperti tata letak data (diurutkan atau tidak), dll.

Fase eksplorasi

Dalam fase eksplorasi, perencana akan menerapkan aturan transformasi, menghasilkan rencana logis yang setara

Misalnya, rencananya:

 grafik td
    1326583549 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"];
    -425111028 ["bergabung"];
    -349388609 ["pindai tbl1"];
    1343755644 ["bergabung"];
    -1043437086 ["pindai tbl2"];
    -1402686787 ["pindai tbl3"];
    1326583549 -> -425111028;
    -425111028 -> -349388609;
    -425111028 -> 1343755644;
    1343755644 -> -1043437086;
    1343755644 -> -1402686787;

Setelah menerapkan aturan transformasi, menghasilkan grafik berikut:

 grafik td
    Grup Subgraph#8
        Expr#8 ["Pindai TBL2 (ID, Field1, Field2)"]
    akhir
    Grup Subgraph#11
        Expr#11 ["gabung"]
        Expr#14 ["gabung"]
    akhir
    Expr#11 -> Group#7
    Expr#11 -> Group#10
    Expr#14 -> Group#8
    Expr#14 -> Group#12
    Grup Subgraph#2
        Expr#2 ["pindai tbl2"]
    akhir
    Grup Subgraph#5
        Expr#5 ["gabung"]
        Expr#16 ["gabung"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Expr#16 -> Group#2
    Expr#16 -> Group#13
    Grup Subgraph#4
        Expr#4 ["gabung"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#13
        Expr#15 ["gabung"]
    akhir
    Expr#15 -> Group#1
    Expr#15 -> Group#3
    Grup Subgraph#7
        Expr#7 ["pindai tbl1 (id, field1)"]
    akhir
    Grup Subgraph#1
        Expr#1 ["Pindai TBL1"]
    akhir
    Grup Subgraph#10
        Expr#10 ["gabung"]
    akhir
    Expr#10 -> Group#8
    Expr#10 -> Group#9
    Grup Subgraph#9
        Expr#9 ["pindai tbl3 (id, field2)"]
    akhir
    Grup Subgraph#3
        Expr#3 ["Pindai TBL3"]
    akhir
    Grup Subgraph#12
        Expr#13 ["Bergabung"]
    akhir
    Expr#13 -> Group#7
    Expr#13 -> Group#9
    Grup Subgraph#6
        Expr#12 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
        Expr#6 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    akhir
    Expr#12 -> Group#11
    Expr#6 -> Group#5

Di sini kita dapat melihat bahwa aturan pushdown proyeksi dan bergabung dengan aturan pemesanan ulang diterapkan.

Fase optimasi

Fase optimasi, adalah melintasi pohon yang diperluas dalam fase eksplorasi, untuk menemukan rencana terbaik untuk kueri kami.

"Sebenarnya" ini adalah optimasi pencarian pohon, sehingga Anda dapat menggunakan algoritma pencarian pohon apa pun yang dapat Anda bayangkan (tetapi Anda harus memastikan itu benar).

Berikut adalah contoh rencana fisik yang dihasilkan setelah fase optimasi:

 grafik td
    Grup#6 ["
    Grup #6
Dipilih: Project TBL1.ID, TBL1.Field1, TBL2.ID, TBL2.Field1, TBL2.Field2, TBL3.ID, TBL3.Field2, TBL3.Field2
Operator: ProjectOperator
Biaya: Biaya (CPU = 641400.00, mem = 1020400012.00, waktu = 1000000.00)
"]
Grup#6 -> Grup#11
Grup#11 ["
Grup #11
Dipilih: Bergabunglah
Operator: Hashjoinoperator
Biaya: Biaya (CPU = 641400.00, mem = 1020400012.00, waktu = 1000000.00)
"]
Grup#11 -> Grup#7
Grup#11 -> Grup#10
Grup#7 ["
Grup #7
Dipilih: Pindai TBL1 (ID, Field1)
Operator: NormalsCanoperator
Biaya: Biaya (CPU = 400.00, mem = 400000.00, waktu = 1000.00)
"]
Grup#10 ["
Grup #10
Dipilih: Bergabunglah
Operator: Mergejoinoperator
Ciri -ciri: disortir
Biaya: Biaya (CPU = 640000.00, MEM = 20000012.00, Waktu = 1100000.00)
"]
Grup#10 -> Group#8
Grup#10 -> Group#9
Grup#8 ["
Grup #8
Dipilih: Pindai TBL2 (ID, Field1, Field2)
Operator: NormalsCanoperator
Ciri -ciri: disortir
Biaya: Biaya (CPU = 600000.00, mem = 12.00, waktu = 1000000.00)
"]
Grup#9 ["
Grup #9
Dipilih: Pindai TBL3 (ID, Field2)
Operator: NormalsCanoperator
Ciri -ciri: disortir
Biaya: Biaya (CPU = 40000.00, MEM = 20000000.00, Waktu = 100000.00)
"]

Rencana yang dihasilkan telah menunjukkan rencana logis yang dipilih, perkiraan biaya, dan operator fisik

Mengoptimalkan/Pengakhiran Pencarian

Perencana kami akan melakukan pencarian kelelahan untuk menemukan rencana terbaik

Menyelam ke dalam kode

Karena kode perencana itu besar, jadi saya tidak akan menulis panduan langkah demi langkah, tetapi saya akan menjelaskan setiap bagian dari kode sebagai gantinya

Bahasa kueri

Di sini kita akan mendefinisikan bahasa kueri yang digunakan secara menyeluruh tutorial ini

 SELECT emp . id ,
       emp . code ,
       dept . dept_name ,
       emp_info . name ,
       emp_info . origin
FROM emp
         JOIN dept ON emp . id = dept . emp_id
         JOIN emp_info ON dept . emp_id = emp_info . id

Bahasa kueri yang akan kami terapkan adalah bahasa seperti SQL. Namun, demi kesederhanaan, kami akan membatasi fungsionalitas dan sintaksinya.

Bahasa muncul dalam bentuk

 SELECT tbl . field , [...]
FROM tbl JOIN [...]

Ini hanya akan mendukung untuk SELECT dan JOIN , juga bidang dalam pernyataan SELECT harus memenuhi syarat sepenuhnya (dalam bentuk table.field .

Ast

Pertama, kita harus mendefinisikan AST untuk bahasa kita. AST (atau pohon sintaksis abstrak) adalah pohon yang digunakan untuk mewakili struktur sintaksis suatu teks.

Karena bahasa kami sangat sederhana, kami hanya dapat mendefinisikan struktur AST dalam beberapa baris kode:

 sealed trait Identifier

case class TableID ( id : String ) extends Identifier

case class FieldID ( table : TableID , id : String ) extends Identifier

sealed trait Statement

case class Table ( table : TableID ) extends Statement

case class Join ( left : Statement , right : Statement , on : Seq [( FieldID , FieldID )]) extends Statement

case class Select ( fields : Seq [ FieldID ], from : Statement ) extends Statement

Misalnya, kueri

 SELECT tbl1 . id ,
       tbl1 . field1 ,
       tbl2 . id ,
       tbl2 . field1 ,
       tbl2 . field2 ,
       tbl3 . id ,
       tbl3 . field2 ,
       tbl3 . field2
FROM tbl1
         JOIN tbl2 ON tbl1 . id = tbl2 . id
         JOIN tbl3 ON tbl2 . id = tbl3 . id

dapat diwakili sebagai

 Select (
  Seq (
    FieldID ( TableID ( " tbl1 " ), " id " ),
    FieldID ( TableID ( " tbl1 " ), " field1 " ),
    FieldID ( TableID ( " tbl2 " ), " id " ),
    FieldID ( TableID ( " tbl2 " ), " field1 " ),
    FieldID ( TableID ( " tbl2 " ), " field2 " ),
    FieldID ( TableID ( " tbl3 " ), " id " ),
    FieldID ( TableID ( " tbl3 " ), " field2 " ),
    FieldID ( TableID ( " tbl3 " ), " field2 " )
  ),
  Join (
    Table ( TableID ( " tbl1 " )),
    Join (
      Table ( TableID ( " tbl2 " )),
      Table ( TableID ( " tbl3 " )),
      Seq (
        FieldID ( TableID ( " tbl2 " ), " id " ) -> FieldID ( TableID ( " tbl3 " ), " id " )
      )
    ),
    Seq (
      FieldID ( TableID ( " tbl1 " ), " id " ) -> FieldID ( TableID ( " tbl2 " ), " id " )
    )
  )
)

Parser kueri sederhana

Setelah mendefinisikan struktur AST, kita harus menulis parser kueri, yang digunakan untuk mengubah kueri teks menjadi bentuk AST.

Karena panduan ini menggunakan Scala untuk implementasi, kami akan memilih kombinator Scala-Parser untuk membuat parser kueri kami.

Kelas Parser Kueri:

 object QueryParser extends ParserWithCtx [ QueryExecutionContext , Statement ] with RegexParsers {

  override def parse ( in : String )( implicit ctx : QueryExecutionContext ) : Either [ Throwable , Statement ] = {
    Try (parseAll(statement, in) match {
      case Success (result, _) => Right (result)
      case NoSuccess (msg, _) => Left ( new Exception (msg))
    }) match {
      case util. Failure (ex) => Left (ex)
      case util. Success (value) => value
    }
  }

  private def select : Parser [ Select ] = ??? // we will implement it in later section

  private def statement : Parser [ Statement ] = select
}

Kemudian tentukan beberapa aturan parse:

 // common
private def str : Parser [ String ] = """ [a-zA-Z0-9_]+ """ .r
private def fqdnStr : Parser [ String ] = """ [a-zA-Z0-9_]+.[a-zA-Z0-9_]+ """ .r

// identifier
private def tableId : Parser [ TableID ] = str ^^ (s => TableID (s))

private def fieldId : Parser [ FieldID ] = fqdnStr ^^ { s =>
  val identifiers = s.split( '.' )
  if (identifiers.length != 2 ) {
    throw new Exception ( " should never happen " )
  } else {
    val table = identifiers.head
    val field = identifiers( 1 )
    FieldID ( TableID (table), field)
  }
}

Berikut adalah dua aturan, yang digunakan untuk menguraikan pengidentifikasi: TableID dan FieldID .

ID tabel (atau nama tabel) biasanya hanya berisi karakter, angka, dan garis bawah ( _ ), jadi kami akan menggunakan regex sederhana [a-zA-Z0-9_]+ untuk mengidentifikasi nama tabel.

Di sisi lain, ID lapangan (untuk kualifikasi lapangan) dalam bahasa kami adalah nama lapangan yang sepenuhnya berkualitas. Biasanya itu dalam bentuk table.field , dan nama lapangan juga biasanya hanya berisi karakter, angka, dan garis bawah, jadi kita akan menggunakan regex [a-zA-Z0-9_]+.[a-zA-Z0-9_]+ untuk parser nama lapangan.

Setelah mendefinisikan aturan untuk parsing pengidentifikasi, kita sekarang dapat mendefinisikan aturan untuk menguraikan pernyataan kueri:

 // statement
private def table : Parser [ Table ] = tableId ^^ (t => Table (t))
private def subQuery : Parser [ Statement ] = " ( " ~> select <~ " ) "

Aturan table adalah aturan sederhana, itu hanya membuat node Table dengan menggunakan TableID parsed dari aturan tableId .

subQuery , adalah aturan untuk menguraikan sub-kuerinya. Di SQL, kita dapat menulis kueri yang terlihat seperti ini:

 SELECT a
FROM ( SELECT b FROM c) d

SELECT b FROM c adalah sub-query dalam pernyataan di atas. Di sini, dalam bahasa kueri sederhana kami, kami akan menunjukkan pernyataan adalah sub-kueri jika tertutup oleh sepasang tanda kurung ( () ). Karena bahasa kami hanya memiliki pernyataan pilih, kami dapat menulis aturan parse sebagai berikut:

 def subQuery : Parser [ Statement ] = " ( " ~> select <~ " ) "

Sekarang kita akan menentukan aturan parse untuk pernyataan pilih:

 private def fromSource : Parser [ Statement ] = table ||| subQuery

private def select : Parser [ Select ] =
  " SELECT " ~ rep1sep(fieldId, " , " ) ~ " FROM " ~ fromSource ~ rep(
    " JOIN " ~ fromSource ~ " ON " ~ rep1(fieldId ~ " = " ~ fieldId)
  ) ^^ {
    case _ ~ fields ~ _ ~ src ~ joins =>
      val p = if (joins.nonEmpty) {
        def chain ( left : Statement , right : Seq [( Statement , Seq [( FieldID , FieldID )])]) : Join = {
          if (right.isEmpty) {
            throw new Exception ( " should never happen " )
          } else if (right.length == 1 ) {
            val next = right.head
            Join (left, next._1, next._2)
          } else {
            val next = right.head
            Join (left, chain(next._1, right.tail), next._2)
          }
        }

        val temp = joins.map { join =>
          val statement = join._1._1._2
          val joinOn = join._2.map(on => on._1._1 -> on._2)
          statement -> joinOn
        }
        chain(src, temp)
      } else {
        src
      }
      Select (fields, p)
  }

Di SQL, kita dapat menggunakan sub-query sebagai sumber gabungan. Misalnya:

 SELECT * . *
FROM tbl1
    JOIN ( SELECT * . * FROM tbl2)
    JOIN tbl3

Jadi parser kami juga akan mengimplementasikan aturan untuk menguraikan sub-kuerinya di bagian gabungan dari pernyataan itu, itulah sebabnya kami memiliki aturan parse:

 " SELECT " ~ rep1sep(fieldId, " , " ) ~ " FROM " ~ fromSource ~ rep( " JOIN " ~ fromSource ~ " ON " ~ rep1(fieldId ~ " = " ~ fieldId)

Lihat queryparser.scala untuk implementasi penuh

Menguji parser kueri kami

Lihat queryparsspec.scala

Rencana logis

Setelah menghasilkan AST dari kueri teks, kita dapat secara langsung mengonversinya ke rencana logis

Pertama, mari kita tentukan antarmuka untuk rencana logis kita:

 sealed trait LogicalPlan {
  def children () : Seq [ LogicalPlan ]
}

children adalah daftar rencana logis anak. Misalnya:

 grafik td
    1326583549 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"];
    -425111028 ["bergabung"];
    -349388609 ["pindai tbl1"];
    1343755644 ["bergabung"];
    -1043437086 ["pindai tbl2"];
    -1402686787 ["pindai tbl3"];
    1326583549 -> -425111028;
    -425111028 -> -349388609;
    -425111028 -> 1343755644;
    1343755644 -> -1043437086;
    1343755644 -> -1402686787;

Node anak dari node PROJECT adalah simpul JOIN pertama. Node JOIN pertama memiliki 2 anak, yang merupakan simpul JOIN kedua dan SCAN tbl1 . Segera, ...

Karena bahasa kueri kami sederhana, kami hanya membutuhkan 3 jenis node logis:

Proyek: Mewakili operator proyeksi dalam hubungan aljabar
BERGABUNG: mewakili gabungan logis
Pemindaian: mewakili pemindaian meja

 case class Scan ( table : ql. TableID , projection : Seq [ String ]) extends LogicalPlan {
  override def children () : Seq [ LogicalPlan ] = Seq .empty
}

case class Project ( fields : Seq [ql. FieldID ], child : LogicalPlan ) extends LogicalPlan {
  override def children () : Seq [ LogicalPlan ] = Seq (child)
}

case class Join ( left : LogicalPlan , right : LogicalPlan , on : Seq [(ql. FieldID , ql. FieldID )]) extends LogicalPlan {
  override def children () : Seq [ LogicalPlan ] = Seq (left, right)
}

Kemudian kita dapat menulis fungsi untuk mengubah AST menjadi rencana logis:

 def toPlan ( node : ql. Statement ) : LogicalPlan = {
  node match {
    case ql. Table (table) => Scan (table, Seq .empty)
    case ql. Join (left, right, on) => Join (toPlan(left), toPlan(right), on)
    case ql. Select (fields, from) => Project (fields, toPlan(from))
  }
}

Lihat LogicalPlan.Scala untuk implementasi penuh

Kelompok yang setara

Kelompok

Kami dapat mendefinisikan kelas untuk grup sebagai berikut:

 case class Group (
                  id : Long ,
                  equivalents : mutable. HashSet [ GroupExpression ]
                ) {
  val explorationMark : ExplorationMark = new ExplorationMark
  var implementation : Option [ GroupImplementation ] = None
}

case class GroupExpression (
                            id : Long ,
                            plan : LogicalPlan ,
                            children : mutable. MutableList [ Group ]
                          ) {
  val explorationMark : ExplorationMark = new ExplorationMark
  val appliedTransformations : mutable. HashSet [ TransformationRule ] = mutable. HashSet ()
}

Group adalah himpunan rencana yang setara secara logis.

Setiap GroupExpression mewakili simpul rencana logis. Karena kami telah mendefinisikan node rencana logis akan memiliki daftar node anak (di bagian sebelumnya), dan GroupExpression mewakili node rencana logis, dan Group tersebut mewakili serangkaian rencana yang setara, sehingga anak -anak dari GroupExpression adalah daftar Group

misalnya

 grafik td
    Grup Subgraph#8
        Expr#8
    akhir
    Grup Subgraph#2
        Expr#2
    akhir
    Grup Subgraph#11
        Expr#11
    akhir
    Expr#11 -> Group#7
    Expr#11 -> Group#10
    Grup Subgraph#5
        Expr#5
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Grup Subgraph#4
        Expr#4
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#7
        Expr#7
    akhir
    Grup Subgraph#1
        Expr#1
    akhir
    Grup Subgraph#10
        Expr#10
    akhir
    Expr#10 -> Group#8
    Expr#10 -> Group#9
    Grup Subgraph#9
        Expr#9
    akhir
    Grup Subgraph#3
        Expr#3
    akhir
    Grup Subgraph#6
        Expr#12
        Expr#6
    akhir
    Expr#12 -> Group#11
    Expr#6 -> Group#5

Seperti yang dapat kita lihat di sini, Group#6 memiliki 2 ekspresi yang setara: Expr#12 dan Expr#6 , dan anak -anak dari Expr#12 adalah Group#11

Catatan: Kami akan menerapkan beberapa transformasi putaran dalam fase eksplorasi, jadi untuk setiap Group dan GroupExpression , kami memiliki indikasi tanda ExplorationMark status eksplorasi.

 class ExplorationMark {
  private var bits : Long = 0

  def get : Long = bits

  def isExplored ( round : Int ) : Boolean = BitUtils .getBit(bits, round)

  def markExplored ( round : Int ) : Unit = bits = BitUtils .setBit(bits, round, on = true )

  def markUnexplored ( round : Int ) : Unit = bits = BitUtils .setBit(bits, round, on = false )
}

ExplorationMark hanyalah kelas pembungkus bitset, itu akan menandai bit i-th sebagai 1 jika ronde I-th dieksplorasi, tandai sebagai 0 sebaliknya.

ExplorationMark juga dapat digunakan untuk memvisualisasikan transformasi yang tepat, lihat visualisasi untuk detail lebih lanjut

Memo

Memo adalah sekelompok pembantu untuk membantu membangun kelompok yang setara. Memo terdiri dari beberapa hashmap untuk menyimpan ekspresi grup dan grup, juga menyediakan metode untuk mendaftarkan ekspresi grup atau grup baru.

 class Memo (
            groupIdGenerator : Generator [ Long ] = new LongGenerator ,
            groupExpressionIdGenerator : Generator [ Long ] = new LongGenerator
          ) {
  val groups : mutable. HashMap [ Long , Group ] = mutable. HashMap [ Long , Group ]()
  val parents : mutable. HashMap [ Long , Group ] = mutable. HashMap [ Long , Group ]() // lookup group from group expression ID
  val groupExpressions : mutable. HashMap [ LogicalPlan , GroupExpression ] = mutable. HashMap [ LogicalPlan , GroupExpression ]()

  def getOrCreateGroupExpression ( plan : LogicalPlan ) : GroupExpression = {
    val children = plan.children()
    val childGroups = children.map(child => getOrCreateGroup(child))
    groupExpressions.get(plan) match {
      case Some (found) => found
      case None =>
        val id = groupExpressionIdGenerator.generate()
        val children = mutable. MutableList () ++ childGroups
        val expression = GroupExpression (
          id = id,
          plan = plan,
          children = children
        )
        groupExpressions += plan -> expression
        expression
    }
  }

  def getOrCreateGroup ( plan : LogicalPlan ) : Group = {
    val exprGroup = getOrCreateGroupExpression(plan)
    val group = parents.get(exprGroup.id) match {
      case Some (group) =>
        group.equivalents += exprGroup
        group
      case None =>
        val id = groupIdGenerator.generate()
        val equivalents = mutable. HashSet () + exprGroup
        val group = Group (
          id = id,
          equivalents = equivalents
        )
        groups.put(id, group)
        group
    }
    parents += exprGroup.id -> group
    group
  }
}

Lihat Memo.scala untuk implementasi penuh

Inisialisasi

Langkah pertama di dalam perencana, adalah inisialisasi

 grafik lr
    kueri ((kueri))
    ast ((ast))
    root_plan ((rootplan))
    root_group ((rootgroup))
    kueri -"queryparser.parse (kueri)" -> ast
    AST -"Logicalplan.toplan (ast)" -> root_plan
    root_plan -"memo.getorcreateGroup (rootplan)" -> root_group

Pertama, kueri akan diuraikan ke AST. Kemudian dikonversi ke rencana logis, yang disebut root plan , kemudian menginisialisasi grup dari root plan , yang disebut root group .

 def initialize ( query : Statement )( implicit ctx : VolcanoPlannerContext ) : Unit = {
  ctx.query = query
  ctx.rootPlan = LogicalPlan .toPlan(ctx.query)
  ctx.rootGroup = ctx.memo.getOrCreateGroup(ctx.rootPlan)
  // assuming this is first the exploration round,
  // by marking the initialRound(0) as explored,
  // it will be easier to visualize the different between rounds (added nodes, add connections)
  ctx.memo.groups.values.foreach(_.explorationMark.markExplored(initialRound))
  ctx.memo.groupExpressions.values.foreach(_.explorationMark.markExplored(initialRound))
}

Lihat volcanoplanner.scala untuk lebih jelasnya

Misalnya, kueri:

 SELECT tbl1 . id ,
       tbl1 . field1 ,
       tbl2 . id ,
       tbl2 . field1 ,
       tbl2 . field2 ,
       tbl3 . id ,
       tbl3 . field2 ,
       tbl3 . field2
FROM tbl1
         JOIN tbl2 ON tbl1 . id = tbl2 . id
         JOIN tbl3 ON tbl2 . id = tbl3 . id

Setelah inisialisasi, kelompok akan terlihat seperti ini:

 grafik td
    Grup Subgraph#2
        Expr#2 ["pindai tbl2"]
    akhir
    Grup Subgraph#5
        Expr#5 ["gabung"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Grup Subgraph#4
        Expr#4 ["gabung"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#1
        Expr#1 ["Pindai TBL1"]
    akhir
    Grup Subgraph#3
        Expr#3 ["Pindai TBL3"]
    akhir
    Grup Subgraph#6
        Expr#6 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    akhir
    Expr#6 -> Group#5

Di sini Anda dapat melihatnya, setiap kelompok memiliki tepat satu ekspresi yang setara

Fase eksplorasi

Setelah inisialisasi, sekarang adalah fase eksplorasi, yang akan mengeksplorasi semua kemungkinan rencana yang setara.

Metode eksplorasi cukup sederhana:

Untuk setiap grup, terapkan aturan transformasi untuk menemukan semua ekspresi grup yang setara dan menambah set yang setara sampai kami tidak dapat menemukan rencana yang setara baru
Untuk setiap ekspresi kelompok, jelajahi semua kelompok anak

Aturan transformasi

Sebelum menyelam ke kode eksplorasi, mari kita bicara tentang aturan transformasi terlebih dahulu.

Aturan transformasi adalah aturan yang digunakan untuk mengubah rencana logis ke rencana logis setara lainnya jika cocok dengan kondisi aturan.

Berikut adalah antarmuka aturan transformasi:

 trait TransformationRule {
  def `match` ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : Boolean

  def transform ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : GroupExpression
}

Karena rencana logis adalah struktur data seperti pohon, sehingga implementasi match aturan transformasi adalah pencocokan pola pada pohon.

Misalnya, berikut adalah match yang digunakan untuk mencocokkan node proyek sementara juga periksa apakah keturunan yang berisi gabungan dan pindai saja:

 override def `match` ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : Boolean = {
  val plan = expression.plan
  plan match {
    case Project (_, child) => check(child)
    case _ => false
  }
}

// check if the tree only contains SCAN and JOIN nodes
private def check ( node : LogicalPlan ) : Boolean = {
  node match {
    case Scan (_, _) => true
    case Join (left, right, _) => check(left) && check(right)
    case _ => false
  }
}

Rencana ini "cocok":

 grafik td
    Grup Subgraph#2
        Expr#2 ["scan"]
    akhir
    Grup Subgraph#5
        Expr#5 ["gabung"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Grup Subgraph#4
        Expr#4 ["gabung"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#1
        Expr#1 ["scan"]
    akhir
    Grup Subgraph#3
        Expr#3 ["scan"]
    akhir
    Grup Subgraph#6
        Expr#6 ["Proyek"]
    akhir
    Expr#6 -> Group#5

Meskipun rencana ini bukan:

 grafik td
    Grup Subgraph#2
        Expr#2 ["scan"]
    akhir
    Grup Subgraph#5
        Expr#5 ["gabung"]
    akhir
    Expr#5 -> Group#3
    Expr#5 -> Group#4
    Grup Subgraph#4
        Expr#4 ["scan"]
    akhir
    Grup Subgraph#7
        Expr#7 ["Proyek"]
    akhir
    Expr#7 -> Group#6
    Grup Subgraph#1
        Expr#1 ["scan"]
    akhir
    Grup Subgraph#3
        Expr#3 ["Proyek"]
    akhir
    Expr#3 -> Group#2
    Grup Subgraph#6
        Expr#6 ["gabung"]
    akhir
    Expr#6 -> Group#1
    Expr#6 -> Group#5

Rencanakan enumerasi

Seperti yang telah kami katakan sebelumnya, metode eksplorasi adalah:

Untuk setiap grup, terapkan aturan transformasi untuk menemukan semua ekspresi grup yang setara dan menambah set yang setara sampai kami tidak dapat menemukan rencana yang setara baru
Untuk setiap ekspresi kelompok, jelajahi semua kelompok anak

Dan di sini adalah kode eksplorasi (cukup sederhana, ya):

 private def exploreGroup (
                          group : Group ,
                          rules : Seq [ TransformationRule ],
                          round : Int
                        )( implicit ctx : VolcanoPlannerContext ) : Unit = {
  while ( ! group.explorationMark.isExplored(round)) {
    group.explorationMark.markExplored(round)
    // explore all child groups
    group.equivalents.foreach { equivalent =>
      if ( ! equivalent.explorationMark.isExplored(round)) {
        equivalent.explorationMark.markExplored(round)
        equivalent.children.foreach { child =>
          exploreGroup(child, rules, round)
          if (equivalent.explorationMark.isExplored(round) && child.explorationMark.isExplored(round)) {
            equivalent.explorationMark.markExplored(round)
          } else {
            equivalent.explorationMark.markUnexplored(round)
          }
        }
      }
      // fire transformation rules to explore all the possible transformations
      rules.foreach { rule =>
        if ( ! equivalent.appliedTransformations.contains(rule) && rule.`match`(equivalent)) {
          val transformed = rule.transform(equivalent)
          if ( ! group.equivalents.contains(transformed)) {
            group.equivalents += transformed
            transformed.explorationMark.markUnexplored(round)
            group.explorationMark.markUnexplored(round)
          }
        }
      }
      if (group.explorationMark.isExplored(round) && equivalent.explorationMark.isExplored(round)) {
        group.explorationMark.markExplored(round)
      } else {
        group.explorationMark.markUnexplored(round)
      }
    }
  }
}

Lihat volcanoplanner.scala untuk lebih jelasnya

Menerapkan beberapa aturan transformasi

Sekarang saatnya menerapkan beberapa aturan transformasi

Pushdown proyeksi

Proyeksi Pushdown adalah aturan transformasi sederhana, yang digunakan untuk mendorong proyeksi ke lapisan penyimpanan.

Misalnya, kueri

 SELECT field1, field2
from tbl

memiliki rencananya

 grafik lr
    Proyek [Proyek Field1, Field2]
    Pindai [pindai TBL]
    Proyek -> Pindai

Dengan rencana ini, saat mengeksekusi, baris dari lapisan penyimpanan (di bawah pemindaian) akan sepenuhnya diambil, dan kemudian bidang yang tidak perlu akan dijatuhkan (proyek). Data yang tidak perlu masih harus beralih dari node pemindaian ke node proyek, jadi ada beberapa upaya yang terbuang di sini.

Kita dapat membuatnya lebih baik dengan hanya memberi tahu lapisan penyimpanan hanya mengambil bidang yang diperlukan. Sekarang rencananya akan diubah menjadi:

 grafik lr
    Proyek [Proyek Field1, Field2]
    Pindai ["Pindai TBL (Field1, Field2)"]
    Proyek -> Pindai

Ayo masuk ke kode:

 override def `match` ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : Boolean = {
  val plan = expression.plan
  plan match {
    case Project (_, child) => check(child)
    case _ => false
  }
}

// check if the tree only contains SCAN and JOIN nodes
private def check ( node : LogicalPlan ) : Boolean = {
  node match {
    case Scan (_, _) => true
    case Join (left, right, _) => check(left) && check(right)
    case _ => false
  }
}

Aturan pushdown proyeksi kami di sini, akan cocok dengan rencana ketika itu adalah simpul proyek, dan semua keturunannya dipindai dan hanya bergabung dengan simpul.

Catatan: Sebenarnya pertandingan proyeksi proyeksi sebenarnya lebih kompleks, tetapi demi kesederhanaan, aturan pertandingan di sini hanya node proyek dengan pemindaian dan bergabung dengan keturunan

Dan inilah kode transformasi:

 override def transform ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : GroupExpression = {
  val plan = expression.plan. asInstanceOf [ Project ]
  val pushDownProjection = mutable. ListBuffer [ FieldID ]()
  extractProjections(plan, pushDownProjection)
  val newPlan = Project (plan.fields, pushDown(pushDownProjection.distinct, plan.child))
  ctx.memo.getOrCreateGroupExpression(newPlan)
}

private def extractProjections ( node : LogicalPlan , buffer : mutable. ListBuffer [ FieldID ]) : Unit = {
  node match {
    case Scan (_, _) => () : Unit
    case Project (fields, parent) =>
      buffer ++= fields
      extractProjections(parent, buffer)
    case Join (left, right, on) =>
      buffer ++= on.map(_._1) ++ on.map(_._2)
      extractProjections(left, buffer)
      extractProjections(right, buffer)
  }
}

private def pushDown ( pushDownProjection : Seq [ FieldID ], node : LogicalPlan ) : LogicalPlan = {
  node match {
    case Scan (table, tableProjection) =>
      val filteredPushDownProjection = pushDownProjection.filter(_.table == table).map(_.id)
      val updatedProjection =
        if (filteredPushDownProjection.contains( " * " ) || filteredPushDownProjection.contains( " *.* " )) {
          Seq .empty
        } else {
          (tableProjection ++ filteredPushDownProjection).distinct
        }
      Scan (table, updatedProjection)
    case Join (left, right, on) => Join (pushDown(pushDownProjection, left), pushDown(pushDownProjection, right), on)
    case _ => throw new Exception ( " should never happen " )
  }
}

Kode transformasi pertama -tama akan menemukan semua proyeksi dari node proyek root, dan kemudian mendorongnya ke semua node pemindaian di bawahnya.

Memvisualisasikan aturan kami, misalnya, rencana tersebut

 grafik td
    Grup Subgraph#2
        Expr#2 ["pindai tbl2"]
    akhir
    Grup Subgraph#5
        Expr#5 ["gabung"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Grup Subgraph#4
        Expr#4 ["gabung"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#1
        Expr#1 ["Pindai TBL1"]
    akhir
    Grup Subgraph#3
        Expr#3 ["Pindai TBL3"]
    akhir
    Grup Subgraph#6
        Expr#6 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    akhir
    Expr#6 -> Group#5

Setelah menerapkan transformasi pushdown proyeksi, akan menghasilkan rencana setara baru dengan proyeksi didorong ke bawah ke operasi pemindaian (rencana baru adalah pohon dengan node perbatasan oranye).

 grafik td
    Grup Subgraph#8
        Expr#8 ["Pindai TBL2 (ID, Field1, Field2)"]
    akhir
    Grup Subgraph#2
        Expr#2 ["pindai tbl2"]
    akhir
    Grup Subgraph#11
        Expr#11 ["gabung"]
    akhir
    Expr#11 -> Group#7
    Expr#11 -> Group#10
    Grup Subgraph#5
        Expr#5 ["gabung"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Grup Subgraph#4
        Expr#4 ["gabung"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#7
        Expr#7 ["pindai tbl1 (id, field1)"]
    akhir
    Grup Subgraph#10
        Expr#10 ["gabung"]
    akhir
    Expr#10 -> Group#8
    Expr#10 -> Group#9
    Grup Subgraph#1
        Expr#1 ["Pindai TBL1"]
    akhir
    Grup Subgraph#9
        Expr#9 ["pindai tbl3 (id, field2)"]
    akhir
    Grup Subgraph#3
        Expr#3 ["Pindai TBL3"]
    akhir
    Grup Subgraph#6
        Expr#12 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
        Expr#6 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    akhir
    Expr#12 -> Group#11
    Expr#6 -> Group#5
    Style Expr#12 Stroke-Width: 4px, Stroke: Orange
    Style expr#8 Stroke-lebar: 4px, stroke: oranye
    Style expr#10 Stroke-lebar: 4px, stroke: oranye
    Style expr#9 Stroke-lebar: 4px, stroke: oranye
    Style expr#11 Stroke-lebar: 4px, stroke: oranye
    Style expr#7 Stroke-lebar: 4px, stroke: oranye
    LinkStyle 0 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 1 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 6 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 7 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 8 Stroke-Width: 4px, Stroke: Orange

Lihat ProyectionPushdown.scala untuk implementasi penuh

Bergabunglah dengan Reorder

Bergabunglah dengan Reorder juga merupakan salah satu transformasi yang paling dikenal di dunia perencana kueri. Perencana kami, juga akan menerapkan aturan transformasi ulang.

Karena bergabung dengan Reorder di dunia nyata bukanlah bagian yang mudah untuk diterapkan. Jadi kami akan menerapkan versi rip-off aturan ulang yang sederhana dan rip-off di sini.

Pertama, aturan match :

 // check if the tree only contains SCAN and JOIN nodes, and also extract all SCAN nodes and JOIN conditions
private def checkAndExtract (
                             node : LogicalPlan ,
                             buffer : mutable. ListBuffer [ Scan ],
                             joinCondBuffer : mutable. ListBuffer [(ql. FieldID , ql. FieldID )]
                           ) : Boolean = {
  node match {
    case node @ Scan (_, _) =>
      buffer += node
      true
    case Join (left, right, on) =>
      joinCondBuffer ++= on
      checkAndExtract(left, buffer, joinCondBuffer) && checkAndExtract(right, buffer, joinCondBuffer)
    case _ => false
  }
}

private def buildInterchangeableJoinCond ( conditions : Seq [(ql. FieldID , ql. FieldID )]) : Seq [ Seq [ql. FieldID ]] = {
  val buffer = mutable. ListBuffer [mutable. Set [ql. FieldID ]]()
  conditions.foreach { cond =>
    val set = buffer.find { set =>
      set.contains(cond._1) || set.contains(cond._2)
    } match {
      case Some (set) => set
      case None =>
        val set = mutable. Set [ql. FieldID ]()
        buffer += set
        set
    }
    set += cond._1
    set += cond._2
  }
  buffer.map(_.toSeq)
}

override def `match` ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : Boolean = {
  val plan = expression.plan
  plan match {
    case node @ Join (_, _, _) =>
      val buffer = mutable. ListBuffer [ Scan ]()
      val joinCondBuffer = mutable. ListBuffer [(ql. FieldID , ql. FieldID )]()
      if (checkAndExtract(node, buffer, joinCondBuffer)) {
        // only match if the join is 3 tables join
        if (buffer.size == 3 ) {
          var check = true
          val interChangeableCond = buildInterchangeableJoinCond(joinCondBuffer)
          interChangeableCond.foreach { c =>
            check &= c.size == 3
          }
          check
        } else {
          false
        }
      } else {
        false
      }
    case _ => false
  }
}

Aturan kami hanya akan dicocokkan, jika kami cocok dengan gabungan 3 arah (jumlah tabel yang terlibat harus 3, dan kondisi gabungan harus 3 arah, seperti tbl1.field1 = tbl2.field2 = tbl3.field3 )

Misalnya,

tbl1
    JOIN tbl2 ON tbl1 . field1 = tbl2 . field2
    JOIN tbl3 ON tbl1 . field1 = tbl3 . field3

Pernyataan gabungan di sini akan "dicocokkan" karena ini adalah 3 arah gabungan (ini adalah gabungan antara tbl1 , tbl2 , tbl3 , dan kondisinya tbl1.field1 = tbl2.field2 = tbl3.field3 )

Selanjutnya, adalah kode transformasi:

 override def transform ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : GroupExpression = {
  val plan = expression.plan. asInstanceOf [ Join ]
  val buffer = mutable. ListBuffer [ Scan ]()
  val joinCondBuffer = mutable. ListBuffer [(ql. FieldID , ql. FieldID )]()
  checkAndExtract(plan, buffer, joinCondBuffer)
  val interChangeableCond = buildInterchangeableJoinCond(joinCondBuffer)
  //
  val scans = buffer.toList
  implicit val ord : Ordering [ Scan ] = new Ordering [ Scan ] {
    override def compare ( x : Scan , y : Scan ) : Int = {
      val xStats = ctx.statsProvider.tableStats(x.table.id)
      val yStats = ctx.statsProvider.tableStats(y.table.id)
      xStats.estimatedTableSize.compareTo(yStats.estimatedTableSize)
    }
  }

  def getJoinCond ( left : Scan , right : Scan ) : Seq [(ql. FieldID , ql. FieldID )] = {
    val leftFields = interChangeableCond.flatMap { c =>
      c.filter(p => p.table == left.table)
    }
    val rightFields = interChangeableCond.flatMap { c =>
      c.filter(p => p.table == right.table)
    }
    if (leftFields.length != rightFields.length) {
      throw new Exception ( s " leftFields.length( ${leftFields.length} ) != rightFields.length( ${rightFields.length} ) " )
    } else {
      leftFields zip rightFields
    }
  }

  val sorted = scans.sorted
  val newPlan = Join (
    sorted( 0 ),
    Join (
      sorted( 1 ),
      sorted( 2 ),
      getJoinCond(sorted( 1 ), sorted( 2 ))
    ),
    getJoinCond(sorted( 0 ), sorted( 1 ))
  )
  ctx.memo.getOrCreateGroupExpression(newPlan)
}

Kode transformasi di sini, akan memesan ulang tabel berdasarkan ukurannya yang diperkirakan.

Misalnya, jika kami memiliki 3 Tabel A, B, C dengan perkiraan ukuran 300B, 100B, 200B dan pernyataan gabungan A JOIN B JOIN C , maka itu akan diubah menjadi B JOIN C JOIN A

Catatan: Anda mungkin memperhatikan dalam kode ini, kami telah menggunakan statistik tabel, untuk memberikan petunjuk untuk mengubah rencana. Secara praktis, perencana dapat menggunakan semua jenis statistik untuk membantu transformasi seperti ukuran tabel, ukuran baris, jumlah nol, histogram, dll.

Memvisualisasikan aturan kami, misalnya, rencana tersebut

 grafik td
    Grup Subgraph#2
        Expr#2 ["pindai tbl2"]
    akhir
    Grup Subgraph#5
        Expr#5 ["gabung"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Grup Subgraph#4
        Expr#4 ["gabung"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#1
        Expr#1 ["Pindai TBL1"]
    akhir
    Grup Subgraph#3
        Expr#3 ["Pindai TBL3"]
    akhir
    Grup Subgraph#6
        Expr#6 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    akhir
    Expr#6 -> Group#5

Setelah bergabung dengan transformasi ulang, hasilnya

 grafik td
    Grup Subgraph#2
        Expr#2 ["pindai tbl2"]
    akhir
    Grup Subgraph#5
        Expr#5 ["gabung"]
        Expr#8 ["gabung"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Expr#8 -> Group#2
    Expr#8 -> Group#7
    Grup Subgraph#4
        Expr#4 ["gabung"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#7
        Expr#7 ["gabung"]
    akhir
    Expr#7 -> Group#1
    Expr#7 -> Group#3
    Grup Subgraph#1
        Expr#1 ["Pindai TBL1"]
    akhir
    Grup Subgraph#3
        Expr#3 ["Pindai TBL3"]
    akhir
    Grup Subgraph#6
        Expr#6 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    akhir
    Expr#6 -> Group#5
    Style expr#8 Stroke-lebar: 4px, stroke: oranye
    Style expr#7 Stroke-lebar: 4px, stroke: oranye
    Linkstyle 2 Stroke-lebar: 4px, stroke: oranye
    LinkStyle 6 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 3 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 7 Stroke-lebar: 4px, Stroke: Orange

Kita dapat melihat bahwa tbl2 JOIN tbl1 JOIN tbl3 dibuat dari tbl1 JOIN tbl2 JOIN tbl3 dihasilkan oleh transformasi (node dan tepi yang baru ditambahkan ditunjukkan oleh garis oranye)

Lihat x3TableJoinRoDerBysize.scala untuk implementasi penuh

Menyatukan semua transformasi

Sekarang kita bisa menempatkan aturan transformasi kita di satu tempat

 private val transformationRules : Seq [ Seq [ TransformationRule ]] = Seq (
  Seq ( new ProjectionPushDown ),
  Seq ( new X3TableJoinReorderBySize )
)

Dan jalankan mereka untuk menjelajahi kelompok yang setara

 for (r <- transformationRules.indices) {
  exploreGroup(ctx.rootGroup, transformationRules(r), r + 1 )
}

Misalnya, rencananya

 grafik td
    Grup Subgraph#2
        Expr#2 ["pindai tbl2"]
    akhir
    Grup Subgraph#5
        Expr#5 ["gabung"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Grup Subgraph#4
        Expr#4 ["gabung"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#1
        Expr#1 ["Pindai TBL1"]
    akhir
    Grup Subgraph#3
        Expr#3 ["Pindai TBL3"]
    akhir
    Grup Subgraph#6
        Expr#6 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    akhir
    Expr#6 -> Group#5

Setelah dieksplorasi, akan menghasilkan grafik ini

 grafik td
    Grup Subgraph#8
        Expr#8 ["Pindai TBL2 (ID, Field1, Field2)"]
    akhir
    Grup Subgraph#11
        Expr#11 ["gabung"]
        Expr#14 ["gabung"]
    akhir
    Expr#11 -> Group#7
    Expr#11 -> Group#10
    Expr#14 -> Group#8
    Expr#14 -> Group#12
    Grup Subgraph#2
        Expr#2 ["pindai tbl2"]
    akhir
    Grup Subgraph#5
        Expr#5 ["gabung"]
        Expr#16 ["gabung"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Expr#16 -> Group#2
    Expr#16 -> Group#13
    Grup Subgraph#4
        Expr#4 ["gabung"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#13
        Expr#15 ["gabung"]
    akhir
    Expr#15 -> Group#1
    Expr#15 -> Group#3
    Grup Subgraph#7
        Expr#7 ["pindai tbl1 (id, field1)"]
    akhir
    Grup Subgraph#1
        Expr#1 ["Pindai TBL1"]
    akhir
    Grup Subgraph#10
        Expr#10 ["gabung"]
    akhir
    Expr#10 -> Group#8
    Expr#10 -> Group#9
    Grup Subgraph#9
        Expr#9 ["pindai tbl3 (id, field2)"]
    akhir
    Grup Subgraph#3
        Expr#3 ["Pindai TBL3"]
    akhir
    Grup Subgraph#12
        Expr#13 ["Bergabung"]
    akhir
    Expr#13 -> Group#7
    Expr#13 -> Group#9
    Grup Subgraph#6
        Expr#12 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
        Expr#6 ["Project tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    akhir
    Expr#12 -> Group#11
    Expr#6 -> Group#5
    Style Expr#12 Stroke-Width: 4px, Stroke: Orange
    Style expr#8 Stroke-lebar: 4px, stroke: oranye
    Style expr#10 Stroke-lebar: 4px, stroke: oranye
    Style expr#13 Stroke-lebar: 4px, stroke: oranye
    Style expr#14 Stroke-lebar: 4px, stroke: oranye
    Style expr#11 Stroke-lebar: 4px, stroke: oranye
    Style expr#9 Stroke-lebar: 4px, stroke: oranye
    Style expr#15 Stroke-lebar: 4px, stroke: oranye
    Style expr#7 Stroke-lebar: 4px, stroke: oranye
    Style expr#16 Stroke-lebar: 4px, stroke: oranye
    LinkStyle 0 Stroke-lebar: 4px, Stroke: Orange
    Linkstyle 15 Stroke-lebar: 4px, Stroke: Orange
    Linkstyle 12 Stroke-Width: 4px, Stroke: Orange
    LinkStyle 1 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 16 Stroke-lebar: 4px, Stroke: Orange
    Linkstyle 13 Stroke-lebar: 4px, Stroke: Orange
    Linkstyle 2 Stroke-lebar: 4px, stroke: oranye
    LinkStyle 6 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 3 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 10 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 7 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 14 Stroke-lebar: 4px, Stroke: Orange
    LinkStyle 11 Stroke-lebar: 4px, Stroke: Orange

Lihat volcanoplanner.scala untuk lebih jelasnya

Fase optimasi

Setelah fase eksplorasi, kami sekarang memiliki pohon yang diperluas sepenuhnya berisi semua rencana yang mungkin, sekarang adalah fase optimasi.

Dalam fase ini, kami akan menemukan rencana terbaik untuk grup root kami. Proses optimasi digambarkan sebagai berikut:

Untuk setiap kelompok, kami akan menemukan implementasi terbaik dengan memilih kelompok yang mengekspresikan dengan biaya terendah
Untuk setiap ekspresi kelompok, pertama -tama kami akan menyebutkan implementasi fisik dari rencana logis. Kemudian untuk setiap implementasi fisik, kami akan menghitung biayanya dengan menggunakan biaya kelompok anaknya.

Inilah contohnya

 grafik td
    Grup Subgraph#2 ["Grup#2 (Biaya = 1)"]
        Expr#2 ["expr#2 (biaya = 1)"]
    akhir
    Grup Subgraph#5 ["Grup#5 (Biaya = 3)"]
        Expr#5 ["expr#5 (biaya = maks (3,2) = 3"]
    akhir
    Expr#5 -> Group#1
    Expr#5 -> Group#4
    Grup Subgraph#4 ["Grup#4 (biaya = 2)"]
        Expr#4 ["expr#4 (biaya = maks (1,2) = 2)"]
        Expr#7 ["expr#7 (biaya = 1+2 = 3)"]
    akhir
    Expr#4 -> Group#2
    Expr#4 -> Group#3
    Grup Subgraph#1 ["Grup#1 (biaya = 3)"]
        Expr#1 ["expr#1 (biaya = 3)"]
    akhir
    Grup Subgraph#3 ["Grup#3 (Biaya = 2)"]
        Expr#3 ["expr#3 (biaya = 2)"]
    akhir
    Grup Subgraph#6 ["Grup#6 (Biaya = 4.5)"]
        Expr#6 ["Expr#6 (Biaya = 3*1.5 = 4.5)"]
    akhir
    Expr#6 -> Group#5
    Grup Subgraph#8 ["Grup#8 (Biaya = 1)"]
        Expr#8 ["expr#8 (biaya = 1)"]
    akhir
    Grup Subgraph#9 ["Grup#9 (biaya = 2)"]
        Expr#9 ["expr#9 (biaya = 2)"]
    akhir
    Expr#7 -> Group#8
    Expr#7 -> Group#9

Misalnya, biaya Expr#4 dihitung oleh biaya kelompok anaknya ( Group#2 dan Group#3 ) menggunakan fungsi max . Contoh lain, adalah Group#4 , biayanya dihitung dengan menghitung nilai min antara biaya ekspresi yang setara.

Rencana fisik

Karena tujuan fase optimasi adalah untuk menghasilkan rencana fisik terbaik mengingat ekspresi kelompok yang dieksplorasi. Kita dapat mendefinisikan rencana fisik sebagai berikut:

 sealed trait PhysicalPlan {
  def operator () : Operator

  def children () : Seq [ PhysicalPlan ]

  def cost () : Cost

  def estimations () : Estimations

  def traits () : Set [ String ]
}

operator adalah operator fisik, yang digunakan untuk melaksanakan rencana, kami akan membahasnya di bagian selanjutnya. Kemudian children adalah daftar node rencana anak, mereka digunakan untuk berpartisipasi dalam proses perhitungan biaya. Atribut ketiga adalah cost , cost adalah informasi biaya penahanan objek (seperti biaya CPU, biaya memori, biaya IO, dll.). estimations adalah perkiraan statistik penahanan properti tentang rencana (seperti jumlah baris, ukuran baris, dll.), Ini juga berpartisipasi dalam perhitungan biaya. Akhirnya, traits adalah serangkaian sifat fisik, yang mempengaruhi aturan implementasi untuk mempengaruhi proses pembuatan rencana fisik.

Selanjutnya, kita dapat mengimplementasikan kelas Node Fisik:

 case class Scan (
                 operator : Operator ,
                 cost : Cost ,
                 estimations : Estimations ,
                 traits : Set [ String ] = Set .empty
               ) extends PhysicalPlan {
  override def children () : Seq [ PhysicalPlan ] = Seq .empty // scan do not receive any child
}

case class Project (
                    operator : Operator ,
                    child : PhysicalPlan ,
                    cost : Cost ,
                    estimations : Estimations ,
                    traits : Set [ String ] = Set .empty
                  ) extends PhysicalPlan {
  override def children () : Seq [ PhysicalPlan ] = Seq (child)
}

case class Join (
                 operator : Operator ,
                 leftChild : PhysicalPlan ,
                 rightChild : PhysicalPlan ,
                 cost : Cost ,
                 estimations : Estimations ,
                 traits : Set [ String ] = Set .empty
               ) extends PhysicalPlan {
  override def children () : Seq [ PhysicalPlan ] = Seq (leftChild, rightChild)
}

Lihat Fisikplan.scala untuk implementasi penuh

Aturan Implementasi

Hal pertama dalam fase optimasi, yaitu, kita harus mengimplementasikan aturan implementasi. Aturan implementasi adalah aturan untuk dikonversi dari rencana logis ke rencana fisik tanpa melaksanakannya.

Karena kami tidak secara langsung menjalankan rencana fisik dalam perencana, jadi kami akan mengembalikan pembangun rencana fisik, juga lebih mudah untuk menyesuaikan fungsi biaya untuk setiap node.

Berikut adalah antarmuka aturan implementasi:

 trait PhysicalPlanBuilder {
  def build ( children : Seq [ PhysicalPlan ]) : Option [ PhysicalPlan ]
}

trait ImplementationRule {
  def physicalPlanBuilders ( expression : GroupExpression )( implicit ctx : VolcanoPlannerContext ) : Seq [ PhysicalPlanBuilder ]
}

Di sini PhysicalPlanBuilder adalah antarmuka yang digunakan untuk membangun rencana fisik, mengingat rencana fisik anak.

Misalnya, gabungan logis memiliki 2 implementasi fisik adalah hash gabungan dan bergabung bergabung

 grafik td
    Anak#1 ["Anak#1"]
    anak#2 ["anak#2"]
    anak#3 ["anak#3"]
    anak#4 ["anak#4"]
    hash_join ["` hash gabung 
    biaya = f (biaya (anak#1), biaya (anak#2))
    `"]
    merge_join ["` gabungan bergabung
    cost=g(cost(child#3),cost(child#4))
    `"]
    hash_join --> child#1
    hash_join --> child#2
    merge_join --> child#3
    merge_join --> child#4

the HASH JOIN cost is using function f() to calculate cost, and MERGE JOIN is using function g() to calculate cost, both are using its children as function parameters. So it's easier to code if we're returning just the phyiscal plan builder from the implementation rule instead of the physical plan.

Finding the best plan

As we've said before, the optimization process is described as following:

For each group, we will find the best implementation by choosing the group expressing with the lowest cost
For each group expression, first we will enumerate the physical implementations from the logical plan. Then for each physical implementation, we will calculate its cost using its child group costs.

And here is the code:

 private def implementGroup ( group : Group , combinedRule : ImplementationRule )(
  implicit ctx : VolcanoPlannerContext
) : GroupImplementation = {
  group.implementation match {
    case Some (implementation) => implementation
    case None =>
      var bestImplementation = Option .empty[ GroupImplementation ]
      group.equivalents.foreach { equivalent =>
        val physicalPlanBuilders = combinedRule.physicalPlanBuilders(equivalent)
        val childPhysicalPlans = equivalent.children.map { child =>
          val childImplementation = implementGroup(child, combinedRule)
          child.implementation = Option (childImplementation)
          childImplementation.physicalPlan
        }
        // calculate the implementation, and update the best cost for group
        physicalPlanBuilders.flatMap(_.build(childPhysicalPlans)).foreach { physicalPlan =>
          val cost = physicalPlan.cost()
          bestImplementation match {
            case Some (currentBest) =>
              if (ctx.costModel.isBetter(currentBest.cost, cost)) {
                bestImplementation = Option (
                  GroupImplementation (
                    physicalPlan = physicalPlan,
                    cost = cost,
                    selectedEquivalentExpression = equivalent
                  )
                )
              }
            case None =>
              bestImplementation = Option (
                GroupImplementation (
                  physicalPlan = physicalPlan,
                  cost = cost,
                  selectedEquivalentExpression = equivalent
                )
              )
          }
        }
      }
      bestImplementation.get
  }
}

This code is an exhaustive search code, which is using recursive function to traverse all nodes. At each node (group), the function is called once to get its best plan while also calculate the optimal cost of that group.

Finally, the best plan for our query is the best plan of the root group

 val implementationRules = new ImplementationRule {

  override def physicalPlanBuilders (
                                     expression : GroupExpression
                                   )( implicit ctx : VolcanoPlannerContext ) : Seq [ PhysicalPlanBuilder ] = {
    expression.plan match {
      case node @ Scan (_, _) => implement. Scan (node)
      case node @ Project (_, _) => implement. Project (node)
      case node @ Join (_, _, _) => implement. Join (node)
    }
  }
}

ctx.rootGroup.implementation = Option (implementGroup(ctx.rootGroup, implementationRules))

See VolcanoPlanner.scala for full implementation

Here is an example of the plan after optimization, it's shown the selected logical node, the selected physical operator, and the estimated cost

 graph TD
    Group#6["
    Group #6
Selected: PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2
Operator: ProjectOperator
Cost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)
"]
Group#6 --> Group#11
Group#11["
Group #11
Selected: JOIN
Operator: HashJoinOperator
Cost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)
"]
Group#11 --> Group#7
Group#11 --> Group#10
Group#7["
Group #7
Selected: SCAN tbl1 (id, field1)
Operator: NormalScanOperator
Cost: Cost(cpu=400.00, mem=400000.00, time=1000.00)
"]
Group#10["
Group #10
Selected: JOIN
Operator: MergeJoinOperator
Traits: SORTED
Cost: Cost(cpu=640000.00, mem=20000012.00, time=1100000.00)
"]
Group#10 --> Group#8
Group#10 --> Group#9
Group#8["
Group #8
Selected: SCAN tbl2 (id, field1, field2)
Operator: NormalScanOperator
Traits: SORTED
Cost: Cost(cpu=600000.00, mem=12.00, time=1000000.00)
"]
Group#9["
Group #9
Selected: SCAN tbl3 (id, field2)
Operator: NormalScanOperator
Traits: SORTED
Cost: Cost(cpu=40000.00, mem=20000000.00, time=100000.00)
"]

Implement the implementation rules

Next, we will implement some implementation rules.

PROYEK

The first, easiest one is the implementation rule of logical PROJECT

 object Project {

  def apply ( node : logicalplan. Project )( implicit ctx : VolcanoPlannerContext ) : Seq [ PhysicalPlanBuilder ] = {
    Seq (
      new ProjectionImpl (node.fields)
    )
  }
}

class ProjectionImpl ( projection : Seq [ql. FieldID ]) extends PhysicalPlanBuilder {

  override def build ( children : Seq [ PhysicalPlan ]) : Option [ PhysicalPlan ] = {
    val child = children.head
    val selfCost = Cost (
      estimatedCpuCost = 0 ,
      estimatedMemoryCost = 0 ,
      estimatedTimeCost = 0
    ) // assuming the cost of projection is 0
    val cost = Cost (
      estimatedCpuCost = selfCost.estimatedCpuCost + child.cost().estimatedCpuCost,
      estimatedMemoryCost = selfCost.estimatedMemoryCost + child.cost().estimatedMemoryCost,
      estimatedTimeCost = selfCost.estimatedTimeCost + child.cost().estimatedTimeCost
    )
    val estimations = Estimations (
      estimatedLoopIterations = child.estimations().estimatedLoopIterations,
      estimatedRowSize = child.estimations().estimatedRowSize // just guessing the value
    )
    Some (
      Project (
        operator = ProjectOperator (projection, child.operator()),
        child = child,
        cost = cost,
        estimations = estimations,
        traits = child.traits()
      )
    )
  }
}

The implementation rule for logical PROJECT, is returning one physical plan builder ProjectionImpl . ProjectionImpl cost calculation is simple, it just inherits the cost from the child node (because the projection is actually not doing any intensive operation). Beside that, it also updates the estimation (in this code, estimation is also inherit from the child node)

See Project.scala for full implementation

BERGABUNG

Writing implementation rule for logical JOIN is way harder than PROJECTION.

One first reason is, a logical JOIN has many physical implementation, such as HASH JOIN, MERGE JOIN, BROADCAST JOIN, etc.

The second reason is, estimating cost for physical JOIN is also hard, because it depends on lots of factors such as row count, row size, data histogram, indexes, data layout, etc.

So, to keep everything simple in this guide, I will only implement 2 physical JOIN: HASH JOIN and MERGE JOIN. The cost estimation functions are fictional (just to show how it works, I'm not trying to correct it). And in the MERGE JOIN, all data is assuming to be sorted by join key.

Here is the code:

 object Join {

  def apply ( node : logicalplan. Join )( implicit ctx : VolcanoPlannerContext ) : Seq [ PhysicalPlanBuilder ] = {
    val leftFields = node.on.map(_._1).map(f => s " ${f.table.id} . ${f.id} " )
    val rightFields = node.on.map(_._2).map(f => s " ${f.table.id} . ${f.id} " )
    Seq (
      new HashJoinImpl (leftFields, rightFields),
      new MergeJoinImpl (leftFields, rightFields)
    )
  }
}

The HASH JOIN:

 class HashJoinImpl ( leftFields : Seq [ String ], rightFields : Seq [ String ]) extends PhysicalPlanBuilder {

  private def viewSize ( plan : PhysicalPlan ) : Long = {
    plan.estimations().estimatedLoopIterations * plan.estimations().estimatedRowSize
  }

  // noinspection ZeroIndexToHead,DuplicatedCode
  override def build ( children : Seq [ PhysicalPlan ]) : Option [ PhysicalPlan ] = {
    // reorder the child nodes, the left child is the child with smaller view size (smaller than the right child if we're store all of them in memory)
    val (leftChild, rightChild) = if (viewSize(children( 0 )) < viewSize(children( 1 ))) {
      (children( 0 ), children( 1 ))
    } else {
      (children( 1 ), children( 0 ))
    }
    val estimatedLoopIterations = Math .max(
      leftChild.estimations().estimatedLoopIterations,
      rightChild.estimations().estimatedLoopIterations
    ) // just guessing the value
    val estimatedOutRowSize = leftChild.estimations().estimatedRowSize + rightChild.estimations().estimatedRowSize
    val selfCost = Cost (
      estimatedCpuCost = leftChild.estimations().estimatedLoopIterations, // cost to hash all record from the smaller view
      estimatedMemoryCost = viewSize(leftChild), // hash the smaller view, we need to hold the hash table in memory
      estimatedTimeCost = rightChild.estimations().estimatedLoopIterations
    )
    val childCosts = Cost (
      estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,
      estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,
      estimatedTimeCost = 0
    )
    val estimations = Estimations (
      estimatedLoopIterations = estimatedLoopIterations,
      estimatedRowSize = estimatedOutRowSize
    )
    val cost = Cost (
      estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,
      estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,
      estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost
    )
    Some (
      Join (
        operator = HashJoinOperator (
          leftChild.operator(),
          rightChild.operator(),
          leftFields,
          rightFields
        ),
        leftChild = leftChild,
        rightChild = rightChild,
        cost = cost,
        estimations = estimations,
        traits = Set .empty // don't inherit trait from children since we're hash join
      )
    )
  }
}

We can see that the cost function of HASH JOIN is composed of its children costs and estimations

 val selfCost = Cost (
  estimatedCpuCost = leftChild.estimations().estimatedLoopIterations, // cost to hash all record from the smaller view
  estimatedMemoryCost = viewSize(leftChild), // hash the smaller view, we need to hold the hash table in memory
  estimatedTimeCost = rightChild.estimations().estimatedLoopIterations
)

val childCosts = Cost (
  estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,
  estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,
  estimatedTimeCost = 0
)

val estimations = Estimations (
  estimatedLoopIterations = estimatedLoopIterations,
  estimatedRowSize = estimatedOutRowSize
)

val cost = Cost (
  estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,
  estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,
  estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost
)

Next, the MERGE JOIN:

 class MergeJoinImpl ( leftFields : Seq [ String ], rightFields : Seq [ String ]) extends PhysicalPlanBuilder {

  // noinspection ZeroIndexToHead,DuplicatedCode
  override def build ( children : Seq [ PhysicalPlan ]) : Option [ PhysicalPlan ] = {
    val (leftChild, rightChild) = (children( 0 ), children( 1 ))
    if (leftChild.traits().contains( " SORTED " ) && rightChild.traits().contains( " SORTED " )) {
      val estimatedTotalRowCount =
        leftChild.estimations().estimatedLoopIterations +
          rightChild.estimations().estimatedLoopIterations
      val estimatedLoopIterations = Math .max(
        leftChild.estimations().estimatedLoopIterations,
        rightChild.estimations().estimatedLoopIterations
      ) // just guessing the value
      val estimatedOutRowSize = leftChild.estimations().estimatedRowSize + rightChild.estimations().estimatedRowSize
      val selfCost = Cost (
        estimatedCpuCost = 0 , // no additional cpu cost, just scan from child iterator
        estimatedMemoryCost = 0 , // no additional memory cost
        estimatedTimeCost = estimatedTotalRowCount
      )
      val childCosts = Cost (
        estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,
        estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,
        estimatedTimeCost = 0
      )
      val estimations = Estimations (
        estimatedLoopIterations = estimatedLoopIterations,
        estimatedRowSize = estimatedOutRowSize
      )
      val cost = Cost (
        estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,
        estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,
        estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost
      )
      Some (
        Join (
          operator = MergeJoinOperator (
            leftChild.operator(),
            rightChild.operator(),
            leftFields,
            rightFields
          ),
          leftChild = leftChild,
          rightChild = rightChild,
          cost = cost,
          estimations = estimations,
          traits = leftChild.traits() ++ rightChild.traits()
        )
      )
    } else {
      None
    }
  }
}

Same with HASH JOIN, MERGE JOIN also uses its children costs and estimations to calculate its cost, but with different formulla:

 val selfCost = Cost (
  estimatedCpuCost = 0 , // no additional cpu cost, just scan from child iterator
  estimatedMemoryCost = 0 , // no additional memory cost
  estimatedTimeCost = estimatedTotalRowCount
)

val childCosts = Cost (
  estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,
  estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,
  estimatedTimeCost = 0
)

val estimations = Estimations (
  estimatedLoopIterations = estimatedLoopIterations,
  estimatedRowSize = estimatedOutRowSize
)

val cost = Cost (
  estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,
  estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,
  estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost
)

See HashJoinImpl.scala and MergeJoinImpl.scala for full implementation

Yang lain

You can see other rules and physical plan builders here:

melaksanakan
pembangun

Putting all pieces together

Now, after done implementing the implementation rules, now we can find our best plan. Let's start over from the user query

 SELECT tbl1 . id ,
       tbl1 . field1 ,
       tbl2 . id ,
       tbl2 . field1 ,
       tbl2 . field2 ,
       tbl3 . id ,
       tbl3 . field2 ,
       tbl3 . field2
FROM tbl1
         JOIN tbl2 ON tbl1 . id = tbl2 . id
         JOIN tbl3 ON tbl2 . id = tbl3 . id

will be converted to the logical plan

 graph TD
    1326583549["PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"];
    -425111028["JOIN"];
    -349388609["SCAN tbl1"];
    1343755644["JOIN"];
    -1043437086["SCAN tbl2"];
    -1402686787["SCAN tbl3"];
    1326583549 --> -425111028;
    -425111028 --> -349388609;
    -425111028 --> 1343755644;
    1343755644 --> -1043437086;
    1343755644 --> -1402686787;

After exploration phase, it will generate lots of equivalent plans

 graph TD
    subgraph Group#8
        Expr#8["SCAN tbl2 (id, field1, field2)"]
    akhir
    subgraph Group#11
        Expr#11["JOIN"]
        Expr#14["JOIN"]
    akhir
    Expr#11 --> Group#7
    Expr#11 --> Group#10
    Expr#14 --> Group#8
    Expr#14 --> Group#12
    subgraph Group#2
        Expr#2["SCAN tbl2"]
    akhir
    subgraph Group#5
        Expr#5["JOIN"]
        Expr#16["JOIN"]
    akhir
    Expr#5 --> Group#1
    Expr#5 --> Group#4
    Expr#16 --> Group#2
    Expr#16 --> Group#13
    subgraph Group#4
        Expr#4["JOIN"]
    akhir
    Expr#4 --> Group#2
    Expr#4 --> Group#3
    subgraph Group#13
        Expr#15["JOIN"]
    akhir
    Expr#15 --> Group#1
    Expr#15 --> Group#3
    subgraph Group#7
        Expr#7["SCAN tbl1 (id, field1)"]
    akhir
    subgraph Group#1
        Expr#1["SCAN tbl1"]
    akhir
    subgraph Group#10
        Expr#10["JOIN"]
    akhir
    Expr#10 --> Group#8
    Expr#10 --> Group#9
    subgraph Group#9
        Expr#9["SCAN tbl3 (id, field2)"]
    akhir
    subgraph Group#3
        Expr#3["SCAN tbl3"]
    akhir
    subgraph Group#12
        Expr#13["JOIN"]
    akhir
    Expr#13 --> Group#7
    Expr#13 --> Group#9
    subgraph Group#6
        Expr#12["PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
        Expr#6["PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2"]
    akhir
    Expr#12 --> Group#11
    Expr#6 --> Group#5
    style Expr#12 stroke-width: 4px, stroke: orange
    style Expr#8 stroke-width: 4px, stroke: orange
    style Expr#10 stroke-width: 4px, stroke: orange
    style Expr#13 stroke-width: 4px, stroke: orange
    style Expr#14 stroke-width: 4px, stroke: orange
    style Expr#11 stroke-width: 4px, stroke: orange
    style Expr#9 stroke-width: 4px, stroke: orange
    style Expr#15 stroke-width: 4px, stroke: orange
    style Expr#7 stroke-width: 4px, stroke: orange
    style Expr#16 stroke-width: 4px, stroke: orange
    linkStyle 0 stroke-width: 4px, stroke: orange
    linkStyle 15 stroke-width: 4px, stroke: orange
    linkStyle 12 stroke-width: 4px, stroke: orange
    linkStyle 1 stroke-width: 4px, stroke: orange
    linkStyle 16 stroke-width: 4px, stroke: orange
    linkStyle 13 stroke-width: 4px, stroke: orange
    linkStyle 2 stroke-width: 4px, stroke: orange
    linkStyle 6 stroke-width: 4px, stroke: orange
    linkStyle 3 stroke-width: 4px, stroke: orange
    linkStyle 10 stroke-width: 4px, stroke: orange
    linkStyle 7 stroke-width: 4px, stroke: orange
    linkStyle 14 stroke-width: 4px, stroke: orange
    linkStyle 11 stroke-width: 4px, stroke: orange

And the at optimization phase, a final best plan is chose

 graph TD
    Group#6["
    Group #6
Selected: PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2
Operator: ProjectOperator
Cost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)
"]
Group#6 --> Group#11
Group#11["
Group #11
Selected: JOIN
Operator: HashJoinOperator
Cost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)
"]
Group#11 --> Group#7
Group#11 --> Group#10
Group#7["
Group #7
Selected: SCAN tbl1 (id, field1)
Operator: NormalScanOperator
Cost: Cost(cpu=400.00, mem=400000.00, time=1000.00)
"]
Group#10["
Group #10
Selected: JOIN
Operator: MergeJoinOperator
Traits: SORTED
Cost: Cost(cpu=640000.00, mem=20000012.00, time=1100000.00)
"]
Group#10 --> Group#8
Group#10 --> Group#9
Group#8["
Group #8
Selected: SCAN tbl2 (id, field1, field2)
Operator: NormalScanOperator
Traits: SORTED
Cost: Cost(cpu=600000.00, mem=12.00, time=1000000.00)
"]
Group#9["
Group #9
Selected: SCAN tbl3 (id, field2)
Operator: NormalScanOperator
Traits: SORTED
Cost: Cost(cpu=40000.00, mem=20000000.00, time=100000.00)
"]

Bonus: query execution

Now we've done building a functional query planner which can optimize the query from user, but our query plan could not run by itself. So it's the reason why now we will implement the query processor to test out our query plan.

Basically the query process receive input from the query planner, and execute them

 graph LR
    plan(("Physical Plan"))
    storage[("Storage Layer")]
    processor["Query Processor"]
    plan -- execute --> processor
    storage -- fetch --> processor

Volcano/Iterator model

Volcano/iterator model is the query processing model that is widely used in many DBMS. It is a pipeline architecture, which means that the data is processed in stages, with each stage passing the output of the previous stage to the next stage.

Each stage in the pipeline is represented by an operator. Operators are functions that perform a specific operation on the data, such as selecting rows, filtering rows, or aggregating rows.

Usually, operator can be formed directly from the query plan. For example, the query

 SELECT field_1
FROM tbl
WHERE field = 1

will have the plan

 graph TD
  project["PROJECT: field_1"]
  scan["SCAN: tbl"]
  filter["FILTER: field = 1"]
  project --> scan
  filter --> project

will create a chain of operators like this:

 scan = {
    next() // fetch next row from table "tbl"
}

project = {
    next() = {
        next_row = scan.next() // fetch next row from scan operator
        projected = next_row["field_1"]
        return projected
    }
}

filter = {
    next() = {
        next_row = {}
        do {
            next_row = project.next() // fetch next row from project operator
        } while (next_row["field"] != 1)
        return next_row
    }
}

results = []
while (row = filter.next()) {
    results.append(row)
}

notes : this pseudo code did not handle for end of result stream

The operators

The basic interface of an operator is described as following:

 trait Operator {
  def next () : Option [ Seq [ Any ]]
}

See Operator.scala for full implementation of all operators

Testing a simple query

Let's define a query

 SELECT emp . id ,
       emp . code ,
       dept . dept_name ,
       emp_info . name ,
       emp_info . origin
FROM emp
         JOIN dept ON emp . id = dept . emp_id
         JOIN emp_info ON dept . emp_id = emp_info . id

with some data and stats

 val table1 : Datasource = Datasource (
  table = " emp " ,
  catalog = TableCatalog (
    Seq (
      " id " -> classOf [ String ],
      " code " -> classOf [ String ]
    ),
    metadata = Map ( " sorted " -> " true " ) // assumes rows are already sorted by id
  ),
  rows = Seq (
    Seq ( " 1 " , " Emp A " ),
    Seq ( " 2 " , " Emp B " ),
    Seq ( " 3 " , " Emp C " )
  ),
  stats = TableStats (
    estimatedRowCount = 3 ,
    avgColumnSize = Map ( " id " -> 10 , " code " -> 32 )
  )
)

val table2 : Datasource = Datasource (
  table = " dept " ,
  catalog = TableCatalog (
    Seq (
      " emp_id " -> classOf [ String ],
      " dept_name " -> classOf [ String ]
    ),
    metadata = Map ( " sorted " -> " true " ) // assumes rows are already sorted by emp_id (this is just a fake trait to demonstrate how trait works)
  ),
  rows = Seq (
    Seq ( " 1 " , " Dept 1 " ),
    Seq ( " 1 " , " Dept 2 " ),
    Seq ( " 2 " , " Dept 3 " ),
    Seq ( " 3 " , " Dept 3 " )
  ),
  stats = TableStats (
    estimatedRowCount = 4 ,
    avgColumnSize = Map ( " emp_id " -> 10 , " dept_name " -> 255 )
  )
)

val table3 : Datasource = Datasource (
  table = " emp_info " ,
  catalog = TableCatalog (
    Seq (
      " id " -> classOf [ String ],
      " name " -> classOf [ String ],
      " origin " -> classOf [ String ]
    ),
    metadata = Map ( " sorted " -> " true " ) // assumes rows are already sorted by id (this is just a fake trait to demonstrate how trait works)
  ),
  rows = Seq (
    Seq ( " 1 " , " AAAAA " , " Country A " ),
    Seq ( " 2 " , " BBBBB " , " Country A " ),
    Seq ( " 3 " , " CCCCC " , " Country B " )
  ),
  stats = TableStats (
    estimatedRowCount = 3 ,
    avgColumnSize = Map ( " id " -> 10 , " name " -> 255 , " origin " -> 255 )
  )
)

The cost model is optimized for CPU

 val costModel : CostModel = ( currentCost : Cost , newCost : Cost ) => {
  currentCost.estimatedCpuCost > newCost.estimatedCpuCost
}

Now, executing the query by running this code:

 val planner = new VolcanoPlanner
QueryParser .parse(query) match {
  case Left (err) => throw err
  case Right (parsed) =>
    val operator = planner.getPlan(parsed)
    val result = Utils .execute(operator)
    // print result
    println(result._1.mkString( " , " ))
    result._2.foreach(row => println(row.mkString( " , " )))
}

it will print:

 emp.id,emp.code,dept.dept_name,emp_info.name,emp_info.origin
1,Emp A,Dept 1,AAAAA,Country A
1,Emp A,Dept 2,AAAAA,Country A
2,Emp B,Dept 3,BBBBB,Country A
3,Emp C,Dept 3,CCCCC,Country B

Voila, We've done building a fully functional query planner and query engine :). You can start writing one for your own, good luck

See Demo.scala for full demo code

Terima kasih

Thanks for reading this, this guide is quite long, and not fully correct, but I've tried my best to write it as understandably as possible ?

Memperluas