This article describes the KNN algorithm implemented in Java. Share it for your reference, as follows:
Everyone should be familiar with KNN algorithms, and they are one of the top ten classic algorithms for data mining.
The idea of the algorithm is to group and classify unknown data for the grouping that has been classified in the training dataset. Among them, the distance is calculated based on the unknown point and the point in its training data, the point with the shortest distance is calculated, and the category where it is classified into that point.
Let's take a look at the algorithm's engineering:
1. Prepare data and preprocess the data
2. Select the appropriate data structure to store training data and test tuples
3. Set parameters, such as k
4. Maintain a priority queue of size k, from large to small, based on distance, to store nearest neighbor training tuples. Randomly select k tuples from the training tuple as the initial nearest neighbor tuple, calculate the distance between the test tuple and these k tuples, and store the training tuple label and distance into the priority queue.
5. Iterate through the training tuple set, calculate the distance between the current training tuple and the test tuple, and divide the resulting distance L to the maximum distance Lmax in the priority queue.
6. Make a comparison. If L>=Lmax, the tuple is discarded and the next tuple is traversed. If L < Lmax, delete the tuple with the largest distance in the priority queue and store the current training tuple in the priority queue.
7. After the traversal is completed, calculate the majority of the k tuples in the priority queue and use them as the category of the test tuple.
8. After the test tuple set is tested, calculate the error rate, continue to set different k values and retrain, and finally get the k value with the smallest error rate.
According to the algorithm process, we implement the Java language:
package KNN;/** * Coordinates of points x and y * @author Administrator * */public class PointBean {int x;int y;public int getX() { return x;}public void setX(int x) { this.x = x;}public int getY() { return y;}public void setY(int y) { this.y = y;}public PointBean(int x, int y) { super(); this.x = x; this.y = y;}public PointBean() { super();}@Overridepublic String toString() { return "PointBean [x=" + x + ", y=" + y + "]";}}KNN algorithm
package KNN;import java.util.ArrayList;/** * Methods of KNN implementation* @author Administrator * */public class KnnMain { public double getPointLength(ArrayList<PointBean> list,PointBean bb){ int b_x=bb.getX(); int b_y=bb.getY(); double temp=(b_x -list.get(0).getX())*(b_x -list.get(0).getX())+ (b_y -list.get(0).getY())*(b_y -list.get(0).getY()); // Find the minimum distance for(int i=1;i<list.size();i++){ if(temp<((b_x -list.get(i).getX())*(b_x -list.get(i).getX())+ (b_y -list.get(i).getY())*(b_y -list.get(i).getY()))){ temp=(b_x -list.get(i).getX())*(b_x -list.get(i).getY()); } } return Math.sqrt(temp); } /** * Get the length and find the smallest one for classification* @param list1 * @param list2 * @param list3 * @param bb */ public void getContent(ArrayList<PointBean> list1,ArrayList<PointBean> list2, ArrayList<PointBean> list3,PointBean bb){ double A=getPointLength(list1,bb); double B=getPointLength(list2,bb); double C=getPointLength(list3,bb); //Make a comparison if(A>B){ if(B>C){ System.out.println("This point:"+bb.getX()+" , "+bb.getY()+" " +" belongs to C"); }else { System.out.println("This point:"+bb.getX()+" , "+bb.getY()+" " +" belongs to B"); } }else { if(A>C){ System.out.println("This point:"+bb.getX()+" , "+bb.getY()+" " +" belongs to C"); }else { if(A>C){ System.out.println("This point:"+bb.getX()+" , "+bb.getY()+" " +" belongs to C"); }else { System.out.println("This point:"+bb.getX()+" , "+bb.getY()+" " +" belongs to A"); } } }}Main function
package KNN;import java.util.ArrayList;/* * Main function KNN */public class TestJava { static ArrayList< PointBean> listA; static ArrayList< PointBean> listB; static ArrayList< PointBean> listC; static ArrayList< PointBean> listD; public static void main(String[] args) { //Chuangjia Arraylist listA=new ArrayList< PointBean>(); listB=new ArrayList<PointBean>(); listC=new ArrayList<PointBean>(); listD=new ArrayList<PointBean>(); //Write data setDate(); getTestResult(); } /** * Get the result*/ private static void getTestResult() { //Create an object KnnMain km=new KnnMain(); for(int i=0;i<listD.size();i++){ km.getContent(listA, listB, listC, listD.get(i)); } } /** * Write data*/ private static void setDate() { //A's coordinate point int A_x[]={1,1,2,2,1}; int A_y[]={0,1,1,0,2}; //B's coordinate point int B_x[]={2,3,3,3,4}; int B_y[]={4,4,3,2,3}; //C's coordinate point int C_x[]={4,5,5,6,6}; int C_y[]={1,2,0,2,1}; // Test data//B's coordinate point int D_x[]={3,3,3,0,5}; int D_y[]={0,1,5,0,1}; // PointBean bA; for(int i=0;i<5;i++){ bA=new PointBean(A_x[i], A_y[i]); listA.add(bA); } // PointBean bB ; for(int i=0;i<5;i++){ bB=new PointBean(B_x[i], B_y[i]); listB.add(bB); } // PointBean bC ; for(int i=0;i<5;i++){ bC=new PointBean(C_x[i], C_y[i]); listC.add(bC); } // PointBean bD ; for(int i=0;i<5;i++){ bD=new PointBean(D_x[i], D_y[i]); listD.add(bD); } }}Test results:
This point: 3, 1 belongs to A
This point: 3, 5 belongs to B
This point: 0, 0 belongs to A
This point: 5, 1 belongs to C
At this point, the simple KNN algorithm has implemented the division of unknown points, which will help everyone understand the KNN algorithm. Some algorithms that improve KNN will be posted later. Learn and make progress together!
For more information about Java algorithms, readers who are interested in this site can view the topics: "Java Data Structure and Algorithm Tutorial", "Summary of Java Operation DOM Node Tips", "Summary of Java File and Directory Operation Tips" and "Summary of Java Cache Operation Tips"
I hope this article will be helpful to everyone's Java programming.