Part One Problem Description
1.1 Specific tasks
This job task is trajectory compression. Given a GPS data record file, each record contains two coordinate fields, longitude and dimension. All records have latitude and longitude coordinates to form a trajectory. It is required to use a suitable compression algorithm so that the distance error of the compressed trajectory is less than 30m.
1.2 Program input
The input of this program is a GPS data record file.
1.3 Data output
The output form is a file, including three parts, the ID sequence and coordinates of the compressed point, the number of points, the average distance error, and the compression rate.
Answers to the second part
According to the problem description, we solve the problem, and the problem solving is divided into the following steps:
2.1 Data preprocessing
This program input is a GPS data record file, with a total of 3150 lines of records, and each line of records is divided into several fields. According to the question, we only need to pay attention to the longitude and latitude coordinate fields. Some records of the original data file are shown in Figure 2.1:
Figure 2.1 Schematic diagram of partial records of original data file
As shown in Figure 2.1, the storage format of the latitude and longitude coordinate field data in each record in the original data file is a typical GPS coordinate expression method, that is, the degree division format, the form is dddmm.mmmmmm, where ddd represents degree, mm.mmmmmm represents fraction, the decimal point represents the integer part of the fraction, and the decimal point represents the decimal part of the fraction; in this data preprocessing, in order to facilitate the calculation of the distance between the next two coordinate points, we need to convert the latitude and longitude coordinate data in the degree division format into the form of degree, the conversion method is ddd+mm.mmmm/60. Here we retain the 6 digits after the decimal point, and the converted form is ddd.xxxxxx.
We take the latitude and longitude coordinates in the first record (11628.2491, 3955.6535) as an example. The conversion result is (116.470818, 39.927558). The latitude and longitude coordinates in all records are performed using the method, and an ID can be generated for each converted coordinate point and uniquely identified. After compression, we only need to output the IDs of all the reserved points.
2.2 Douglas-Peucker Trajectory Compression Algorithm
Trajectory compression algorithms are divided into two categories, namely lossless compression and lossy compression. Lossless compression algorithms mainly include Huffman coding, and lossy compression algorithms are divided into batch processing method and online data compression method. The batch processing method includes DP (Douglas-Peucker) algorithm, TD-TR (Top-Down Time-Ratio) algorithm and Bellman algorithm. Online data compression methods include sliding windows, open windows, safe area-based methods, etc.
Due to limited time, for this trajectory compression, we decided to use a relatively simple DP algorithm.
The steps of the DP algorithm are as follows:
(1) Connect a straight line AB between two points A and B at the beginning and end of the curve, and the straight line is the chord of the curve;
(2) traverse all other points on the curve, find the distance from each point to the straight line AB, find the point C with the maximum distance, and record the maximum distance as dmax;
(3) Compare the distance dmax with the predefined threshold Dmax size. If dmax<Dmax, the straight line AB is used as the approximation of the curve segment and the curve segment is processed;
(4) If dmax>=Dmax, point C divides curve AB into two sections AC and CB, and performs (1) to (3) steps of these two sections respectively;
(5) When all curves are processed, the polyline formed by each segmentation point is connected in turn, that is, the path of the original curve.
2.3 The distance from point to line
In the DP algorithm, we need to find the distance from a point to a straight line. This distance refers to the vertical Euro-like distance, that is, the distance d from point C outside the straight line AB to line AB. Here, points A, B, and C are all latitude and longitude coordinates; we use the equal area method of triangles to find the distance d. The specific method is: points A, B, and C form a triangle. There are two ways to find the area of this triangle, namely the ordinary method (bottom x height/2) and the Helen formula. The Helen formula is as follows:
Suppose there is a triangle with side lengths a, b, and c respectively. The area S of the triangle can be obtained from the following formula:
where p is half circumference:
We can find the area of the triangle through the Helen formula, and then we can find the size of the height, where the height is the distance d. To use the Helen formula, you must find the distance between three points A, B, and C. The distance formula is given by the teacher, and you can directly call the distance function.
Note: After finding the distance, add the absolute value to prevent the distance from being negative.
2.4 Solve the average error
The average error refers to the value obtained by dividing the sum of the distances from the points ignored during compression to the corresponding line segment by the total number of points.
2.5 Solution to compression rate
The compression ratio calculation formula is as follows:
2.6 Generation of data result files
After the above processing and calculation, we write the ID of the remaining points after compression and the number of points, average distance error, compression rate and other parameters into the final result file, and the answer to the question is completed.
Part 3 code implementation
This program is written in Java language, with the development environment of IntelliJ IDEA 14.0.2. The code is divided into two classes. One is the ENPoint class, which is used to save latitude and longitude point information, and the other is the TrajectoryCompressionMain class, which is used to write functions such as data processing, DP algorithm, point-to-line distance, and average error.
3.1 General procedure process
The entire program flow mainly includes the following steps:
(1) Define the related ArrayList array and File object, among which there are three ArrayList array objects, namely the original latitude and longitude coordinate array pGPSArryInit, the filtered point coordinate array pGPSArrayFilter, and the filtered and sorted point coordinate array pGPSArrayFilterSort; there are five File objects, namely the original data file object fGPS, the compressed result data file object oGPS, the data file fInitGPSPoint that maintains the converted original latitude and longitude coordinate points, the simulation test files fTestInitPoint and fTestFilterPoint.
(2) Obtain the coordinates of the original point and write them to the file, mainly including two operations: reading the file and writing the file;
(3) Perform trajectory compression;
(4) Sort the compressed latitude and longitude point coordinates;
(5) Generate simulation test files and use R language tools to draw graphics to obtain the final result;
(6) Find the average error and compression rate, the average error is obtained through a function, and the compression rate is directly calculated;
(7) Write the final result into the result file, including the filtered point ID, number of points, average error and compression rate;
3.2 Specific implementation code
(1) ENPoint.java
package cc.xidian.main;import java.text.DecimalFormat;/*** Created by hadoop on 2015/12/20.*/public class ENPoint implements Comparable<ENPoint>{public int id;//Point IDpublic double pe;//Longitude public double pn;//Dimension public ENPoint(){}//Empty constructor public String toString(){//DecimalFormat df = new DecimalFormat("0.000000");return this.id+"#"+this.pn+","+this.pe;}public String getTestString(){DecimalFormat df = new DecimalFormat("0.0000000");return df.format(this.pn)+","+df.format(this.pe);}public String getResultString(){DecimalFormat df = new DecimalFormat("0.000000");return this.id+"#"+df.format(this.pn)+","+df.format(this.pe);}@Overridepublic int compareTo(ENPoint other) {if(this.id<other.id) return -1; else if(this.id>other.id) return1; elsereturn0;}}(2) TrajectoryCompressionMain.java
package cc.xidian.main;import java.io.*;import java.text.DecimalFormat;import java.util.*;import java.util.List;/*** Created by hadoop on 2015/12/19.*/public class TrajectoryCompressionMain{public static void main(String[] args)throws Exception{//------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- pGPSArrayInit = new ArrayList<ENPoint>();//Original record latitude and longitude coordinate array ArrayList<ENPoint> pGPSArrayFilter = new ArrayList<ENPoint>();//Filtered latitude and longitude coordinate array ArrayList<ENPoint> pGPSArrayFilterSort = new ArrayList<ENPoint>();//Filtered and sorted latitude and longitude coordinate array File fGPS = new File("2007-10-14-GPS.log");//Original data file object File oGPS = new File("2015-12-25-GPS-Result.log");//Filtered result data file object//Keep the original latitude and longitude data file after being converted into degrees, keeping the format as "ID#Large and longitude value, latitude value", where the longitude and dimension units are degrees, and keep the 6-digit numbers after the decimal point File fInitGPSPoint = new File("2007-10-14-GPS-ENPoint.log");//Keep the converted data file of the original latitude and longitude coordinate point after the converted file fTestInitPoint = new File("2007-10-14-GPS-InitTestPoint.log");//Featured original latitude and longitude coordinate point data file File fTestFilterPoint = new File("2015-12-25-GPS-FilterTestPoint.log");// Used for the filtered latitude and longitude coordinate point data file after simulation//------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- pGPSArrayInit);//Write the converted original latitude and longitude point data into the file System.out.println(pGPSArrayInit.size());//Output the number of original latitude and longitude point coordinates//---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 30.0;//Set the maximum distance error threshold pGPSArrayFilter.add(pGPSArrayInit.get(0));//Get the coordinates of the first original latitude and longitude point and add them to the filtered array pGPSArrayFilter.add(pGPSArrayInit.get(pGPSArrayInit.size()-1));//Get the coordinates of the last original latitude and longitude point and add them to the filtered array ENPoint[] enpInit = new ENPoint[pGPSArrayInit.size()];//Use a point array to receive all point coordinates for the subsequent compression Iterator<ENPoint> iInit = pGPSArrayInit.iterator();int jj=0;while(iInit.hasNext()){enpInit[jj] = iInit.next();jj++;}//Copy the coordinates of the point in ArrayList into the point array int start = 0;//Start subscript int end = pGPSArrayInit.size()-1;//End subscript TrajCompressC(enpInit,pGPSArrayFilter,start,end,DMax);//DP compression algorithm System.out.println(pGPSArrayFilter.size());//Output compressed points//---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ENPoint[pGPSArrayFilter.size()];//Use an array of points to receive filtered point coordinates, for the following sorting Iterator<ENPoint> iF = pGPSArrayFilter.iterator();int i = 0;while(iF.hasNext()){enpFilter[i] = iF.next();i++;}//Copy the point coordinates in ArrayList to the point array Arrays.sort(enpFilter);//Sort for (int j=0;j<enpFilter.length;j++){pGPSArrayFilterSort.add(enpFilter[j]);//Write the sorted point coordinates into a new ArrayList array}//------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- pGPSArrayFilterSort);//Write the filtered latitude and longitude data points into the simulation file, the format is "Longitude, Dimension" //------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- getMeanDistError(pGPSArrayInit,pGPSArrayFilterSort);//Find the average error System.out.println(mDError);//----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- (double)pGPSArrayFilter.size()/pGPSArrayInit.size()*100;//Find the compression rate System.out.println(cRate);//---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- and compression ratio writeFilterPointToFile(oGPS,pGPSArrayFilterSort,mDError,cRate);//--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- @param fGPS: source data file* @return pGPSArrayInit: Return ArrayList array that saves all point coordinates* @throws Exception*/public static ArrayList<ENPoint> getENPointFromFile(File fGPS)throws Exception{ArrayList<ENPoint> pGPSArray = new ArrayList<ENPoint>();if(fGPS.exists()&&fGPS.isFile()){InputStreamReader read = new InputStreamReader(new FileInputStream(fGPS));BufferedReader bReader = new BufferedReader(read);String str;String[] strGPS;int i = 0;while((str = bReader.readLine())!=null){strGPS = str.split(");ENPoint p = new ENPoint();p.id = i;i++;p.pe = (dfTodu(strGPS[3]));p.pn = (dfTodu(strGPS[5]));pGPSArray.add(p);}bReader.close();}return pGPSArray;}/*** Function function: Write the latitude and longitude coordinates, average distance error, and compression ratio of the filtered point to the result file* @param outGPSFile: result file* @param pGPSPointFilter: filtered point* @param mDerror: average distance error* @param cRate: compression ratio* @throws Exception*/public static void writeFilterPointToFile(File outGPSFile,ArrayList<ENPoint> pGPSPointFilter,double mDerror,double cRate)throws Exception{Iterator<ENPoint> iFilter = pGPSPointFilter.iterator();RandomAccessFile rFilter = new RandomAccessFile(outGPSFile,"rw"); while(iFilter.hasNext()){ENPoint p = iFilter.next();String sFilter = p.getResultString()+"/n";byte[] bFilter = sFilter.getBytes();rFilter.write(bFilter);}String strmc = "#"+Integer.toString(pGPSPointFilter.size())+","+double.toString(mDerror)+","+double.toString(cRate)+"%"+"#"+"/n";byte[] bmc = strmc.getBytes();rFilter.write(bmc);rFilter.close();}/*** Function function: Save the converted original latitude and longitude data points into a file* @param outGPSFile* @param pGPSPointFilter* @throws Exception*/public static void writeInitPointToFile(File outGPSFile,ArrayList<ENPoint> pGPSPointFilter)throws Exception{Iterator<ENPoint> iFilter = pGPSPointFilter.iterator();RandomAccessFile rFilter = new RandomAccessFile(outGPSFile,"rw");while(iFilter.hasNext()){ENPoint p = iFilter.next();String sFilter = p.toString()+"/n";byte[] bFilter = sFilter.getBytes();rFilter.write(bFilter);}rFilter.close();}/*** Function function: Write latitude and longitude point coordinate data in the array into the test file for visual testing* @param outGPSFile: file object* @param pGPSPointFilter: point array* @throws Exception*/public static void writeTestPointToFile(File outGPSFile,ArrayList<ENPoint> pGPSPointFilter)throws Exception{Iterator<ENPoint> iFilter = pGPSPointFilter.iterator();RandomAccessFile rFilter = new RandomAccessFile(outGPSFile,"rw"); while(iFilter.hasNext()){ENPoint p = iFilter.next();String sFilter = p.getTestString()+"/n";byte[] bFilter = sFilter.getBytes();rFilter.write(bFilter);}rFilter.close();}/*** Function function: convert raw latitude and longitude coordinate data into degrees* @param str: original latitude and longitude coordinates* @return: Return the degree data for the corresponding */public static double dfTodu(String str){int indexD = str.indexOf('.');String strM = str.substring(0,indexD-2);String strN = str.substring(indexD-2);double d = double.parsedouble(strM)+double.parsedouble(strN)/60;return d;}/*** Function function: keep six decimal places of a double number* @param d: original double number* @return Return the converted double number*/public static double getPointSix(double d){DecimalFormat df = new DecimalFormat("0.000000"); return double.parsedouble(df.format(d));}/*** Function function: calculate the distance between the point pX and the straight line determined by point pA and pB using the equal method using the area of the triangle (find using the Helen formula). * @param pA: Start point* @param pB: End point* @param pX: Third point* @return distance: The distance between point pX and the straight line where pA and pB is located*/public static double distToSegment(ENPoint pA,ENPoint pB,ENPoint pX){double a = Math.abs(geoDist(pA, pB));double b = Math.abs(geoDist(pA, pX));double c = Math.abs(geoDist(pB, pX));double p = (a+b+c)/2.0;double s = Math.sqrt(Math.abs(p*(pa)*(pb)*(pc)));double d = s*2.0/a;return d;}/*** Function function: Use the method given by the teacher to find the distance between two latitude and longitude points that are incomprehensible* @param pA: Start point* @param pB: End point* @return distance: Distance*/public static double geoDist(ENPoint pA,ENPoint pB){double radLat1 = Rad(pA.pn);double radLat2 = Rad(pB.pn);double delta_lon = Rad(pB.pe - pA.pe);double top_1 = Math.cos(radLat2) * Math.sin(delta_lon);double top_2 = Math.cos(radLat1) * Math.sin(radLat2) - Math.sin(radLat1) * Math.cos(radLat2) * Math.cos(delta_lon);double top = Math.sqrt(top_1 * top_1 + top_2 * top_2);double bottom = Math.sin(radLat1) * Math.sin(radLat2) + Math.cos(radLat1) * Math.cos(radLat2) * Math.cos(delta_lon);double delta_sigma = Math.atan2(top, bottom);double distance = delta_sigma * 6378137.0;return distance;}/*** Function function: Angle radian* @param d: Angle* @return The returned radian*/public static double Rad(double d){return d * Math.PI / 180.0;}/*** Function function: According to the maximum distance limit, the original trajectory is recursively sampled using the DP method to obtain the compressed trajectory* @param enpInit: original latitude and longitude coordinate points array* @param enpArrayFilter: maintain the filtered point coordinate array* @param start: start subscript* @param end: endpoint subscript* @param DMax: prespecified maximum distance error*/public static void TrajCompressC(ENPoint[] enpInit,ArrayList<ENPoint> enpArrayFilter,int start,int end,double DMax){if(start < end){//Recursive condition double maxDist = 0;//Maximum distance int cur_pt = 0;//Current subscript for (int i=start+1;i<end;i++){double curDist = distToSegment(enpInit[start],enpInit[end],enpInit[i]);//Distance from the current point to the corresponding line segment if(curDist > maxDist){maxDist = curDist;cur_pt = i;}//Finding the subscript of the maximum distance and the corresponding point of the maximum distance}//If the current maximum distance is greater than the maximum distance error if(maxDist >= DMax){enpArrayFilter.add(enpInit[cur_pt]);//Add the current point into the filtered array//Disassemble the original line segment into two segments centered on the current point, and perform recursion processing respectively TrajCompressC(enpInit,enpArrayFilter,start,cur_pt,DMax); TrajCompressC(enpInit,enpArrayFilter,cur_pt,end,DMax);}}}/*** Function function: Find the average distance error* @param pGPSArrayInit: Original data point coordinates* @param pGPSArrayFilterSort: Filtered data point coordinates* @return: Return the average distance*/public static double getMeanDistError(ArrayList<ENPoint> pGPSArrayInit,ArrayList<ENPoint> pGPSArrayFilterSort){double sumDist = 0.0;for (int i=1;i<pGPSArrayFilterSort.size();i++){int start = pGPSArrayFilterSort.get(i-1).id;int end = pGPSArrayFilterSort.get(i).id;for (int j=start+1;j<end;j++){sumDist += distToSegment(pGPSArrayInit.get(start),pGPSArrayInit.get(end),pGPSArrayInit.get(j));}}double meansDist = sumDist/(pGPSArrayInit.size());return meanDist;}}Part IV Procedure Results
4.1 Program output results
Compressed results:
(1) Total number of points: 140 points; (2) Average distance error: 7.943786; (3) Compression rate: 4.4444%
4.2 Simulation results
After trajectory compression, we convert the original latitude and longitude coordinate points into compressed and filtered latitude and longitude coordinate points. We write these two points coordinate data into two files, and then draw the trajectories before and after compression according to these two files, and compare them. Based on the comparison results, we can see whether our trajectory compression algorithm is effective. The final comparison result is shown in Figure 4.1:
Summary of Part 5
During the writing process of this program, I encountered various problems and learned a lot of programming experience. The following is a summary of the problems and solutions encountered, and finally put forward suggestions for improvement for the shortcomings of the program.
5.1 Problems and solutions encountered
Question 1: Latitude and longitude coordinate order issue
Solution: The parameters in the distance formula are that the latitude is in front and longitude are in back, and the order of the latitude and longitude coordinate points need to be adjusted.
Question 2: The distance cannot be negative
Solution: Ensure that the distance found cannot be negative, add an absolute value function.
Question 3: DP algorithm implementation details
Solution: I started to use ArrayList array to solve the subscript problem. There was a huge error when solving recursively. Later, I switched to using ordinary array subscripts for recursion, and the result was much better.
5.2 Existing shortcomings and prospects
(1) When performing trajectory compression, the DP algorithm is the simplest algorithm and is not the best. Some algorithms with good results can be used to perform trajectory compression again;
(2) The data records of this experiment are 3,150, and the amount of data is not large. What should I do if there are 1 billion data? We can consider hardware, distributed, data preprocessing, data segmentation, and data warehouses with good performance.
The above is the full content of this article on the detailed code of Douglas-Peucker algorithm for Java programming to implement trajectory compression. I hope it will be helpful to everyone. If there are any shortcomings, please leave a message to point it out. Thank you friends for your support for this site!