EMAN Download - EMAN Source code download

EMAN

Other categories

1.0.0

Download

EMAN

A simple e-book recommendation system based on the SSM framework and item collaborative filtering algorithm (ItemCF)

Interface screenshot

img/Homepage 0.png

img/Homepage 1 I'm interested in you.png

img/book classification 0.png

img/book details page 0.png

img/book details page 1 recommendations and comments.png

img/book details page 2 comments.png

img/book details page 1 recommendations and comments.png

img/login.png

img/search page.png

System function analysis

Recommendation strategy

Because some recommendation algorithms need to use the user's favorite data as parameters. If the user is not logged in, the recommendation strategy for tourists will be adopted. If the user is logged in, the recommendation policy for the logged-in user is adopted. If the logged-in user has a partition record of interest in the database, a recommendation from the partition you are interested in will be added. Therefore, the recommendation strategy is divided into two situations: whether to log in or not.

Copy of recommended strategy.png

If the user is not logged in, the user rating display strategy for tourists will be adopted. If the user is logged in, the user rating display policy for the logged in user is adopted. If the logged-in user has already scored the e-book on the current details page, its rating record will be displayed.

Copy of the scoring strategy.png

�Crawler crawling strategy

img/crawler crawling strategy.png

System Analysis and Design

System Analysis

As shown in the use case diagram, there are three basic users in this system. These are tourists, registered users, and administrators. Visitors can visit the homepage of the e-book recommendation platform, user registration page, and view the e-book page. The function of registering users is that they can rate and comment on e-books and recommend e-books determined by the user's predicted interest. Administrators can regularly use the crawler module to update e-book information and use the statistics module to update classified statistics, e-book co-occurrence matrix and e-book cosine similarity matrix.

![User Case.png](/img/User Case.png)

Recommended function adopts the item-based collaborative filtering algorithm (ItemCF)

Considering that there is a cold start problem with this algorithm. That is, for new users, they often lack rating data, which leads to the recommendation based on the user's predicted interest level and cannot be smoothly carried out. To solve this problem, I added a module to directly use the cosine similarity matrix w to directly perform similar e-book recommendations, so that unregistered users and new users can better obtain recommendations. As for the sparse data matrix problems in collaborative filtering algorithms, some unpopular e-books may have no user ratings, low e-book correlation, and some users have few e-book ratings. To solve this problem, add a new book recommendation module to the homepage to better recommend e-books that have no rated; add a partition recommendation module to recommend e-books based on the user's interest in order to increase the interest of specific users to recommend e-books. It is worth noting that the e-book recommended by users to predict interest is calculated using formulas on the cosine similarity matrix W calculated by this algorithm. In addition, the user's favorite e-book is also required as input parameters during calculation. The recommended similar e-book is to use the string similarity matrix W for statistics directly. This means that the above two recommendations need to complete the calculation of the co-occurrence matrix of the e-book and the cosine similarity matrix of the e-book. Because the above two matrices are computationally large and all data is required for each calculation, it is set to be performed regularly by the administrator.

System Design

Database table

img/ER picture.png

ebook: Book details
ratinglist: �� Rating comment list
subclassify: Book subclassify
user: user information
classificationmainstatistics: main classification statistics
Recommendhomepage: Recommended books on the homepage (to be determined)
favorite: user favorite category list
matrixC: Co-occurrence matrix C and cosine similarity matrix W table

Class and related major documents

Symbol → indicates that the class is a regular manual operation of modules

com.controller

EBookController: a book-related interface
RatingListController: Interface related to user comments and ratings
StatisticsController: Statistics module interface
UserController: User-related interface
FavoriteController: Users like classification interfaces
ClassifyMainStatisticsController: Classification statistics-related interface

com.dao

DBAccess: used to obtain SqlSession database connection
EBookDao: Provides access to book lists
RatingListDao: Provide access to the rating review table
UserDao: Provides access to user tables
FavoriteDao: User favorite access to category tables
ClassifyMainStatisticsDao: Access to Classified Statistics related data tables
MatrixCDao: Provides access to co-occurrence matrix C and cosine similarity matrix W
EBookMapper.xml: SQL statement that provides access to the book table
RatingListMapper.xml: SQL statement that provides access to the rating comment table
UserMapper.xml: SQL statement that provides access to user tables
ClassifyMainStatisticsMapper.xml: SQL statement that provides access to the main partition statistics table
FavoriteMapper.xml: SQL statements that users like to access the classification table
ClassifyMainStatisticsMapper.xml: SQL statement for accessing classified statistics related data tables
MatrixCMapper.xml: SQL statement that provides access to co-occurrence matrix C and cosine similarity matrix W

com.entity

EBook: Books->ebooks
RatingList: Comments and ratings -> ratingList table
ClassifyMainStatistics: Main partition statistics module -> classifyMainStatistics table
User: User->user table
Favorite: Users like classified entities-> Favorite table
MatrixC: Co-occurrence matrix C and cosine similarity matrix W -> matrixC table

com.service

EBookService: Book module
RatingListService: Comment Scoring Module
UserService: User module
FavoriteService: Users like classification modules
ClassifyMainStatisticsService: Classification StatisticsModule

com.statistics

→ ItemCollaborationFilter: Used to calculate the recommended matrix for collaborative filtering based on items
→ StatisticsClassifyMain: StatisticsClassifyMain: Number of people rated, average rating, and variance of ratings for each score in the main category
→ StatisticsRatingValue: Book rating information statistics: count the data of the RatingValue table and write the statistical results to the EBook table corresponding to the book

com.util

ChartDataJsonCreater: Provides structured Json data format for converting � data into Chart.js plug-in
JSONConverter: Provides direct conversion of JavaBeans objects into structured Json data
RandomNumFactory: Random number generation tool class

spider (crawler package)

BookInfoSpider: Crawl e-book details pages and store them into the database
→ EBookListSpider: Crawl all e-books
HttpURLConnectionUtil:
→ RatingValueListSpider: Crawl Douban book rating list and user comments (read book information from the database and crawl the rating list)

Configuration File

SSM framework configuration file

applicationContext.xml: Spring framework configuration file
jdbc.properties: jdbc configuration file
mybatis-config.xml: MyBatis framework configuration file
web.xml: project configuration file

Crawler configuration file

setting.properties: EBookListSpider configuration file
user_spider_setting.properties: configuration file for RatingValueListSpider
user_spider_exception.log: Error log of user comment rating crawler (RatingValueListSpider)

img/EMAN class structure.png

Page list

index.jsp: Home page
head.jsp: Navigation bar
error.jsp: Error message page (to be implemented)
success.jsp: Successful operation of information jumps to page
/ebook (book related page)

list.jsp: Main category book list
info.jsp: Book details page
serrchResult.jsp�: Book search results page

/user(user related page)

login.jsp: User login
register.jsp: User registration
home.jsp: User favorite category selection

System implementation

Design and implementation of algorithm (itemCF based on item)

There are two main steps for the collaborative filtering algorithm based on items:

Calculate similarity between items.
Generate a recommendation list for users based on the similarity of items and the historical behavior of users.

Suppose N(i) is the number of users who like item i. N(i)⋂N(j) represents the number of users who like item i item j at the same time. Then the similarity between item i and item j is:

Calculate similarity of items.png

However, the above formula has a defect: when item j is a very popular product and everyone likes it, then wij will be very close to 1, that is, the above formula will make many items have a great similarity to popular products, so you can improve the formula:

Calculate item similarity improvement.png

Create a list of user items inverted (suppose uppercase letters represent users, lowercase letters represent items):

Item reverse order.png

Calculate the co-occurrence matrix C (the co-occurrence matrix C represents the number of users who like two items at the same time, and is calculated based on the user item inversion table):

Co-occurrence matrix.png

As shown in the figure, we can see that the diagonal elements of the co-occurrence matrix are all 0 and are real symmetric sparse matrices. The algorithm is implemented as follows:

com.statistics.ItemCollaborationFilter

    /**
	 * 计算共现矩阵C
	 */
	private void computerMatrixC (){
		// 建立用户物品倒排表
		// 若用户对物品评分大于等于4则认为喜欢(出现)
		List < User > allUser = userDao . queryAllUser ();
		for ( int i = 0 ; i < allUser . size (); i ++){ // 遍历全部用户
			// 获取一个用户的评分列表中>=4的评分记录
			List < RatingList > likeList = ratingListDao . selectRatingListByUidAndRatingValue ( allUser . get ( i ). getUid (), 4 );
			if ( likeList . size () <= 1 ){ // 若用户只喜欢一本或不喜欢任何图书
				continue ;
			}
			
			for ( int j = 0 ; j < likeList . size (); j ++){ // 计算likeList中两两出现的图书并写入同现矩阵C
				for ( int k = j + 1 ; k < likeList . size (); k ++){
					int a = Integer . valueOf ( likeList . get ( j ). getEid ());
					int b = Integer . valueOf ( likeList . get ( k ). getEid ());
					// 生成key
					String key = null ;
					if ( a < b ){
						key = a + "," + b ;
					} else {
						key = b + "," + a ;
					}
					// 检查key是否已经存在
					if ( this . matrixC . get ( key ) != null ){
						int value = this . matrixC . get ( key );
						this . matrixC . put ( key , value + 1 );
					} else {
						this . matrixC . put ( key , 1 );
					}
				}
			}
			System . out . println ( "[" + df . format ( new Date ())+ "]" + "[已完成" + i + ",共" + allUser . size ()+ "]:用户uid=" + allUser . get ( i ). getUid ()+ "的记录以计算完成,共" + likeList . size ()+ "本图书" ); 
		}
		
	}

The number of times each item appears is:

Number of items appearing.png

Calculate the cosine similarity matrix W: The cosine similarity matrix can be obtained using the improved formula.

Cosine similarity matrix.png

The algorithm is implemented as follows:

com.statistics.ItemCollaborationFilter

 /**
	 * 计算余弦相似度矩阵W
	 * 计算方法：
	 * 使用矩阵C的每个value作为分子，key中的两个图书的喜欢人数的积开根号作为分母
	 */
	private Double computerMatrixW ( String eida , String eidb , int value ){
		DecimalFormat df = new DecimalFormat ( "#.##" );
		// 查询每个图书有多少人喜欢
        try {
            Statement statemenet = conn . createStatement ();
            ResultSet rs = statemenet . executeQuery ( "select count(rid) from ratinglist where eid = '" + eida + "' and ratingValue >= 4;" );
            rs . next ();
            int likeANum = rs . getInt ( "count(rid)" );
            rs = statemenet . executeQuery ( "select count(rid) from ratinglist where eid = '" + eidb + "' and ratingValue >= 4;" );
            rs . next ();
            int likeBNum = rs . getInt ( "count(rid)" );
            if ( likeANum == 0 )
                likeANum = 1 ;
            if ( likeBNum == 0 )
                likeBNum = 1 ;
            // 开始计算
            Double answer = value * 1.0 / Math . sqrt ( likeANum * likeBNum );
            // 精确到小数点后两位
            Double result = Double . parseDouble ( df . format ( answer ));
            // 返回计算结果
            return result ;
        } catch ( SQLException e ) {
            e . printStackTrace ();
        }
		return null ;
	}

What item is ultimately recommended is determined by predicting interest.

Item j predicts interest = interest of item i that the user likes × similarity between item i and item j.

For example: a user likes items a, b, and c. Their interest is 1, 2 and 2 respectively. Then the predicted interest of items c, d, e, and f are:

c:1×0.58+2×0+2×0=0.58
d:1×1.0+2×1.44+2×0.45=4.78
e:1×0+2×0 +2×0=0.0
f:1×0+2×0.35+2×0=0.70

Therefore, item d should be recommended to the user. The algorithm is implemented as follows:

	@ Override
	public List < EBook > userRecommendedList ( String uid ) {
		// 获取用户喜爱图书列表
		List < RatingList > likeList = this . ratingListDao . selectRatingListByUidAndRatingValue ( uid , 4 );
		// debug
		System . out . println ( "uid=" + uid + "用户喜爱图书列表" );
		for ( RatingList r : likeList ){
			System . out . println ( r . getEid ()+ "," + r . getRatingValue ());
		}
		System . out . println ( "likeList.size=" + likeList . size ());
		// 定义计算用矩阵
		List < Item > matrix = new ArrayList <>();
		// 将用户喜爱的图书作为矩阵的列
		// 将与用户喜爱的图书同现的图书作为矩阵的行
		// 建立工作矩阵
		for ( int i = 0 ; i < likeList . size (); i ++){ // 遍历用户喜爱的图书
			RatingList temp = likeList . get ( i );
			// 获取同现图书
			List < MatrixC > itemList = this . matrixCDao . selectMatrixCByEidAOrEidB ( temp . getEid (), temp . getEid ());
			for ( int j = 0 ; j < itemList . size (); j ++){
				MatrixC c = itemList . get ( j );
				// 从matrixC的key中选出同现图书的eid
				String sEid = null ;
				if ( c . getEida (). equals ( temp . getEid ())){
					sEid = c . getEidb ();
				} else {
					sEid = c . getEida ();
				}
				// 在行中查询同现图书是否存在
				if ( matrix . indexOf ( sEid ) == - 1 ){ // 若列中不存在
					double [] col = new double [ likeList . size ()];
					// 将同现图书所在行对应喜爱图书的数组值设为对应的余弦相似度*用户喜爱程度(4分为1,5分为2)
					col [ likeList . indexOf ( temp )] = c . getCos_similarity ()*( temp . getRatingValue ()- 3 );
					matrix . add ( new Item ( sEid , col )); // 增加行

				} else { // 若列中存在
					// 则将同现图书所在行对应喜爱图书的数组值设为对应的余弦相似度*用户喜爱程度(4分为1,5分为2)
					matrix . get ( matrix . indexOf ( sEid )). col [ likeList . indexOf ( temp )] = c . getCos_similarity ()*( temp . getRatingValue ()- 3 );
				}
			}
		}
		// 计算预测兴趣度
		for ( int i = 0 ; i < matrix . size (); i ++){
			Item item = matrix . get ( i );
			double interestValue = 0 ;
			for ( int j = 0 ; j < item . col . length ; j ++){
				interestValue += item . col [ j ];
			}
			matrix . get ( i ). interestValue = interestValue ;
		}
		// 根据预测兴趣度进行排序
		Collections . sort ( matrix );
		// 返回推荐图书列表
		List < EBook > resultList = new ArrayList <>();
		for ( int i = 0 ; i < matrix . size () && i < 20 ; i ++){ // 返回排前10的书
			if ( matrix . get ( i ). interestValue > 0 ){
				EBook eBook = this . eBookDao . queryEBookByEid ( matrix . get ( i ). eid );
				resultList . add ( eBook );
				// debug
				System . out . println ( matrix . get ( i ). eid + "," + eBook . getEname ()+ ",interestValue=" + matrix . get ( i ). interestValue );
			}
		}
		return EBookServiceImpl . initEBookImgAddress ( resultList );
	}

The recommended module uses the data crawled by the crawler as input to output the calculation results into the matrixC table. The entire calculation process is divided into 2 stages. The first stage calculates the co-occurrence matrix C. The second stage calculates the cosine similarity w of the e-books that appear in pairs. For the recommended function based on user predicted interest, because the user likes the real-time and total calculation amount of e-book data is too large, the user will use real-time calculations when accessing the page. After multiple tests, the average waiting time of the user is within an acceptable range.