EMAN下載 - EMAN源代碼下載

EMAN

其他類別

1.0.0

下載

EMAN

一個基於SSM框架與物品的協同過濾算法(ItemCF)的簡單電子書推薦系統

界面截圖

img/首頁0.png

img/首頁1感興趣猜你.png

img/圖書分類0.png

img/圖書詳情頁0.png

img/圖書詳情頁1推薦與評論.png

img/圖書詳情頁2評論.png

img/圖書詳情頁1推薦與評論.png

img/登錄.png

img/搜索頁.png

系統功能分析

功能.png

�爬蟲爬取策略

img/爬蟲爬取策略.png

系統分析與設計

系統分析

如用例圖所示，本系統中的基本用戶分為3 種。分別是遊客、註冊用戶、管理員。遊客可以訪問電子書推薦平台的首頁、用戶註冊頁面、查看電子書頁面。註冊用戶比遊客多的功能在於可以對電子書進行評分與評論和由該用戶預測興趣度決定的電子書推薦。而管理員則可以定期使用爬蟲模塊來更新電子書信息和使用統計模塊更新分類統計信息、電子書同現矩陣和電子書餘弦相似度矩陣。

![User Case.png](/img/User Case.png)

系統設計

數據庫表

img/ER圖.png

ebook：圖書詳情
ratinglist：��評分評論列表
subclassify：圖書副分類
user：用戶信息
classifymainstatistics：主分類統計
recommendhomepage：�主頁推薦書籍(待定)
favorite：用戶喜愛分類表
matrixC：同現矩陣C與餘弦相似度矩陣W表

類�和相關主要文件

符號→ 表示該類為定期手動運行模塊

com.controller

EBookController：與圖書相關的接口
RatingListController：與用戶評論、評分相關的接口
StatisticsController：統計模塊接口
UserController：用戶相關接口
FavoriteController：用戶喜愛分類接口
ClassifyMainStatisticsController：分類統計相關接口

com.dao

DBAccess：用於獲取SqlSession數據庫連接
EBookDao：提供圖書表的訪問
RatingListDao：提供評分評論表的訪問
UserDao：提供用戶表的訪問
FavoriteDao：用戶喜愛分類表的訪問
ClassifyMainStatisticsDao：分類統計相關數據表的訪問
MatrixCDao：提供同現矩陣C與餘弦相似度矩陣W的訪問
EBookMapper.xml：提供圖書表的訪問的sql語句
RatingListMapper.xml：提供評分評論表的訪問的sql語句
UserMapper.xml：提供用戶表的訪問的sql語句
ClassifyMainStatisticsMapper.xml：提供主分區統計表的訪問的sql語句
FavoriteMapper.xml：用戶喜愛分類表的訪問的sql語句
ClassifyMainStatisticsMapper.xml：分類統計相關數據表的訪問的sql語句
MatrixCMapper.xml：提供同現矩陣C與餘弦相似度矩陣W的訪問的sql語句

com.entity

EBook：圖書-> ebook表
RatingList：評論評分相-> ratingList表
ClassifyMainStatistics：主分區統計模塊-> classifyMainStatistics表
User：用戶-> user表
Favorite：用戶喜愛分類實體-> favorite表
MatrixC：同現矩陣C與餘弦相似度矩陣W -> matrixC表

com.service

EBookService：圖書模塊
RatingListService：評論評分模塊
UserService：用戶模塊
FavoriteService：用戶喜愛分類模塊
ClassifyMainStatisticsService：分類統計模塊

com.statistics

→ ItemCollaborationFilter：用於計算基於物品的協同過濾推薦矩陣
→ StatisticsClassifyMain：統計主分類的各個分數的評分人數、平均評分、評分的方差
→ StatisticsRatingValue：圖書評分信息統計：統計RatingValue表的數據並將統計結果寫入到EBook表對應到圖書上

com.util

ChartDataJsonCreater：提供將�數據轉換為Chart.js 插件專用的結構化Json 數據格式
JSONConverter：提供用於將JavaBeans 對象直接轉化為結構化Json 數據
RandomNumFactory：隨機數生成工具類

spider(爬蟲包)

BookInfoSpider：爬取電子書詳情頁並存入數據庫
→ EBookListSpider：爬取全部電子書
HttpURLConnectionUtil：
→ RatingValueListSpider：爬取豆瓣圖書評分列表與用戶評論(從數據庫讀取圖書信息並爬取圖書的評分列表)

配置文件

SSM框架配置文件

applicationContext.xml：Spring 框架配置文件
jdbc.properties：jdbc 配置文件
mybatis-config.xml：MyBatis 框架配置文件
web.xml：項目配置文件

爬蟲配置文件

setting.properties：電子書爬蟲(EBookListSpider)的配置文件
user_spider_setting.properties：用戶評論評分爬蟲(RatingValueListSpider)的配置文件
user_spider_exception.log：用戶評論評分爬蟲(RatingValueListSpider)的錯誤日誌

img/EMAN類結構.png

頁面列表

index.jsp：主頁
head.jsp：導航欄
error.jsp：錯誤信息頁(待實現)
success.jsp：成功操作信息跳轉頁
/ebook(圖書相關頁面)

list.jsp：主分類圖書列表
info.jsp：圖書詳情頁
serrchResult.jsp�：圖書搜索結果頁

/user(用戶相關頁面)

login.jsp：用戶登錄
register.jsp：用戶註冊
home.jsp：用戶喜愛分類選擇

系統實現

算法(基於物品的協同過濾算法(ItemCF))設計與實現

基於物品的協同過濾算法主要有兩步:

計算物品之間的相似度。
根據物品的相似度和用戶的歷史行為給用戶生成推薦列表。

設N(i)是表示喜歡物品i 的用戶數。 N(i)⋂N(j)表示同時喜歡物品i 物品j 的用戶數。則物品i 與物品j 的相似度為:

計算物品的相似度.png

但是上式有一個缺陷:當物品j 是一個很熱門的商品時，人人都喜歡，那麼wij 就會很接近於1，即上式會讓很多物品都和熱門商品有一個很大的相似度，所以可以改進一下公式:

計算物品相似度改進.png

建立用戶物品倒排表(設用大寫字母表示用戶，小寫字母表示物品):

物品倒排表.png

計算共現矩陣C(共現矩陣C 表示同時喜歡兩個物品的用戶數，是根據用戶物品倒排表計算出來的):

共現矩陣.png

如圖可知共現矩陣的對角線元素全為0，且是實對稱稀疏矩陣。算法實現如下:

com.statistics.ItemCollaborationFilter

    /**
	 * 计算共现矩阵C
	 */
	private void computerMatrixC (){
		// 建立用户物品倒排表
		// 若用户对物品评分大于等于4则认为喜欢(出现)
		List < User > allUser = userDao . queryAllUser ();
		for ( int i = 0 ; i < allUser . size (); i ++){ // 遍历全部用户
			// 获取一个用户的评分列表中>=4的评分记录
			List < RatingList > likeList = ratingListDao . selectRatingListByUidAndRatingValue ( allUser . get ( i ). getUid (), 4 );
			if ( likeList . size () <= 1 ){ // 若用户只喜欢一本或不喜欢任何图书
				continue ;
			}
			
			for ( int j = 0 ; j < likeList . size (); j ++){ // 计算likeList中两两出现的图书并写入同现矩阵C
				for ( int k = j + 1 ; k < likeList . size (); k ++){
					int a = Integer . valueOf ( likeList . get ( j ). getEid ());
					int b = Integer . valueOf ( likeList . get ( k ). getEid ());
					// 生成key
					String key = null ;
					if ( a < b ){
						key = a + "," + b ;
					} else {
						key = b + "," + a ;
					}
					// 检查key是否已经存在
					if ( this . matrixC . get ( key ) != null ){
						int value = this . matrixC . get ( key );
						this . matrixC . put ( key , value + 1 );
					} else {
						this . matrixC . put ( key , 1 );
					}
				}
			}
			System . out . println ( "[" + df . format ( new Date ())+ "]" + "[已完成" + i + ",共" + allUser . size ()+ "]:用户uid=" + allUser . get ( i ). getUid ()+ "的记录以计算完成,共" + likeList . size ()+ "本图书" ); 
		}
		
	}

統計可得每個物品出現的次數為:

物品出現次數.png

計算餘弦相似度矩陣W:使用改進後的公式計算可得餘弦相似度矩陣。

餘弦相似度矩陣.png

算法實現如下:

com.statistics.ItemCollaborationFilter

 /**
	 * 计算余弦相似度矩阵W
	 * 计算方法：
	 * 使用矩阵C的每个value作为分子，key中的两个图书的喜欢人数的积开根号作为分母
	 */
	private Double computerMatrixW ( String eida , String eidb , int value ){
		DecimalFormat df = new DecimalFormat ( "#.##" );
		// 查询每个图书有多少人喜欢
        try {
            Statement statemenet = conn . createStatement ();
            ResultSet rs = statemenet . executeQuery ( "select count(rid) from ratinglist where eid = '" + eida + "' and ratingValue >= 4;" );
            rs . next ();
            int likeANum = rs . getInt ( "count(rid)" );
            rs = statemenet . executeQuery ( "select count(rid) from ratinglist where eid = '" + eidb + "' and ratingValue >= 4;" );
            rs . next ();
            int likeBNum = rs . getInt ( "count(rid)" );
            if ( likeANum == 0 )
                likeANum = 1 ;
            if ( likeBNum == 0 )
                likeBNum = 1 ;
            // 开始计算
            Double answer = value * 1.0 / Math . sqrt ( likeANum * likeBNum );
            // 精确到小数点后两位
            Double result = Double . parseDouble ( df . format ( answer ));
            // 返回计算结果
            return result ;
        } catch ( SQLException e ) {
            e . printStackTrace ();
        }
		return null ;
	}

最終推薦的是什麼物品，是由預測興趣度決定的。

物品j預測興趣度= 用戶喜歡的物品i的興趣度× 物品i和物品j的相似度。

例如:某個用戶喜歡物品a、b 和c。對其興趣度分別為1、2、2。那麼物品c、d、e、 f 的預測興趣度分別為:

c:1×0.58+2×0 +2×0=0.58
d:1×1.0+2×1.44+2×0.45=4.78
e:1×0+2×0 +2×0=0.0
f:1×0+2×0.35+2×0=0.70

所以應當向該用戶推薦物品d。算法實現如下:

	@ Override
	public List < EBook > userRecommendedList ( String uid ) {
		// 获取用户喜爱图书列表
		List < RatingList > likeList = this . ratingListDao . selectRatingListByUidAndRatingValue ( uid , 4 );
		// debug
		System . out . println ( "uid=" + uid + "用户喜爱图书列表" );
		for ( RatingList r : likeList ){
			System . out . println ( r . getEid ()+ "," + r . getRatingValue ());
		}
		System . out . println ( "likeList.size=" + likeList . size ());
		// 定义计算用矩阵
		List < Item > matrix = new ArrayList <>();
		// 将用户喜爱的图书作为矩阵的列
		// 将与用户喜爱的图书同现的图书作为矩阵的行
		// 建立工作矩阵
		for ( int i = 0 ; i < likeList . size (); i ++){ // 遍历用户喜爱的图书
			RatingList temp = likeList . get ( i );
			// 获取同现图书
			List < MatrixC > itemList = this . matrixCDao . selectMatrixCByEidAOrEidB ( temp . getEid (), temp . getEid ());
			for ( int j = 0 ; j < itemList . size (); j ++){
				MatrixC c = itemList . get ( j );
				// 从matrixC的key中选出同现图书的eid
				String sEid = null ;
				if ( c . getEida (). equals ( temp . getEid ())){
					sEid = c . getEidb ();
				} else {
					sEid = c . getEida ();
				}
				// 在行中查询同现图书是否存在
				if ( matrix . indexOf ( sEid ) == - 1 ){ // 若列中不存在
					double [] col = new double [ likeList . size ()];
					// 将同现图书所在行对应喜爱图书的数组值设为对应的余弦相似度*用户喜爱程度(4分为1,5分为2)
					col [ likeList . indexOf ( temp )] = c . getCos_similarity ()*( temp . getRatingValue ()- 3 );
					matrix . add ( new Item ( sEid , col )); // 增加行

				} else { // 若列中存在
					// 则将同现图书所在行对应喜爱图书的数组值设为对应的余弦相似度*用户喜爱程度(4分为1,5分为2)
					matrix . get ( matrix . indexOf ( sEid )). col [ likeList . indexOf ( temp )] = c . getCos_similarity ()*( temp . getRatingValue ()- 3 );
				}
			}
		}
		// 计算预测兴趣度
		for ( int i = 0 ; i < matrix . size (); i ++){
			Item item = matrix . get ( i );
			double interestValue = 0 ;
			for ( int j = 0 ; j < item . col . length ; j ++){
				interestValue += item . col [ j ];
			}
			matrix . get ( i ). interestValue = interestValue ;
		}
		// 根据预测兴趣度进行排序
		Collections . sort ( matrix );
		// 返回推荐图书列表
		List < EBook > resultList = new ArrayList <>();
		for ( int i = 0 ; i < matrix . size () && i < 20 ; i ++){ // 返回排前10的书
			if ( matrix . get ( i ). interestValue > 0 ){
				EBook eBook = this . eBookDao . queryEBookByEid ( matrix . get ( i ). eid );
				resultList . add ( eBook );
				// debug
				System . out . println ( matrix . get ( i ). eid + "," + eBook . getEname ()+ ",interestValue=" + matrix . get ( i ). interestValue );
			}
		}
		return EBookServiceImpl . initEBookImgAddress ( resultList );
	}

推薦模塊使用爬蟲爬取的數據作為輸入，將計算結果輸出到matrixC 表中。整個計算過程分為2 個階段來進行。第一階段計算出共現矩陣C。第二階段計算出兩兩出現的電子書的餘弦相似度w。而對於根據用戶預測興趣度來推薦這一功能因為用戶喜愛電子書數據的實時性和總計算量太大，則採用用戶在訪問頁面時實時進行計算。經過多次測試，用戶平均等待時間在可以接受的範圍。