నైరూప్య
Block matrix-based marpreduce pagerank algorithm web structure mining applied effect research
Weizhong Yan
Web page not only has text messages, but also contains hyperlinks that points from one page to another one and hyperlinks contain potential annotations. Lots of Web hyperlinks information provides relative Web page contents correlation, quality and structure aspect information, the information reflects documents containment, quotation or affiliation relations. And Web structure mining is mining derived knowledge from World Wide Web organization structure and link relations on Web pages link structures. In information searching, it can regard high authority score and pivot scoreÂ’s webpage as high quality webpage, during searching process, it priority provides it to users, in this way it can discover network community by analyzing hyperlinksÂ’ topology and construct a digraph for searching result or assigned webpage set. The paper on the basis of introducing Web structure chart, it analyzes Pagerank algorithm applied merits, and then researches on block matrix-based Mapreduce PageRank algorithm, the method uses block matrix thought to reduce every time iteration mixed phase and rank phase time consumption so that let every time iteration only execute one Mapreduce phase, for the algorithm, the paper compares it with other two algorithms, gets that the algorithm superiority degree on operation time that provides theoretical basis for Web structure mining techniques