G

George Karypis

University of Minnesota

ORCID: 0000-0003-2753-1437

Publishes on Data Mining Algorithms and Applications, Advanced Graph Neural Networks, Recommender Systems and Techniques. 724 papers and 63.4k citations.

724Publications
63.4kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Item-based collaborative filtering recommendation algorithms
Cited by 9k

Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services during a liveinteraction. These systems, especially the k-nearest neighbor collaborative ltering based ones, are achieving widespread success on the Web. The tremendous growth in the amountofavailable information and the number of visitors to Web sites in recentyears poses some key challenges for recommender systems. These are: producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative ltering systems the amountofwork increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative ltering techniques. Item-based techniques rst analyze the user-item matrix to identify relationships between dierent items, and then use these relationships to indirectly compute recommendations for users. In this paper we analyze dierent item-based recommendation generation algorithms. Welookinto dierenttechniques for computing item-item similarities (e.g., item-item correlation vs. cosine similarities between item vectors) and dierenttechniques for obtaining recommendations from them (e.g., weighted sum vs. regression model). Finally, weexperimentally evaluate our results and compare them to the basic k-nearest neighbor approach. Our experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than th...

A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
George Karypis, Vipin Kumar|SIAM Journal on Scientific Computing|1998
Cited by 5.7k

Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph [4, 26]. From the early work it was clear that multilevel techniques held great promise; however, it was not known if they can be made to consistently produce high quality partitions for graphs arising in a wide range of application domains. We investigate the effectiveness of many different choices for all three phases: coarsening, partition of the coarsest graph, and refinement. In particular, we present a new coarsening heuristic (called heavy-edge heuristic) for which the size of the partition of the coarse graph is within a small factor of the size of the final partition obtained after multilevel refinement. We also present a much faster variation of the Kernighan-Lin algorithm for refining during uncoarsening. We tes...

A Comparison of Document Clustering Techniques
Michael Steinbach, George Karypis, Vipin Kumar|University of Minnesota Digital Conservancy (University of Minnesota)|2000
Cited by 2.5kOpen Access

This paper presents the results of an experimental study of some common document clustering techniques. In particular, we compare the two main approaches to document clustering, agglomerative hierarchical clustering and K-means. (For K-means we used a "standard" K-means algorithm and a variant of K-means, "bisecting" K-means.) Hierarchical clustering is often portrayed as the better quality clustering approach, but is limited because of its quadratic time complexity. In contrast, K-means and its variants have a time complexity which is linear in the number of documents, but are thought to produce inferior clusters. Sometimes K-means and agglomerative hierarchical approaches are combined so as to "get the best of both worlds." However, our results indicate that the bisecting K-means technique is better than the standard K-means approach and as good or better than the hierarchical approaches that we tested for a variety of cluster evaluation metrics. We propose an explanation for these results that is based on an analysis of the specifics of the clustering algorithms and the nature of document data.

Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus
Emily C. Baechler, Franak Batliwalla, George Karypis et al.|Proceedings of the National Academy of Sciences|2003
Cited by 2.2k

Systemic lupus erythematosus (SLE) is a complex, inflammatory autoimmune disease that affects multiple organ systems. We used global gene expression profiling of peripheral blood mononuclear cells to identify distinct patterns of gene expression that distinguish most SLE patients from healthy controls. Strikingly, about half of the patients studied showed dysregulated expression of genes in the IFN pathway. Furthermore, this IFN gene expression "signature" served as a marker for more severe disease involving the kidneys, hematopoetic cells, and/or the central nervous system. These results provide insights into the genetic pathways underlying SLE, and identify a subgroup of patients who may benefit from therapies targeting the IFN pathway.

Item-based top-<i>N</i>recommendation algorithms
Mukund Deshpande, George Karypis|ACM Transactions on Information Systems|2004
Cited by 2.2k

The explosive growth of the world-wide-web and the emergence of e-commerce has led to the development of recommender systems ---a personalized information filtering technology used to identify a set of items that will be of interest to a certain user. User-based collaborative filtering is the most successful technology for building recommender systems to date and is extensively used in many commercial recommender systems. Unfortunately, the computational complexity of these methods grows linearly with the number of customers, which in typical commercial applications can be several millions. To address these scalability concerns model-based recommendation techniques have been developed. These techniques analyze the user--item matrix to discover relations between the different items and use these relations to compute the list of recommendations.In this article, we present one such class of model-based recommendation algorithms that first determines the similarities between the various items and then uses them to identify the set of items to be recommended. The key steps in this class of algorithms are (i) the method used to compute the similarity between the items, and (ii) the method used to combine these similarities in order to compute the similarity between a basket of items and a candidate recommender item. Our experimental evaluation on eight real datasets shows that these item-based algorithms are up to two orders of magnitude faster than the traditional user-neighborhood based recommender systems and provide recommendations with comparable or better quality.