|
|
|||||
|
|
|||||
|
Distribution of words in all NIPS papers from 1988 to 2003. Including joint distribution of words and authors. This extends a database prepared by Sam Roweis, that used papers from nips 1-12 that were OCR'ed by Yann Lecun. Word occurences were calculated from pdf or ps files mostly available on the NIPS web site. Data was processed with the help of Amir Globerson. Distribution of words, documents and authors Download the data in matlab (V6, R12) format (35Mb)              Format description Data preprocessing description. Additional data Please contact me by email (gal@ai.stanford.edu) if you need other files (e.g. raw text files). Papers that use this data P. Sarkar, Sajid M. Siddiqi and G.J. Gordon Approximate Kalman Filters for Embedding Author-Word Co-occurrence Data over Time. ICML SNA 2006 C. Elkan Clustering Documents with an Exponential-Family Approximation of the Dirichlet Compound Multinomial Distribution. ICML 2006. A. Globerson, G. Chechik, F. Pereira and N. Tishby Euclidean Embedding of Co-occurrence Data. NIPS 2004 p. 497-504. |
|||||
|
|
|||||
|
|
|||||