At the 2010 Joint Statistical Meetings (JSM) in Vancouver, statisticians from around the world will report on and discuss the statistical implications of a variety of topics, including the Netflix prize, Google trends, sexual orientation issues, global warming, the Haitian earthquake, cyber security, and the shelf life of drugs, the American Statistical Association (ASA) said today. JSM, the world’s largest annual gathering of statisticians, is expected to draw about 5,500 statistics experts from government, industry and academia to the Vancouver Convention Centre July 31 to August 5.

"This is an exciting time to be a statistician,” said ASA President Sastry Pantula. “Massive amounts of data are being collected in every field of science, and decisions, discoveries and policies are being enabled by extracting important information from data. Statisticians are playing a key role in advancing science and in innovation," said Sastry Pantula, ASA President. “JSM provides statisticians the opportunity to share their research and findings in a greatly diverse variety of scientific areas.”

A small sample of the approximately 2,500 JSM session topics appears below; additional sessions of interest can be found at, where you can search on keywords, presenter’s name or affiliation, and activity number.

Behind the Scenes at the Netflix Prize - Activity #94
Presenter: Robert Bell, AT&T Labs (one of the winners)
In October 2006, the DVD rental company Netflix released more than 100 million user ratings of movies for a competition to predict users' ratings based on prior ratings. The discussion leader was one of two statisticians on a multi-disciplinary team that won a $1,000,000 prize after 33 months by achieving a ten percent reduction in root mean squared prediction error relative to Netflix's current algorithm. Discussion topics may include models with more parameters than observations, contrasting perspectives of statisticians and computer scientists, the role of prizes as a way to advance science, and a race to the finish line that you would not believe in a movie.

The Netflix Prize Story: Methods and Madness - Activity 211
Presenters: Chris Volinsky and Robert Bell, AT&T Labs
In October 2006, Netflix released a data set 100 million movie ratings as part of a $1 million dollar competition to improve their online system for recommending movies. The competition generated rabid interest from parts of the statistics and data mining communities. Three years later, the competition ended and the prize awarded to a multinational team of researchers who had mostly never met each other. This talk is from a member of that team, and will document some of the stories along the way: new methodologies for collaborative filtering, strategies for ensembles of models, optimizing team collaborations, and the overall role of contests and crowdsourcing for science.

Predicting the Present with Google Trends – Activity #364
Presenter: Hal R. Varian, Google
It is now possible to acquire real time information on economic variables using various commercial sources. I illustrate how one can use Google Trends data to measure the state of the economy in various sectors, and discuss some of the ramifications for research and policy. Hal R. Varian is the Chief Economist at Google. He started in May 2002 as a consultant and has been involved in many aspects of the company, including auction design, econometric, finance, corporate strategy and public policy. He also holds academic appointments at the University of California, Berkeley in three departments: business, economics, and information management. He received his S.B. degree from MIT in 1969 and his MA and Ph.D. from UC Berkeley in 1973. Professor Varian has published numerous papers in economic theory, econometrics, industrial organization, public finance, and the economics of information technology.

Measurement Issues and the Gay/Lesbian/Bisexual/Transgender Population – Activity #209
Presenter: Nancy Bates, U.S. Census Bureau
Reliable data on marriage, divorce and family composition are essential for measuring poverty, the employment and earnings situations of families, developing and evaluating health care, tax, and other policies, and addressing other issues of national and local concern. Without accurate measures, analyses may be incorrect or policies may be misdirected. The legal landscape around same sex marriage has been changing rapidly. Since 2004, same-sex marriage has been made legal in several states, and other states have legalized civil unions or created domestic partnership registries. This roundtable discussion centers around the survey measurement issues related to Gay/Lesbian/Bisexual/Transgender individuals. Measurement issues include how to measure relationship status, family status, and health issues.

Analysis of Global Warming Data: A Contrarian Data-Based View – Activity #486
Presenter: Edward J. Wegman, George Mason University
The so-called Hockey Stick paleoclimate temperature reconstruction was a major feature of the Intergovernmental Panel on Climate Change (IPCC) 2003 Third Assessment Report. Drs. David Scott, Yasmin Said and I were asked to assemble a report to the House Committee on Energy and Commerce concerning the mathematical correctness of the principal components procedure that was used to develop the Hockey Stick. The principal components procedure was not done correctly and both empirical and theoretical analysis showed the flaws. While our testimony was limited to the statistical validity of this particular graphic, it was assumed by many and widely reported in the Press that we were arguing against anthropogenic-induced global warming. Indeed, politically motivated personal attacks were made. This talk will review the data issues and report on our experiences.

The Philosophy and Intent of Stability Shelf Life – Activity #17
Presenter: David Christopher, Merck & Co., Inc.
To expand discussion on the current definition of shelf life, a key problem had to be addressed in that, for clinical research and development, a pharmaceutical product is generally judged on the basis of mean response, while commercial batches are essentially judged by individual results. In addition, perceived expectations of the consumer were also considered. Development of a statistical methodology for estimating shelf life must be done in consideration of the issues and challenges of assessing and managing risk relative to different quality standards.

Statistics and Cyber Security: Understanding the Emerging Threat - Activity #494

  • Graph Anomalies in Cyber Communication — Scott Vander Wiel, Los Alamos National Laboratory; Curtis Storlie, Los Alamos National Laboratory
  • Graph-Based Network Anomaly Detection — Joshua Charles Neil, Los Alamos National Laboratory; Mike Fisk, Los Alamos National Laboratory; Curtis Storlie, University of New Mexico; Alexander Brugh, Los Alamos National Laboratory
  • When Science Meets Security — Roy Maxion, Carnegie Mellon University
  • Emerging Research Challenges in Cyber Security — Deborah Frincke, Pacific Northwest National Laboratory

Statistics Without Borders Post-Earthquake Efforts in Haiti - Activity #550

  • Considerations in the Study Design of a Mobile Phone Survey of the Haitian Population — James D. Ashley, U.S. Government Accountability Office (U.S. GAO); Fritz Scheuren, NORC
  • Survey Administration in the Wake of a Natural Disaster — Justin S. Fisher, Government Accountability Office
  • Overview of the Results of the Survey and Lessons Learned — Jean G. Orelien, SciMetrika, LLC

The Use of Statistics to Inform Economic Policy - Activity #471

  • The Effect of Weights and Sampling Plans on Price Index Estimates: A Simulation Study — Daniele Toninelli, University of Bergamo; Zdenek Patak, Statistics Canada
  • Estimation and Comparison of Seasonally Adjusted CPI-U Standard Errors with Official CPI-U Standard Errors — Owen Shoemaker, Bureau of Labor Statistics
  • Comparison of Variance Estimation Methods Using PPI Data — Andy Sadler, Bureau of Labor Statistics; Helen Chen, Bureau of Labor Statistics
  • Studying Simulated Mass Layoff Events and Employment/Unemployment Data with Factor Analysis, Multiple Regression, and Bayesian Methods — Zhe (Jason) Liu, Iowa State University; Mack Shelley, Iowa State University
  • Using Worker Flows in the Analysis of Establishment Turnover: Evidence from Germany — Tanja Hethey, Institute for Employment Research; Johannes F. Schmieder, Columbia University
  • Defining an Outlet: What Characteristics Are Truly Price Determining? — Sara Stanley, Bureau of Labor Statistics

Can You Maintain Confidentiality and Have Useful Data at the Same Time - Activity #109
Panelists: Michael Link, The Nielsen Company; Jennifer Madans, National Center for Health Statistics; Elaine Murakami, Federal Highway Administration; Marilyn Seastrom, National Center for Education Statistics; and John Thompson, NORC
One of the largest issues facing those in the public and private sectors collecting survey and census data is the need to maintain confidentiality and deliver useful information. Clearly public trust has declined substantially in the past few decades. At the same time, the need to better understand the economy, our social structure, along with medical information for health purposes has increased dramatically. The panelists will explore how confidentiality is maintained under various scenarios of data collection. These will include the implications of data linking; the need to protect DNA information; analyzing small area data; federal statistical agency cooperation; and using secure remote access locations.

