Home
Learn More
Features & Pricing
Success Stories
Contact Us
Search Archives
PRWeb Direct
Submit Release
August 30, 2008
 
Industry Categories  
News by Country  
News by MSA  
Todays News  
Browse by Day  
PR Trackbacks™  
Featured Videos  
ViewNews™  
eBook Digests  
RSS  
PRWeb, a leader in online news and press release distribution, has been used by more than 40,000 organizations of all sizes to increase the visibility of their news, improve their search engine rankings and drive traffic to their Web site.
 
All Press Releases for June 19, 2006 Subscribe to this News Feed      
 

Cornell Theory Center Aids Social Science Researchers

A team at the Cornell University Theory Center is developing software cyberinfrastructure tools to obtain and examine information from large internet data collections.

Ithaca, NY (PRWEB) June 19, 2006 -- A two-year National Science Foundation (NSF)-funded cyberinfrastructure project (2006-2008) entitled “Very Large Semi-Structured Datasets for Social Science Research” is being supported by the efforts of the Cornell Theory Center (CTC). CTC is an interdisciplinary research center at Cornell University focused on providing cyberinfrastructure resources for research and education. The project is also part of a three-year Cornell-funded initiative entitled “Getting Connected: Social Science in the Age of Networks.” A team at CTC is developing software cyberinfrastructure tools to obtain and examine information from large internet data collections. The eruption of electronic and internet communication networks has created vast amounts of data that hold enormous potential for basic and applied investigations in the social sciences that has thus far been largely untapped.

In phase one CTC is building software cyberinfrastructure tools to copy and rearrange the “Internet Archive” (www.archive.org) collection of Web “snapshots” taken every two months since 1996. The archive will be reconfigured as a database that will make very large on-line network data accessible to social science researchers at Cornell and elsewhere.

Cornell researchers involved in the endeavor include computer science (CS) professors William Arms, Jon Kleinberg, Daniel Huttenlocher, and Johannes Gehrke, Associate Director of CTC. Professor Michael Macy, department chair of sociology is also on the project team, along with David Strang, also in sociology, and Geri Gay, chair of the communication department and professor of information science in computing and information science (CIS).

The Internet Archive is a non-profit organization started by Brewster Kahle that is preserving a record of the internet by capturing snapshots of 55 billion web pages. CTC will transfer these pages from the Internet Archive servers to a computer server at CTC. The plan is to have 30% of that data (about 200 terabytes) transferred by 2008. As the data streams from the Internet Archive to the server it passes through a parsing pipeline developed by the research team and the Theory Center. The pipeline is set up to separate out the URL information, the content, and the links. Using cyberinfrastructure tools developed by the research group and CTC, social science researchers will then be able to scrutinize and manipulate each piece of this data to study the internet networks. This will allow them to validate theoretical models and help identify new trends.

As Gehrke explains, “Previously if social scientists wanted to study a village they needed to go and live with the villagers. Today they can study human behavior and interactions in a new world of data via the internet.”

Data CTC is transferring from the Internet Archive will allow social science researchers to study the Web as a social phenomenon. How does the Web play a role in the diffusion of ideas and innovations? The spread of urban legends? As a source of information about contemporary social events? Previously these studies were only able to be based on small, hand-coded samples. Use of the transferred web data allows researchers to do large-scale studies and create a highly convenient virtual laboratory or “WebLab” for the research. These tasks would not have been feasible without cybertools as the internet data exists only as an archive of individual web pages with no way to parse the data into meaningful parts and structures.

Ultimately CTC’s assistance with this first phase of the investigation will allow researchers to gain insight into social networks and help them develop more advanced tools for linking computational social scientists across disciplines and organizations.

In addition to using data from the Internet Archive, the Web Lab team is also using Web crawls, data collected from the Wayback Machine, and from NetScan (http://netscan.research.microsoft.com), a usenet analysis tool developed at Microsoft Research to build smaller and more focused datasets which can be used to study specific networks such as adolescent peer networks and the relationship between individual attributes (e.g., personality, beliefs) and network position.

About CTC    
CTC is an interdisciplinary research center at Cornell University focused on providing cyberinfrastructure resources for research and education; these resources include high-performance and data-intensive computing hardware and expertise, visualization, and K-12 outreach. Scientific and engineering projects supported by CTC represent a wide variety of disciplines, including bioinformatics, behavioral and social sciences, computer science, engineering, geosciences, mathematics, physical sciences, and business.

###

OPTIONS
Printer Friendly Version
Email this story to a colleague
CONTACT INFORMATION
Laura Cima
CORNELL UNIVERSITY
607-254-8757
Email us Here
ATTACHED FILES

There are no multimedia files attached to this release. If this is your release, you may add images or other multimedia files through your login.

ABOUT PRESS RELEASES
If you have any questions regarding information in these press releases please contact the company listed in the press release. Please do not contact PRWeb. We will be unable to assist you with your inquiry. PRWeb disclaims any content contained in these releases. Our complete disclaimer appears here.
 
Disclaimer: If you have any questions regarding information in these press releases please contact the company listed in the press release.
Please do not contact PRWeb®. We will be unable to assist you with your inquiry.
PRWeb® disclaims any content contained in these releases. Our complete disclaimer appears here.

© Copyright 1997-2008, Vocus PRW Holdings, LLC.
Vocus, PRWeb and Publicity Wire are trademarks or registered trademarks of Vocus, Inc. or Vocus PRW Holdings, LLC.

Terms of Service | Privacy Policy | Copyright