Metadata Technology Collaborating with Statistics Canada on New Data Repository for Research Data Centres

Share Article

Metadata Technology North America Inc. engaged in research and development of iRODS based Master Data Repository to support the management and delivery of microdata files to Canada Research Data Centres

Metadata Technology North America, Inc. is pleased to report that, in partnership with Breckenhill and The AIM Group, it has engaged in a project with Statistics Canada towards the establishment of a new RDC Master Data Repository whose objective is to centrally administer survey datasets and documentation files and automate their distribution to researchers across the Canada Research Data Centre Network (CRDCN).

Managing thousands of data and documentation files is a challenge faced by most statistical data producers, archives, and research centers. Disk or network based file systems, traditionally used to store such resources, are ill-equipped for delivering a controlled environment capable of ensuring content consistency, supporting automation, enforcing business rules, maintaining linkages with reference metadata, or interacting with other infrastructure components. By leveraging innovative data grid and open source technologies, in particular the iRODS™ platform, we aim to design new solutions for the administration of large file collections that, combined with standard driven metadata information and security systems, will provide a comprehensive platform for statistical data and documentation file management.

The goal of this initial effort is to outline system requirements, architecture, and specifications in order to provide a road map for incremental implementation and develop guidelines for integration with other infrastructure components and business processes. A key short term objective is to automate the replication of Statistics Canada microdata master files to secure research centres across Canada. Enforcing business rules, such as file and folder naming conventions, is also a fundamental requirement. Machine actionable metadata elements will be associated with resources stored in the system to support interaction with other internal components such as the Integrated MetadataBase (IMDB) or the Data Documentation Initiative (DDI) metadata repository. Integration with the LDAP based user management and project databases is also anticipated. The platform will be implemented around the open source iRODS™ package, to be wrapped by a service oriented architecture and complemented with administration tools and utilities. All together, these components will provide a comprehensive solution for managing and distributing file resources across the agency and research centres.

This initial project outputs will be used to support implementation anticipated in 2012-2013. General findings will be shared with the public community and other interested agencies. Given the widespread need for such platform, we encourage and welcome collaborative efforts around research and implementation.

The project is lead at Metadata Technology by M. Pascal Heus, Vice-President and Head of Research. He is joined by Dr. Bing Zhu from the Computational Science Research Center (CSRC) at San Diego State University (SDSU) who will provide technical expertise around iRODS™.

Statistics Canada is the Canadian federal government agency commissioned with producing statistics to help better understand Canada, its population, resources, economy, society, and culture. Internationally, Statistics Canada is held in high regard for the quality of its data and its methodology.

The Canadian Research Data Centre Network (CRDCN) gives Canada’s research community access to Canadian social and population health statistics and help provide evidence for effective public policy and planning. Since 2000, the CRDCN, in partnership with Statistics Canada's Research Data Centre Program, has transformed quantitative social science research in Canada. In secure computer laboratories on university campuses across Canada, university, government and other approved researchers are able to analyse a vast array of social, economic and health data.

iRODS™, the Integrated Rule-Oriented Data System, is a data grid software system developed by the Data Intensive Cyber Environments research group (developers of the SRB, the Storage Resource Broker), and collaborators. One of the main ideas behind iRODS™ is to provide a system that enables a flexible, adaptive, customizable data management architecture. iRODS™ is a second generation data grid system providing a unified view and seamless access to distributed digital objects across a wide area network.

The Data Documentation Initiative (DDI) is an XML specification for the documentation and management of microdata, particularly in socio-economic sciences and the health sector. It is maintained by the DDI Alliance.

Metadata Technology North America Inc. is a privately owned eGovernment data management and information technology solution provider. The Company’s mission is to facilitate production, open access, and improve the quality and use of statistical and scientific data. It specializes in products and services leveraging XML technology, metadata standards, and related best practices. Its key personnel include globally recognized experts in the area of the Statistical Data and Metadata Exchange (SDMX) standard, the Data Documentation Initiative (DDI), and the management of socio-economic data, health data, and official statistics.

The views and opinions expressed in this article are those of the Company and do not necessarily reflect the official policy or position of other agencies and individuals.

Share article on social media or email:

View article via:

Pdf Print

Contact Author

Visit website