James Linden

~# data ninja / linux guru / web dev geek / robotics nerd / idea machine / N6NRD

Datasets / ISO 639-2 Language Codes Parser

ISO 639-2 Language Codes Parser

Overview

LoC.gov ISO 639-2 Language Codes Parser is a script to parse the raw text file for the ISO 639-2 standard into a local MySQL database.

Dataset Source

URL: http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt

Updates: infrequently

Environment

  • GNU/Linux
  • PHP 5.3 + (with mbstring)
  • MySQL 5.x

Notes

  • This script is designed to run on the command line.

Howto

Create the database (insert your database name):

CREATE DATABASE IF NOT EXISTS database_name DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_bin;

Create the table:

CREATE TABLE IF NOT EXISTS t_languagecode (
c_id SERIAL PRIMARY KEY,
c_code3 CHAR(3) DEFAULT NULL,
c_code3b CHAR(3) DEFAULT NULL,
c_code2 CHAR(2) DEFAULT NULL,
c_nameeng VARCHAR(96) DEFAULT NULL,
c_namefre VARCHAR(96) DEFAULT NULL
) ENGINE=MyISAM;

Download loc.gov-iso639-2.php and and edit it, setting your database configuration.

$CFG['db']['host'] = 'localhost';
$CFG['db']['port'] = 3306;
$CFG['db']['user'] = null;
$CFG['db']['pass'] = null;
$CFG['db']['name'] = null;

Run the script -- it should only require a few seconds.

License

This project is BSD (2 clause) licensed.

photo of James Linden
Founder / Head Geek
Digital Dock, LLC
aka kodekrash & N6NRD
Alexandria, LA USA

What I Do

Linux administration & virtualization
Data mining, storage & analysis
Web development

What I've Done

Rescued a skunk
Built Prime GNU/Linux
Contributed to Spidering Hacks