Tuesday, February 15, 2011

Use Python to normalize database

Many occasions, data needs to be normalized to speed up query operations before entering a database. Large text files have to depend on Python, given its’ excellent row-wise data manipulation ability.
First thought is to use a nested list to fill in all the data, such as the codes below.
import csv, sqlite3

infile = open('mtcars.csv', 'r')
f = csv.reader(infile)
header = f.next()
header.pop(0)

data = []
for r in f:
name = r.pop(0)
for i in range(0, len(r)):
data.append([name, header[i], r[i]])
However, a dictionary will be much more convenient given its built-in iteration tools.
import csv, sqlite3

infile = open('mtcars.csv', 'r')
f = csv.DictReader(infile)

data = []
for r in f:
a = r.popitem()[1]
for key, value in r.iteritems():
data.append([a, key, float(value)])

infile.close()

# Create an empty table for the codes below
db = sqlite3.connect('mtcars.db')
c = db.cursor()
c.execute('create table if not exists display(name text, varname text, varvalue real) ')
db.commit()

# Insert the normalized data into SQLite database
c.executemany('insert into display values (?, ?, ?)', data)
db.commit()

# Show the result
for row in c.execute('select * from display'):
print row

No comments:

Post a Comment