Gigantron: Processor of Data
Get Version
0.1.3→ ‘gigantron’
What
Gigantron is a simple framework for the creation and organization of data processing projects. Data-processing transforms are created as Rake tasks and data is handled through ActiveRecord models. (DataMapper was the original plan, but it has problems playing nicely with JRuby for now).
Ruby is great for exploratory data processing. Data processing projects tend to grow up and encompass large numbers of random scripts and input files. It is easy to get lost in coding and lose organization. Gigantron is an attempt to use code generation and random magic to make maintaining organized DP projects simple. Code is separated into data (models) and operations on the data (tasks). Code generators stub out these files and the associated tests for the user.
Gigantron was written for my own needs working with atmospheric data and will evolve through use to reduce the trivialities that can sometimes dominate the work of developers.
Installing
sudo gem install gigantron
This should handle the major dependencies automatically except for your
database adapter. If you are on JRuby the gem is
activerecord-jdbcsqlite3-adapter for sqlite3 and
activerecord-jdbcmysql-adapter for mysql. On MRI be sure to install
sqlite3-ruby if you are using sqlite.
Note: JDBCSqlite3 is still a work in progress and migrations are basically broken for it.
The basics
# Generate new project
shell> $ gigantron project
create
create tasks
create db
create models
create lib
create test
create Rakefile
create database.yml
create initialize.rb
create tasks/import.rake
create test/test_helper.rb
create test/models
create test/tasks
create test/tasks/test_import.rb
dependency install_rubigen_scripts
create script
create script/generate
create script/destroy
shell> $ cd project
# Create new model
shell> $ script/generate model modis
exists models/
create models/modis.rb
exists test/
exists test/models/
create test/models/test_modis.rb
shell> $ script/generate task modis_to_kml
exists tasks/
create tasks/modis_to_kml.rake
exists test/
exists test/tasks/
create test/tasks/test_modis_to_kml.rb
One can edit these files to add functionality. Gigantron by default includes ActiveSupport for convenience.
Hacking
Gigantron is super minimal now, so modifying it is pretty easy. The gigantron
application generator (that which is invoked by the gigantron command) lives
in app_generators/gigantron/. The template files and the template dir
structure are in app_generators/gigantron/templates/. When adding new
templates, directories, or files, add the names first to the tests in
test/test_gigantron_generator.rb, then the stubs to
app_generators/gigantron/templates, and then finally describe them in the
manifest section of app_generators/gigantron/gigantron_generator.rb. It
should look something like
def manifest record do |m| ... m.file "new_file", "new_file" #straight file copy m.directory "my_new_dir" #create directory m.template "new_thing", "new_thing" #runs file through ERB when copying end end
It might be handy to know that in gigantron_generator.rb the name provided to the generator can be referenced as @name. This can be used like
record do |m| m.file "renamed_file", "#{@name}_file" end
The same value is available to your templates as just name. You can template
a file like
class Test<%= name.camelcase %> < Test::Unit::TestCase ... end
The same process applies to the model and task generator for gigantron
projects. These generators live in gigantron_generators/. Modifying them
is exactly the same as modifying the application generator.
All of this is pretty vanilla RubiGen, so if in doubt, check out the docos on that fine piece of work.
The only other place for code in Gigantron is is lib/gigantron/tasks/ where a few boilerplate test and db tasks live. I think I ripped the test tasks off of rails.
If you have any questions, do contact me. I am interested in anything that will make Gigantron suck less and be useful to people.
How to submit patches
git clone git://github.com/schleyfox/gigantron.git
Build and test instructions
cd gigantron cp test/template_database.yml.example test/template_database.yml vim test/template_database.yml rake test rake install_gem
License
This code is free to use under the terms of the MIT license.
Contact
Comments are welcome. Send an email to Ben Hughes
Ben Hughes, 24th June 2008
Theme extended from Paul Battley