Thursday, March 31, 2011

Day 19: In the Thick of It

We are now halfway into our scheduled work plan finishing up the GUI and getting ready to write the code that will constitute the algorithm for computing the user's 'health score'.  When we divided tasks among ourselves a couple weeks earlier, we did not know the length of time to which each task was realized.  As of now, we no longer need the use of a database to store the food groups and other possible attributes and their values, so Jordan is now able to research how to import the Activity into the Sugar software with the help of the tutorial we all followed in beginning this project.  Megan has come up with the algorithm that will calculate the health score, but as I said earlier, we are not ready to code it just yet.  Our concern at the moment is helping Alex with the GUI (primarily with finding out how to get the input values from the combo boxes and selection drop-down menus), a task that has required the most attention so far.

A little drawback like this does not mean we are not on schedule, in fact, we are right on time with providing each deliverable as specified by our timeline.  Flux Capacitors plans to make a great first activity for use with the Sugar software!

Tuesday, March 29, 2011

Day 18: POSSCON

The open source conference in Columbia was a great experience for a student like me getting ready to graduate and trying to find an interesting company that uses database technologies.  One such company, BackType, makes use of database and data mining technologies in dealing with large data systems.  The speaker for BackType, Nathan Marz, was able to explain the eight properties of large data systems as follows:
  1. Robust
  2. Low latency reads and updates
  3. Scalable (horizontally, adding more machines as the data size grows)
  4. General (abstracting whenever possible)
  5. Extensible (able to add new features)
  6. Ad-hoc analysis (this is where the data mining comes into play)
  7. Minimal maintenance
  8. Debuggable
Marz also described dividing the system into two layers, the batch and speed layers.  Using a tool called Hadoop, one is able to create this structure and use message passing and filters to create incremental algorithms that check for false positives in creating batch views.  Although the batch layer is slow with high latency and high throughput, it contains the master copy.  The speed layer compensates for this by utilizing more complex algorithms and transient data (meaning that the data is discarded from the speed layer once it is passed to the batch layer).  In a sense, the two layers both gather data only to merge it all together to create a real-time view.  The databases associated with containing the data are mostly Read/Write databases and not the widely used relational databases that I have always worked with.  However, one can still use MySQL, for example, to query the database.  Backups, as well as full recoveries, can be done using the batch layer while the speed layer continues to append more data to it's log.

I was also able to sit down with two teachers from another school and a representative from Oracle.  I learned about Oracle's diverse range of software products, as the representative was saying one could find Oracle just about anywhere there is IT.  We also got into a discussion over the usefulness of Virtual Box, something I currently use to host a SQL Server 2008 server on Windows Server 2003.  The representative was very appreciative of the complements we had to say about the product.  The two teachers had mentioned an idea to make the software communicate with other instances of Virtual Box.  I only know about a feature in Virtual Box to communicate with the host computer, but not other instances.

As for our project, we have been able to simplify the problem by removing an inner portion to the linear regression equation.  Currently, we need only to calculate the weight of each food group for use in the linear regression equation.  Alex, with her skills using PyGTK, was able to reflect this change in the GUI in presenting it to us.

Tuesday, March 22, 2011

Day 17: Road Work Ahead

While preparing for POSSCON, and a possible chance to meet Walter Bender (one of the lead developers of the Sugar software), we are also laying the foundation in completing a new Sugar Activity.  Our team member Alex has presented us with a GUI she built using Glade, a program that utilizes PyGTK and XML to produce things like combo boxes and allows for data entry into categories.  We also suggested the possibility of including class diagrams in the documentation, but we have not come across the need for many so the number of diagrams will be fairly small.

We also started to think more about the algorithm involved in calculating the score of the user's diet.  I mentioned using a regression formula the week before, because the calculation involves a weighted sum.  With more research, I found the answer to be just that, straight-line linear regression.  In our case, the variables are as follows:
  • weight = the weight of each food group (according to a food pyramid TBD) = (Servings of Group) / (Total Servings)
  • numServings = the user will pick a number of servings of a certain food group they have eaten (Alex currently has the measurement of Fists)
  • foodGroupScore = sum(weightn * numServingsn)  --> where n corresponds to a specific food group as selected by the user
Still, there is a level deeper that may be ignored in this calculation in including the impact each food in each food category has in determining the suggested number of servings of that food group because it is already given by the food pyramid of choice.

Thursday, March 17, 2011

Day 16: Preparing for POSSCON

In preparation for Thursday, March 24 POSSCON events, I have come up with the following questions for selected presenters:
  1. Early in the morning is the introduction to the event, have to make sure I grab a guide of some kind.
  2. Chris Hinkley has a 15 minute track on web hosting addressing the application layer.  I want to ask what kinds of logic should be implanted in the database layer.
  3. John Mertec will be talking about deploying easy PHP application security, so I want to ask what he has in mind for such things concerning database driven websites.
  4. Immediately following his presentation is Nathan Marz in building large data systems.  I want to see if his designs are similar in the ways in which SQL Server deals with splitting pages of data in order to create a new node and more space.
  5. Afterwards, there are a couple of leadership tracks in data governance and becoming efficient in a business.  I might have to just listen to their presentations to see if I have any relevant questions.
 There is a lot to learn here, and unfortunately I will not be able to get to some of the concurrent events.  This is a schedule for what I have picked to be the most interesting and career specific for me.  On a side note, there are plenty of Linux organizations attending the event, so I am glad to have some experience using Ubuntu because I just might see some use instances during the presentations.

As for our project, I have related the documentation and other resources mentioned in Day 15 to my team members, and now we are creating the algorithm that will calculate the score on the health meter for our Sugar activity.

Tuesday, March 15, 2011

Day 15: Back on Track

After some well deserved time off and tranquility, we can now get back to working on our project.  I forgot to provide the link to the Sugar Labs API, it is the following:

http://api.sugarlabs.org/

The folder epydocs/ contains a table of contents listing the modules and their corresponding classes, functions, and variables.  The folder puppets/ is something I'm still not familiar with in that it also contains the same elements as in epydocs/, but for a different set of modules.  One such module I noticed in particular is the use of MySQL and PHP to create database driven websites, something those students in CSCI 332 have done before.  Lastly, the folder sphinx/ contains a table of contents for all of Sugar's documentation, listing the source for how to build an activity and import graphics (lucky us!).

Saturday, March 12, 2011

Intermission

Although the team has adjourned for Spring Break, we have divided up some tasks for research and development in the continuation of our project.  I am currently researching Sugar's library of Python methods in trying to become familiar with some of the important pieces for use in creating our activity.

Tuesday, March 1, 2011

Day 14: The Game Plan

After submitting our fix to modify a couple of Python files in the Sugar code base, we gathered yesterday to discuss the course of the team's efforts for the rest of our time together during this semester.  I do not think anyone in the group wanted to write test cases, since it would be a repeat performance of the projects we had completed in CSCI 362.  Instead, we decided on developing software in the form of creating an activity for the Sugar environment.  Our team member, Megan, pointed us in the right direction by finding Sugar's Wikipedia page on how to create an activity.  This step by step guide shows one how to create a simple activity by giving the reader the code needed in order to perform the setup operations.

Towards the bottom of the page, there is a command to install the compressed activity in the .xo format as follows:

sugar-install-bundle HelloWorld.xo

I mentioned earlier the use of the Surf.xo activity recommended over Sugar's Browse feature.  I installed it using the command above, replacing HelloWorld.xo with Surf-115.xo.  I received a successful completion in the terminal window and noticed the activity appear in the Sugar environment, but upon using the activity to search the Web, it failed and returned me to the start-up screen.  However, in using a USB to transfer the program to the Sugar emulator, the program would respond well upon use.  I'll investigate further to see if the problem is a matter of placing the file in a particular location in order to use the command appropriately.

Getting back to our timeline, we have created some initial tasks to divide amongst ourselves for creating an activity that will present nutrition facts to children in the U.S.  There is a similar activity available for kids in Uruguay, click here.  So far, we have planned:
  • A GUI
  • Algorithms to calculate a percentage of "healthiness"
  • A database that interacts with the GUI