Melvin's digital garden

Data driven programming assignments

date: 2009-12-23 09:48:06 +08:00 category: teaching

Originally appeared here on 09/01/09.

I attended a talk by Randall Bryant on Data Intensive Scalable Computing. His focus was on computer systems for processing large amounts of data.

I realized the importance of data driven computations earlier though my experience setting programming assignments. I think it is important to have problems that are realistic; where you want to write the programs in order to see the results. Unfortunately, most of the time we are technique driven. We try to form a task around a specific method. However, most problems usually have a rather trivial solution, therefore we need to impose some unreasonable constraints or increase the size of the problem to some unbelievably large size in order to force the use of specific techniques.

I think the correct approach is to start from the data. There are large amount of interesting data that is available over the web, from movies to tags. In the UNIX workshop I conducted for freshmen orientation 2008, I made use of the SMS corpus from the WING research group to motivate the use of UNIX pipes.

Dealing with large amounts of publicly available real world data gives rise to realistic computational problems where the effect of efficient algorithms become apparent. Computations that takes hours to run using a naive method can be completed in seconds using the correct approach.

Links to this note