[Java-Tip] Non-Blocking Method To Download Files From Web

November 9th, 2010 No comments

java-150x150
The URLConnection class contains many methods that let you communicate with the URL over the network. But the URLConnection doesn’t provide a callback mechanism to know the data read progress. Java’s support of interfaces provides a mechanism by which we can get the equivalent of callbacks. The trick is to define a simple interface that declares the method we wish to be invoked and to notify the data read progress of a URLConnection.

We define our Event as follows.

// FileDownloadEvent.java
 
public interface FileDownloadEvent
{
    // This is just a regular method so it can return something or
    // take arguments if you like.
    public void dataReadProgress (int done, int total, byte data[]);
    public void done(boolean error);
}

There is two methods, the dataReadProgress() and thedone() method. We invoke the dataReadProgress() each time we read a chunk of data to notify the data read progress. We use the done() method to inform the data read is over or an error has happened.

The class that will signal the event needs to expect objects that implement the dataReadProgress interface and then invoke the dataReadProgress() method as appropriate.
We will keep a counter for the downloaded data and fire the dataReadProgress event each time we read a chunk of data.

// FileDownload.java
 
import java.net.URL;
import java.net.URLConnection;
import java.io.InputStream;
import java.io.DataInputStream;
import java.io.BufferedInputStream;
import java.util.Arrays;
import java.lang.Thread;
import java.lang.Runnable;
 
public class FileDownload extends Thread implements Runnable
{
    private FileDownloadEvent ie;
    private InputStream is = null;
    private DataInputStream dis = null;
    private int dataReadSize = 4096;
    private String downloadURL = null;
 
    public FileDownload (FileDownloadEvent event)
    {
        // Save the event object for later use.
        ie = event;
    }
 
    public void request (String url)
    {
        this.downloadURL = url;
	this.start();
    }
 
    //...
    public void run ()
    {
        boolean error = false;
        try {
            URL url = new URL(this.downloadURL);
	    URLConnection fdCon = url.openConnection();
 
            int total = fdCon.getContentLength();
 
            is = url.openStream();  // throws an IOException
            dis = new DataInputStream(new BufferedInputStream(is));
 
	    byte[] data = new byte[dataReadSize];
	    int progress = 0, n;
            while ((n = dis.read(data)) > 0) {
	        progress += n;
                this.ie.dataReadProgress (progress, total, data);
                Arrays.fill(data, (byte)0);
            }
        } catch (Exception e)
        {
            error = true;
        }
        this.ie.done(error);
    }
    // ...
}

The code that wishes to receive the event notification must implement the FileDownloadEvent interface and just pass a reference to itself to the event notifier or do as in the below code.

// Download.java
 
public class Download
{
    public static void main(String args[])
    {
        // Create the event notifier and pass ourself to it.
        FileDownload req = new FileDownload (new FileDownloadEvent() {
            // Define the actual handler for the event.
            public void dataReadProgress (int done, int total, byte[] data)
            {
                System.out.println("Progress: " + ((float)done/(float)total) * 100 + "%");
                // Do something with data...
            }
            public void done (boolean error)
            {
		System.out.println("Download Completed.");
                // Do something...
            }
        });
 
        req.request("http://somedomain/path/to/file.gz");
 
	// Do something
    }
}

That’s all there is to it. I hope use this simple Java idiom will be useful to someone.

Categories: JAVA Tags: , , ,

Playing With Python And CouchDB

November 4th, 2010 No comments

couchdb
Apache CouchDB is a document-oriented database that can be queried and indexed in a MapReduce fashion using JavaScript. CouchDB also offers incremental replication with bi-directional conflict detection and resolution.

CouchDB provides a RESTful JSON API than can be accessed from any environment that allows HTTP requests. There are myriad third-party client libraries that make this even easier from your programming language of choice. CouchDB’s built in Web administration console speaks directly to the database using HTTP requests issued from your browser.

CouchDB is written in Erlang, a robust functional programming language ideal for building concurrent distributed systems. Erlang allows for a flexible design that is easily scalable and readily extensible.

See the introduction and the technical overview for more information.

Getting Started

The couchdb.client.Server represents a CouchDB server. New databases can be created using the create method:

from couchdb.client import Server
 
server = Server('http://localhost:5984')
print server
try:
        db = server.create('emails')
except Exception:
        db = server['emails']
 
print db

Output:

<database 'emails'>

This class behaves like a dictionary of databases. For example, to get a list of database names on the server, you can simply iterate over the server object.

for db in server:
    print db

The out put will be something like

test
segfault
emails

Creating a Document

To define a document mapping, you declare a Python class inherited from Document, and add any number of Field attributes:

from couchdb.client import Server, Document
from couchdb.mapping import TextField, DateTimeField, ListField

Now you create subclass of Document and fill in the values.

class Email(Document):
  frm  = TextField()
  to = ListField(TextField())
  sub = TextField()
  added = DateTimeField(default=datetime.now())
 
eml = Email()
eml['frm'] = "vinod@example.com"
eml['to'] = "user@domain.com"
eml['sub'] = "test"

To update a document, simply set the attributes, and then call the save() method:

doc_id, doc_rev = db.save(eml)
print doc_id, doc_rev

Output:

8be7ef2e5d711f11e859972ca9d38a52 455397369

Retrieving documents

for docid in db:
  eml = db.get(docid)
  print eml['frm'], eml['to'], eml['sub']

Output:

vinod@example.com user@domain.com test

Working With Views

Views are the primary tool used for querying and reporting on CouchDB documents. There are two different kinds of views: permanent and temporary views.

Temporary Views

The views you don’t want to save in the CouchDB database. NOTE: Temporary views are only good during development. Final code should not rely on them as they are very expensive to compute each time they get called and they get increasingly slower the more data you have in a database.

code = '''function(doc) { if(doc.frm == "vinod@example.com") emit(doc.frm, null); }'''
results = db.query(code)
 
for res in results:
   print res.key
Permenant Views

Permanent views are stored inside special documents called design documents, and can be accessed via an HTTP GET request to the URI /{dbname}/{docid}/{viewname}, where {docid} has the prefix _design/ so that CouchDB recognizes the document as a design document, and {viewname} has the prefix _view/ so that CouchDB recognizes it as a view.

You use ViewDefinition class to create a permanent view in the database.

from couchdb.design import ViewDefinition
 
rpt_view = ViewDefinition('reports', 'fromemail', '''function(doc) { if(doc.frm == "vinod@example.com") emit(doc.frm, null); }''')
rpt_view.sync(db)

You can see our view in the drop down list.
couchdb

Query permanent views
for res in db.view("_design/reports/_view/fromemail"):
        print res.id, res.key

Output

8be7ef2e5d711f11e859972ca9d38a52 vinod@example.com

Installing Python-CouchDB

If you are a Debian/Ubuntu user install couchdb and python-couchdb via apt:

sudo aptitude install couchdb python-couchdb

Or you can download from python-couchdb project home.

Categories: PYTHON Tags: , ,

How To Expand Usable Storage Space In Ubuntu

October 31st, 2010 No comments

ubuntu

1. Using LVM

For partitions created on Logical Volume Manager (LVM) (Linux feature) at install time, they can be resized easily by concatenating extents onto them or truncating extents from them over multiple storage devices without major system reconfiguration.

Caution: Deployment of the current LVM system may degrade guarantee against filesystem corruption offered by journaled filesystems such as ext3fs unless their system performance is sacrificed by disabling write cache of hard disk.

Run a df from terminal.

$ df
Filesystem	1K-blocks	Used	Available	Use%	Mounted on
/dev/mapper/VolGroup00-LogVol00	7935392	6773500	752292	91%	/
/dev/sda5	497829	20904	451223	5%	/boot
tmpfs	1037084	0	1037084	0%	/dev/shm
/dev/mapper/VolGroup00-LogVol01	70877776	14988144	51045372	23%	/home

We have two partitions here, / partition is about 8 Gb and the /home partition is about 71 Gb. What we are trying to do is to expand the / partition to 10 Gb by taking free space from /home.

For /home you do:

# sudo umount /home
# sudo e2fsck -f /dev/VolGroup00/LogVol01
# resize2fs /dev/VolGroup00/LogVol01 69G
# lvreduce -L-2G /dev/VolGroup00/LogVol01
# mount /home

For / you do:

# lvextend -L+2G /dev/VolGroup00/LogVol00
# resize2fs /dev/VolGroup00/LogVol00

e2fsck and resize2fs belong to package e2fsprogs.

After resizing you will get

$ df
Filesystem	1K-blocks	Used	Available	Use%	Mounted on
/dev/mapper/VolGroup00-LogVol00	9299624	6779304	2043564	77%	/
/dev/sda5	497829	20904	451223	5%	/boot
tmpfs	1037084	0	1037084	0%	/dev/shm
/dev/mapper/VolGroup00-LogVol01	68877776	14999888	51033628	23%	/home

Read the lvm-howto for detailed infotmation.

2. Mounting another partition

If you have an empty partition (e.g., “/dev/sdx”), you can format it with mkfs.ext3(1) and mount(8) it to a directory where you need more space. (You need to copy original data contents.)

$ sudo mv work-dir old-dir
$ sudo mkfs.ext3 /dev/sdx
$ sudo mount -t ext3 /dev/sdx work-dir
$ sudo cp -a old-dir/* work-dir
$ sudo rm -rf old-dir

3. Using symlink

This might be the easiest way. If you have an empty directory (e.g., “/path/to/emp-dir”) in another partition with usable space, you can create a symlink to the directory with ln(8).

$ sudo mv work-dir old-dir
$ sudo mkdir -p /path/to/emp-dir
$ sudo ln -sf /path/to/emp-dir work-dir
$ sudo cp -a old-dir/* work-dir
$ sudo rm -rf old-dir

4. Using aufs

If you have usable space in another partition (e.g., “/path/to/”), you can create a directory in it and stack that on to a directory where you need space with aufs. With aufs you can unite several directories into a single virtual filesystem.

$ sudo mv work-dir old-dir
$ sudo mkdir work-dir
$ sudo mkdir -p /path/to/emp-dir
$ sudo mount -t aufs -o br:/path/to/emp-dir:old-dir none work-dir