Note
We are no longer accepting new customers or work orders at this time. Thank you for your interest.
Perl Hashes are Very Important
Hashes in Perl are just as important as arrays; maybe even more important. If you haven't mastered hashes, it's time to do so right now, with my help. It's easy, actually. Here we go.
Array Refresher
You probably have used arrays in other languages. An array is just a list of items, for example a list of people's names, or addresses, or prices of products, etc. Each item in the array is numbered with an "index number"; the first one is at index 0, the next is at index 1, and so forth, for as many items has you have in the array.
You can look up the items in the array by their index number, for example to see the 5th item, you would look at index #4. (0 thru 4 is 5 items total) This way you have instant access to any element in the array, no matter how large it is - so long as you know the index number it's stored at.
Visually you can think of ordinary arrays as a sequence of storage boxes, where each box has an index number identifying it. Indexes are a kind of "address" of each storage box.
Hashes are Similar to Arrays
Hashes are very similar to arrays - in fact they are sometimes called Associative Arrays. They are just a bunch of storage boxes, like an array, except that instead of marking each box with an index number, they are marked with a name, a word, or bunch of words (any string of text, basically).
You "look up" each particular storage box by a name, instead of a number. It's very simple, when you think about it.
Visually you can imagine a hash looking like this:
With hashes, the part you look up items by is called the "key", while the data element in the little box is called the "value". You may have heard of people talking about "key-value pairs", this is all they're talking about.
When to Use a Hash
How do you know when to use a hash, instead of an array? After all, they both hold a bunch of items, as many as we want.
Which one you use depends on how you're going to look up the data later. If you simply need to list all the elements in sequence, a plain old array is fine. But if you need to look up specific items in the data by a non-numeric key, like by name or phone or some other ID, you need a hash.
Example Project
Suppose you want to ask the user for their name, and look up in a file to see if there's a message waiting for them. The file format is single lines of "name message", very simple.
The basic algorithm we would use is:
If you're used to using arrays for everything, you'd probably write step 3 like this:
This is very bad. It's not right to loop thru the entire data set, when the amount of data you're fetching is a tiny subset of that data, based on an exact-name match. This would be very slow, and would get slower and slower, the more data is added to the system! Your users will be annoyed, and your system administrator will be upset at what a CPU hog your program is. We say this solution does not "scale well", because it gets progressively slower the more successful it becomes (the more data is added to the system, the more users you have; etc).
The proper way to write step 3 is with a hash. No more looping to find single items. You simply look up the exact element that matches the person's name (the key of the hash), and get the message (the data value in the hash at that key's location).
Here's a diagram of what the hash may look like in memory:
Once we've created a hash like that, any message can be fetched almost instantly, given the person's name! Assuming the data is stored in the hash %messages, and you have a name $name to check for messages, retrieval is easy:
$msg = $messages{$name};
if ($msg) {
print $msg, "\n";
} else {
print "no message for you.\n";
}
|
What's the Limit?
Here's a question for you. How many items can you store in a hash?
With some languages, there are limits to things like this. But with Perl, the answer is the same with hashes and arrays - there is no limit! You're only limited by the amount of memory in your computer. Perl was designed very well, so us programmers don't have to worry about size limits like that very often.
What Order are Hash Elements in?
With hashes, there's no way to know what "order" the elements are in. You can think of them as being in random order, effectively. They are not stored in sorted-order. They also don't stay in the order you put them into the hash in the first place!
Hashes use a special formula (called a hash function) for finding the data elements quickly, almost as quickly as array index lookups. Because of this, the elements are stored in a non-predictable order, for easy access.
Notes about Hashes
The key of a hash element is case-sensitive. It's important to remember that. In other words, the key "sally" is completely different than the key "Sally", because of that 1 uppercase letter! If you store data in a hash using an uppercase name and then try to find an element by the lowercase name, it won't be there.
That's why a lot of programmers force their keys to lowercase first, before storing anything into (or retrieving anything from) their hash.
Perl has a good function, lc(), to do lowercasing. Check it out:
my %addresses;
$name = "Sally";
$addresses{lc($name)} = "1234 Programmer Lane";
print "$name lives at ", $addresses{lc("SALLY")}, "\n";
|
In this example I used two different upper/lowercase versions of the key. Luckily I used the lc() function every time I did a hash access, so it would work right.
Notice one subtle thing however: when I display the name as output, I didn't lowercase it. I only lowercased it for hash lookups. This is the sign of a professional program.
Goal: Present things to humans the way a human understands things best (with upper/lowercase done properly). Present things to computers the way a computer understands things best (all-lowercase for key lookups).
Other things that can mess up key lookups is if you read a string from the user, or from a data file, and forgot to remove the newline character from the end! Use chomp() on the string to make sure there's no newline character.
Problems also occur if there are extra characters invisible to the human eye, in the string. For example, space or tab characters on the front or back of the string. That's why you see a lot of code do things like this:
print "Enter the person's name: ";
my $name = <STDIN>;
chomp $name; # remove newline character
$name =~ s/^\s+//; # remove spaces from front
$name =~ s/\s+$//; # remove spaces from back
print "Address is: ", $address{lc($name)}, "\n";
|
Data Structure Planning Cheat-Sheet
The answer you give to #1 is the data record you're going to create in memory. The number of answers you give to #2 tells how many hashes-or-arrays you're going to need (2, in this case: one for Name, one for City-State). For each answer to #2: if the element is words or really large numbers, use a hash. If it's 0-based integer numbers, use an array.
In our example, both Name and City-State are strings, so we must use hashes for both.
What If I Want to Lookup by Any Data Field?
Then you need to put your data in a real relational database! That's one of many things that SQL was designed to excel at. There are many freely available and extremely powerful and reliable database out there such as MySQL (my personal favorite), Postgres, Firebird, and others.
There are even some higher-end database systems that cost money, for really large-scale data needs. But even the free ones can work with hundreds of thousands of records, and still perform lookups on your data in well under a second.
Perl Code for the Above Example
There are many ways of storing data like this in memory. This is one of the best methods. First, draw a picture of what you're trying to do.
I know this diagram looks a little complex. It simply has (1) data elements floating in memory, (2) a hash to look up records by name, and (3) a hash to look up records by city-state.
Each lookup hash points to a lot of data record hashes, so we call this data structure a "hash of hashes".
Example code that stores data in this structure:
foreach (read one data line from file into variables) {
# create the data element in memory:
$elem = { name=>$name, addr=>$addr,
city=>$city, state=>$state, phone=>$phone };
# attach data element to "name" hash:
$byname{lc($name)} = $elem;
# attach data element to "city-state" hash:
$bycitystate{lc($city.'-'.$state)} = $elem;
}
|
We're actually obtaining a "reference" to an anonymous hash in memory (holding the data elements name, addr, city, ...) and then storing the reference as a data value in two different hashes (our two lookup tables: byname, and bycitystate).
Example code that retrieves data from this structure:
# Given person name ($name), print the person's phone number:
if (!exists($byname{lc($name)})) {
print "I don't see $name in the data set.\n";
}
else {
print "$name's phone is ", $byname{lc($name)}{phone}, "\n";
}
|
# Given a city $city and state $state, print the person's info:
my $key = lc($city."-".$state);
if (!exists($bycitystate{$key})) {
print "I don't see anyone with a city of $city \
in the state of $state, in the data set.\n";
}
else {
print "Person name: ", $bycitystate{$key}{name}, "\n";
print " Phone: ", $bycitystate{$key}{phone}, "\n";
print " Address: ", $bycitystate{$key}{address}, "\n";
print " City: ", $bycitystate{$key}{city}, "\n";
print " State: ", $bycitystate{$key}{state}, "\n";
}
|
Of course, this assumes that no two people have the same name, and that no two people live in the same city and state! Not the best assumptions. To be more practical, it would be better to store an array of possible values at every city-state key, so that if 5 people lived in the same city-state, all 5 would be listed in the array. This data structure would be called a "hash of hashes of arrays".
Learning More
Reading Perl documentation is a good way to learn more about these concepts. On Unix you can use the "perldoc" command (as well as the "man" command) to read the on-line docs that came with your version of Perl. On Windows, there's a whole documentation system that comes with Active Perl, if you have that installed.
Don't miss the latest web tips and tricks!
Subscribe to our low-volume mailing list:
Privacy Policy
| Copyright © 2006 - 2010 Keith Smith Internet Marketing LLC, all rights reserved. |
| Problem with this web site? please let us know |