As part of a memory management package for storing variable-length records in a large memory space, in this project we will build some necessary data structures for doing search and analysis on a large song database. The records that you will store for this project are artist names and song track names from a subset of the Million Song database.
To store data for the song records, we will simplify the memory management. You should create a large array of bytes to store all songs and artist names. The artist names and song titles will be stored separately. For each record, the first byte (called flag) will be a sign to mark whether the data is active (0 means deleted and 1 means active). The next two bytes will be the (unsigned) length of the record, in (encoded) characters. Thus, the total length of a record may not be more than 216=65536 characters or bytes. Following that will be the string itself. Access to all records will be controlled by “handles”. For each handle, it contains the start location (within the array) of the record.
To simplify the implementation, for insertion, append the new record to the last record of your array and mark it as active; For deletion, mark the corresponding data as deleted, but you don’t have to physically remove the record from the array. Whenever the array size is not enough, you will create a new array with larger size and copy the old data to the new array. Access to the song records will be through several index files: two closed hash tables for accessing artist names and accessing song titles, and two 2-3+ trees for range query (B+ tree with order of 3, you need to use 2-3+ tree other than other ADT).
For the hash tables, you will use the second string hash function described in the book, (the hash function will be provided by your instructors), and you will use simple quadratic probing for your collision resolution method (the i'th probe step will be i2 slots from the home slot). The key difference from what the book describes is that your hash tables must be extensible. That is, you will start with a hash table of a certain size, (defined when the program starts). If the hash table exceeds 50% full, then you will replace the array with another that is twice the size, and rehash all of the records from the old array. For example, say that the hash table has 100 slots. Inserting 50 records is OK. When you try to insert the 51st record, you would first re-hash all of the original 50 records into a table of 200 slots. Likewise, if the hash table started with 101 slots, you would also double it (to 202) just before inserting the 51st record. The hash tables will actually store “handles” to the relevant data records that are currently stored in the memory. This handle is used to recover the record. For this project, it will be just the index of the data in the array.
9 freelancers are bidding on average $132 for this job
I fully read your description and very interested in your project. I have good experience in BigData technology which may help for your project. I'd like to discuss more about project with you. Thanks, Bing
I have up to 7 years of experience in Java. Please don't hesitate to ask me if you are interesting and provide me full details on the dB of the songs. Thanks and have a great day.