Our company collects various modes of data in the field to conduct condition assessment of critical infrastructure. The data is typically from sensors installed on field robots and includes images, videos, laser scans, environmental data, point measurements etc. The data is also coupled with other miscellaneous attributes such as reports, documents, annotations etc.
The data is currently hosted on local servers and is accessible by the employees on their windows/linux/osx platforms using sshfs, nfs or samba.
However as the amount of data exponentially grows (currently at 30Tb and expecting to double in the next 6 months) we need to address a few challenges in data management.
Current challenges are:
- Data stored across multiple server locations and not accessible seamlessly
- Difficult to prioritise data between ssd and hdd when requiring different speeds for current projects
- No overview of all the data, where it is, when was it last saved etc.
- Data is sometime reshuffled in the back end, which means all the code pointing to hard links breaks.
- Hard to manage permissions which many users trying to do prototyping and development at the same time
- Slow network access on linux machines (could be problem with sshfs)
These problems leads to the basic requirements of the project.
A. Central repository to store all data
A1. A central repository where data can be seamlessly stored. Data is currently spread over a few local machines. This needs to be aggregated such that the end user doesn't have to worry about where the data is actually stored.
A2. Set protocols for adding new data (adding done by plugging in drives into the servers or over the network)
A3. Data access to be streamlined. Data is currently (and will be) stored on local linux servers. Data to be accessible to users positioned on compute servers (connected to monitors and running linux) or their laptops (running linux, windows or OSX). Usable file paths to data for all linux users needs to be consistent to enable code transfer between machines. The same for windows and OSX machines to enable unified file paths when working on data processing software across multiple machines.
A4. Establish correct mounting protocols/scripts needed to allow for maximum read/write speeds.
B A barebones dashboard for the data repository (1 developer, 5-10 days including validation and testing)
B1. View basic stats on all data (where stored, how to access, last edited, size of data, basic notes etc)
B2. Accessible on the network using user login details (if they are an allowed user)
B3. Perform storage medium switches - e.g. copy/move data in the backend between ssd and hdd (for speed boosts on short term projects). This should be seamless to all users, who continue to access the data using the same path.
B4. Ability to change group owner of the different mount points
B5. Ability to see if data is being uploaded to AWS and tag data to be uploaded onto AWS (currently done using cronjob and bash).
19 freelancers are bidding on average $4565 for this job
Hello. Thanks for your kind posting. I am good at C Programming, anyway at Data structuring. So i can help you. Let us discuss more in details by chat, Thanks King Regards