We need a client-side tool/ GUI that allows users to upload data into our web-based system. We are looking for bidders with expertise in data science. Our desired candidate should have previous database creation/management experience, creative approaches to problems that arise, strong communication skills and long term foresight. A proof of concept as well as the use of the data aggregation tool may be necessary so that the candidate can demonstrate the necessary skills. If the relationship is successful, we would be interested in extending it over the longer term. A significant part of this project will be understanding what the existing systems does when reading the already existing data sets, and then building the import tool by making sure the end product meets the requirements set for the existing datasets.
Over the long term, we want to build the data import feature out to be fully automated by crawling websites to look for data updates and automatically bring them into the data aggregation tool, format them appropriately and analyze the data for any anomalies. Additionally, the feature would be able to translate data headers (such as country names) into one uniform moniker. For example, the tool would be able to convert any iteration of the United States (US, USA, United States of America) into a single identifier.
About the tool
The current data aggregation system is developed using Python in Django framework. At the backend it relies on MongoDB and MySQL. Moreover, the tool uses several third-party programs including jsonschema, xlrd, xlsxWriter and xlwt. The tool aims to build a crowd-source data analytics platform to serve as a collaboration tool for sharing, cleansing, and augmenting data sets among an online research community, which gears up domain-specific studies with data science methodologies including exploration, visualization, and quantitative analytics.
The system has a django based backend API that links to the repository and generates .JSON format data sets for reading into the web API. These data set were loaded into the repository at the beginning of the designing process. From these initial datasets, the backend API generates JSON encoded data sets , parses them into HTML (using .getJSON), and populates them into the user interface. The user interface displays a list of matrices whose content can be downloaded in .JSON format conforming to a specific parsing format.
At its current state, the system can display and export data in .JSON format for the set of already existing datasets. However, it does not have the tools to replicate the process of uploading these datasets or any additional new data sets. The task for the developer is therefore, to develop a tool to convert raw data in .xls, format it into the current .JSON format, and finally into the format compatible to the DataGator database.
We have attached some supporting and background documents. In addition, we have attached python and other codes that were initially meant to import data but were not utilized. Please familiarize yourself with all attached documents and respond with any questions or suggestions you may have.