Find Jobs
Hire Freelancers

Data comparison in batch

$250-750 USD

Closed
Posted about 10 years ago

$250-750 USD

Paid on delivery
Develop a mechanism & software to identify similar content in a huge base of articles. Input format from csv flat file. Output should tell which entries are similar with indicator of "similarity strength" Language for this program is flexible as long as it deliver the result. If you are interested, please give me a message and let me know how you want to start this. I can give you examples and our detail requirements.
Project ID: 5489126

About the project

27 proposals
Remote project
Active 10 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
27 freelancers are bidding on average $527 USD for this job
User Avatar
Dear sir, I am really interested in development of this project, I have strong programming skills in several languages, so I have many options to develop this application. Thanks and regards, Yasser
$250 USD in 2 days
5.0 (132 reviews)
7.1
7.1
User Avatar
Hello. I'm interested in your project since I have experience in searching similar content in databases. Please give detailed requirements. Thanks.
$600 USD in 20 days
5.0 (33 reviews)
6.1
6.1
User Avatar
i can make this as a python script . .
$450 USD in 10 days
4.8 (79 reviews)
6.2
6.2
User Avatar
This has a lot to do with my work for my Master's thesis which was in the field of Artificial Intelligence applied in Linguistics. If the article are in English, my first idea would be to first do a part of speech tagging and then only compare the sets of words that are relevant to your purpose - for example, proper nouns. After that, something like a modified version of the Lesk algorithm might show some good results. Would you rather this to be something that you can run from a server, like a PHP script, or a Windows program? How many articles are there? Can you give me some examples of entries that you consider similar and entries that you don't consider similar? The similarity measure is a subjective function.
$600 USD in 10 days
5.0 (45 reviews)
5.7
5.7
User Avatar
I might be able to this project using locality sensitive hashing or compressed sensing methods, depending on the details of you dataset. Please send me a few examples and I'll let you know if this is possible.
$333 USD in 10 days
5.0 (43 reviews)
5.9
5.9
User Avatar
Hey There, Thanks you for posting the project overview. It looks very feasible and I am interested to do it. Next steps: Lets discuss more about the requirements/data input/output and and and I start the work accordingly. I am an excel/access VBA automation professional (Data Analyst) having more then 5+ years of experience in the same domain. Please consider and contact me for further discussion. I am available online to take any further queries. Thanks, Abhinav
$283 USD in 5 days
5.0 (53 reviews)
5.7
5.7
User Avatar
Hi, I have more than 14 years of exp and I am expert in this kind of work. I have completed more than 200 projects. Please look at the feedback left by my employer to know more about my work. Waiting for your positive response. Thanks.
$277 USD in 10 days
4.9 (89 reviews)
5.9
5.9
User Avatar
I would like to give this a try. Please send me details on "similarity strength" and some sample input data. I will try to come up with a prototype in 2-3 days.
$277 USD in 5 days
4.7 (21 reviews)
5.7
5.7
User Avatar
Hi I've completed many projects before but I'm not very sure what you need now. provide some examples and your needs so that I can understand them clearly. Thanks Zhining
$444 USD in 10 days
5.0 (26 reviews)
4.3
4.3
User Avatar
I have clearly read and understood your project requirements. I have a rich experience of Team Lead for 2+ years with a total experience of 6+ years. I am responsible for managing teams, writing Frameworks and Scripts in Python. I have recently completed several Projects in Python on oDesk, Elance and Freelancer with excellent (5 star) rating. I am in Top 10% (11th rank) amongst Python test takers at oDesk, Elance and Freelancers. Assure you of accurate and on time delivery of work with utmost quality. Please see my profile and portfolio. I assure you I am the one you are looking for as a Python Developer. Looking forward to work with you. Thanks, Vikas
$250 USD in 7 days
4.6 (5 reviews)
4.0
4.0
User Avatar
Hello, I have a good experience with Python and Ruby and I wish to know more about this project. Can you please send me some more details.. Thanks Vinod
$555 USD in 30 days
5.0 (6 reviews)
3.8
3.8
User Avatar
Greetings. You have interesting project and I suppose to use Perl for data comparison program development on web base. I am ready to help you and solve your task in time and in budget.
$530 USD in 25 days
4.8 (6 reviews)
3.6
3.6
User Avatar
Hello, I am interested to work with you on this project. I would choose C++ for this project as it is a quiet fast programming language. I would need more details about the "similarity strength" and what exactly you would expect it to be. I hope we could have a nice experience on working on this project. Respectfully, Grig
$388 USD in 10 days
5.0 (9 reviews)
3.6
3.6
User Avatar
I have check this requirement,have some query,so need to discuss this,please tell me how we can start the discussion. to know more about us please check Private Message. We have a team of professionals,they have more than 11 year of experience,so we can manage this work and will give you quality solution.
$600 USD in 17 days
2.5 (37 reviews)
6.8
6.8
User Avatar
Hi, I have been developing in python for over 10 years, and have experience in Natural Language Processing. To accomplish this, I intend to create a statistical model based on the word distribution of your articles and use that as a comparative metric. This will remove the need to do a pairwise comparison of every file (while is impossible for a large data base) and will be rather quick. If you have any questions or wish to see some of my work, don't hesitate to ask. Thank you, Chris
$555 USD in 4 days
4.6 (2 reviews)
3.6
3.6
User Avatar
I have a masters degree in applied mathematics and have worked with these kinds of problems previously. I will probably use Python or Ruby for the data processing.
$888 USD in 30 days
5.0 (2 reviews)
2.3
2.3
User Avatar
use python3 to do this. we can test it first. running in windows or linux. also can have gui to dispaly result.
$333 USD in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hello, I implemented a similar project a few years ago, but it was a bit more complex. It processed thousends of textual documents with a bunch of distributed computers. Obviously I have enough experience with Information Retrieval techniques. Based on your description, I would first automatically clean up the document (remove punctuation etc.) and then extract the pure words. These words a combined to n-grams (for n=1, .., m; with a user defined m), weighted with "term frequency - inverse document frequency" and finally the documents are compared with cosine similarity. This produces a score from 0 (not similar) to 1 (equal). Based on the score, it is possible to identify all documents DS wich are similar to D by thresholding the score. And of course it is possible to identify the k most similar documents. I would implement it with Python and the scikit-learn package (BSD licens). If you have any questions, do not hesitate and send me a message. Sincerely, Sebastian
$500 USD in 3 days
0.0 (0 reviews)
0.0
0.0
User Avatar
A proposal has not yet been provided
$444 USD in 10 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hello, I would make it with Python for sure. You can consider putting all the data in a relational database (eg. sqlite) to speed up search queries, repeat queries, analyze data etc. Because CSV files is not handy for that.
$555 USD in 3 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of HONG KONG
Hong Kong, Hong Kong
5.0
35
Payment method verified
Member since Feb 23, 2010

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759) & Freelancer Online India Private Limited (CIN U93000HR2011FTC043854)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.