Resumé/CV and Job Parsing

CVs come in all sorts of variations, nearly all of them in unstructured text. So how do you make sense of them and get a clean, structured view of your data? This is where a parser comes in. A parser converts all the textual content of the CV into nicely formatted structured fields which can then be indexed and searched.

A parser has a very difficult job to do. It needs to scan through hundreds or thousands of lines of text written by hundreds or thousands of people and turn those lines into a standard format understandable by databases or search engines.

A parser must be accurate and intelligent. Simply scanning a document for known skills or roles doesn’t cut it. If it did then “Steven Carpenter” would always be offered woodwork roles.

A parser needs, as much as possible, to act like an intelligent human being, who can skim through the text picking out the relevant data and determining what it means. The more intelligent the parser, the more accurate the data obtained.

A parser should be able to:

  • Pick out roles, skills, dates, companies, personal data and educational achievements

  • Disambiguate similar skills and roles. A “project manager” in Information Technology is very different from a “project manager” on a construction site

  • In a vacancy/job order then differentiate between “must have” roles/skills and “nice to have”

  • Identify the most relevant skills, “10 years java development” is usually more important than “hard working”, (although you may want both!)

  • Ignore noise: many words are ambiguous and can be mis-interpreted. e.g. “closing” might refer to closing deals or sales, but can just as easily mean closing the shop every night

A bad or inaccurate parser can give a totally distorted view of your CVs or jobs. Your data is all-important and a good parser is crucial to help you understand your candidate base and unleash the potential of your talent pools

Why use the reshufl parser?

The reshufl solution has three main strands which are combined uniquely to provide unparalleled accuracy. Semantic analysis, disambiguation and a rich taxonomy.

Semantic Analysis


Our semantic analysis engine allows us to break down each Resumé into its component sections and sub sections and really understand how each term is used.


Each Resumé section is parsed independently to look for relevant terms. So for example, we don’t confuse skills from the hobbies sections like football, with skills from the candidate’s employment section.


Our parser will also identify patterns that allow us to distinguish candidates that are CEOs to those that “work for the CEO”, vastly improving the parser accuracy and avoiding false positives when searching CVs or jobs.

Term Disambiguation


Term disambiguation helps us to distinguish between ‘java programming’ and ‘java coffee’. By understanding the meaning of words and phrases in context, we can differentiate and avoid false positives.


Another example of our role and skill disambiguation is that we correctly identify “project manager” in Information Technology as different from “project manager” in construction.


Each role within the CV is allocated one or more employment sectors or sub-sectors, e.g. IT, Finance, Cryptography etc… so each role is treated independently in a CV where the candidate has a varied career.

Managed Taxonomy


Our taxonomy is so big that we don’t actually know exactly how many terms we have. We do know that we can deal with over 2 billion variations. The taxonomy is a very rich mix of job roles, skills, synonyms, plurals, skill roles, role clusters, modifiers, industry sectors, names, companies, locations, educational courses and educational establishments.


Each role and skill has multiple facets which include importance, confidence, and can include valuation (in global currency terms). Values are obtained from real jobs in multiple locations.


All roles are also classified by SOC (Standard Occupational Classification) codes. These are used for governmental reporting and statistics throughout Europe and the USA

These three strands are combined using the latest platform-independent software. Our leading edge non-blocking code enables each CV or job to be parsed using the benefit of a 2 billion term taxonomy and intelligent semantic analysis and returns super-fast extremely accurate results.

What does the reshufl parser return?

Personal Information

Candidate name, location(s) (including lat/long), email addresses and telephone numbers

Hobbies and References

Distinguishing between fishing and sailing as a hobby and a job is pretty crucial when it comes to understanding what makes your candidates tick. If you end up emailing your IT candidates fishing jobs because they enjoy fishing on the weekend they are not going to be happy!

It’s critical not to confuse names of referees with the candidate names or else you can end up emailing the wrong person! 

Employment History

Roles, Skills, Dates, Companies along with employment sectors and sub sectors for individual roles

We can accurately identify which roles and skills candidates have been using and for how long and for whom so that you can build a clear profile of your candidate. 

Owing to our rich taxonomy of recruitment related terms, we know that time spent as a programmer, software developer, software programmer, IT programmer (or several thousand other combinations) in separate jobs can be accumulated and essentially means the same. This is crucial to accurately matching and reporting on your database of candidates.

Identifying sectors for each of the candidate’s experience also allows us to accurately identify how much candidates have spent in primary and secondary sectors which gives a clearer picture of the candidate’s experience.

Educational Information

Education levels, attainments and educational establishments

A clear understanding of a candidates educational history has many benefits.

Firstly jobs and careers can be tailored to the appropriate level of qualification.

Secondly skills learnt at school or university should not be confused with skills learnt through work. e.g. studying “statistics” at school does not necessarily show you work as a statistician. 

Through our rich taxonomy of education establishments we see that candidates that have “University of Manchester” or “Manchester University” actually went to the same university which becomes critical when trying to produce accurate  reports.

All roles and skills are returned along with dates, length of service, freshness and associated company. Soft skills are identified. Also, uniquely they can be returned with valuations. So you can differentiate between the most important and lesser skills and roles. The valuations are in global currency and can vary according to the location of the candidate. If you are using the search and match solution then the parser can also return a document ready for seamless insertion into your search engine

Why Choose Us?

Put simply, we are the most accurate parser available on the market, way ahead of all our rivals, but don’t take our word for it, compare us now to the others. Our demo is free to use and without obligation.

We don’t make ridiculous claims about 99.9% accuracy or 1000 CVs per second processing per CPU.

Our parser is pretty fast, but more importantly, is linearly scalable. That means the more hardware is provided, the more throughput you can utilise, the faster it processes.

We don’t say that our parser will replace the recruiter. All jobs are different, as are all candidates, and the successful match of both will almost always come down to factors other than those on a CV, but the parser will dramatically cut down time taken to shortlist candidates and provide the valuable insights into your data that you need.

We are also agile enough to grow with your business in ways that some of the incumbent players cannot. If you have special or bespoke requirements, then we can move quickly to accommodate you. Please see our strategic partnerships page for more information and get in touch with us for more information.