{"id":160,"date":"2012-10-04T19:32:00","date_gmt":"2012-10-04T18:32:00","guid":{"rendered":"http:\/\/blog.espol.edu.ec\/xallam\/?p=160"},"modified":"2012-10-04T19:37:28","modified_gmt":"2012-10-04T18:37:28","slug":"finding-names-on-a-raw-text","status":"publish","type":"post","link":"https:\/\/blog.espol.edu.ec\/xallam\/2012\/10\/04\/finding-names-on-a-raw-text\/","title":{"rendered":"Finding names on a raw text"},"content":{"rendered":"<p>Sometimes is difficult to find out names on a text. Maybe the most na\u00efve way is to get all the words that starts with a capital letter and that's it! But, it you check on this paragraph you could find names like \"Maybe\" or \"But\" (???) So, fortunately, there're more brilliant ideas like <a title=\"Find names with Regex\" href=\"http:\/\/stackoverflow.com\/questions\/7653942\/find-names-with-regular-expression\" target=\"_blank\">this<\/a>, on which is used regex with some particular rules, like:<\/p>\n<ul>\n<li>A name is composed by two word (minimum) that starts with a capital letter each one.<\/li>\n<li>Maybe can be composed by more than two words, like \"James Van de Putte\" or something similar.<\/li>\n<li>Multiple words separated by whitespace.<\/li>\n<li>... and so.<\/li>\n<\/ul>\n<p>This is the final regex string used to parse names (namely, composed names) from a text.<\/p>\n<pre><code>[A-Z]([a-z]+|\\.)(?:\\s+[A-Z]([a-z]+|\\.))*(?:\\s+[a-z][a-z\\-]+){0,2}\\s+[A-Z]([a-z]+|\\.)<\/code><\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Sometimes is difficult to find out names on a text. Maybe the most na\u00efve way is to get all the words that starts with a capital letter and that's it! But, it you check on this paragraph you could find &hellip; <a href=\"https:\/\/blog.espol.edu.ec\/xallam\/2012\/10\/04\/finding-names-on-a-raw-text\/\">Sigue leyendo <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":16,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[1],"tags":[42698,144566,146857],"class_list":["post-160","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-extract","tag-name","tag-regex"],"_links":{"self":[{"href":"https:\/\/blog.espol.edu.ec\/xallam\/wp-json\/wp\/v2\/posts\/160","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.espol.edu.ec\/xallam\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.espol.edu.ec\/xallam\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/xallam\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/xallam\/wp-json\/wp\/v2\/comments?post=160"}],"version-history":[{"count":4,"href":"https:\/\/blog.espol.edu.ec\/xallam\/wp-json\/wp\/v2\/posts\/160\/revisions"}],"predecessor-version":[{"id":165,"href":"https:\/\/blog.espol.edu.ec\/xallam\/wp-json\/wp\/v2\/posts\/160\/revisions\/165"}],"wp:attachment":[{"href":"https:\/\/blog.espol.edu.ec\/xallam\/wp-json\/wp\/v2\/media?parent=160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/xallam\/wp-json\/wp\/v2\/categories?post=160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.espol.edu.ec\/xallam\/wp-json\/wp\/v2\/tags?post=160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}