Taxonomy Strategies’ Post

Name: How named entities can boost digital assets - Joseph Busch | Taxonomy Strategies posted on the topic | LinkedIn
Uploaded: 2024-07-23T18:58:43.634Z
Channel: Taxonomy Strategies

Taxonomy Strategies

882 followers

3mo

It might be a long shot, but are you interested in named entities or natural language processing? Named entities are more important than you think in identifying what something is about and why it is important. Want to learn a lot more about them in under 14 minutes? Have a look at this presentation I gave and see how you can supercharge your ability to find and use relevant digital assets. - Joseph Busch #namedentities #naturallanguageprocessing #digitalassets #taxonomy

Transcript

There's a lot of software available that does a good job of identifying named entities, the names of people, organizations, events, places, and other things that occur in text. Too often these are just used as keywords to find digital assets without any further differentiation. In this talk, I will tell you how to refine or supercharge those named entities so you can tag content assets more specifically and then. Be able to find and use them more effectively. 1st I'll talk about what Named Entity Recognition or Nur is. Then I'll present some case studies, and finally I'll identify some tools and resources. Named Entity Recognition or Nur is a stack of methods for programmatically identifying named entities mentioned in unstructured text and classifying them into predefined categories such as people, organizations, locations, events, and identify identifiable patterns such as dates, time, expressions, quantities, monetary values, percentages, codes, and so on. Here's a snippet. Where person, location, organization, event data, and other named entities have been identified in the text. Named entity recognition is based on a stack of natural language processing, or NLP, methods that parse sentences, which are strings of words separated by white spaces and in some cases punctuation, into meaningful expressions that can be acted upon, for example to infer the intent of a search query. Identifying the part of speech, normalizing the terms, and analyzing the phrase structure is usually sufficient to identify the type of a named entity. Most known phrases contain named entities. They may be verified by lookup and a lexicon, glossary, authority file, or other resource such as Wikidata. For example, the longitude and latitude of Kotono and other related information can be retrieved from Wikidata. Patterns such as numbers and dates can be identified, normalized and further contextualized. For example, inferring that 2.4 million is a quantity for the population of Cotonou. Natural language processing doesn't just identify keywords, the strings in between the white spaces. It provides the capability to understand the context of the named entities and tell us what the content is really about. This requires understanding the grammatical relationships in the sentence. For example, does the string apple refer to the fruit, the technology company, or the record company? It depends whether it's related to the phrase. Granny Smith or iPhone 14 or The Beatles? What does the string Katona refer to? It's the French name of the capital of Benin. What is the number 2.4 million? A quantity of It's the population of Benin. Where is Cotonou located on a map? What is its longitude and latitude? How do you evaluate the accuracy of named entity extraction? A combination of techniques can be used to optimize recall and precision and named entity extraction. Ad hoc testing is used in the early stages of performance testing. This method uses iterative trial and error observation to obtain an initial level of Neuraperformance. We use ad hoc testing with our clients to develop a proof of concept, to identify performance bottlenecks, to identify requirements, and to assess the clients comfort zone with the technology. Quality control scripts is a software engineering best practice to develop a script consisting of test cases that simulate as much as possible all Nur cases. Quality assurance engineers develop a script based on the engineering specification and then test the product to ensure that it meets the specification. This technique is not feasible for gauging your accuracy, but it is useful for testing systems integration issues. Random sampling is a statistically based approach that yields high confidence from random sampling of nerve results. This is an established methodology adapted from the social sciences. A comparison of new results against an existing collection of similar and correctly identified named entity results, such as the TREC Text Retrieval Conference Test collection, can be used to validate neuro performance. The drawback of this approach is that mirror requirements of the standard text collection might differ from the clients requirements, leading to misleading results. Creating a representative test collection of correctly identified named entities from the target source content against which to test the newer applications. Performance is the best possible approach for measuring accuracy of the system as it relates to the customer's requirements for nerve. So how do you measure NOR accuracy? You do it the same way you measure search accuracy, by recall and precision. Recall is the number of true positive results divided by the number of all samples that should have been identified as positive, while precision is the number of true positive results divided by the number of all positive results, including those not identified correctly. The tests accuracy is measured by the F1 score, which is calculated from the precision and recall of the test. Here are some examples of neural use cases. So which organizations were mentioned in the news article? Were specified products mentioned in complaints or reviews? Does the tweet contain the name of a person? Let's consider two content sources that are very different. The Chronicle of Higher Education is a newspaper and website that presents news, information, and jobs for college and university faculty and student affairs professionals. A subscription is required to read some articles. While the Oracle Press Room is a website where you can search and sort press releases from the American multinational computer technology corporation. Let's paste the text. Of a Chronicle story US is investigating whether the university athletic cuts harm black students into pool parties. Entity Recognition demo. Click on the Extract button. And then click on the Named entity tab and extraction results. This displays 6 organizations. The first is an error. Probably. Identified as an organization because it is in the title case and includes the string. And includes the string university. In fact, it's a snippet from the title of the article. A full-featured Nur would have a method to identify the content structure and interpret titles and headings differently from body text. The remaining organizations look like organizations, even Central Michigan, which in the context of this article is a short form for the Central Michigan University. To supercharge this Nur, the next step would be to classify these organizations by type, for example, which are government agencies, which are colleges and universities, which are philanthropies and so on. Besides classifying organizations by type, other related information such as location, size and so on would also be interesting to identify and to provide the opportunity to enrich the context of this content. But. How do you sort? How do you do this sort of enrichment? It can be retrieved from Wikidata and other Internet sources. This slide shows some of the information about Central Michigan University that can be extracted from Wikidata. Let's paste the text. Of an Oracle press release University of Tennessee system upgrades finance and HR tech with Oracle Fusion Cloud Applications into expert AIS. Document analysis demo. Click on the Analyze button and then click on the entities results in the left rail. This displays. People, organizations, places, and values such as currency amounts. Percentages and measures mentioned in the text, with the corresponding links to open data sources such as Wikidata, Dbpedia, and Geonames. This displays 6 organizations. The first three are errors and the second four are correct, but the University of Tennessee system doesn't have a link to the Wikidata record. The business companies identified seemed to be more accurate, but this Nur has difficulty recognizing products. To supercharge this Nur, the Oracle products could be more accurately recognized and disambiguated by looking them up on the Oracle website. Oracle divides their products into 3 categories. Product line. Currently called Oracle Cloud Infrastructure Technology. Currently called Hardware and Software and Applications. Currently called Oracle Cloud Applications and also provides industry solutions, for example for government and education. The specific application that the University of Tennessee system purchased is Enterprise Performance Management. So what about non text media? It's true that named entity recognition needs text to work on, but even for non text media, sometimes text can be generated, for example by voice to text transcription or by using captions or surrounding text. As shown in this example from the Washington Post. There are lots of toolkits and applications that can be used to build named Entity Recognizer environments. This is a selected list of them. I showed brief demos of expert AI. And pull tardy neural components. These are both semantic platforms that provide a wide range of functionality to build automated classification applications and to build and manage categories and their relationships. Nettel has been used for Nur in the news business for a long time. For those who are developers or are located within organizations that have application development resources, toolkits are the way to go. Apache has almost everything you need to build semantic applications. Developers wonder why you would ever buy an expensive application when you can build one that has all the functionality you want. I don't know what you like to do. I like to play with toolkits, but I don't want to be responsible to build applications for my clients. To summarize. In this. Talk. I've explained what NEAR is, given you some examples of typical use cases, and demonstrated how Nerve works with a couple of applications. I've also shown you simple ways to enrich named entities with links to related information that provides more context to base content such as news articles and press releases. Please let me know if you have any questions.

To view or add a comment, sign in

More Relevant Posts

PubNub

10,302 followers
5mo
Report this post
https://buff.ly/3WnO8fc Sentiment analysis is the use of natural language processing and text analysis to determine the tone of a message. 🤖 Want to learn how to monitor your live chat for “positive” or “negative” behaviors? Check out this blog!
Like Comment
To view or add a comment, sign in
François Guité

Consultant en sciences de l'éducation et conférencier
5mo
Report this post
From Explicit to Implicit: Stepwise Internalization Ushers in a New Era of Natural Language Processing Reasoning | MarkTechPost https://lnkd.in/e2FJ9v5A From Explicit CoT to Implicit Chain-of-Thought: Learning to Internalize CoT Step by Step | arXiv https://lnkd.in/eMqmyyyU
Like Comment
To view or add a comment, sign in
Moore Injury Funding

Legal Funding Counselors
7mo
Report this post
🤖We are continuing a series of ten topics about how AI will influence Legal Funding industry. #5: NATURAL LANGUAGE PROCESSING: Al techniques, such as natural language processing, can be used to extract relevant information from legal documents and contracts, helping Moore Injury Funding quickly and accurately assess the terms and conditions of a funding opportunity. 🩵🦋 #NaturalLanguageProcessing #AI #ArtificialIntelligence #LegalFunding #Personalnjury #MooreInjuryFunding #ItsTimeForMooreInjuryFunding
Like Comment
To view or add a comment, sign in
Nattala Sujitha

--Aritificial Intelligence and Machine Learning undergrad @ Vishnu institute of technology |python programmer |
3mo Edited
Report this post
Task-1: Chatbot this will give the basic understanding of natural language processing and conversation @codsoft #codsoft
Like Comment
To view or add a comment, sign in
Annaparthi Anilkumar

--
4mo
Report this post
Task:fill mask using hugging face . Fill mask is a feature in natural language processing that involves filling in a blank space in a sentence with the most appropriate word or phrase. It's like solving a puzzle where you have a sentence with a missing word, and the goal is to predict and fill in that missing part to make the sentence complete and meaningful. This task helps improve language understanding and model accuracy by predicting the missing word based on the context provided in the sentence. If you have more questions about fill mask or need further clarification, feel free to ask! #AIMERS #AIMERSOCIETY #APSCHE #HUGGINGFACE
Like Comment
To view or add a comment, sign in
Data & Analytics

84,894 followers
6mo
Report this post
The journey of natural language processing systems spans decades, during which significant strides have been made in how computers understand human language. Read more 👉 https://lttr.ai/ARajH #InterpretHumanLanguage #ComprehensiveGuideDelves #FieldsRepresent #MachinesUnderstand #IdealProgrammingLanguages #NaturalLanguageProgramming #NaturalLanguageProcessing #RapidlyEvolvingWorld #InvolvesUnderstandingMeaning #MachineLearningAlgorithms
Like Comment
To view or add a comment, sign in
Mirko Peters

AI & Data Marketing Maven: Turning Your Tech into Talk with a Dash of Humor and a Heap of Results – Let's Connect!
7mo
Report this post
The journey of natural language processing systems spans decades, during which significant strides have been made in how computers understand human language. Read more 👉 https://lttr.ai/ARLLJ #InterpretHumanLanguage #ComprehensiveGuideDelves #FieldsRepresent #MachinesUnderstand #IdealProgrammingLanguages #NaturalLanguageProgramming #NaturalLanguageProcessing #RapidlyEvolvingWorld #InvolvesUnderstandingMeaning #MachineLearningAlgorithms
2 Comments
Like Comment
To view or add a comment, sign in
Seek AI

3,800 followers
8mo
Report this post
Excellent post by our friends at Arthur. From the post: Raz Besaleli, co-founder and Director of AI Research at Seek AI, agreed that the “leaderboardization” of evaluations is often insufficient. “When you’re building a product,” they noted, “often the task that you’re trying to solve is a lot more specific than what these leaderboards do. [...] Not only should you have very task-specific benchmarks, but you should have a whole suite of them.” Seek AI is top of the Spider leaderboard for semantic parsing and text-to-SQL, but what really matters is how well it works on data people use to make decisions.
Arthur

6,936 followers
8mo

#LLM leaderboards play a crucial role in advancing the field of natural language processing—but should practitioners take them at face value and use benchmark rankings to guide model selection? See our thoughts on this hot topic in our most recent blog: https://bit.ly/49F1JSK
Like Comment
To view or add a comment, sign in
MarTech News Room

2 followers
8mo
Report this post
4 Ways Machine Learning Powering Smarter Threat Intelligence #Natural language #processing transforms #unstructured, natural language text into a structured, language-independent #representation. In our system, this means identifying entities and events, and time associated with those events to make available information more easily #understandable to a human. Once these #entities and events are put into context, the machine can then structure this information and reveal connections to more technical threat indicators like IP addresses, hashes, and domains. The #graphic below illustrates the phases of natural language processing inside Recorded Future. We’ve developed a machine-learning module that initially determines which text is #relevant and what should be ignored, stripping away advertising or links to other #unrelated content. https://lnkd.in/dcWXFf32

4 Ways Machine Learning Powering Smarter Threat Intelligence - Tech Pulse Insider

https://www.techpulseinsider.com
Like Comment
To view or add a comment, sign in
Arthur

6,936 followers
8mo
Report this post
#LLM leaderboards play a crucial role in advancing the field of natural language processing—but should practitioners take them at face value and use benchmark rankings to guide model selection? See our thoughts on this hot topic in our most recent blog: https://bit.ly/49F1JSK
Like Comment
To view or add a comment, sign in

882 followers

View Profile Follow

Taxonomy Strategies’ Post

Transcript

More Relevant Posts

Explore topics