This short video captures the key takeaways from three of our expert panellists on the qualities of a good business case for KM/KO: Tom Reamy, Cor Beetsma and Barry Byrne.
This video contains some key insights about Governance for knowledge organization, from Neo Kim Hai and Ahren Lehnert, following the Expert Panel on Governance that they participated in.
The 2016 IKO conference featured a number of great presentations, a number of interesting fishbowl expert panels, and two afternoons of fascinating case study cafes showcasing 16 case studies. It also featured two workshops, one on search and the other on text analytics.
The two workshop areas are intimately connected. Search is one of the most common and powerful applications of text analytics and text analytics is the best way to make search smarter.
Even though social media and sentiment analysis have been getting more press for the last couple of years, using text analytics to improve search actually delivers more business value to organizations by saving time for users across the entire organization, cleaning up chaotic collections of unorganized documents including multiple duplicate and near duplicate documents, and finally by enhancing business decisions by delivering the right information at the right time.
The payoff for improving search is so huge that the biggest problem is believing the numbers – but multiple studies keep demonstrating that those numbers are indeed true. Saving $6,750,000 dollars a year per 1,000 employees is huge and it is probably somewhat understated as the amount of unstructured text continues to grow (see note below).
But this raises the question – how do search and text analytics work together? We spent considerable time on this in the workshop I conducted at the IKO 2016 conference, but the basic idea for those of you who could not attend the workshop can be summarized as follows:
What is the “correct” way to use text analytics to improve search? The answer is, of course, that there is no “one size fits all” solution. The most successful basic model is what I called the hybrid model. The hybrid model does not try to use text analytics to automatically tag documents (except in some cases) nor does it rely on an army of human taggers. The way it works is to semi-automate the job of adding metadata tags as seen in this summary:
This model combines the best of human and machine providing the consistency and scalability of the machine and the depth and intelligence of the human. It also overcomes the issues of author-generated tags by presenting the author with a value that they can react to rather than ask them to generate all that metadata. Reacting to suggested values is a much more cognitively easy task than asking someone to think up the best keywords. Also, it turns out that authors are much more likely to actually provide this review. And if a number of authors simply say “yes” to whatever the software suggests, then you at least have the benefits of an “automatic” tagging – not as good as a true hybrid solution, but better than no metadata at all.
There are, of course, variations on this model and there are situations where this model does not apply as well, for example, in large collections of legacy documents or external documents. In that case, the solution would normally be more heavily weighted on the “automatic” side, but even there, a partial hybrid solution is still best. The human input in these cases comes about in at least two ways. First, as the “automatic” solutions runs, tagging hundreds of thousands or millions of documents, subject-matter-experts (SMEs) and/or a team of librarians or information analysts can periodically review the text analytics-suggested tags for quality. How many documents to review and how often will vary by organization and document collection and anticipated applications.
The second avenue for human input is provided by the feedback that SMEs, authors, and librarians/info analysts generate as they publish or review the text analytics results. This feedback can then be incorporated into the text analytics auto-categorization and entity extraction rules and models to refine and improve those rules and models. Having a sample document where a categorization or extraction rule was wrong enables the text analyst to not only get clues as to what went wrong, but also can be used to test a new, refined rule.
These refined, improved rules can then be used to not only enhance the hybrid CM-text analytics-search model of tagging with facet metadata values, but can also enhance the quality of tagging in those large volume cases that are more automatic.
There are a number of text analytics and search software vendors that like to claim that their solution is fully automatic. Just plug it in, and out comes quality metadata. My experience has been that these claims are almost always grossly overstated – both in terms of the effort needed to get them to work and the accuracy of the “automatic” solutions.
It does take a significant amount of work to develop highly accurate categorization, sentiment, and extraction capabilities, but that work is becoming less as we learn to build on early efforts with templates, better knowledge organization schemas, and shared best practices. In addition, developing these capabilities also creates a platform that can be used for other applications besides search – business and customer intelligence, voice of the customer and voice of the employee, fraud detection, knowledge management applications like expertise location and community collaboration, and dozens more applications that utilize that most under-utilized resource, unstructured text.
But let’s postpone that discussion for another post.
NOTE on dollar savings per year per thousand employees:
This calculation in USD is based on a 30% improvement of search through the application of text analytics as reported in the workshop I conducted for the IKO conference, and the figures on the cost of bad search as reported in an IDC study by Sue Feldman. A good summary of search studies, including the IDC study can be found on Search Technologies website: http://www.searchtechnologies.com/enterprise-search-surveys
by Tom Reamy August 2016
Governance. Lots of it. Lack of it. What is it? Where do you need it? Do you need it? The session Fishbowl (Expert Panel) 2 - Governance for Knowledge Organization: Challenges and Opportunities at the 2016 IKO Conference in Singapore addressed these and other questions about governance. Dave Clarke, a participant on the panel and a conference organizer, captured the following questions for consideration on the first day of the conference:
What principles to determine what needs to be controlled and what can be “free for all”?
If we are talking about the governance of content, then it all comes down to purpose.
What is the purpose of the information? Is it lessons learned, training, or records? What is the intent? Is the information meant to be discussed in an open forum, used by a team collaboratively, or used as research? Is the information retained for compliance and subject to records management policies?
Most organizations have many different types of content with a designated purpose. Good governance should cover all of these variables and have a spectrum of control. Discussions, blogs, and other social communication are governed as part of the company’s intellectual property, but the governance is low and has more room for variance across the organization as this content should be easily accessed, shared, and discussed among employees. Contracts, company financials, and other corporate operational documents should be highly governed with little access and no sharing as there is likely good reason to have one official copy in a designated location. Again, all of the organizational information is governed, but the level of control within the governance framework is dictated by the nature and purpose of the content.
Should governance be top-down or based on consultation with stakeholders?
See the last question. Within a greater governance framework, the nature of the content will dictate the type of governance policies which apply. Corporate-level, proprietary information should be subject to strict top-down governance and control. Compliance is a legal issue and is not open for negotiation. While stakeholders should be able to make their case for access, there is not a lot of leeway for alternate treatments of the information. An organization with good information governance will set the appropriate legal and financial policies so corporate information is protected, secured, and complies with all applicable local, regional, and national laws. There is no reason that information of this type should be subject to alternative methods of governance at department or regional levels other than to account for regional legal differences.
Information on the rest of the governance spectrum should have stakeholder input because it is the information directly related to their work. All employees should have the appropriate level of access and should have a stake in good information governance policies, particularly in how and when information is accessed. Without their input, existing inefficient information processes are never brought to light, addressed, and improved to make the organization more efficient.
Organising? Does it always work? There are cases where no organisation is better.
No information organization is still subject to governance because it is no organization by design. For example, if a company decides to leave knowledge in an unstructured location such as a file share and offers only navigational access or access through a search interface with no other intervention or attempts at organizing the information, it is done so by deliberate design. Whatever the motivations—perceived lack of value, lack of resources, reliance on search functionality, etc.—the information is still part of a greater governance scheme dictating some information is highly governed and other information has very little governance. The governance spectrum spans the various types of information a company may possess and unorganized pockets of knowledge exist within this framework.
Of course, in a perfect world, all information is governed from its creation until its disposition, but we all know we don’t live in a perfect world. The amount of time, resources, and money it takes an organization to retroactively organize information usually has very little return on investment. Making the deliberate decision to only manage and govern content based on its value, creation date, or some other factor important to the company makes governance a more palatable proposition for subsets of the greater information framework. Sometimes organization is just not worth the effort.
What can we do step by step to make the new technology support our business and bring value?
The final question is aimed directly at technology governance, but technology is always part of the three pillars of governance including people and processes. You can’t have any one of these pillars in absence of the others. I would argue it is possible to create and adhere to governance processes even in the absence of technology. Of course, in the modern world, this is rarely the case. However, if you treat governance from a “neutral” perspective—in other words, from the point of view that governance must work regardless of the specific technology—then you can start the work of establishing the right processes and people to support good governance at any stage in a technology implementation. Establishing technology governance policies when the technology is new and before go-live is preferred, but not often a luxury most companies have. In addition to the essential change management necessary to get users to adopt a new technology, governance put in place in the beginning brings several things to the new system: trust and operational longevity.
Trust may be one of the single most important factors in getting users to adopt a new technology. When the system has the right level of governance and processes supporting everything from change requests to support, from flexibility to strict control where needed, from access to understanding how and why to use the system, then users feel comfortable and trust the system.
Likewise, good technology governance brings policies to avoid garbage in, garbage out, adding to the operational longevity of the system. When a system quickly becomes cluttered with poor information and information retrieval becomes more difficult over time, the chances the system will continue to be used and offer value to the organization diminishes. A clear governance framework for the new system will be essential to realizing the full value of the technology.
In sum, governance is important, and whether a company does a lot or a little depends only on purpose and the nature of the content. It’s not whether you have governance or not, it’s whether you have the appropriate level of governance fit to purpose.
by Ahren Lehnert, July 29 2016
A couple of articles from zdnet.com:
Big data's biggest problem: It's too hard to get the data in: "According to a study by data integration specialist Xplenty, a third of business intelligence professionals spend 50% to 90% of their time cleaning up raw data and preparing to input it into the company's data platforms. ... The data cleansing problem also means that some of the most widely sought after professionals in the tech field right now are spending a big chunk of their time doing the mind-numbing work of sorting through and organizing data sets before they ever get analysed."
The dirtiest little secret about big data: Jobs: "As a result, many companies have tried to put machine learning and artificial intelligence to use in doing some of the data sorting and data cleansing... Still, a lot of these companies quickly run up against the limitations of AI. It's excellent for very narrow, specific purposes. But, it's not very good at making judgment calls or deciding on something that falls into a gray area... Because of that, many companies have discovered that humans do the work of data sorting much better than algorithms, and so they are putting people to work behind the scenes to help their big data projects succeed."
By Maish Nichani
Here are the slides for our two IKO workshops. Many thanks to ISKO Singapore and Milkk Consulting for their generous support for the workshops, and thanks to our presenters, Agnes, Maish and Tom!
1. Agnes Molnar and Maish Nichani - Getting Started in Search
2. Tom Reamy - Getting Started in Text Analytics
By Patrick Lambe
Thanks to our wonderful speakers for their contributions to the IKO Conference last week! We'll have more updates on this blog as we process the materials and video recordings, but for the time being, here are the slides for the three keynotes and twelve of our sixteen case studies (four case studies do not have slides available).
Here is the overall IKO 2016 Conference Guide, which contains all the the case study outlines. If you want to learn more about the case studies that do not have slides, please contact us to be put in touch with the case presenter.
Here are the slides for the three keynotes:
1. Bob Glushko on The Discipline of Organising
2. Tom Reamy on Deep Text: Using New Approaches in Text Analytics
3. Matt Moore on Building Organisational Capabilities in Knowledge Organisation
Here are the slides for 12 of the 16 case studies (numbering follows the Conference Guide):
1. Agnes Molnar on Scoping Enterprise Search
3. Barry Byrne on The Irish Defence Forces' IKON Programme
4. Bob Glushko on Organising Single Source Content for a User Configurable Textbook
5. Cor Beetsma on KM Portal Implementation at Yokogawa Electric
6. Maish Nichani on Getting to Enterprise Search Pilot in 3 Weeks
9. Chris Khoo on Applying Multi-Document Summarisation Tools
11. James Robertson on Innovative Intranets with Taxonomies
12. Dave Clarke on Using a Taxonomy Management System for Distributed Governance
13. Matt Moore on Delivering Information in Context with Panviva SupportPoint
14. Patrick Lambe on Developing Faceted Taxonomies from Knowledge Maps
15. Ahren Lehret on Establishing Governance for Taxonomy and Metadata
16. Tom Reamy on Using Content Analytics on Telco Customer Call Enquiries
We will be uploading videos of the case pitches and keynotes, and the workshop slides shortly!
By Patrick Lambe
The 2015 Findwise Enterprise Search and Findability Survey makes for sobering reading. Over 50% of respondents stated that it is difficult or very difficult for users to find the information they are looking for. Nearly as many respondents stated that a major obstacle for users finding information is that they don't know where to look. Organisations have a findability challenge.
One approach to these problems is to blame it on the search engine, buy or build a new one, and start again from scratch. While this approach can feel like progress, it avoids two major underlying issues that may doom it to failure.
The first issue is that search is dependent on people looking for what they want in a sensible manner. Search teams deal with this by trying to promote search tools and then trying to ensure that metadata structures and search weighting match user behaviours. However in certain process- and procedure-driven environments, we can predict the information that people will need – e.g. contact centres, back office processing. Rather than rely on users finding things, we can proactively serve it to them.
The second issue concerns the underlying quality of the content. most enterprise content repositories are like teenager's bedrooms. They are not kept as tidy as they should be. Things go on inside them that shouldn't. And their users can be thoughtless and distracted. Implementing enterprise search is like turning on a light in such an environment. All your illusions of order are swept away by the blunt, ugly reality that sits in front of you. However turning on a light is not enough to clean up the room - or keep it clean. If your underlying content is poorly written and poorly managed then enterprise search simply allows your people to find bad stuff quicker. Unless findability strategies are linked to tools and techniques that maintain the quality of content, they will fail.
How do you achieve these goals? Well, you'll just have to come to IKO 2016 to find out!
By Matt Moore
Here's a nice piece from Image and Data Manager about potential risks in smart pattern-sensing and suggesting applications like MS Office Delve. It may detect patterns you wouldn't want disclosed, and all because the permissions regime has not kept pace with the the machine capabilities. This usually happens because governance and information security work on outdated habits and assumptions. On a related note, check out this talk on "Search Among Secrets" by Prof Douglas Oard, for ISKO Singapore on May 25th!
By Patrick Lambe
We have reported at an ISKO Singapore event on the knowledge organisation competencies survey that we started after last year's IKO conference. Afterwards we had a lively discussion led by a panel of experts. The full report materials, together with the panel discussion report, and links to an ISKO UK event on the same topic, can be found at the ISKO Singapore event page. Check out the ISKO event materials page for links to resources from all their events.
We are using this blog to keep you updated on conference planning and organisation, and to link you to informative discussion materials.