Knoware Blog/News

Best practices in Data Analysis Quality Assurance

In this video aimed for an audience of information analysts and their management, Colin Harris presents his views on how best to apply QA, or quality assurance to data analysis.

The benefits for information analysts who use best practice techniques are achieving more correct and therefore usable results, in a shorter time period, with less rework.

This means greater productivity for everyone involved in information analysis including the decision maker or “end user” of the information.

 

This video discusses QA, or quality assurance, best practice for information analysts. The benefits for analysts following best practice techniques are getting better and more correct results, and improved productivity.

But first of all, a little bit of background. I am Colin Harris, Technical Director of Knoware and we are based in Wellington, New Zealand. Knoware is an analytics business intelligence and information management consultancy. We have worked with many analysts teams over many years and have seen what works well and what does not.

After many requests to provide best practice advice for analysts, we developed two half day workshop sessions – one covering non technical aspects, and the other, technical aspects.

This video covers the QA segment from the non technical workshop. To put the segment in context I will start with a workshop introduction first. When we initially put these sessions together we felt like it was a lot of common sense, and would people be gaining any valuable information from this?

What we have found after a few hundred people have been through these sessions, that by far, the majority report back that they have learnt a lot of useful information.

Even though sometimes they did already know this, but were not actually putting it in practice; they had forgotten about some of the key points. There are two clear objectives of these sessions:

(1) Getting the correct information, so the right, accurate, consistent information out, of the work that is being done; and

(2) to do the work productively or efficiently, and get good results quickly.

Here are the various segments and sections that are covered on the non technical session.

First of all we talk about some common issues and then move onto the typical project phases. We will have a quick look at that. Communications are vital to be getting the best outcomes. Some data and information management concepts, and then some general tips.

We are going to look at one part of the project phases that touch on common issues first. The common issues; we put that out to the floor and get lots of feedback and ideas of the various issues people have in their different organisations. It is amazing how similar it is across all of the different sessions – the same key issues that actually come out. I really like this Dilbert one because it is some of the key issues that do come through. Continuous changes of requirements, unclear communication – and they come out very strongly once we have put this issue out to the floor.

This slide shows typical project phases for a piece of work. It depends of course of what the work is and what the organisation is. This was one was based on research and evaluation analysts work, but it could be BI developers developing reports or user interfaces. It could be a range of different types of work that needs to be done.

They typically follow the same sort of approach. This one here, first phase there is a request for work comes in, then it goes down to someone assessing that work. It gets allocated out to the resources that are going to do the work, whether that is an individual or a whole team. Then we go down to what are the requirements of the work.

Often that is called a brief in some areas. Then looking into the background; getting some good background information before you get into the actual details of the work itself. Data exploration; another important stage to explore, look for outliers, get a good a feel of the data before going on and doing the preparation work.

The data preparation itself; then onto the actual work that is required whether that be some analysis or whether it is generating a report or something else. Once that work has been done particularly analytical type work, there is some interpretation there that adding some commentary before the results get delivered out.

Then onto QA or the quality assurance phase; that is really what we are going to be talking about in more detail in this particular video – delivering the results out and consolidation at the end. It is important to point out, the arrows that we have on the left hand side there in terms of an iterative approach is what we thoroughly recommend.

You may get down to the analysis part and then find that something is not quite right and you need to go back up to a previous phase. Whether that could be right back to adjusting requirements or back to different data preparation. It is really important to get the correct results or good results that you allow that as part of the development approach you are taking.

These days Agile is pretty popular as a formal approach but it could be all sorts of other iterative approaches that you may use to do that. So now we move onto the QA or quality assurance segment of the workshop.

And a first very important point to make is that QA should be happening throughout the whole project. Not just as a discreet step that happens near the end after the development or the analysis is done. That is an important point for us.

The second point I have there is most organisations or many organisations will have their own documentation or a methodology on how they do their QA. A lot still don’t; we are finding that is the case. The second key point down there, the bigger point is about self review. Of course as an analyst or a developer is working through doing the work they should be reviewing, testing, QA-ing their own work.

Whatever they deliver off to others to be checked out, they are confident that it is producing the right result. So is that result that you are getting as expected and how do you know what the expected result should be? That depends on the work that you are doing. You may be able to compare against other systems or other reports that have been done, or other pieces of work and at least know what the ball park numbers are going to be on the results that you are putting out there.

That last bullet point there, we say check via the source system. Often the people that we are dealing with, the analysts groups are working in a data warehouse type environment, so source systems or operational systems; the information from there is extracted into the warehouse environment and you do your analysis or your reporting from that data warehouse environment.

The results that you are getting out the other end sometimes can be good or often is good to go back to that source system, frontline system, look into that system and see what the actual real results are. The real numbers are that are coming through and check that they are coming out in your final results as well. The next major point I want to make is about peer review.

We certainly highly recommend that a peer review takes place. It happens in some organisations but many organisations it doesn’t. Someone produces results and they get sent off to whoever it is who has required those, whether it is senior manager, whether it is the customer, whether it is the public or the media.

Of course that is really dangerous if things are not checked appropriately. We would highly recommend that that is done. Other places where they say, “yeah, we do peer review” it really is lip service that is being paid and someone does a quick check and says, “hey yeah, those numbers look about right, yep put it out”.

So we would say it shouldn’t just be cursory check, it should be someone checking thoroughly and the person who is checking should know that subject matter and they should know the underlying data where it is coming from, so they can ask sensible questions, can do appropriate checks.

Another benefit out of always doing peer reviews is that it cross fertilises information between different teams or between members within the team. Because if Bob is checking what Jill has done, Bob is going to see the techniques that Jill has done and maybe learn some other things from that as well.

The next point there is about comments. This is really when people are using code based software. For example, something like SAS where you are writing an actual piece of code or programming language and it is to say that when you are commenting that logic, we think a really good guideline is if you take all the logic out and just leave the comments in there, then those comments should tell the story of the work that you are actually doing and what is actually being produced. The persons doing the peer review should not just look at the extra results coming out the end but all of the things listed there. The documentation that is being produced.

Can you follow that and understand how you should be running this logic or running this whether it is on a weekly basis or just on an adhoc basis. Look at the logic itself, the code or the program or whatever the term is that is being used. Look at any logs that are produced after running the logic.

That is really important to see if there is no stray messages coming out saying something should be looked at, whether there is an error or some note that should be reviewed to check it is producing the right results. And of course, the results themselves.

The next point is about using approved business rules. This was something that was covered earlier in the workshop about how important business rules are and that organisations should have business rules. If that is the case then part of the peer review process is that the logic is being used as a part of this piece of work, should be using the appropriate business rules. And of course if business rules don’t exist for the work that is being done, of course that can’t be done.

A good simple rule of thumb is the bottom point there that the good old ‘run over by a bus test’. If the person that developed this or produced this leaves the company or something does happen to them, can someone come along and pick up from what is left there? The documentation, the code and the folder structures that are being used; that they all make sense. That someone can pick up and do this work in the future as required.

Next slide then; not only a peer reviewer but the person who is requesting that the work is being done, whether that is them individually or someone else in their team, they of course should be checking the results also. This shouldn’t be right at the end of the process once it has all been done. The person doing the work says, “Here it is, have a look at the results”. They should be being involved throughout that as well. We talked about the iterative process earlier on. As someone as developed an initial cut or first version, should be checking themselves it is okay, get the requester or the business area who requires that information, to have a look and say this is our preliminary results. How does that look? Now is it laid out appropriately? Is it the sort of numbers you are looking at? So you know when you get through to the final result that you should be on track and delivering what needs to be delivered.

The next point there is about a formal sign off process. Often this is skipped over particularly for smaller pieces of work. We have seen people bitten on the backside by that happening a number of times. The recommendation there is to always have some sort of formal sign off process. If it is a small piece of work it doesn’t have to be huge in terms of the sign off process. It can just be a matter of saying to the person or the business unit that has got the information, “Is that what you are after, is that correct?” and if it is, to put an email through to say, “Yes I agree; I have got the results I want, thanks”.

Rather than saying verbally that and that is a bit hard to protect yourself further down the track. That person leaves and you have got no record that you had produced the correct results. As the note says there, it depends on the size or the importance of the work as to how formal you get in the sign off, so it ranges from a simple email right through to a much more detailed document where you have got test results and sign offs for the results or work through that particular document.

The last point we have there for this little segment is ongoing validation. It is something that a lot of people don’t think about. You have created the first piece of work. You have checked it out. It delivers the results. It is peer reviewed, it is all great and that is really good. But if this is something that is done regularly, weekly reporting or monthly, or quarterly or whatever, there should really be something put in place that validates this on an ongoing basis. So maybe once a quarter or once every six months someone should go and check that those results are still returning valid results.

The best way to do that is as the original piece of work is done, you are of course testing that the results are correct and as part of that testing, building up a little report suite that does the validation or the data quality checking. It is not just for that initial testing, but put that to one side as a little suite that can be run in six months time and that is run through and cross checks back against some other numbers; or produces some numbers that you need to manually check.

Then you can continue to say great, this is producing our correct results. It is also really useful for troubleshooting. If someone says, “hey the numbers are wrong, don’t trust these numbers” or something really weird does come out, there is a number that obviously ten times too big that has come through because data changes over time, that checking suite of logic or programs can be used very nicely to help troubleshoot and identify where an issue is actually coming from.

If you are interested in us doing more of these sorts of videos from other sections of these best practice workshops we have put together, please contact us, let us know and we will certainly consider putting the other sections up as part of some videos for you to review.

Thanks for listening.

Best practices in Data Analysis Quality Assurance
read more

Auckland event – SAS Global Forum 2016 feedback session

Date: 19th May 2016

Location: Auckland

SAS Global Forum is the big annual conference where major SAS announcements are made. This year 5,000 people attended the Las Vegas conference.  Many presentations were given – from high-level strategy to detailed programming techniques. The big announcement this year was “SAS Viya”, the new open source, cloud-enabled architecture.

Knoware’s Technical Director Colin Harris gave a summary of the presentations. You can view Colin’s presentation here (in PDF format 1.7MB)

Milo Davies from SAS  also presented an executive summary of the conference along with a brief overview of SAS Viya. You can view Milo’s presentation here (in PDF format 0.97 MB)

TOPICS:
• SAS Viya
• Products announcements or enhancements such as SAS Customer Intelligence 360, more powerful analytics, event stream processing and SAS Analytics for IoT
• Trends in analytics
• Hadoop: How is Hadoop maturing? Plus latest on SAS and Hadoop integration
• SAS Visual Analytics enhancements + tips and tricks
• ODS Excel: Better SAS and Excel integration available now
• Keynote: The power of the introvert!
• Use of infographics for storytelling
• Better SAS Platform Administration

Back to Events
Auckland event – SAS Global Forum 2016 feedback session
read more

The NZ Analytics Forum

Date: 6th April 2016

Location: Wellington

This years conference was held at the Michael Fowler Centre
111 Wakefield Street in Wellington. It was a really great opportunity to talk to your peers, make new contacts and catch up on the latest news.

Here is the official SUNZ website

Knoware again supported the AS community NZ wide by participating as both as a sponsor and speaker and with members of our team. If you have any questions about SUNZ then please either call SAS NZ or give us a call at Knoware

Back to Events
The NZ Analytics Forum
read more

SAS Global Forum 2016 feedback session

Date: 6th April 2016

Location: Wellington

The Analytics Forum was held in Wellington on Wednesday 6th April 2016. The theme was“Stories from the Trenches” and the Forum featured 3 key speakers who have first-hand experience in working with data, and ensuring it is used correctly. The following speakers shared their insider view of analytics.

Billie Gruschow – Insights Analyst at Trademe

Kathryn Greenbrook – Analytics Consultant at Kiwibank

Rosie Read – Formerly Consultant Analyst at PA Consulting Group

Back to Events
SAS Global Forum 2016 feedback session
read more

SAS Visual Analytics Special Interest Group Q1 2016

Date: 10th March 2016

Location:

Knoware hosted another successful SAS Visual Analytics Special Interest Group (VA-SIG) at SAS’s Wellington premises on Thursday, 10 March 2016. The event was well attended and it was nice to see some familiar faces.

We had three presentations: Gavin Knight, NZ Police and Milo Davies, SAS gave verbal presentations; and Milan Horvath, Knoware had a visual presentation. We have uploaded Milan’s presentation and the invitation for the evening.

You can view Milan’s presentation in PDF format here (2.2 MB)

You can view the Special Interest Group invite here

If you are interested in attending our next VA-SIG session mid-2016 please register your interest by emailing suzanne.turner@knoware.co.nz

Back to Events
SAS Visual Analytics Special Interest Group Q1 2016
read more

Finding Keys to Business

Finding Keys to Business

Written by Clare Somerville (Managing Director of Knoware) & Trish O’Kane (Associate at Knoware)

Let’s look at the top 5 directions and challenges in using business intelligence to drive your organisation to achieve its goals and objectives.

  1. What are the biggest challenges for CIOs looking to meet users’ future needs?
  2. Most organisations have a good set of standards and guidelines for the management of information (with a focus on structured information), but why doesn’t this work well enough?
  3. Why is unstructured data so important to be able to analyse and so difficult at the same time?
  4. What are the tools we should be using to look at unstructured data?
  5. Do I still need an information Strategy and what does it need to deliver to really help me?

1

What are the biggest challenges for CIOs looking to meet users’ future needs?   

  • Between 80% and 90% of an organisation’s key information will be or is unstructured data. Meaning it is: non text-based such as scans, photos, images, voice recordings, or text based  such as email, documents or XML / web forms, and even social media chat.
  • There is growing recognition of the importance of making use of unstructured data for decision making and much of this data is generated using mobile tools. For example how easily and often do you make use of the calls to your call centre in making your strategic decisions? YouTube videos? Consumer TV?
  • Current strategies and investment in EDRMS systems to store documents can be undermined by something as simple as people avoid them, and because the saving and managing of documents is incidental to doing a business process. (I’ll do that later)
    Instead of providing support to users, the EDRMS places demands on them and makes it harder.
  • Use of spatial data and reporting is growing at an incredible rate and is heaping layers of complexity and depth into the data pile that is available to organisations i.e. Google Maps.
  • New techniques, tools and thinking is needed to make use of this array of information.

2

Most organisations have a good set of standards and guidelines for the management of information (with a focus on structured information), but why doesn’t this work well enough?

  • The standards set out what should be happening, but that doesn’t ensure that it is happening.
  • The key weakness is often the lack of connection between an organisation’s business processes and the documents created during these processes.

3

Why is unstructured data so important to be able to analyse and so difficult at the same time?

  • Structured data is predictable and predictable: an air flight reservation, or a banking transaction, whereas people create unstructured information and there is little that is predictable as a result.
  • Different techniques and tools are needed to analyse unstructured data.

4

What are the tools we should be using to look at unstructured data?

  • Text – Text analytics tools – These are linguistic, statistical, and machine learning techniques that model and structure the textual content for BI and create metadata that can be searched.
  • Voice recordings – Organisations can use systems that convert speech into text, which can then be interrogated using “analytics tools”.
  • Techniques for images include using workflow, metadata, and outsourcing.

5

Do I still need an information strategy and what does it need to deliver to really help me?

  • Short answer is YES, you do need an information strategy, but you are going to need help to make it relevant and to make it work for you. Don’t reinvent the wheel, it is being reinvented every day out there.
  • Your strategy will have standard components such as current state, desired future state, goals and outcomes, but it must cover both structured and unstructured information.
  • The strategy must identify the priority business activities that are needed to deliver outstanding wins i.e.  frequently occurring activities that carry out a key function or bring in profitable revenue.
  • The strategy must also identify information compliance obligations and the link to key business processes.  Information audits can be used to gather this information.
Finding Keys to Business
read more

SAS Visual Analytics Special Interest Group update

Date: 10th November 2015

Location: Wellington

Knoware hosted the second meeting of the SAS Visual Analytics (VA) special interest group,

Presentations included:

If you would like to come to the next session in early 2016 , please register your interest by emailing suzanne.turner@knoware.co.nz

Back to Events
SAS Visual Analytics Special Interest Group update
read more

SAS Visual Analytics Special Interest Group Q3 2015

Date: 25th August 2015

Location:

This is the first meeting of the SAS Visual Analytics (VA) special interest group, hosted by Knoware. We had three presentations and lots of good networking. Moving on from the stress of the job and the resulting effects on hair, Gavin Knight, Chief Data Scientist for New Zealand Police, talked about the classification of their users and the definition of their roles. These may change or blur over time, but it’s a good step forward to identifying the roles that people play in information delivery. Another key point was the demo information system delivered to managers, closely tied to the strategic priorities of the organisation – a good move away from the all too common operational silo approach to information delivery.

Colin Harris, Technical Director of Knoware, gave some tips and techniques in using Visual Analytics. These were based on commonly expressed user questions, and useful tricks of the trade. A practical paper, attendees came away with some new ideas to try out and some suggestions for future topic areas for the next VA SIG. Milo Davies, Principal Solutions Consultant at SAS completed the presentations with an overview of future advances and changes in VA. VA has come a long way quickly, and it is being further developed rapidly with a 6 month release cadence. Watch this space! The feedback was that this was an excellent event for all attendees. Do come to the next one!

You can view Milo’s presentation here (PowerPoint in PDF format 4.2 MB)

Click here for Colin Harris’s presentation here (PowerPoint in PDF format 1.5MB)

Back to Events
SAS Visual Analytics Special Interest Group Q3 2015
read more

Designing for a Data Future

Date: 12:00 pm – 1:30 pm 28th May 2015

Location: Westpac, Level 1, Optimation House, 1 Grey Street, Wellington

Selena Smeaton (Knoware’s intellectual lead for information governance) and Colin Harris (Knoware’s) Technical Director spoke at May’s lunchtime session for IITP (Institute of IT Professionals). The subject of their presentation is directly relevant for all of you working for both corporates and government.

“The promise of data has never had so much appeal as it has right now. Industries, organisations and data practitioners continue to debate, argue, deliberate and plot the data path from “insights to value”. Data governance methodologies, frameworks, tools and best practice approaches abound, analytics capability gets top billing at the Board table. Selena and Colin’s talk covers the ‘rules of the game’, what they mean for data governance, and get right down to details on what you actually need to consider at a grassroots solution design level.

For more information go to this link

Back to Events
Designing for a Data Future
read more