About : furniture stands
Title : furniture stands
furniture stands
ok hi my name is katherine frania and youare logged into data karma: how to deposit data that stand the test of time. this is a very visual webinar, so ifyou're only listening you won't be able to see my funny slide,it's slide number two, so i'm trying to be funny right away. i might not succeed,especially if you're only listening, but if you do have questions please typethem in. i think it'll make it a lot more interesting. so we're moving on, again my name iskatherine frania i've been processing
for icpsr for eight years. officially, i became a supervisor fouryears ago in the general archive which is the members archive. i worked in otherarchives at icpsr, such as the demography archive, dsdr, and the aging archive nacdaand i've seen a lot of different data deposits. today i hope that you willlearn what does a good data deposit look like? what happens after it isdeposited? i'm hoping it'll be a quick answer and the last thing is whathappens if you realize you need to resubmit materials. so what does karma have to do with it? well for those of you who aren't sure,karma is a spiritual principle of cause
and effect or intent and actions of anindividual influence the future of that individual. so then you have the actionswhich are the cause and the future which is the effect and i've made some funlittle chart here that show how good intent and good deeds lead to futurehappiness and bad beads and bed intently to future suffering and i'm relatingthat to date deposits because when we received a deposit that are in greatshape it makes the process front for us much easier and also we think that itmakes the collections more valuable to our users. in the chart on your right you cansee that incomplete data deposits or
documentation they can lead to a longeramount of time it takes for us to process the materials, confusion on uponsecondary analysis, and possibly less data downloads. so good data deposits equals orapproximately is equal to good karma. so, what does a good day to deposit looklike? good data deposits have data with complete unique variable labels, datawith complete value labels, the sampling information the methodology and theweight information has been provided, and their supporting documentation to thedata such as a code book or the
collection instrument / questionnaireand they're basically they go together they're just like yin and yang. data anddocumentation that's what makes a good deposit. we also refer to somedocumentation as metadata which is data about data. so for now to keep it simplei'll refer to any kind of metadata as documentation as well. so, as i said,deposits need data and documentation. they complete each other. we like to seethe file names like they go together data file and whatever documentationfile you have, just like peanut butter and jelly. we need to be able to connect all thematerials and if everything's name
straightforwardly or if you provide alegend and labeling information, basically anything that we would need toreplicate your analysis and make your collection look good to others who wantto replicate your analysis that's that's the way to go. for those of you who remember sesamestreet all of these things that you see on this slide are like the other. just todrive at home, having data without documentation ordocumentation without data in a deposit would be like trying to put together ican furniture without the l wrench or it wouldbe like working with building blocks and
trying to make a certain item withoutinstructions the same idea. in the picture here i have some legoinstructions and you can see they show you which blocks you have and whatyou're supposed to be making at each point. this image that i'm showingnow this is exactly the same thing without those directions so we can seethe blocks that were supposed to have in the end product but we have no idea howthey got there. we see a lot of data deposits without documentation and a lotof documentation files without data for whatever reason. when we see thisweek contact the depositor and let them know we more. they should not be alonethe data file shouldn't be alone the
documentation shouldn't be alone and it'slike if you were to give us one without the other it's just going to be a toughroad ahead. so when i say documentation what i mean?the documentation should support the data and there are some images here thatshow.. we shouldn't have to guess.. we don't want to see that you know somebodymistakingly deposit one questionnaire for one data collectionwith the data for a completely different data collection. documentationincludes anything from variable level information such as variable labelsvalue labels and coding information sometimes includes frequencies. it'sbasically a data legend. documentation
also means including sampling the designmethodology weight information. sometimes this information is just provided thedeposit form but we prefer to see it in both the form and a document file.the documentation can be called a report it can be called a user guide on it could just becalled a document with the same name as the data file or it can be a code book. we need that sampling information to, soyou might deposit data and you might deposit your labeling information, but wealso need the sampling and design. so here in this slide you can see an image, this is actually an image of one of the code books that gets deposited toone of our regular series and they
really spell it out for us which wegreatly appreciate. so in the documentation, the variables in the dataneed to be labeled, so in the documentation we need the labels and thedata we need labels. we can put two and two together, so if you gave us data and then justgave us documentation with the labels we can put it together but if it's notcomplete then we have to contact you and we have to find out what any kind ofmissing label information means. i have a quote here its quality meansdoing it right when no one is looking, that's by henry ford. anybody couldbe looking at your data after we
released it, so we often get userrequests that point us to inconsistencies that they have found andif we can't answer them or if it's not noted somewhere in the documentation ialmost personally feel like we didn't do our job and curating the data becausewho knows if we'll be able to track down the producer and find the answer to ourusers questions. what happens after you deposit your data and documentation?we curate it and what does that really mean? i wonder if anybody could guess whatthat really means maybe you could write in your ideas to dory? i'm sure youheard the word before maybe you've heard
different definitions. i am preparing forthis presentation, google curate, and this is what came up. curate... and it has nothing to do withdata so then i typed in data curation and a great definition came up. iactually i kind of wonder if icpsr had a huge hand in creating this definitionbecause if you look at our website we have a very similar definition. so here it says data curation is themanagement of data throughout the lifecycle from creation and initialstorage to the time ways archive for posterity or becomes obsolete and isdeleted. the main purpose of data curation
is to ensure that data is reliablyretrievable for future research purposes or reuse. so that's the data curationdefinition on google on icpsr's page i've underlined the day the definitionhere on this slide. we liken it to curating art and it we basically saythat the creation process means that the data organized described cleanedenhanced and preserved for public use and this is similar to how someone likecurate painting to make sure that it stands the test of time. welike to say we care about the data when we're talking about curating the data, aswell. you might see a lot of people on the internet talking about curating dataand they posted it they've made a
website, you know, they shared it. maybe they haven't cared for it maybe they have ensured all the labeling isclear maybe they haven't told you about howthey collected the data. making copies, posting, sharing these are onlypieces. curation includes everything from receiving the data, reviewing thedata, and cleaning it up. making sure all the questions are answered to putting itout there into the world. we know that producers feel that their data are theirbabies and we understand. i have a quote here by f. scott fitzgerald to write it tookthree months, to conceive it took three minutes, to collect the data in it all my life.
so we know you put a lot of effortinto your data and we care about that data that you've given to us. we wantto put, you knowm a good amount of effort into curating the data that it took youforever to collect. we do not want to change your data.curation does not mean change. we want to make copies. we want to back it up. we want to share those copies. we want tomake multiple formats and we want to watch those copies turn into new data.basically watch them grow. so what does a processor do? the curate data. so i gota lot of different images here from different file types and and differentlike documentation and data companies,
package companies that is, and thatreally has to do with processing. a processor has to be kind of ajack-of-all-trades when it comes to the data. so the processors, they managethe data. the data come from many different producers and projects somefrom a single producer or project. the processors also have other duties. imentioned user requests a bit ago. it's our processors that often respondto user requests. sometimes we're updating web content,which mainly ties to these study description pages that you see for yourdata deposits. we also... processors also act on organization-wide committeesto help improve any kind of policies
that are already out there for theresearch community about data curation. so processors, we receive the data in anykind of file type of the a couple sides ago you saw the images for sas and spssand ascii data. our deposits can be any kind of data sometimes they're excelsometimes their access. we convert whatever data we received into multiplefile types for usability and preservation.why does it take so long? well a processor, we receivethese data and the documentation we have to look through them, we have toextract information from the data and documentation in order to create thosedata description pages, we have to write a
summary about what the collectioncontains, we have to add all of that all that information to the description page,and we have to add subject terms to improve the search results for our topic. we also have to review the datafor disclosure risk and then that is a huge time component for any data depositbecause we want to make sure that people feel comfortable giving out their databy knowing their data is not going to identify them. another reason it takes so long, wego through the process of extracting this information from these data anddocumentation sometimes we come across
incomplete information and then we haveto reach out to the producers. basically this all ties to it takesa long time because it involves people a people provide us these information, peopleprocess the information, and it it really takes a person to read through what'sgiven to us and make those connections. we would like to think it could beautomated and for now it has not yet been and it cannot be. so i mentionedmultiple file types, so it could be not only incomplete informationwithin the files but it could be did they supply all the files that arenecessary and then how do we change those files to the desired what we liketo call icpsr full product suite.
just another example of somedifferent ways that we might receive the documentation the data and how thattranslates to making sure that we create the usable files, ready to go files, thatare that is and the preservation of formats. we try to make sure thateverything that seems unclear is at least made clear or noted somewhere inthe documentation so that the user know what to expect. another reason it takes so long isthat the stats packages are not equal to each other. they all have their quirks. what is ok and one is not okay andanother. any of you who have dealt with
stata you would know that if you provide a variable that has a range of one through five and then 1.5 meanssomething and you labeled 1.5 stata that that's crazy you you never do thaton 1.5 could be somewhat lightly and sas and spss will be totally fine with that, state is not. another example of how the packages are not equal to each other isthat, and not to pick on stata, i love stata, but stata is not compatible withspss for conversion of the four stata of version 9. so if we receive a newerversion of stata we have to convert it back to state a version nine in order tocreate an spss data file from it. just one more example so that i'm not pickingon stat, sas, sas does not like own people to
use certain things in their variablenames, such as, i want to say it underscores but i actually think it'speriods but it's this is how i'm tripping up right now is a perfectexample of how hard it can be to keep track of whats okay and one is not okay and another. anotherreason that it takes so long is because we a question text and in the picturethat you see here i have the same variable and one has question text andthe other one doesn't so you can imagine that if you'relooking through our variable search and you come across this mental healthvariable and then you look at the values
if it does not have the question textyou have no clue that this variable relates to how worried the respondent ison or anxious they might be that they have felt tired or worn out or exhausted.you know the the label mental health could mean anything. question text is a huge time suck but wethink it's a very valuable time suck. another reason it takes solong and i think that these are the short answers to the question whydoes it take so long because this is my last point about it is that processors combine document so that they're easy to click through. everything thatyou're seeing in the image here we
produce that cover we create thosebookmarks we write up those processing notes we embed fonts so that the pdf youknow adheres to preservation standards but you can't really see embedded fonts youhave to look at the properties that they are. so we want to ensuresecondary users do not have questions that they need to go back to theproducers for since the data will outlive the producer and that's thecuration standard and that if you do that if you give us data deposits thathave good data and good documentation that's good karma and you're hearingpreservation standards. we can't do this without you, even if youyou know make a mistake and you give us
the wrong materials we will reach out toyou and you know that all ties to evacuation and and the time that ittakes. what happens if i realized i need recent materials? if you realize youneed to resubmit materials please contact us. you can provide the new orupdated materials as soon as you can and include theinformation needed to connect your deposits. depending on what changesyou made it might not impact the time that much the remaining time than ittakes us to distribute the information will work with you. if you've been incontact with anybody since you deposited your materials, such as some ofmy acquisition team or the archive
manager or the processor, you canemail us along with depositing the new materials. if we know what to expectwith that update we can plan our time accordingly so that we don't continue towork on efforts that be later in vain sometimes when people ask usquestions about how long it might take to deposit their data or what if they need tosubmit materials will say it depends. i've created a few charts about thatshow the cause-and-effect or changes you might make your deposit and and thenwould in turn means to us with our current system and how it works. onthe first example is that you may have added or removed variables on and or addedor edited cases. this means that we need to
recompile our pdf variable description offrequencies we often call it a codebook. more importantly we need toreconvert all the data to new to those different file formats thespss, sas, stata, r and the ascii text with the setup files. if you were to edit thetitle the title affects almost every piece ofdocumentation that we release, so you can see we need to recompile all thedocumentation we need to reconvert the program files or we call them set up filesand we need to update that study description page. so there there's acouple more charts on the next slide 2 . this seems like it would be
overwhelming but i can assure you, we got this, wedo it all the time. i wanted to illustrate the effect so that youunderstand what it means to our workflow, but i think that knowing what factorscontribute to a good data deposit and what we do with the data and what to doif something in your files changes contribute to providing data that standthe test of time. it contributes to data that is sure tobe preserved at its best and usable without confusion. so to sum it up, what is a good datadeposit look like? you remember the ikea
picture, all those pieces without thatl wrench. good data deposit? data and documentation. what happens after it isdeposited? curation. what happens if you need to resubmitmaterials? you better not. just kidding just let us know; we'll figure out therest. so my last image is a nice littlepicture that i found when i went to creative commons and looked up karma. we are all on an exciting journey of learning best experienced when we sticktogether and share. so please share your data with us and please give us completeinformation about the data. i am ready for any questions. okay, we'll give people a few moments to asks me questions.
people say they like the presentation. thank you. oh man, for those whowatched and just are you know didn't just listen. we'll givethe questions a few moments to queue up just wanted to remind people of a few webinars that wehave tomorrow let's see first up we have open data is not enoughresearch data curation for the reuse. then at one o' clock assistingresearchers demonstrate impact using data related publications how icpsr doesit and how you can help. then at 2 o'clock collaborating for open data accessand data reuse how do we do it.
last a ddi primer and overview andexamples of ddi in action ok so let me go to a question here. doyou try to meet with folks while they are applying for grants to work on adata management program for their data. do you help them set up their variables, filestructure, etc. that's two questions. just kidding. do we try to meet withpeople to help them with their data management plans? yes we are working on doing that. we haveour acquisitions team and they reach out to data producers and try to get thereearly in the game with their data and help them and let them know whatwe need so that they can best share the
materials and then the second questionwas do we help them set up their variables, their filestructure. oh that's a good one. i would like tothink so, i would like to think yes. i'm not clear on what kind of system wehave for that, i would say that's more of a question for acquisition team and ican follow up with them and get back to you. i think that's somethingthat we in general want to work towards because i know that we have thevariables database and the goal and that is to tietogether variables that are similar
across studies. if those variableswere named similarly and if they use the same coding structure that would thatwould be a huge jump in the success towards that. so i would like tosay yes but i'm not sure on the details at this time. processors don't at this time i can tellyou that though all and they're usually the ones who are looking at yourdeposits after after we receive them and working on them. ok, ok, thank you for those questions. yeah it looks like that will be ourquestions. i just want to remind people that this
presentation will be on youtube alongwith the presentation slides. ok well thank you so much for joining metoday and for listening and hopefully you learned something from it. take care.